# Multi-cavity Field Control (MFC) Module Description

Low Level RF Group



July 2009

## Introduction

The MFC board is an FPGA based 33 channel down-conversion and signal processing module designed for vector control of multiple cavities with a single Klystron such as the proposed RF scheme for the ILC. A block diagram of the main components of the board is shown in Fig. 1 and a picture of the board in Fig. 2. There are 4 DAC channels for RF outputs and multiple high speed serial transceivers on the front panel and the backplane bus to allow a flexible architecture for inter-module real time data exchanges. A floating point DSP provides additional computational capability for calibration and implementation of more complex control algorithms. Both the FPGA and DSP have external SDRAM memory for waveform and diagnostic data storage. Non-volatile Flash memory is used for DSP program and FPGA configuration storage.

The interface CPLD supports the VXI bus protocol for communication to a Slot 0 CPU, with Ethernet connections for the control system interface, remote in system programming of the FPGA and DSP as well as for data acquisition and diagnostics.

#### Features

- 32 12-bit, 65MS/S ADC input channels including 2 DC coupled channels
- 1 14-bit, 105MS/S ADC input channel
- 4 14-bit, 260 MS/S DAC channels configurable as AC or DC coupled
- 1 8 output clock divider chip with a 1.6 GHz max external clock input
- 1 External Aux clock input (LVCMOS) to FPGA and/or DAC
- 2 Front panel TTL trigger inputs
- All Linear regulators for low board noise
- Optional DC/DC converter for higher FPGA current (>3A) applications
- Cyclone II FPGA with 70000 Logic elements and 300 18-bit multipliers, 400 MHz
- 2 Front panel and 2 backplane serial ports 600 Mb/s each
- 400 MHz, 32-bit floating point DSP with 5 Serial Ports, SPI and UART ports
- 64 MB FPGA SDRAM and 16 MB DSP SDRAM for waveform capture
- 16 MB Flash memory for FPGA and DSP code storage
- Remote download of programs to Flash through Ethernet.
- Max II CPLD for VXI interface to backplane







Fig. 2 MFC Board

# 1) Signal I/O

#### a) 12-bit ADC

There are 32 RF input channels that are connected to 4, 8-ch, 12-bit, 65 MHz ADCs (AD9222) through 4 Harting Minicoax connectors. 30 of these channels are AC coupled through a center tapped differential transformer connection (1:4 Z ratio) with a broadband filter (>70 MHz) and impedance matching network . 2 channels are DC coupled with AD8138 differential buffers. The full scale signal amplitude to all 32 channels is 1Vpp (4dBm) into a 50 ohm termination. The ADC's have the following features

- 114 mW power dissipation per channel at 65 MHz
- SNR = 70 dB (to Nyquist)
- SFDR = 80 dBc
- Serial LVDS outputs with Data and Frame clocks
- Digital SPI port for ADC settings such as

Full chip or individual channel power down modes

Custom and built-in test pattern generation

Programmable data and clock alignment

Flexible bit orientation

Programmable output resolution

#### ADC Performance

A 13 MHz single-tone input frequency was sampled with Fs = 61.9 MHz. A 64k point FFT with a Hanning window was applied to the acquired data. The results are shown in Fig. 3

A 1 MHz apart two- tone signal centered at 13 MHz sampled at Fs = 61.9 MHz is shown in Fig. 4.

Cross talk measurements between the 8 channels on a single ADC is shown in Fig. 5. The measurements were made by applying full scale signal (10 MHz) to 7 of the 8 channels while measuring the noise on the 8'th channel.







Fig. 4 1 MHz apart Twin-Tone centered at 13 MHz with Fs = 61.9 MHz



Fig. 5 Cross-Talk in 8-channel ADC's with Fin = 10 MHz and Fs = 61.9 MHz

### *b)* 14-bit ADC

RF input channel 33 is connected to a 14-bit, 105 MHz ADC (AD6645) through a Harting Minicoax connector, AC coupled by a center tapped differential transformer connection (1:4 Z ratio) and an impedance matching network . The AD6645 is a single channel high performance ADC with parallel CMOS output and low latency (3.5 cycles). The full scale signal amplitude is 1.1Vpp (4.8 dBm) into a 50 ohm termination. The ADC has the following features

- SNR = 75 dB (to Nyquist)
- SFDR = 89 dBc, Fin = 70 MHz
- Sampling jitter 0.1 pS
- Parallel CMOS outputs with Data Ready and Over Range bits

#### ADC Performance

A 13 MHz single-tone input frequency was sampled with Fs = 61.9 MHz. A 64k point FFT with a Hanning window was applied to the acquired data. The results are shown in Fig. 6 A 1 MHz apart twin- tone signal centered at 13 MHz sampled at Fs = 61.9 MHz is shown in Fig. 7. The spectrum of the signal source for the two tone test is shown in Fig. 8.







Fig. 7 1 MHz apart Twin-Tone centered at 13 MHz with Fs = 61.9 MHz



Fig. 8 Spectrum of two-tone signal source

#### c) 14-bit DAC

There are 4 DAC output channels from 2, dual output, 14-bit, 260 MSPS DACs (ISL5927). The outputs are AC coupled (Optional DC coupling available) with a transformer (1:1 ratio) and connected through a Harting Minicoax connector. Full scale output is 1 Vpp into a 50 Ohm load. The DAC's have the following features

- 233 mW power dissipation at 130 MSPS
- SFDR = 70 dBc (to Nyquist), Fs = 130 MSPS, Fout = 10 MHz
- Adjustable Full Scale Output Current from 2 to 20 mA
- 3V LVCMOS compatible inputs
- Voltage compliance of +1.25V and -1V

A single 13 MHz tone generated within the FPGA with a NCO clocked at 65 MHz was used to measure the DAC output spectrum shown in Fig. 9 – Fig. 12.





₩ Agilent 10:13:20 Jun 25, 2009











Fig. 12 DAC Phase Noise Measurement (65 MSPS)

| DAC A | DAC B | DAC C | DAC D |
|-------|-------|-------|-------|
| -3.3  | -71.5 | -90.5 | -91.5 |
| -72.3 | -3.4  | -74.4 | -88.2 |
| -89.8 | -74.3 | -3.42 | -76.9 |
| -91.2 | -88.9 | -76.4 | -3.45 |

Cross talk measurements (dBm) between the DAC channels is shown in Table 1.

Table 1.

## 2) Clock Distribution and Synchronization

Clock distribution is provided with an 8-output, 2-input, clock divider and distribution chip (AD9510). The AD9510 has the following features

- 2 Clock inputs, up to 1.6GHz
- 8 Programmable dividers, 1 to 32, all integers
- Additive output jitter < 275 fs rms
- Adjustable Phase on outputs
- Digital SPI port for clock settings

One of the clock inputs is connected to the front panel through an SMA connector for external clock inputs. The second clock input is connected to an on-board 65 MHz crystal oscillator for testing and debugging purposes.

The following clock signals are provided by the AD9510 clock distribution chip.

- 1 clock signal for each of the four 8-channel ADCs (AD9222),
- 1 clock signal for the single-channel fast ADC (AD6645),
- 1 clock signal for the main FPGA (Altera Cyclone II)
- 1 clock signal for each of the two 2-channel DACs (ISL5927)

The AD9510 provides 8 programmable outputs, 4 PECL and 4 LVDS/CMOS. The FPGA and the 12-bit ADCs can handle both LVDS and PECL clock input, the 14-bit ADC requires a differential input and the DACs require a LVCMOS clock input. So for uniformity, the four 12-bit ADC's are driven by a PECL clock, the two outputs devoted to the DACs use the CMOS mode, and the two remaining ones provide LVDS clock signals for the FPGA and the 14-bit ADC, as shown in Fig. 13.

The PECL outputs of the AD9510 offer the best jitter performance; however, there isn't a part in that family that provides PECL only outputs.



Fig. 13 Clock Distribution

The AD9510 is programmed through its SPI port by the DSP, which is independently clocked and will come up before the clock distribution. The AD9510 has two clock inputs (CLK1 and CLK2). The default clock input is CLK1. To avoid sending uncontrolled clock signals to the ADCs before the AD9510 is programmed, the LO clock input is routed to the non-default input (CLK2).

Clock synchronization between different MFC boards is achieved using the AD9510 FUNCTION pin as illustrated in Fig.14.



Figure 14: clock synchronization between three MFC modules.

The booting sequence is as follows:

- a) Before booting, the AD9510 function pin is kept low by the MAXII (default). The default configuration of the function pin (FCT) is reset, so keeping this pin low will inhibit any outputs.
- b) After power up, as the MAX II comes up and is configured, it brings the function pin to a high state (this could be triggered externally by the slot 0). The outputs of the AD9510 become active but do not send any signal as the LO is connected to the non-default clock input (CLK2)
- c) Now that the AD9510 is active, the DSP can configure it, setting the correct divider values and changing the role of the function pin to a synchronization trigger input.
- d) After all boards are up and their clock distribution chip properly configured, the MAX II sends a negative pulse (3.3V to 0V) and maintains the function pin at high status at all subsequent times. This same SYNC pulse is propagated from one module to another through the local buses. This will synchronize all boards individually at the same time, in a consistent way.

With this synchronization scheme, the output clocks of two AD9510 are synchronized within 1 LO clock margin (1313 MHz  $\Leftrightarrow$  770 ps).

# 3) Signal Processing FPGA

The primary signal processing functions of this board are performed in a Cyclone II FPGA from Altera. The device has the following features

- 68000 Logic Elements and 1.5Mbits of RAM
- 300 x 9-bit multipliers
- 672 pins with 472 user I/O

- Maximum clock speed of over 400 MHz
- 4 PLL's

A 64 MB, 130 MHz SDRAM is connected to the FPGA for waveform storage and other diagnostic storage purposes. Data can be transferred through the SDRAM interface or may be read directly by the slot0 CPU or the DSP through the host interface.

Fig.15 shows an example signal processing application (Multi-cavity control for ILC) implemented in the FPGA.

Serial Data from the 32 RF channels is converted to 12 bit parallel in the Serial-Parallel latch. Downconversion is performed by multiplying the data with an 18 bit scaled and offset Cosine/Sine table to provide a composite gain plus rotation. The tables are 256 deep and they can be written to by the DSP or the slot0 CPU through the host interface. The 24 I,Q pairs corresponding to the cavity fields are summed for vector processing. A pair of 5'th order CIC filters completes the signal processing before the feedback error is computed. A gain and klystron linearizer multiplier table provides loop gain. A feedforward input is added and the output from the fast klystron loop is summed in before up-conversion to the IF frequency. The I and Q signals are outputted to the external modulator through a dual channel 14bit DAC.

Reference signals for the beam and cavity phase (one for each cryo-module with 8 cavities each) are processed in 4 of the auxiliary channels. Phase computation is done in the DSP and the corresponding I and Q set-point tables are updated.

Extensive diagnostics are available at various points along the signal chain. The diagnostic buffers are configured as 1024x18bit blocks. There are 2 double data line LVDS serial ports (1.2Gb/s) connected to the front panel and 2 single data LVDS serial ports (600 Mb/s) connected to the VXI backplane for inter module high speed data transfers. Two double data serial ports (200Mb/s) are connected to the DSP for data transfers independent of the local parallel bus.

For test and debugging purposes 22 pins are connected to a Logic analyzer connector pad compatible with the Tektronics 34-channel logic analyzer probe P6960. There are three ways to load the FPGA configuration file. The first is directly through the FPGA JTAG connector with the Quartus programmer. The second is through a serial configuration chip which can be programmed using a second JTAG connector. The FPGA configuration will be automatically loaded from the configuration chip at powerup. The third method is to load the configuration from the Flash memory chip connected to the local parallel bus. In this scheme, a parallel flash loader module in the MaxII interface chip reads the configuration from the Flash over the parallel bus and loads the FPGA in the passive serial mode. The third method provides the maximum flexibility as the FPGA configuration can be written remotely over Ethernet through the slot0 controller and the VXI interface. Multiple versions of the configuration (upto 7) can be stored in the Flash and slot0 commands can be used to select the version loaded and to initiate reconfiguration. Dip switches determine whether the Flash or the serial configuration chip are used.





LLRF Group

# 4) **DSP**

A 32-bit, 400 MHz, floating point SHARC DSP (ADSP-21369) is connected to the local 32-bit parallel bus as shown in Fig. 16. The DSP through its various peripherals provides the flexibility to perform a variety of tasks in addition to the computational support to the FPGA signal processing functions. The DSP has the following features

- 2 Mb on-chip SRAM for program and data storage
- 16 MB external 32-bit SDRAM with an on board SDRAM controller
- 8 Serial Ports (5 used)
- 1 SPI port and 1 UART port
- External port with 32-bit data and 24-bit address lines to connect to the parallel bus
- 34 DMA channels
- 2.4 GFLOPS at 400 MHz

At power-up the DSP loads its program from the external Flash chip using an internal bootloader. During initialization the DSP configures the 5 devices (4 ADC's and the clock chip) which have SPI ports. When the DSP initialization is complete it signals the MaxII interface chip with a flag bit to proceed with the FPGA configuration. This prevents contention at power-up of the parallel bus for access to the Flash chip where both the DSP program and FPGA configuration are stored.

The devices connected to the parallel bus are shown in Fig. 16. The DSP can read and write to all the devices shown as well as to the FPGA SDRAM through the Cyclone host interface. The MaxII can directly read and write to the FPGA (including its SDRAM) and the Flash memory. Access to the DSP internal memory and the DSP SDRAM is obtained indirectly because the DSP does not have a host port for external access. There are eight hardware interrupts from the MaxII chip to the DSP. Two of these interrupts are used to signal a read or write request. The DSP responds to the interrupt by reading the address and data from the MaxII chip and writing the requested data back to the MaxII, which triggers the data acknowledge signal DTACK for completing the slot0 request.

The DSP program can be stored on the Flash chip remotely through the Ethernet and VXI interface similar to the FPGA configuration. The DSP program can also be written to the Flash with MFCProgramFlash DSP program that is run with the USB-ICE JTAG DSP emulator.

The UART connections are available on the front panel as an additional debug port to the DSP and indirectly to the rest of the board. A Labview GUI interface is used along with a command parser in the DSP program to do simple read and write operations to various memory

locations on the board. Two spare I/O lines are connected to the FPGA and two to the MaxII chip for hardware flags or triggers.

A memory map of the board is shown in Fig. 17. The left side shows the memory space from the DSP while the right hand side shows the parts of the local memory that are accessible from the VXI A32 address space allocated for the board (64MB). The DSP SDRAM has dedicated control lines from the DSP SDRAM controller and can therefore be directly accessed only by the DSP. The maximum speed for SDRAM access is 166 MHz.

External port operations from the DSP use the handshake signal ACK along with wait states to address the different speeds of the devices on the parallel port. When a device receives a read or write request it can pull the ACK line low, causing the DSP to wait until it has completed the operation at which point it can release the ACK line.

Tasks for the DSP from slot0 are executed using the vector interrupt scheme that uses one of the eight hardware interrupts and nine 32-bit registers in the MaxII interface chip. The eight message registers are used for input and output parameters and the VIRPT register is used to write the address of the vector interrupt subroutine in the DSP program. As an example of the processing involved, the slot0 test function MfcAdd2 inputs two numbers to be added and the DSP returns their sum. The slot0 initiates the function by first writing all the parameters needed by the DSP subroutine – in this case two, to the message registers. Then the address for the function is written to the VIRPT register. This prompts the hardware interrupt to the DSP which responds by reading first the address of the routine from the VIRPT register and then branches to the corresponding routine which goes and reads the needed parameters from the message registers, carries out the task and if there are any outputs writes them to the message register(s). Then a 0 is written to the VIRPT register which is polled by the slot0 controller to confirm the completion of the function at which point it can read the message register(s) for any outputs from the function.

There are two hardware interrupts from the MaxII chip that are used in the bus control scheme described in the next section An additional hardware interrupt line is connected to an external trigger input (Start Trigger) through the front panel Minicoax connector.

There are 5 serial ports used – two to the FPGA two to the backplane and one to the front panel. All the serial ports have dual data lines and are bidirectional with a maximum data rate of 100 Mb/s each. Each serial port can be attached to DMA channels on the DSP so that data transfers can take place without interrupting the operations of the core. The serial ports to the FPGA can be used to bring data in to the DSP and the computed parameters can be written to the FPGA using the parallel bus allowing a simple loop architecture for continuous signal processing schemes.



**MFC Data Transfer Control** 

Fig. 16

# **MFC Memory Map**





## 5) VXI Interface Chip

An Altera MaxII CPLD (EPM2210F324C5) is used to provide the VXI interface.

The MaxII has the following features

- Instant on non-volatile architecture using on board Flash memory
- 2210 Logic elements and 324 pins
- User Flash Memory(UFM) area upto 8kbits
- Maximum clock rate of 300 MHz

In addition to the VXI interface the MaxII provides some multiplexing logic for the DSP SPI port control, the Parallel Flash Loader which controls the configuration of the FPGA, some logic for driving front panel LED's, watchdog chip and local reset control and 12 pins to the logic analyzer pad. There are two DIP switches for user digital inputs.

#### VXI Interface

A register based dynamically configured(DC) A32/D32 VXI interface is implemented in VHDL. The interface includes 6 16-bit configuration registers and a state machine to handle the data read/write protocol. These 6 configuration registers are used during initialization to provide the slot 0 controller with information such as Device ID, Manufacturer ID, memory requirement etc. An 8 bit value called the Logical Address is also provided which if set to 0xFF indicates a dynamic configuration device. Any other value for the logical address makes it a static configuration (SC) device. For DC devices, the slot 0 controller determines the logical address. 64 bytes in A16 space are allocated to each device with the base address calculated by the equation 49152 + 64\*V where V is the logical address. The 6 16 bit configuration registers are mapped to the first 8 bytes starting at this base address. They are the Manufacturer ID, Device type, Status, Offset, Logical Address and Control registers as shown in Fig. 18

For a DC device, during initialization when the A16 base address is not assigned, the logical address is passed to the Resource Manager through the Offset register which is initially mapped to a base address corresponding to a logical address of 0xFF. Since all the cards start off with this same base address, the MODID line is used to single out one card at a time. Once the Resource manager has assigned a logical address it will write the new base address to the logical address configuration register. All subsequent reads and writes to this device use this new base address and the MODID line is ignored.

In the Offset register, the base address of the device's A32 memory is written by the Resource Manager based on the memory requirement indicated in the Device type register. The contents of this Offset register are used to do address decoding for any reads and writes to the A32 memory of the device. This process is repeated by the Resource Manager till all slots have been configured.



Fig 18 MFC VXI Interface

**‡** Fermilab



#### Bus sharing scheme between DSP and MaxII (Slot0)

The data transfer state machine is shown in Fig. 19. For slot0 accesses to the DSP internal memory or DSP SDRAM, it is necessary for the DSP to be directly involved in the data transfer. The MaxII first sends an interrupt to the DSP, signaling a read or write request. The DSP reads the address and data copied into MaxII registers, decodes the address, performs the read or write and writes the data to the data register in the MaxII. This signals the completion and the MaxII can conclude the slot0 transaction with the DTACK handshake signal. Access to the Flash and the FPGA memory directly from the MaxII, does not involve the DSP directly. However, a definite scheme for sharing the local bus mastership between the DSP and the MaxII is required.

In order to minimize disruption to the DSP and to reduce the overhead in the slot0, a dual scheme for interchanging bus mastership is adopted. It would result in a different approach for short accesses to a few registers for a parameter change for instance, as opposed to a sustained data transfer of a block of memory such as in downloading or uploading tables and waveforms from the board or for a Flash write sequence. For short accesses, the process is transparent to the slot0. Each access results in two interrupts to the DSP from the MaxII – the first to signal it to release the bus and the second to indicate that it may resume bus mastership. Acknowledgement of this from the DSP will be through the existing DSP BRQ output to the MaxII. Specifically, when there is an access to the Flash or the FPGA the interface state machine first checks to see if the semaphore register indicates slot0 control. If the semaphore register indicates slot0 control, the DSP BRO line is checked to see if LOW – which indicates that the DSP external accesses are suspended. If the semaphore register indicates DSP control, an interrupt is sent to the DSP to release the bus. The state machine then waits for the BRQ line to go LOW. Once BRQ goes LOW, the transfer can proceed. Upon completion of a transfer, another interrupt is sent to the DSP to resume bus mastership – unless the semaphore register indicates slot0 control, corresponding to extended access. Thus if it is an extended access, bus mastership remains with the MaxII.

The semaphore register is written to only by the slot0. An extended access requires two writes to the semaphore register, one at the beginning and one at the end of the access. Block transfers should be treated as extended access. For slot0 accesses to the DSP internal memory or DSP SDRAM it does not make sense to use extended access as described because the transfer involves the DSP directly which must keep bus mastership.

# 6) **Power Supply**

All power to the board is provided by linear regulators which are driven from the +5V VXI crate power supply. The only other backplane power used is a -5V connection for the AD8138 differential buffer amplifiers for the two DC coupled RF inputs. A 3A regulator is used for the I/O power supply to the FPGA. Depending upon the resource usage in the FPGA, this current requirement can be in excess of 5A. For this purpose an optional 6A DC/DC converter can be substituted for the linear regulator.