S. Böttcher, J. Parsons, S. Simion, W. Sippach
Columbia University / Nevis Labs
The DSP processing unit (PU) is part of the ATLAS calorimeter read-out driver (ROD), responsible for receiving the digitized samples from the FEB, processing these data, and making the processed data available to the DAQ [1]. The data processing includes the linear filtering of the digitized samples and computation of the energy, time, and pulse shape quality for all channels. The PU is also responsible for data monitoring for every channel.
This note describes the PU prototype realized with a TMS C6202 CPU. A general description of the board is given, followed by more detailed information on how this board can be operated (hardware guide). The software issues, including how the physics algorithms are implemented on this particular board, are not discussed here. Instead, these will be addressed in a separate document.
The PU (Figure 1) is organized around a 250 MHz TMS320C6202 DSP with 256K Bytes of internal program memory and 128K Bytes of data memory. Input data are provided through the input FPGA in charge of receiving the FEB data and TTC information, data format consistency checking, and some data rearrangement. The events are then buffered into a dual-port memory, available to the DSP as read-only external asynchronous RAM. This dual-port RAM also serves for initializing the DSP internal program and data memory, at reset. After event processing, the DSP writes the output data in the final format, to the output FIFO, ready to be read by the ROD motherboard.
|
The board is controlled by the output FPGA, which implements the PU interfacing to the ROD motherboard. The output FPGA is responsible for booting the PU board by configuring the input FPGA, then loading the DSP code. It also provides a second FIFO (for histogramming purposes) which can be written to by the DSP, and read from by the ROD motherboard. Finally, the output FPGA provides a two-way serial communication with the DSP, and a few other control signals.
|
TMS320C6202 @ 250 MHz |
|
|
CYC7C057V -12 ns 32K × 36 (used as 32K × 32) |
|
|
IDT72V3680 16K × 36 (used as 16K × 32) |
|
|
|
This section describes only those peripherals which are relevant for the prototype board.
The dual-port memory is the only device connected to the External Memory Interface (EMIF). The right port of this device is available to the DSP as read-only asynchronous RAM, and can be accessed directly via LOADs or by DMA transfers. Note that all four CEn spaces will in fact address the same physical device. The EMIF timings are normally configured (immediately or soon after boot) to the fastest settings supported by the dual-port RAM.
Please refer to Section 10.3 of the Peripherals Guide [5], describing the memory map. On this board, the DSP boots in MAP1 mode.
The expansion bus (XBus) is configured in synchronous FIFO mode. It provides the high throughput necessary for event output, as well as for histogramming and for test purposes. The XBus can only be written by DMA transfers; it is not possible to use STOREs. The XCE3 space corresponds to the event output FIFO, while XCE2 corresponds to a second FIFO inside the output FPGA. In addition to the XBus signals, the following synchronization signals can be used:
Before writing events to the output FIFO, a few registers must be set up appropriately, as shown in Table 2. These are the XBus global control register, DMA primary control register, DMA secondary control register, and DMA destination address register. In addition, the DMA source address register must be set up before each transfer. The DMA channel 2 should in principle be used for writing out events, since the DMAC2 signal increments the FPGA event counter.
Using these settings, the transfer does not proceed if the event FIFO is almost full (less than 1024 words free). You must not issue a second DMA transfer before the previous transfer has completed. This can be checked by reading the STATUS field of the DMA primary control register. For details, please refer to Chapters 5 and 8 of the Peripherals Guide [5], and to the DMA Performance Application Report [7].
To set up a multiframe DMA transfer, please follow the indications in Table 3 below. Frames are synchronized using the half-full flag of the FIFO, available as EXT_INT5 . In this example, we assume that the DMA channel 3 is used.
The C6202 has three Multichannel Buffered Serial Ports (McBSPs), of which McBSP0 is used for two-way communication with the output FPGA. To ensure compatibility with the FPGA, this port shall be configured, in that order, as described in Table 4.
|
Change the polarity of CLKX0 so that data are driven by the falling edge of the clock. |
|||
The two other serial ports are used as general control signals: DX1 , FSR1 , FSX1 , DX2 , FSR2 , FSX2 , CLKR2 . Each one of these pins is connected to the output FPGA controller, and therefore the assignment of these signals as inputs or outputs must match the configuration of the output FPGA.
The present hardware configuration of these signals is described in Table 5.
Please refer to Chapter 11 of the Peripherals Guide [5] for what concerns the software configuration of the DSP serial ports. In particular, refer to Table 11-22 in the above document for what concerns the McBSP configuration for general I/O. Note that if you do not wish to use these signals, their default configuration at reset does not conflict with the hardware assignment.
Do not configure the FSR1, FSR2, or FSX2 as DSP outputs!
The output FPGA controller is configured from the on-board PROM. The PU initialization must be completed via VME, by configuring the input FPGA and loading the DSP code.
During and after power-up, the output FPGA maintains the DSP RESET signal asserted.
The complete initialization sequence is the following:
1. For a warm reinitialization, the DSP RESET signal should first be asserted.
2. The input FPGA is configured as specified in the ALTERA application note.
3. The input FPGA is put in Offline mode (as opposed to Event processing mode). In this way, the dual-port RAM can be initialized via VME.
4. Load the output FPGA address register with the absolute address in the dual-port RAM where the data is to be stored.
5. Send the data as 32-bit words by writing to the corresponding register in the output FPGA. The address register is incremented automatically.
6. Go back to step 4 to initialize another block of memory, if necessary.
7. The DSP RESET signal is deasserted. The DSP now initiates its ROM boot process. The program located in the dual-port RAM is copied to address 0 by the DMA controller. This is done with a single-frame block transfer of 64K bytes. After completion of the transfer, the DSP starts executing from address 0.
8. If the DSP code and data needed for event processing occupy more than 64K bytes, a second phase of the boot process is needed. Subsequent transfers will be initiated by the DSP (by software) to download the data or code from the upper 64K of the dual-port RAM.
9. If the DSP code and data occupy more than the 128K bytes of the dual-port RAM, a third phase of the boot process is needed, involving some VME handshaking via the DSP serial port or via the general control signals: The host CPU refills the dual-port RAM and notifies the DSP that more data is available. The DSP then copies the data from the dual-port RAM to its internal memory and acknowledges the transfer.
The PU interface to the ROD motherboard is implemented by the output FPGA controller. Its main functionalities are described below.
The output FPGA controller drives the nCONFIG , DATA0 and DCLK pins and monitors the nSTATUS and CONF_DONE pins of the input FPGA.
The output FPGA controller drives the Offline signal to the input FPGA. When this signal is high, data can be sent via VME to the input FPGA or to the dual-port RAM. This Offline mode effectively inhibits FEB and TTC data processing by the input FPGA.
The output FPGA controller drives the DSP RESET pin. This low-active signal is asserted at power-up, and can be controlled via VME.
The output FPGA controller provides a two-way interface to the DSP via the McBSP0 port. Unfortunately, currently there is no buffering at the FPGA receive end; therefore the DSP must not send any new data until the previous data word has been read via VME. Recent data always overwrite previous data at the FPGA receive end.
The output FPGA controller monitors the general-purpose DX1 and FSX1 DSP signals.
The output FPGA controller receives data from the XCE2 space, into the internal histogram FIFO. This FIFO is then read out via VME, through the PU 8-bit wide data bus. The internal histogram FIFO word count can be read via VME as the f_bwc field of the status register.
Data from the XCE3 space is written directly into the external event FIFO. The event FIFO is read out directly via the 32-bit wide event data bus.
The CW_nRSTBUF and CW_nRSTFIFO bits of the FPGA control register must be set for the corresponding data path to be enabled. Clearing the nRSTFIFO bit resets the external event FIFO. Clearing the nRSTBUF bit resets the internal histogram FIFO.
The output FPGA controller monitors the event FIFO empty flag EF and the event FIFO almost-full flag PAF . The PAF flag is also connected to the INT6 pin of the DSP.
The ouput FPGA controller implements the Event-Ready logic via a counter. The rising edge of DMAC1 increments the event counter, while the OC_EvtEnd signal decrements the counter. The output signal FIFO_EvtRdy is active when the event counter is non-zero.
A software pulse sent via the control register, can be used to reset both the event FIFO and the counter. The event counter can be read via VME as the f_nevt field of the status register.
|
The least significant 8 bits are sent as configuration data to the input FPGA |
|||
During normal event processing, the input FPGA receives the FEB data and the TTC trigger information. The data are then formatted and written to the dual-port RAM. In Offline mode, the input FPGA receives 32-bit serial data from the output FPGA controller, and writes the data to the dual-port RAM (if A15 is 0) or to its internal data-formatting table (if A15 is 1).
The 128K Bytes of the dual-port RAM are divided into 128 contiguous event blocks of 1K Bytes. It is therefore assumed that the complete formatted data from one event (64 channels) holds entirely within one 1K Byte event block. The input FPGA Altera code may be modified to accommodate a different block size, if necessary.
The last word of the dual-port RAM (address 0x7FFF) contains the TTC event block counter and the FEB event block counter. The TTC counter occupies the upper 16 bits and the FEB counter occupies the lower 16 bits.
The DSP must poll on these counters when waiting for incoming data; no other signal is delivered to the DSP upon reception of a TTC trigger or FEB event.
The prompt BCID information, and the delayed Trigger Type, are decoded and buffered separately into two FIFOs. When the TTC information is complete (i.e. the Trigger Type FIFO is not empty), both FIFOs are read, and the 32-bit TTC word is written to the dual-port RAM, at offset 0xFE inside the corresponding event block. The TTC event counter (i.e. block counter) is then incremented.
Currently the input FPGA does not check the TTC BCID and the FEB BCID for equality. This can be done by the DSP.
The FEB data are de-serialized. Reception of the FEB event header (BEVT) triggers the decoding of the FEB event. The FEB control words CTL1, CTL2, and CTL3, and the capacitor addresses RADDn, are written to the dual-port RAM without any modification; while the ADC values and the gain bits are written separately. The writing takes place at the offsets prescribed in the data-formatting table .
All data words (except BEVT and EEVT) are checked for odd parity. If one or more errors are detected, a flag is set in the error status word.
The gain bits for a given channel must preserve their value for all samples during an event1. Otherwise, a flag is set in the error status word.
Since the input event buffering uses a dual-port RAM, the format of the FEB data to the DSP is configurable to a large extent. This flexibility is useful for a fully efficient DSP code.
The different types of information are described in the table below. The data are always packed in 32-bit words. Each one of the words in the left column can be written or not to the dual-port RAM. If a word is written, the RAM write address is also user-defined, via a data-formatting table which is downloaded to the input FPGA.
Input FPGA error status word. |
||||||||
The software library provides an easy way to configure the event format in the dual-port RAM using the data-formatting table.
The input FPGA detects various errors while receiving FEB data. These errors cause the corresponding bits to be set in the event status word, as shown in Table 10.
The C6202 board consumes 4 W when idle, and 5.8 W during event processing. Event processing includes: computation of energy, time, and χ 2 for 64 channels at 200 kHz LVL1 trigger rate, and the associated I/O; and filling of energy and time histograms for each channel, for all events.
The processor board described in this document is beleived to be able to provide the CPU and I/O bandwidth necessary to process and monitor 64 channels at 100 kHz trigger rate. However, some straightforward steps can be taken to boost the performance of this board, which may be useful for flexibility (e.g. if the DSP code can written in C instead of assembler) or, alternatively, for doubling the number of channels which can be processed.
Therefore, the following changes are considered:
1 ATLAS Liquid Argon Calorimeter TDR, CERN/LHCC 96-41.
2 Liquid Argon Calorimeters Read-Out Drivers web page http://atlasinfo.cern.ch/Atlas/GROUPS/LIQARGON/ROD/largrod.html
3 TMS320C62X Family Application Notes http://www.ti.com/sc/docs/apps/dsp/tms320c62x.html
4 TMS320C6000 CPU and Instruction Set Reference Guide SPRU189E
5 TMS320C6000 Peripherals Reference Guide SPRU190C
6 TMS320C620X/TMS3206701 DMA and CPU: Data Access Performance SPRA614
7 TMS320C6000 Expansion Bus to External Synchronous FIFO Interface SPRA547
8 TMS320C6000 Technical Brief SPRU197D
9 TMS320C6202 Data Sheet SPRS104a
10 TMS320C6000 Optimizing C Compiler User's Guide SPRU187G
11 TMS320C6000 Programmer's Guide SPRU198D
12 TMS320C6000 Tools: Vector Table and Boot ROM Creation SPRA544A
13 TMS320C6000 Assembly Language Tools User's Guide SPRU186G