MONITORING THE SLIC

1. BASIC STRUCTURE

During operation monitoring/troubleshooting the SLICs will happen at four levels of increasing complexity.

Standard monitoring displays constructed from collect status data. The goal here is for shifters to be able to anticipate problems from changes in some basic histograms and to be able to quickly identify a problem as coming from the SLICs when it occurs. For this to occur efficiently the shifter must not be bombarded with too much information from the SLICs.
Expert monitoring displays constructed from collect status data. These would be used mainly by L2muon experts to keep a eye on the detailed performance of the SLICs without having to stop the run and check the SLICs explicitly. Obviously, the more information that can go in here, the better.
A slic_alive program that shifters could run to localize an existing problem to a given SLIC. This program would be run on the entire collection of SLICs with the run stopped and would quickly test all major components of each board for basic functionality. The goal of this test would be to identify as quickly as possible a problem SLIC so that it could be replaced and the run continued.
The full SLIC test suite which would only be run by experts when all other avenues of investigation had failed or for localizing problems during the repair of broken SLICs.

The slic_alive program will be a subset of the full SLIC test suite and is under development now. The rest of this note will address constraints on data obtained on collect status events due to the SLIC architecture.

Note that Arthur has made a proposal for SLIC output formats. This proposal includes monitoring information in a very sensible way. The purpose of this note is not to make another such proposal, but to lay out the constraints due to the SLIC design that lead to Arthur's scheme. To do this we will present several possible monitoring schemes and discuss their strengths and weaknesses.

2. SLIC ELEMENTS

Structurally, the SLIC consists of 6 basic element shown in the table below with their components. Also shown is how information about the status of each element can be obtained over VME. More detailed information about the SLIC architecture is available in the overview.

Element Components VME Access

Inputs Hotlink Receivers
Input FPGAs
Input FIFOs limited - through fpga
status
limited - through fpga

Link Link FPGAs status

DSP Input DSP FPGAs
DSP FIFOs status
only almost-full

Worker DSP DSPs 1-4 DSP cmds via fpga

Admin DSP DSP 5 DSP cmds via fpga

Output Buffer FPGA
Output FIFOs
Output FPGA
Hotlink Transmitters status
limited - through fpga
no
no status

3. AVAILABLE INFORMATION

Given below is a list of information from each element that could be available for online monitoring (collect status). This list is intended to be exhaustive for the FPGAs, so anything not on it should be considered as impossible (or at least very difficult) to access. The main sources of information in the table below are the input trailer word added to the end of each data block by the input FPGA, the input, link and output status commands (VME) and the DSP FPGA status register (VME).

DSP monitoring information is limited only be inventiveness in coding and processing power, so only an abbreviated list is given.

Element Information Where

Input Input Word Count trailer / status

Local Event Count trailer / inp status

FIFO Full or Almost-Full trailer / inp status

Error Flags trailer / inp status

Configuration inp status

Link FIFO empty link status

Local Event Count link status

Configuration link status

DSP Input FIFO Almost-full dspfpga status

Configuration dspfpga status

DSPs Processing Times code

Data Sizes code

Errors code

Output FIFO Full or Almost-full out status

Local Event Count out status

Configuration out status

4. ONLINE MONITORING SCHEMES

The basic constraint imposed by the design of the SLIC on monitoring data available on collect status events is that there is limited provision for event asynchronous information on the inputs and outputs. For example, there is no free-running clock that samples the occupancy of the input FIFOs at some low rate. (In fact, the FIFOs we use don't support this function anyway.) All status data information (except for FIFO-full and ALMOST-full) on input and output data is assembled in the respective FPGAs and is available only after receipt of an end of event character.

For the DSPs, asynchronous time-in-state counters can be implemented in the code at the expense of extra processing time. The speed loss will depend heavily on how ambitious this monitoring is.

The main questions to be addressed in the SLIC online monitoring scheme are then, where will the information be stored and who will access it. Several possibilities suggest themselves and are listed below in approximate order of increasing dependence having functional DSPs.

Direct VME Access to each Element
In this scheme some external agent (the Alpaha, TCC,...) that is in charge of collecting monitoring data loops over a list of all elements in each SLIC and reads status information from them directly over VME.
Advantages
- Complete independence from the functioning of any part of the SLIC.
- All information is available.
- Real "snapshots" of the system state are taken.
Disadvantages
- Horribly time consuming and disruptive of normal data-flow in the SLIC.
- Not trivial to ensure that data is ready when asked for.
- Difficult to provide event stamps on most of the information.
Summary
This type of scheme would probably involve too much overhead. Directly accessing each SLIC element is more appropriate to do outside of normal running as part of slic_alive and the test suite.
VME Access to each DSP
Here each of the five DSPs collects status information from the inputs through the input trailer word. They add their own processing statistics to this and put it all in a convenient location. An external agent then accesses this location over VME and reads the information.
Advantages
- More efficient than scheme 1).
- Format of monitoring data can be changed easily in DSP code (including event stamps).
- Reduced dependence on functionality of the full SLIC readout chain.
Disadvantages
- Information available only in status registers is lost.
- While each DSP is sending monitoring data to VME it cannot accept or send normal data on the Link.
Summary
This scheme will probably also prove to be too disruptive to normal data taking to be implemented.
VME Access to DSP-5
In this scheme the administrator DSP serves as the collection point for all monitoring information in a SLIC. Each worker DSP collects input channel status via trailer words and adds its own status information. This data is included in the normal worker DSP output, as in Arthur's proposal, and sent to DSP-5 over the link after it has finished processing each event. DSP-5 then assembles the information from all the worker DSPs into a convenient location which is accessed over VME by an external agent.
Advantages
- Most of the advantages of scheme 2).
- More streamlined access to monitoring data.
Disadvantages
- Requires full readout chain in SLIC to be functional.
- Output from the SLIC is interrupted while DSP-5 is being read over VME
- Some processing and memory burden on DSP-5.
Summary
This scheme is attractive in that it balances efficiency of SLIC operation with autonomy from other boards in the system. We need to study how VME accesses to the SLIC during data-taking will really affect the system though.
Monitoring Data as part of SLIC output
The SLIC can simply pass the buck on monitoring upstream to a worker Alpha by sending monitoring data which is collected as in scheme 4) out with its normal data output. This could conceivably be streamlined by adding the monitoring block only on collect status events. (Is this possible?)
Advantages
- Minimal impact on SLIC performance.
Disadvantages
- Requires upstream elements to be functional in order to access SLIC monitoring data.
- Increases output data volume.
Summary
This would certainly be the easiest scheme to implement (for SLIC people).

Element	Components	VME Access
Inputs	Hotlink Receivers Input FPGAs Input FIFOs	limited - through fpga status limited - through fpga
Link	Link FPGAs	status
DSP Input	DSP FPGAs DSP FIFOs	status only almost-full
Worker DSP	DSPs 1-4	DSP cmds via fpga
Admin DSP	DSP 5	DSP cmds via fpga
Output	Buffer FPGA Output FIFOs Output FPGA Hotlink Transmitters	status limited - through fpga no no status

Element	Information	Where
Input	Input Word Count	trailer / status
	Local Event Count	trailer / inp status
	FIFO Full or Almost-Full	trailer / inp status
	Error Flags	trailer / inp status
	Configuration	inp status
Link	FIFO empty	link status
	Local Event Count	link status
	Configuration	link status
DSP Input	FIFO Almost-full	dspfpga status
	Configuration	dspfpga status
DSPs	Processing Times	code
	Data Sizes	code
	Errors	code
Output	FIFO Full or Almost-full	out status
	Local Event Count	out status
	Configuration	out status