MONITORING THE SLIC
1. BASIC STRUCTURE
During operation monitoring/troubleshooting the SLICs will happen at
four levels of increasing complexity.
- Standard monitoring displays constructed from collect
status data. The goal here is for shifters to be able to
anticipate problems from changes in some basic histograms and to
be able to quickly identify a problem as coming from the SLICs
when it occurs. For this to occur efficiently the shifter must
not be bombarded with too much information from the SLICs.
- Expert monitoring displays constructed from collect
status data. These would be used mainly by L2muon experts to
keep a eye on the detailed performance of the SLICs without
having to stop the run and check the SLICs
explicitly. Obviously, the more information that can go in here,
the better.
- A slic_alive program that shifters could run to localize
an existing problem to a given SLIC. This program would be run
on the entire collection of SLICs with the run stopped and would
quickly test all major components of each board for basic
functionality. The goal of this test would be to identify
as quickly as possible a problem SLIC so that it could be replaced
and the run continued.
- The full SLIC test suite which would only be run by
experts when all other avenues of investigation had failed or
for localizing problems during the repair of broken SLICs.
The slic_alive program will be a subset of the full SLIC test
suite and is under development now. The rest of this note will address
constraints on data obtained on collect status events due to
the SLIC architecture.
Note that Arthur has made a proposal for
SLIC
output formats. This proposal includes monitoring information in a
very sensible way. The purpose of this note is not to make another
such proposal, but to lay out the constraints due to the SLIC design
that lead to Arthur's scheme. To do this we will present several
possible monitoring schemes and discuss their strengths and weaknesses.
2. SLIC ELEMENTS
Structurally, the SLIC consists of 6 basic element shown in the table
below with their components. Also shown is how information about the
status of each element can be obtained over VME. More detailed
information about the SLIC architecture is available in the
overview.
| Element |
Components |
VME Access |
| Inputs |
Hotlink Receivers
Input FPGAs
Input FIFOs |
limited - through fpga
status
limited - through fpga |
| Link |
Link FPGAs |
status |
| DSP Input |
DSP FPGAs
DSP FIFOs |
status
only almost-full |
| Worker DSP |
DSPs 1-4 |
DSP cmds via
fpga |
| Admin DSP |
DSP 5 |
DSP cmds via
fpga |
| Output |
Buffer FPGA
Output FIFOs
Output FPGA
Hotlink Transmitters |
status
limited - through fpga
no
no status |
3. AVAILABLE INFORMATION
Given below is a list of information from each element that could be
available for online monitoring (collect status).
This list is intended to be exhaustive for the FPGAs, so
anything not on it should be considered as impossible (or at least
very difficult) to access. The main sources of information in the
table below are the
input trailer word added to the
end of each data block by the input FPGA, the
input,
link and
output status commands (VME) and
the DSP FPGA status register
(VME).
DSP monitoring information is limited only
be inventiveness in coding and processing power, so only an
abbreviated list is given.
| Element |
Information |
Where |
| Input |
Input Word Count |
trailer / status |
|   |
Local Event Count |
trailer / inp status |
|   |
FIFO Full or Almost-Full |
trailer / inp status |
|   |
Error Flags |
trailer / inp status |
|   |
Configuration |
inp status |
| Link |
FIFO empty |
link status |
|   |
Local Event Count |
link status |
|   |
Configuration |
link status |
| DSP Input |
FIFO Almost-full |
dspfpga status |
|   |
Configuration |
dspfpga status |
| DSPs |
Processing Times |
code |
|   |
Data Sizes |
code |
|   |
Errors |
code |
| Output |
FIFO Full or Almost-full |
out status |
|   |
Local Event Count |
out status |
|   |
Configuration |
out status |
4. ONLINE MONITORING SCHEMES
The basic constraint imposed by the design of the SLIC on monitoring
data available on collect status events is that there is limited
provision for event asynchronous information on the inputs and
outputs. For example, there is no free-running clock that samples the
occupancy of the input FIFOs at some low rate. (In fact, the FIFOs we
use don't support this function anyway.) All status data information
(except for FIFO-full and ALMOST-full) on
input and output data is assembled in the respective FPGAs and is
available only after receipt of an end of event character.
For the DSPs, asynchronous time-in-state counters can be implemented
in the code at the expense of extra processing time. The speed loss
will depend heavily on how ambitious this monitoring is.
The main questions to be addressed in the SLIC online monitoring
scheme are then, where will the information be stored and who will
access it. Several possibilities suggest themselves and are listed
below in approximate order of increasing dependence having functional
DSPs.
- Direct VME Access to each Element
In this scheme some external agent (the Alpaha, TCC,...) that is
in charge of collecting monitoring data loops over a list of all
elements in each SLIC and reads status information from them
directly over VME.
Advantages
- Complete independence from the functioning of any part of
the SLIC.
- All information is available.
- Real "snapshots" of the system state are taken.
Disadvantages
- Horribly time consuming and disruptive of normal data-flow
in the SLIC.
- Not trivial to ensure that data is ready when asked
for.
- Difficult to provide event stamps on most of the
information.
Summary
This type of scheme would probably involve too much
overhead. Directly accessing each SLIC element is more
appropriate to do outside of normal running as part of
slic_alive and the test suite.
- VME Access to each DSP
Here each of the five DSPs collects status information from the
inputs through the
input trailer
word. They add their own processing statistics to this and
put it all in a convenient location. An external agent then
accesses this location over VME and reads the information.
Advantages
- More efficient than scheme 1).
- Format of monitoring data can be changed easily in DSP
code (including event stamps).
- Reduced dependence on functionality of the full SLIC
readout chain.
Disadvantages
- Information available only in status registers is lost.
- While each DSP is sending monitoring data to VME it cannot
accept or send normal data on the Link.
Summary
This scheme will probably also prove to be too disruptive to
normal data taking to be implemented.
- VME Access to DSP-5
In this scheme the administrator DSP serves as the collection
point for all monitoring information in a SLIC. Each worker DSP
collects input channel status via
trailer words and
adds its own status information. This data is included in the
normal worker DSP output, as in
Arthur's
proposal, and sent to DSP-5 over the link after it has
finished processing each event. DSP-5 then assembles the
information from all the worker DSPs into a convenient location
which is accessed over VME by an external agent.
Advantages
- Most of the advantages of scheme 2).
- More streamlined access to monitoring data.
Disadvantages
- Requires full readout chain in SLIC to be functional.
- Output from the SLIC is interrupted while DSP-5 is being
read over VME
- Some processing and memory burden on DSP-5.
Summary
This scheme is attractive in that it balances efficiency of SLIC
operation with autonomy from other boards in the system. We need
to study how VME accesses to the SLIC during data-taking will
really affect the system though.
- Monitoring Data as part of SLIC output
The SLIC can simply pass the buck on monitoring upstream to a
worker Alpha by sending monitoring data which is collected as in
scheme 4) out with its normal data output. This could
conceivably be streamlined by adding the monitoring block only
on collect status events. (Is this possible?)
Advantages
- Minimal impact on SLIC performance.
Disadvantages
- Requires upstream elements to be functional in order to
access SLIC monitoring data.
- Increases output data volume.
Summary
This would certainly be the easiest scheme to implement (for
SLIC people).