Dear Colleagues, Given below is a first attempt at a list of testing documentation we should have in hand for the System Review. The actual tests required to produce this documentation are described below in Dan's Master List. Everyone will need to go over my list carefully to check for things I have missed. I have also made a first attempt at assigning names to documentation tasks. The idea is that the named person is responsible for making sure that the test they are in charge of gets done (you don't have to do all the work yourself though!) and for producing the documentation for that test at least a week before the review (i.e. Aug. 19). Please let me know if you think I've put you down for something that you aren't comfortable doing - or if you think you should be responsible for something. We should discuss this tentative assignment of responsibility at the meeting on Tuesday. Regards - Hal ---------------------------------------------------------------------- 18-Aug-05 DELIVERABLES FOR L1CAL SYSTEM REVIEW: AUGUST 26-27, 2005 ======================================================== Installation Strategy STONE --------------------- We need to pull together an updated description of the L1Cal installation procedure, including: 1) removal of old racks 2) installation of new racks 3) connection to services 4) cabling: - including labeling conventions and support schemes 5) hardware verification software 6) updated schedule Intra- and Inter-System Link Quality Tests ------------------------------------------ This comprises tests described in Dan's List: i) BLS Signal Digitization ii) Readout iii) Generation of And-Or Terms For each link in the system, we need to have: a) a BER limit (or the time the link ran without error) or a quantitative measurement of the analog signal quality b) a description of exactly what data was sent during the test Obviously, the more closely the data resembles actual L1Cal running conditions the better. The links to document are: 1) BLS to ADF (through transition system) BENITEZ 2.a) single ADF to single TAB (through ATC) UNALAN 2.b) multi-ADF to TAB(s) (trhough ATC) UNALAN 3) TAB algorithm performance in hardware vs sim JOHNSON 4) TAB to L2 & L3 (using optical splitter) CWIOK/NAYEEM 5) TAB to Cal-Track FANTASIA 6) TABs to GAB JOHNSON 7) GAB to L2/L3 (using optical splitter) MULHEARN 9) GAB trigger term creation in hardware vs simulation JOHNSON 9) GAB and/or terms to TFW MULHEARN 10) Readout of ADF/TAB/GAB readback memories during MULHEARN operation - this is a first step toward "Monitoring" in Dan's List Monitoring ---------- Status (and plots, if possible) should be presented for the following: 1) Pulser system RENKEL 2) Examines YORK/ADAMS 3) Online Monitoring/Alarms EVANS - probably not much will be available for this, but we should present a plan Trigger/COOR Files (c.f. Dan's List item "Controls") LAMMERS ------------------ The status of this should be described. To include are: 1) a list of "special" trigger files and what they do 2) formats (if available) of new "trigger configuration files" to be used by COOR and the Trigger DB Slice Test / Data Readout LAMMERS/MULHEARN ------------------------- The goal of this test is to show that the components of the system have been integrated with each other and that they all function reliably together. This test involves (see Dan's "Readout" section for more details) 1) Operation of a "System Slice" (BLS-->Trans-->ADF-->TAB-->GAB) including L2/L3 readout (& data to Cal-Track?) for an extended period of time. 2) Data input should come from (in order of preference) a) real Cal data taken during normal running b) pulser system data c) hand pulser data 3) Data should be written to tape for analysis 4) TFW terms should be checked (at least) for basic timing The end products of this test should be: 1) "X" in the statement: We ran for X hours without problems. 2) plot of RunIIb TTs vs precision readout (vs Run IIa TTs) 3) Measurement of coherent noise in data Simulation & Trigger List ------------------------- As discussed in the meeting, simulation will not be a major part of this review. We do need to show that the algorithms in hardware agree with what we expect, but that is included above. The following will NOT BE NECESSARY for the review. 1) Plots from tsim_l1cal2b showing that it produces TAYLOR sensible results for all algorithms: - JET, EM, TAU(?), MET, AND/ORs 2) proposed Run IIb trigger terms BARBERIS - plots of: turn-on curves, rate vs efficiency for each term 3) comparison of simulation with hardware JOHNSON (as much as possible) 4) progress on / plans for inserting these terms BARBERIS into the Trigger DB ------------------------------------------------------------------------ DAN'S MASTER LIST OF NECESSARY TESTS TO PROVE SYSTEM READINESS ============================================================== Functions to be tested: BLS Signal Digitization Readout Generation of And-Or Terms Controls Monitoring Documentation BLS Signal Quality 1. BLS Signal Digitization Prove that you can correctly digitize the analog BLS signal and send the resulting digital representation of this signal to the TAB labeled with the BX Number that caused the energy deposit. - The first step involves using Stefano's Test Waveform Generator as a substitute for BLS signals. We are making a new "adaptor" at the output of Stefano's TWG so that instead of plugging its test signals into the back of the ADF Crate backplane (as was used for production ADF-2 card testing) the TWG will now be able to plug its 32 test signals into the Patch Panel of the BLS Cable Transition System. Thus testing at this step will include flowing the analog BLS signal through the full BLS Cable Transition System. For this test the TWG makes sin wave stimulus signals of frequencies below, at the peak of, and above the principal Fourier component of a real BLS "bump". We analyze the resulting digital representation of this stimulus to check that: the input signal is wired to the correct ADF-2 channel, that the signal has the correct RMS amplitude, that the signal is still a sin wave (FFT), that there is no crosstalk. When the stimulus is turned off we also check the noise floor (aka pedestal width) to check that the noise level through the analog part of the system is OK. Together these tests show that the cabling, analog, and digitization circuits are working OK. Making these tests on the sidewalk just reuses existing software from the production ADF-2 card testing. We can also test all the racks this way when they are assembled in the MCH this fall. - Next prove that the BX Number labeling of the ADF-2 Et output data is correct. To do this use the existing Run I Cal Trig "hand pulser" to inject a signal into the Patch Panel BLS Cable connector for a known BX. While the stimulus signal from the "hand pulser" is only approximately the same shape as a real Calorimeter BLS signal its big point is that it reaches its peak, for a known and adjustable BX, at the same point in time as a real Cal BLS signal would for a Tevatron energy deposit from that BX. We capture and readout a turns worth of ADF-2 buffer data that shows both Et value sent to the TAB and its associated BX Number. Check to verify that the peak signal in this turns worth of data has the correct BX Number. This test shows that the ADF-2 is correctly labeling its Et output data with the BX Number that it comes from. This test can be done with existing software. - Show that the Channel Link transport of data from the ADF to the TAB is working reliably. This test is done with pseudo-random data generated by the ADF-2. All 3 Channel Link outputs of all 100 ADF-2 cards have already been checked using the Denis Saclay Channel Link Test Receiver. We are currently rechecking some cards with the Channel Link signals flowing through the (just now available) ADF Crate Transition Card. The big point here is to start making this test on the sidewalk with: a fully loaded ADF Crate, realistic LVDS cable routing that includes the ADF Crate Transition Card, and with the data testing done in the TAB card. I don't know if additional TAB firmware or control software is needed to make this test. The intent is that the ability to make this test should be kept readily available for the life of this system. At any point in the future when something is not working you may need to be able to test and prove that the ADF to TAB data transport part of the system is OK. Together the above steps prove that the TAB receives correctly digitized and BX Number labeled BLS signals. 2. Readout Prove that when sent an L1_Acpt for a given BX and then an L2_Acpt for that event, that the TAB and GAB will correctly readout their event data from the correct BX, and send this data to both L2 and to L3-DAQ. This is a big point. Without this working we can not debug the more detailed Physics aspects of the system. In order for us to move forward, readout from the new L1 Cal Trig must work reliably, i.e. It must work every time you turn it on. It must run for hours at the full L1 rate without getting stuck and hanging the DAQ system. Every bit of data in every event must be understood and correct. There are a number of aspects to prove that readout is working: - Prove that the optical split is working and that the same data reliably reaches both L2 and L3-DAQ. If it is hard (or takes special software) to compare L2 data and L3-DAQ data then do the split the other way. Split and send two copies of the TAB-GAB data to the L3-DAQ readout VRB(s). Then record a ton of events and compare the two copies. This could all be done between stores. For example prove that you could run the sidewalk system in ZB between Stores. That is, prove that the "readout engine" for this system is working and never hangs during a ton of ZB events and that the two copies that you send to VRB(s) in its readout crate are always the same. I don't care what the data is. If things don't work at this fundamental "readout engine" level then work must be done down at that level before moving forward. - We need to prove that in response to an L1_Acpt for BX Number "N" that the TABs and GAB actually read out their data from BX Number "N". Remember these cards have to buffer and hold on to this readout data in anticipation of a possible L1_Acpt. If an L1_Acpt does arrive then they have to access the readout data from their buffer for the correct event. As far as I know this has not been checked at all so far. This can not be checked with fixed data. There are a number of ways that people could go about testing this. My note, "Using BLS signals in the Sidewalk Cal Trig test stand" from 7-June-2005 talked about this. Briefly: One could use the "hand pulser" to inject a signal for a known BX in front of the Splitter. Use the current L1 Cal Trig to control when L1_Acpts are issued, i.e. write a Trigger Configuration File that uses the current L1 Cal Trig to control when your trigger fires. Run this between Stores and prove that you see the signal in the TAB-GAB readout data. Get the readout working reliably enough so that it is rational to use 10 minutes of beam time for a special run at the end of a Store. Use the current L1 Cal Trig to generate the triggers for this special run. Clamp down the TT eta,phi coverage of the Reference Sets in the current L1 Cal Trig so that it only covers the 4 or 16 TTs that are connected to the sidewalk test stand. For this special run, readout at least: the TABs and GAB, the current L1 Cal Trig, and the Cal Precision Readout. Note that the sidewalk test stand equipment is just working as a readout system - it is not controlling when the L1 trigger fires. Quickly record 1000 events and then do the analysis to prove a correlation between TAB GAB readout and the Cal Precision readout. These were triggered events so all 1000 will have energy in them. If one either: does not understand the layout of the TAB data or if the TAB is reading out event data from the wrong BX, then there will not be a correlation. I would like to see this done in two steps. First make it work between Stores with the "hand pulser" and then once you know: how to do it and that it is working OK for the single channel that the pulser is connected to, make a special run with beam data. - Finally prove that the readout function of the Run2B Cal Trig is working well enough that it gets along OK in real Global Beam Physics Running. Once again this is just a test of the readout function of the new Cal Trig system. Have the new Cal Trig included along with all the rest of the crates in a normal Global Beam Physics Run and verify that there are no problems. This will require baby sitting the new system for a full 4 hour run (or a full Store). As you know, lots of other systems have had trouble at this step in their testing. This test is important because this is how the new system must operate on "day one" after the Shutdown is over. You must test this now before the shutdown starts. Because this is a normal Global Physics run not all events will have energy in the TTs that are connected to the sidewalk. But still one needs to look through the 50 events that are recorded each second and find a subset that have energy in these TT's and verify the correlation with Cal Precision Readout. Hunting for these events is simple because you can just look at the readout data from the current L1 Cal Trig to find events with energy in these TTs. These steps should go a long way to proving that the readout function of the new L1 Cal Trig is working OK. Without reliable readout data I don't know how anyone could hope to move forward with this project. 3. Generation of And-Or Terms Independent of what the actual TAB-GAB "trigger algorithm" is, prove that the And-Or Terms coming out of the GAB are for the correct BX and that their latency is OK. This set of tests is now looking at the trigger generation function of the new L1 Cal Trig. Note about "Latency". As far as I know, all of the recent talk about latency has been limited to the Cal-Trk Match path, i.e. Cal --> ADF --> TAB --> Cal-Trk_Match --> And-Or Terms to TFW. I have not seen latency numbers for the principal signal path, i.e. Cal --> ADF --> TAB --> GAB --> And-Or Terms to TFW. If we do not have an official estimate for the latency through the principal signal path then we should make one. The steps to actually test the generation of And-Or Term could be: - In the first step you can use the "hand pulser" and just a scope to look at the And-Or Terms from the GAB. The "hand pulser" can be set to inject a signal into some ADF channel once every turn for example for the first BX of the first Super Bunch of the turn. The TAB-GAB algorithm can be set for anything that will generate an asserted And-Or Term from that algorithm with just one TT having a lot of energy in it. On the scope you can look at a subset of the GAB output signals, i.e. a couple of And-Or Terms, the Gap Marker, and the Strobe. You can trigger the scope on the Gap Marker and then count over some number of Strobe Clocks until you reach the asserted And-Or Term(s). That count of Strobe Clocks tells you which BX the new Cal Trig system says is the interesting one with the energy deposit in it, i.e. which BX the TFW will subsequently issue an L1_Acpt for. Please note we are not using the horizontal scale of the scope to measure a time. Rather the scope is just displaying the state of digital signals. Counting the number of Strobe Clock edges over to the asserted And-Or Terms tells you directly which BX those And-Or Terms are asserted for. The And-Or Term signal path from a L1 Trig Sub-system to the Trigger Framework is a fully digital clocked synchronous connection. The are no cable lengths or little screwdriver pots on front panels that need to be adjusted to get the L1 Trig Sub-system to correctly tell the TFW which BX Number had the interesting energy deposit. The L1 Trig Sub-systems effectively label their And-Or Term data by marking their And-Or Terms that came from their analysis of the BX "that happened" at the beginning of each turn (the Gap Marker). The TFW takes the And-Or Term data from all of the various L1 Trig Sub-systems, lines this data up in time, and then uses it to make the L1 Decision for each tick. This test can be done at anytime even during a Store. No software is required. The result of this test tells you whether or not the GAB is asserting its And-Or Terms for the correct tick that actually had the energy deposit in it. One should repeat this test for the various TAB-GAB algorithms to verify that the And-Or Terms from all the algorithms are correctly aligned with the ticks that had the Cal energy deposits. - Next verify the latency of the And-Or Terms from the GAB. Making this test is as simple as plugging the And-Or Term cable from the GAB into the TFW and looking at monitor information from the TFW. This monitoring information will tell you whether or not the TFW is able to lock onto the GAB's And-Or Term cable signals and it will tell you how many ticks the TFW is storing them for before they are used to make the L1 Decision. The GAB just needs to be programmed and running normally. No And-Or Terms need to be firing. No signals need to be injected into the BLS inputs. - Finally verify that the whole string is actually working and L1_Acpts are being issued for the correct BX (the one with the energy deposit in it). Set up the "hand pulser" to inject a signal into the ADF input that is a the correct point in time to correspond with a real BLS signal from a known BX. Program the TAB-GAB with any algorithm that will cause And-Or Terms from the GAB to be asserted in response to a lot of energy in one TT. Down load a trigger configuration file that tells the TFW to use that And-Or Term in the definition of your L1 Trigger. Start your run and see what BX numbers your L1 Trig fires on. They should all match for BX Number that the "hand pulser" is injecting a signal on. Repeat this for the different TAB-GAB algorithms. This can all be done between stores and does not require any beam or special software. It does not matter what crates are being readout in response to the L1 trigger that was setup by your trigger configuration file. We only care about what BX Number your L1 trigger firing on. This set of 3 tests proves that the new L1 Cal Trig can correctly identify which beam crossings it thought were interesting. It is important to verify that this is working correctly. If there is a problem is this area then when we come out of the fall Shutdown at low luminosity the L1 Cal Trig would not be able to correctly identify the beam crossings with Calorimetry energy in them. Without being able to trigger on beam crossings with energy, you can not move forward with debugging the L1 Cal Trig (or work on other detector systems that depend on the L1 Cal Trig to find interesting beam crossings with good energy deposits in them). 4. Controls To be useful it is necessary to be able to control the new L1 Cal Trig from the context of the D-Zero DAQ System. Some test of this control chain needs to be made. The control chain is something like: Trigger Board decides what they want the L1 Cal Trig to look for. Trigger Meisters write trigger configuration files. At run time, i.e. right before the Store goes in, COOR reads the trigger configuration file, selects what resources in the L1 Cal Trig it will use to implement the required L1 Cal trigger functions and sends the appropriate high level ascii human readable messages to the TCC requesting that these trigger functions be setup. TCC receives these request messages from COOR, determines what values need to be loaded into what control registers in the L1 Cal Trig in order to implement these functions, initiates the VME cycles that load these control registers, and sends acknowledgment messages back to COOR to let COOR know that either its requests have been setup or else that there is a problem. I believe that some aspects of this control chain are actually a bit more complicated than what is shown above, e.g. there may be software that automates some aspects of writing the trigger configuration files, there may be a separate branch that controls simulators and such, because the messages from COOR are "incremental" and not "full context" with each message there is a predefined "Initialize State" that needs to be understood and verified. The point is that without the full control chain the new L1 Cal Trig hardware will not useful for doing Physics. I only know about testing the last step, i.e. TCC. For that we have the possibility of doing the following: From a separate menu screen that lets you pretend that you are COOR you can compose and sent to TCC examples of the various COOR messages, i.e. the various requests for how to setup the new L1 Cal Trig hardware. This menu screen supplies much of the syntax for composing these COOR messages. These messages are, by design, high level and clearly human readable. After sending a COOR setup request message to TCC an "expert" either examines directly the Cal Trig hardware or else studies the TCC log file (which contains time stamped full details of what TCC did to satisfy each message from COOR). This rather slow, expert intensive process, is repeated for the various messages that COOR can send to TCC. I have no idea about testing the steps above the COOR to TCC message. Clearly the development and testing of the COOR part depends on Scott Snyder. There are other auxiliary Controls functions that TCC needs to be able to take care of and that need to be tested, e.g. finding pedestals for the ADF-2 cards, setting up the various control registers in the SBC, VRBs, and VRBC in the readout crate at Initialize time, ... There is the whole big topic of "tester exerciser" which can be considered a TCC "controls function", i.e. loading test data into the ADF cards and then examining the output from TAB-GAB. The general outline of this is known, e.g. it is done on a per turn basis so that no system wide isochronous "I'm triggering you for test data now" signal is needed. Is demonstration of this capability part of the system review ? In any case there are a lot of lower level function that we need to make work before using a "tester exerciser" program. [Note from Paul & Darien: This tool should be in place for commissioning, but if the the same data integrity can is tested in other ways (e.g. offline), then that would be sufficient for the system review.] 5. Monitoring Without being able to monitor the operation of the new trigger hardware it is not a useful tool for doing science. Once the readout from the new Cal Trig is working someone should be able to modify the existing L1 Cal Examine to work with the new data format. The existing L1 Cal Examine displays, along with the existing TFW monitoring displays of individual L1 trigger rates and And-Or Term rates, are the basic blocks of online run time monitoring. At a basic level, testing these tools is as simple as asking someone to show you these displays in the control room. I believe that it is called the "Expert Mode" of the existing L1 Cal Trig Examine that allows you to make detailed per TT comparison of the L1 Cal readout Et values with those calculated from the Cal Precision Readout data. This is information that people will want to look at right after the first store coming out of the fall Shutdown. This is how we determine the "energy calibration" of the new L1 Cal Trig. We need to make this special version of the L1 Cal Trig Examine work now with current L1 Cal Trig data and with the data format from the new L1 Cal Trig. Because the TAB-GAB will readout into the event stream additional data besides the TT Et values, the current L1 Cal Examine could be expanded to allow this additional data to be displayed and studied. We should determine how we can make use of this addition data to verify the operation of the new L1 Cal Trig, write up what new displays should be added to the existing L1 Cal Examine, and then implement and test them. Are there "Alarm Messages" that should be sent from the new L1 Cal Trig to the D-Zero Significant Event System ? (I mean in addition to the obvious power supply alarms and such.) Can these be listed and their source, e.g. the Examine or TCC specified ? The CAN-Bus based monitoring of the basic Wiener Crate parameters is being implemented by Geoff and Fritz. This is power supply Voltage monitor and such. I believe that they basically take care of getting this monitoring data from the crates to the online system and then we need to make the monitoring display and alarm values part of this. This is needed and should be tested by the time that the new trigger starts Physics operation. 6. Documentation To the extent that this is a list of things to get ready for a system review then we need a list of what documentation must be available by the time of this review. This documentation should include the safety review operating permit documents and the basic shifter operating documents. 7. BLS Signal Quality On one hand the generation of the BLS Trigger Pickoff signals is kind of the responsibility of the "calorimeter group" and not the responsibility of the "calorimeter trigger group". These signals are generated by electronics on the BLS cards which is not a component that was supplied by the "cal trig group". On the other hand: We are all in this together. We can not do our job of generating good L1 triggers without good BLS Trigger Pickoff signals feeding into the L1 Cal Trig. It certainly is the our responsibility to at least analyze the BLS Trigger Pickoff signals, as seen by the L1 Cal Trig, and determine which ones have a problem. It is this last point where we are stuck. To date we have identified "bad BLS signals" by using the L1 Cal Trig Examine in "offline expert mode with text file output". Previously, this special version of the L1 Cal Examine would be run offline on a big file of a couple of hundred thousand events that were taken from a recent good quality Global Beam Physics run. There were enough events so that this examine could make a channel by channel comparison between the Precision Cal Readout and the L1 Cal Trig Et readout. This expert mode examine then identified bad L1 Cal Channels (which 90% + of the time were caused by bad BLS signals). It identified bad L1 Cal Channels based on a number of criteria, e.g. gain and noise. Much of this was based on calculating the average response of all the L1 Cal channels at a given TT Eta and then looking for channels in that group that were far from the average. As you can imagine there were a number of parameters in this program that needed to be tuned to most effectively make a bad channel list. The most effective use of this expert mode L1 Cal Examine was to run it right before a shutdown on a recent set of good quality beam Physics data. You then had the duration of the shutdown to examine and repair the L1 Cal Trig channels (BLS signals) that it had identified. A problem with this system is that once you thought that you had repaired a BLS signal, you had no way to test it until after the shutdown was over and you could once again collect beam Physics data. Status of the expert mode L1 Cal Trig Examine, as I understand it. This has not been run since fall 2003 or early 2004. I have no current list of what BLS signals need to be worked on. If the shutdown starts today I have no idea what BLS signals to test / repair. As of a phone conversation yesterday with Bob Kehoe this special version of the L1 Cal Trig Examine is either: lost or no longer works (or some problem like that). The other approach to determining which BLS signals need to be worked on is to analyze the L1 Cal Trig Et readout when the system is stimulated by the Precision French Cal Pulser system. This is how we debugged and maintained the BLS Trigger Pickoff electronics during Run I. This method has a number of advantages: You can make runs during the shutdown and see what problems you have fixed and what BLS signals still need to be worked on. This method is fast. The data collection time is fast - you only need a few events from each of the pulser patterns. The data analysis time is fast - the analysis code can run in just a few minutes. Because the different Precision French Cal Pulser patterns stimulate different subsets of the Calorimeter elements you can see the individual contributions to the TT sum. Thus you can find and fix "single BLS summer resistor problems" that you would never see in the analysis of beam Physics data. You can make a more precise test of the BLS signal response. The stability of the Precision French Cal Pulser is good enough that it is easy to see a change in the response of a BLS signal at the percent level. During a period of beam Physics running, between stores, it is easy to make one of these "pulser runs" and verify that all is OK. Tests that we need working to take care of the BLS signals: We need the "offline expert mode of the L1 Cal Trig Examine with text file output" working. We need this to make certain that we have a list of BLS signals to repair when the shutdown starts. We need this as a way to "bootstrap" the program that analyzes the Cal Trig Et readout of pulser run data. We need to verify that the program that analyzes the pulser data finds many of the same BLS signal problems as the expert mode examine does. We need the program to collect pulser data and the program to analyze the L1 Cal Trig Et readout of this pulser data to be working. We need to be able to locate BLS signal problems and verify the repair of BLS signal problems during the shutdown. We need to be able to locate and repair the "single BLS resistor problems". This has never been looked at during Run II. It would be very good to use pulser runs to maintenance of the system once we return to beam Physics running. Both the expert mode examine and the pulser data analysis programs need to be able to work with the readout data from the current L1 Cal Trig and with the data from the new system.