Update on TAB->L2 current BER measurements (18 Oct 2005, M.Cwiok, J.Heinmiller) =============================================================================== I.Some improvements implemented after readiness review on Aug 26th, 2005: ------------------------------------------------------------------------- 1. Mike implemented sending 16-bit XOR parity word at the end of each broadcasted event which allow fast on-the-fly event verification on the L2 side. 2. Mike corrected VMESCL/TAB firmware so that TAB is less vulnerable to SCL inits during off-line tests at a constant rate of L1Accepts. Before this fix the corrupted events occurred sometimes, which: * failed XOR test * had more 16-bit words than nominal 192 * had a 2nd header in the middle of event 3. Mikolaj observed that cron daemon's jobs running on Betas around 4am might cause MBUS communication to became stalled. Such MBUS failures occur not only during TAB->L2 tests at L1Accept rates between 2 and 24kHz, but also during tests using dedicated HotLink test event generator running at a sustainable rate of 2.5 kHz. Therefore, in the latest TAB->L2 tests described here all cron jobs were temporarily disabled. In addition, the BER test executable was running in the Real Time priority similar to executables in the on-line system. II.TAB->L2 test set-up: ----------------------- 1. Hardware tested: - VMESCL unit with the latest firmware (==> see II.2, I.1-2) sending L1Accepts at a constant rate of 12kHz - 2 TABs (#0 and #1) with the latest firmware (==> see II.2, I.1-2) - final version of Fiberdyne optical splitter: (2ch)x(1-to-2) that matches optical fibers with 50um core diameter - one standard VTM (#561030) - one FIC (#002) with modified ch.0 (#002) - one SFO (#13) serving as an analog HotLink fan out working in (6ch)x(1-to-2) mode - 2 MBTs (#1 and #3) with 4 channels being read out from each MBT (simultaneous test of 8 channels) - standard L2Beta (d0l2beta11) with disabled cron jobs (==> see recent issues above) 2. TAB/VMESCL configuration: - firmwares as of Oct 14th, 2005 - commands to configure VMESCL and TABs executed from tbilisi-clued0: $> cd /scratch/work/l1cal2b/tablib/run $> ./bin/powerdown 0 1 # power down TAB0 and TAB1 $> ./bin/configure stable 0 1 # firmware for TAB0 and TAB1 $> ./bin/setup_tab 1 1 165 # load TAB1's test memory with HEX xA5 $> ./bin/setup_tab 0 1 165 # load TAB0's test memory with HEX xA5 $> ./bin/setup_vmescl 10 47 0 # firmware for VMESCL, switch off broadcasting $> ./bin/send_const_l1a 47 # start broadcasting from TABs where is a frequency parameter, e.g: * 2 for 24kHz L1Accept rate, * 3 for 16kHz L1Accept rate, * 4 for 12kHz L1Accept rate <---- parameter used in this test * 23 for 2kHz L1Accept rate. III. Results: ------------- - constant L1Accept rate: 12kHz - overall test duration: 44h - longest running without any problems: 18h - total number of broadcasted events: N_evts=1.35e9 - total number of bits tested: N_bits=(32 bits per MBT word)*(96 words)*N_evts=4.1e12 bits per each MBT channel - number of corrupted events (see I.2): N_bad=1 * all 8 channels had same errors in this event what allow one to exclude: splitter, VTM, FIC, SFO and MBTs as a source of * a FIC monitoring program did not detect any G-Link and Protocol errors - number of MBUS communication failures (see I.3): N_mbus=4 IV. Conclusions: ---------------- 1. Two TABs tested simultaneously using final version of 1:2 passive optical splitter 2. Probability of corrupted event: p_bad = N_bad/N_evts = 7.4e-10 per event, absence of G-Link and Protocol errors and exactly the same location of 2nd header in all 8 MBT channels (4 for TAB0 and 4 for TAB1) suggest that it _can_ be related to the TAB firmware 3. Probability of MBUS failure: p_mbus = N_mbus/N_evts = 3.0e-9 per event; this is rather _not_ related to the TAB at all 4. Assuming that observed MBUS failures and 1 bad event were not related to the G-Link and Protocol errors, but rather to an MBT-to-Beta communication problem and to a momentary glitch of the D0 framework clock, respectively, one can estimate BER<2.4e-13 per TAB channel 5. Currently the probability of MBT hang is higher than probability of sending a "corrupted" event from the TAB. Solution: modify BER test code to ignore such MBT failures and re-start BER test automatically after re-configuring all MBTs via VME bus.