Subject: glink transmission issues resolved
From: Michael J Mulhearn
Date: Wed, 20 Jul 2005 11:53:29 -0400 (EDT)
To: d0-l1cal2b@fnal.gov
CC: cwiok@fnal.gov, janderson@fnal.gov, jheinm1@uic.edu

Hello!

Mikolaj Cwiok reports that we have sent 5,000,000 events at 1.4kHz to L2 with no transmission errors reported by the FIC.

Transmission errors come in two varieties. First glink chip level errors:

 -> glink loses sync
 -> glink reconstructs invalid word

then errors on the D0 specified protocol for the data sent over glink
(i.e. the beginning and end of event markers):

 -> protocol error

We can now run for hours without the FIC ever issuing any of these transmission errors.

 ***

It appears that (contrary to evidence from our initial tests) the problems we were having were *not* from the L1Cal hardware.  Instead, we had three problems:

1)  VME_SCL was being inadvertently dropped to "online" mode, which
    uses a noisy oscillator.  This causes glink to lose sync.

2)  A (as yet unresolved) systematic problem with channel 0 of all tested
    VTM/FIC pairs (James Heinmiller is now looking into this).

3)  A Bad VTM.

The and of these three, and, once (1) was resolved, the and of (2) and (3) gave the appearance of sporatic errors across different channels and VTM/FIC hardware.  The logical (but wrong!) conclusion was that the untested new hardware was to blame.

It was only after a systematic study with the scope and logic analyzer and much help from John Anderson and James Heinmiller that we became convinced that the TABs were transmitting correctly.  We also arranged for many L1As to be sent without any VME transactions, which made it possible to generate errors *immediately* as opposed to sporatically.  These two developments led to a more systematic study of the receiver hardware (VTM and FIC) that allowed us to diagnose (2) and (3).

 ***

We are not done yet!  At the next stage of testing, we are looking for known test patterns received at L2.  In the 5,0000,000 events mentioned above, there were 39 events with the correct data format but with sections of data set to zero.  I have some speculations of what is causing this, but it is almost certainly at the data generation side of the test, and *not* a transmission error. Therefore, it does not make sense to calculate a BER rate from these results.

cheers,
mike