Subject: Eight hour LVDS test complete
From: Michael J Mulhearn
Date: Sun, 26 Feb 2006 02:19:18 -0500 (EST)
To: D0 Run2b L1Cal Trigger

SUMMARY OF STATUS:

Through a combination of measures described below, we have demonstrated error free operation of the ADF->TAB LVDS link at the test stand, using 2 entire ADF crates, for eight hours without a single bit error.

We will of course extend this result, but it should be noted that this meets the minimum requirements agreed upon for the stable operations period for this link: the ability to run for at least eight hours without any bit errors, with at least 50% of the system included.  The minimum of two ADF crates was specified during the review, and the eight-hour benchmark was proposed by me and agreed upon by the L1Cal2B group during planning for stable operations.

DETAILS OF LATEST TESTS:

In what follows, for brevity, I will only discuss systematic effects. There were of course single-channel failures which were corrected by such measures as unbending bent bins, replacing broken cables, and replacing failing cards.

Extensive long term testing of this system has occurred at many stages. The recent problems were with the initialization of the link, where we discovered a flakey response to the deskew function performed by the channel-link LVDS transmitter/receiver chipset.  Each time the system was initialized, about 10% of the links failed to deskew properly.  The marginal behavior was not restricted to a few bad channels, but it was widespread throughout the system.

We have added pre-emphasis to 2 of the 4 ADF crates.  This substantially reduces the number of links that fail upon a global LVDS link initialization.  For the 50% of the system under consideration, the number of failed channels on a typical global reinitialization drop to around 0-4 channels.  Also, the total number of cables that ever fail for the present configuration has dropped dramatically to around 15.

We did extensive testing of the start up failure rates of each cable.  We
replaced the most problematic with ERNI 4m cables.  We have also recently
aquired 3 6-meter high-quality AMP cables, which were also installed in
problematic locations.  The higher quality cables have never been seen to
fail, at initialization or during long term operation.

With pre-emphasis and these cable replacements, there are no failed channels in more than 90% of global link initializations.  We have demonstrated that by iteratively descewing the link, we can consistently get the system to an error free state.

In the most recent test, the combination of pre-emphasis, select cable replacement, and iterative descewing has allowed 2 ADF crates to successfully send data to the TABs error-free for eight hours.

Although we are encouraged by this result, we do not consider our job done until we have convinced ourselves that we have the most robust system we can achieve.  For that reason, we are planning additional measures for the installation we are convinced will make the system more robust.  In particular, we consider the iterative-descewing procedure to be highly undesirable for the installed system.  However, we are able to operate the system in its present state.

PLANNED IMPROVEMENTS:

The three four-meter ERNI cables and three six-meter high-quality AMP cables have never failed at startup or during long term running.  This includes several thousand link initializations run continuously during the past three nights. For this reason, we are convinced that the AMP cables we have purchased are of poor quality.  We propose to replace them with ERNI cables.

We have also directly observed that shorter cables are more reliable than longer cables.  For this reason, we wish to use the shortest cables possible to reach the ADF crates.  Even though it appears that we can operate with 5-meter cables (and even 6-meter cables) it is evident that we have less of a margin of error than is desirable.  An easy way to increase the margin is to use shorter cables.

WHY NOT PUT PRE-EMPHASIS ON ALL FOUR ADFS?

Two of the four ADF crates are within 3 meters of the TAB/GAB backplane. For this length, it may be possible to operate the link without pre-emphasis or with less pre-emphasis.  This is desirable as the pre-emphasis does add more high-frequency noise to the system.

CONCLUSION:

I believe we have shown we can "get by" with the present system, and have a clear plan to make the full installation far more robust.  There's more work to be done, but I believe that we can go forward now with confidence.

cheers,
mike