SUMMARY OF STATUS:
Through a combination of measures described below, we have demonstrated
error free operation of the ADF->TAB LVDS link at the test stand, using 2
entire ADF crates, for eight hours without a single bit error.
We will of course extend this result, but it should be noted that this
meets the minimum requirements agreed upon for the stable operations
period for this link: the ability to run for at least eight hours without
any bit errors, with at least 50% of the system included. The minimum of
two ADF crates was specified during the review, and the eight-hour
benchmark was proposed by me and agreed upon by the L1Cal2B group during
planning for stable operations.
DETAILS OF LATEST TESTS:
In what follows, for brevity, I will only discuss systematic effects.
There were of course single-channel failures which were corrected by such
measures as unbending bent bins, replacing broken cables, and replacing
failing cards.
Extensive long term testing of this system has occurred at many stages.
The recent problems were with the initialization of the link, where we
discovered a flakey response to the deskew function performed by the
channel-link LVDS transmitter/receiver chipset. Each time the system was
initialized, about 10% of the links failed to deskew properly. The
marginal behavior was not restricted to a few bad channels, but it was
widespread throughout the system.
We have added pre-emphasis to 2 of the 4 ADF crates. This substantially
reduces the number of links that fail upon a global LVDS link
initialization. For the 50% of the system under consideration, the number
of failed channels on a typical global reinitialization drop to around 0-4
channels. Also, the total number of cables that ever fail for the present
configuration has dropped dramatically to around 15.
We did extensive testing of the start up failure rates of each cable. We
replaced the most problematic with ERNI 4m cables. We have also recently
aquired 3 6-meter high-quality AMP cables, which were also installed in
problematic locations. The higher quality cables have never been seen to
fail, at initialization or during long term operation.
With pre-emphasis and these cable replacements, there are no failed
channels in more than 90% of global link initializations. We have
demonstrated that by iteratively descewing the link, we can consistently
get the system to an error free state.
In the most recent test, the combination of pre-emphasis, select cable
replacement, and iterative descewing has allowed 2 ADF crates to
successfully send data to the TABs error-free for eight hours.
Although we are encouraged by this result, we do not consider our job done
until we have convinced ourselves that we have the most robust system we
can achieve. For that reason, we are planning additional measures for the
installation we are convinced will make the system more robust. In
particular, we consider the iterative-descewing procedure to be highly
undesirable for the installed system. However, we are able to operate the
system in its present state.
PLANNED IMPROVEMENTS:
The three four-meter ERNI cables and three six-meter high-quality AMP
cables have never failed at startup or during long term running. This
includes several thousand link initializations run continuously during the
past three nights. For this reason, we are convinced that the AMP cables
we have purchased are of poor quality. We propose to replace them with
ERNI cables.
We have also directly observed that shorter cables are more reliable than
longer cables. For this reason, we wish to use the shortest cables
possible to reach the ADF crates. Even though it appears that we can
operate with 5-meter cables (and even 6-meter cables) it is evident that
we have less of a margin of error than is desirable. An easy way to
increase the margin is to use shorter cables.
WHY NOT PUT PRE-EMPHASIS ON ALL FOUR ADFS?
Two of the four ADF crates are within 3 meters of the TAB/GAB backplane.
For this length, it may be possible to operate the link without
pre-emphasis or with less pre-emphasis. This is desirable as the
pre-emphasis does add more high-frequency noise to the system.
CONCLUSION:
I believe we have shown we can "get by" with the present system, and have
a clear plan to make the full installation far more robust. There's more
work to be done, but I believe that we can go forward now with confidence.
cheers,
mike