L1CAL MEETING MINUTES
			    12 March, 2002

Present
-------
o Saclay: D.Calvet, M.Irakli 
o Nevis:  J.Ban, H.Evans, J.Mitrevski, J.Parsons, B.Sippach, M.Tuts

General
-------
o Discusssion centered on the emails from John and Denis that were
  circulated in the last week. These emails are appended at the end of
  the minutes for reference.


1) Data Duplication
-------------------
o Possible to make duplication at ADF.
  - doubles cable count - not a big issue for small cables
  - need to identify a candidate cable 
  - Jamieson's cable(in Denis' note): $30/cable (6-pair, 6m)
  - requirement of halogen-free? - need to check
o LVDS stub to duplicate signals at TAB
  - not much savings here, but increase in complexity
    . note 512-768 signals/TAB depending on algorithm

DECISION
* Decide to do duplication at ADF


2) Data Transfer
----------------
o Two general classes of options:
  a) use FPGAs to send/receive LVDS signals
  b) use National Semiconductor Channel Link chipsets to send/receive

o FPGA LVDS
  - need to use Xilinx to do LVDS-to-LVDS with FPGAs
    . Altera (high end needed) doesn't have enough clock signals
    . could use small Xilinix chips for two cables
      > could use small cable proposed by Denis
      > but expensive ($60)
      > and power sequencing problems - require large switch-on currents
      > would not pack as densely as Channel-Link
    . algorithm chip would need to use highest speed rating --> much
      higher cost
  - Requires that both sender and receiver use the same family of chip
    . nobody is ready to choose right now

DECISION
* general agreement that this is not the best solution to pursue
* Channel Link seems to be the best way to go


o Two Channel-Link Possibilities were discussed
  - it was generally agreed that we should try to send data directly
    from an ADF to the TABs (2 cables per ADF)
    . not merge data from several ADFs onto a single cable
  a) chipset: 28-bit parallel to 4 (LVDS pairs) serial data
     . corresponds to Denis' suggestion 2/ below
     . send 3x8-bits each clock
  b) chipset: 48-bit parallel to 8 serial
     . only use 32-bit parallel to 6 serial
o Channel Link 28-to-4
  - Positive
    > fits in small 6-pair cable
  - Negative
    > high bandwidth: 582 Mbits/s
    > overhead at receiver: unpacking the 3x8-bits is complicated
    > route signals to processing FPGAs at 83 MHz (original scheme was
      60 MHz) 
o Channel Link 48-to-8
  - Have to check whether leaving off 2 signal lines is possible
    . this will be checked at Nevis
    . --> 6-pairs + clock
  - Cable possibilities
    a) Molex cables
    . stackable: 5-pin blocks
    . best suited to go on back of card - may be difficult on TAB
      front panel
    . need to check cost
    . but better fit to signals
    b) Amp cable: exists in 10-pair (see below)
    . this may cause real estate problems on the TAB

ACTION
* To decide between 48 or 28 bit version
  - need to study issue (48-8) on the ADF side
  - need to get input from National about dropping data lines
    >>> Note added after the meeting <<<<
        National confirms that it is possible to drop 2 of the data
	lines - meaning we could use a cable with 6 data pairs + 1
	clock pair
  - need to check cable
    . facilities exist to do this at Nevis


3) Crate Layouts
----------------
o ADF Crate/Cards
  - 6U with VME on 3U, hard-metric on P1
  - input cables from BLS
    . interleaved cables on board
  - output to TAB
    . cables on back
  - Saclay has prepared a drawing of how this could work
o TAB Crate/Cards
  - bring signals from ADF to front panel
    . need enough space for this
  - control signals on P3


4) Protocols
------------
o Several issues were raised here for discussion later
o What extra bits are needed
  - framing, parity, etc
o How to recover when alignment is lost
  - probably will require system-wide reset
o Initial synchronization
  - proposed scheme requires first data word on start-up to start
    with a 1
  - After that need to periodically check alignment

ACTION
* This should be discussed at the next meeting


----------------------------------------------------------------------------

From calvet@hep.saclay.cea.fr Mon Mar 11 11:31:35 2002
Date: Mon, 11 Mar 2002 17:25:27 +0100
From: Denis Calvet <calvet@hep.saclay.cea.fr>
To: "John Parsons" <parsons>, "Hal Evans" <evans>
Cc: "L1Cal Trigger -- Maris Abolins" <abolins@pa.msu.edu>,
        "Chip Brock" <brock@pa.msu.edu>, "Dan Edmunds" <edmunds@pa.msu.edu>,
        "Patrick LeDu" <ledu@hep.saclay.cea.fr>, "Hal Evans" <evans>,
        "John Parsons" <parsons>,
        "Emmanuelle Perez" <eperez@hep.saclay.cea.fr>,
        "Michael Tuts" <tuts@fnal.gov>, "Harry Weerts" <weerts@fnal.gov>,
        "Jovan Pavle Mitrevski" <jpm194@columbia.edu>, <mur@hep.saclay.cea.fr>,
        <mandjavi@hep.saclay.cea.fr>
Subject: Re: L1Cal video

Dear all,

Following last week ideas from John on the ADC to TAB links,
I wrote a series of comments.

We can discuss all these points tomorrow at the video conference.

Best Regards,
Denis.


>Now for our comments. We've thought about the implications of two main
>issues: ADF data duplication and ADF-to-TAB data transmission.
>
>1) Data Duplication
>As mentioned above, we can minimize the amount of ADF data duplication
>by tuning the amount of data dealt with in each TAB. The question is,
>where should this duplication be done? Two ideas have been proposed.
>  a) Duplicate the ADF data on the ADFs, doubling the cable count to
>     the TABS above the case of no duplication.
>  b) Send only one copy of the ADF data to the TABs and have each TAB
>     fan out some of its data to its neighbors using either a custom
>     backplane or point-to-point links.
>
>We believe that the system is greatly simplified if scheme a) is
>chosen. There are two main arguments for this choice:
>  i) The amount of data to pass around per board is much smaller if
>     the duplication is done at the ADF where there are only 32
>     channels to consider. Since the TABs share all of their data with
>     their neighboring boards (half going to board i-1 and half to
>     board i+1) there would be a huge amount of data to pass around:
>     8x32x2x8-bits in the 8x32 scheme and 12x32x2x8-bits in the 12x32
>     scheme. Any system we could devise to do the data sharing at the
>     TABs is much more costly and complicated than the increase in
>     cable costs when doing the duplication at the ADFs.
> ii) In any scheme we can think of there will be increased latency, of
>     at least one BC, due to the requirement of duplicating the TAB
>     data.

The total amount of data to be duplicated is identical whether the
duplication of the flows is made at the level of the ADC boards or
at the level of the TABs. After a first look, one should expect that
duplicating data within the same crate for the board that sits just
in the next slot can be made simpler/cheaper than duplicating it from
different crates that are a few meters apart. In the present case,
we have a good picture of a solution to send data at 250 MByte/s (i.e.
the throughput of a 32 channel ADC card) a couple of meters away; but
we do not have a scheme to send 1 GByte/s (i.e. what is shared between
each TABs pairs) 10 centimeters away. I do not have a solution either,
but I was thinking of the following possibilities:

1/ Duplicate the serial data stream of each ADC board by placing LVDS
stubs on the TABs at the receiving end of each cable from an ADC board.
This would keep the ADC to ADF cable count to its minimum and would not
introduce a noticeable additional latency.

2/ Wide (unipolar) busses going through the backplane in the TAB crate.
I guess that 64 bit wide busses between 2 adjacent cards are feasible.
Running one such bus at 64 MHz with double edge clocking leads to 1.024
GByte/s throughput. This scheme would introduce more latency than the
previous one, though it would be less than 1 BC because you need not
wait for a whole packet from an ADC card to start duplicating it.

I agree that both options require a significant effort in engineering
and for their validation. I also understand that adding 4 GByte/s
of I/O bandwidth at the TAB level is an option you are not particularly
keen on. In contrast, I do not have strong arguments against placing 2
identical output links on each ADC board. If none of the two ideas I
gave above are to be persued on your side, we can agree on making the
duplication of data at the level of the ADC cards, and double the cable
count.

>2) ADF-to-TAB data transmission
>
>Two main schemes have been discussed:
>     a) Channel Link LVDS chipset, which multiplexes by a factor of 6.
>For 32 channels, we would need 6 data pairs plus 1 CLK pair = 7 pairs.
>Or, in the case where data from more than one ADF is merged, one
>could send 48 channels over 8 data pairs plus 1 CLK pair = 9 pairs.
>
>     b) LVDS serializer/deserializers built into larger FPGA chips.
>For example, it has been mentioned to multiplex by a factor of 8 and
>send 32 channels over 4 data pairs plus 1 CLK pair.
>
>     We have spent quite some time comparing these two possibilities,
>and believe the Channel Link solution is strongly preferred.  There are
>several different reasons supporting this preference, and the reasons
>are of differing natures.  While perhaps no one reason is a "show stopper",
>we feel the collection together provides a persuasive case:
>
>   i) FPGA chip sizes and speed grades
>      - using the FPGA LVDS requires the use of large FPGAs, where
>this option is available on-chip.  In addition, meeting the LVDS
>speed requirement implies using higher speed grade FPGAs than would
>otherwise be necessary.  Both of these factors lead to increases
>in cost.
>      In contrast, John showed that, for the Channel Link case, we
>have demonstrated that the sliding window logic could be implemented
>in low-cost speed grade -3 Altera FPGAs. The choice of chip size is
>then also dictated by the logic requirements, and not by the
>presence/absence of on-chip LVDS.
>
>     ii) interface simplicity
>     - the Channel Link chipset provides the CLK and data recovery
>automatically.  You simply feed it up to 48 channels at 60 MHz at the
>ADF end, and get back the 48 channels of data at 60 MHz at the TAB end.
>In contrast, the FPGA LVDS solution relies on a more complicated
>protocol to establish and correct for skew effects upon
>system initialisation.
>
>    iii) PCB routability
>   - the FPGA solution implies high speed (~480 MHz) data lines all the
>way from the output pins of the ADF FPGA to the inputs of the corresponding
>TAB FPGA.  This requires very careful attention to PCB routing, skews,
>etc. and has a strong impact on the PCB layout/design (for example, it
>argues for placing the ADF FPGA very close to the output cable connector,
>which might otherwise not be natural if the same FPGA is to hold all
>the FIR logic for all 32 channels, given that the 32 ADCs will
>necessarily be spread out over the large PCB).
>
>   iv) ADC noise issues
>     - for FPGA LVDS, the required use of large more costly FPGAs
>implies a strong need to minimize the number of chips in order to be
>cost effective.  At the ADF end, this might imply coding all 32 FIR
>channels within a single FPGA. This would lead to the ADC digital
>outputs (10 * 32 = 320 per board) being sent over very long traces
>to be collected at this single FPGA.  This is not very "clean" from
>the perspective of noise on this mixed analog-digital board.  By
>contrast, the Channel Link solution allows use of small, cheap
>FPGAs placed very near the ADCs, allowing cleaner separation
>of the analog and digital functions of the board.
>
>     v) FPGA technology
>     - the FPGA LVDS solution seems only viable with Xilinx.  While
>large Altera FPGAs with on-chip LVDS do exist, they come with only
>a single PLL for CLK and data recovery.  This is not sufficient, since
>each FPGA on the TAB would have to receive data from several
>different ADF boards, each with their own CLK and data skew.
>Therefore, the FPGA LVDS scheme has a single technology available,
>namely Xilinx.  Concerns about the power-on current surges of
>large Xilinx chips worry us; we do not want to have to arrange the
>system such that one must turn on one chip at a time.


I had a look at the different connectors/cables that we could use.
On the receiving TAB end, 110 pin AMP 352272 looks adequate. For
cables we could use AMP 621409-6 (6 pairs) or 621411-5 (10 pairs).
2 connectors 110 pins take 10 cm in height and allow to plug 16
cables composed of 6 pairs. Hence, from the point of view of cabling,
connecting each TAB to 16 ADC cards does not require an intermediate
concentrator stage and could allow the TAB to fit in 6U format. By
using high density FPGAs, I think it would be worth trying to fit
each TAB on a 6U board without going to 9U that will be more
expensive and delicate in terms of clock distribution, grounding, etc.

Concerning the technology of links, I see 4 options:

1/ FPGA LVDS pins multiplexing 32 ADC channels over 4 pairs + 1 clock
+ 1 framing signal. Using double edge clocking, the signals on each
pair would have to be carried at BC x 64 / 2 = 242 MHz

2/ National Semiconducteur channel link 28 bit parallel to
4 LVDS pairs + 1 clock. The best way to fit 32 x 8 bits on
that chip is to send 3 x 8 bit at a time (i.e. use 24 bits out
of the 27 available -- I assume the 28th bit is used to transmit
a framing signal). With such scheme, we would clock the parallel
bus at BC x 11 = 83.27 MHz. This is just the capacity of chipset
DS90CR287/288A (rated 85 MHz). For each BC we transmit 27 x 11 = 297
bits and use 256. We have 41 spare bits for protocol, parity...
The signals on LVDs pairs are carried at BC x 11 x 7 / 2 = 291 MHz

3/  National Semiconducteur channel link 48 bit parallel to
8 LVDS pairs + 1 clock, carrying the data of 1 ADC card.
It would be sufficient to run the parallel bus at BC x 6 = 45 MHz
to transmit 36 x 8 bits per crossing, but the chip DS90CR481 has
a minimum clock rate of 66 MHz. So it is indeed more than what we
need -- it would reduce the latency of data transmission a give
a lot of spare bandwidth.

4/ Same as above grouping 2 ADC cards. We could run the parallel
bus at BC x 11 = 83.27 MHz to carry 2 x 33 x 8 bits per crossing.
We would have to put a framing pattern in the data stream for
frame recovery. The speed over the cable is like in case 2/.


So, coming to the discussion, I would reject solution 3/ because
it does not make an efficient use of the chipset/cable. This would
lead to 4 times as many cable wires compared to the minimum scheme
(1/ or 2/) with data duplication at the TAB level. Doubling the
amount of cables compared to the minimum scheme is probably OK, but
multiplying it by 4 does not seem judicious to me.

I would also reject solution 4/ which is not any better than 1/ or
2/ in terms of cabling and signal speed but adds the complication
of merging the data of 2 ADC cards in a single link, and imposes
a more complex frame recovery method.

Deciding between 1/ and 2/ is more difficult on my opinion. As far
as PCB routing is concerned, I agree that the FPGA solution will
need a good end-to-end design, but we should note that the speed
of signals over the wires and the cable will be lower than that
of the channel link solution. Hence, one can expect more
skew margin with the FPGA solution. If PCB traces are a worry, a
possible solution is to solder a cable that goes from the front
panel of a board to the closest point on the FPGA. With the channel
link solution, we will have both 85 MHz parallel busses and high
speed LVDS signal. On the TAP, you will have 16 28 bit wide or
16 48 bit wide busses -- that is 448 to 768 signals to route...
while the FPGA solution would require ~192 signals.

As for FPGA size and speed is concerned, the channel link will give
us a 85 MHz bus to drive. A 5 tap FIR running at BC x 2 requires
logic running at ~91 MHz. So, on the ADC board side, I do not expect
any gain in FPGA speed with the channel link solution. There might
be a gain on the TAB side, but there are only 10 boards, so I doubt
it will make such a big difference.
As for FPGA size, I had in mind that going to denser logic would be
better, but we have not yet decided on that. For another project, we
purchased 1M gate Xilinx Virtex 2 for ~300$, it is equivalent to 30
devices at 10 $...
While the price of high-end devices is likely to remain high,
the price of mid-range devices should go down. An argument for
going to denser logic is flexibility because a FPGA can be
reprogrammed, while the connections on a printed circuit board
cannot be changed.

For what concerns interface simplicity, it is true that channel
links can be used as a black box, while the use of the same functions
in FPGA requires an intellectual effort. The channel link devices
do not however provide a mechanism for frame recovery. This can be
easily added in the FPGA solution, and also in the 28 bit channel
link solution.

Concerning ADC noise issues, I am not particularly convinced by
your arguments. I even think that splitting the design across
multiple chips will lead to more noise than an integrated
scheme: all the external busses and signals that will interconnect
the different components will draw a significant amount of current
because of the relatively high capacitance of PCB traces compared
to on-chip lines. This will likely make more noise than keeping
as many signals as possible within the same chip and having the
minimum number of pins driving higher capacitance external traces.

Concerning FPGA technology, I think that Altera parts are indeed
more advanced than their competitors: Altera claims 1 Gbps per
LVDS pin, while Xilinx advertise only 840 Mbps. In Altera devices,
I agree that we will be short in PLL counts (I think there are
2 PLL for LVDS high speed receivers) if each links has its own
clock. The situation is better with Xilinx; one would need at least
2 chips per TAB to have 16 input links. With Altera, one could think
of APEX 20 series (200K to 1M gate) and place 8 such devices per TAB.
I haven't look very much into that.

On my opinion, the FPGA solution is a more modern but less proven
way to go. The channel link solution is conventional; it may
not be simpler/cheaper than the FPGA solution, but it follows what
most people have done so far (e.g. the DFE), so it it rather "safe".
An argument that was not put by John in favor of the channel link
solution is that it would simplify the interface between us: once
we have agreed on a chipset and cable, each of us just need to
refer to a datasheet. For the FPGA solution, we probably would have
to make a specification ourselves (in VHDL?) and agree on it.
Though more flexible, it will require more transatlantic coordination
than freezing a choice on a given chipset. Also, the channel link
solution will leave more freedom on each side on whether to use Xilinx
or Altera parts and how to partition them. For the FPGA solution, we would
certainly need to use devices from the same vendor, and maybe from the same
family. It is a bit early on the Saclay side to decide on which vendor,
and which family we will use.

Although I would prefer that some R&D effort is put in the FPGA solution,
the channel link solution (2/) is certainly acceptable for us. In order
to avoid a premature decision, I would however suggest that a decision
is not made too early.