[...] I have tried to do a direct comparison between the two DSPs, the 6414 and the 6203. This was possible, since we have prototype boards equipped with each of the DSPs. Two different versions of the DSP code were used: - The first one is based on the loop26 program. In this case, it is known that loop26 is not well optimized for the 64x. The already existing histogramming part of the DSP code (as shown at the December 2000 ROD meeting) was however modified in order to: a) optimize it for the 64x (remove the cross-path stalls); and b) handle histograms with 16-bit bins for low and medium gain, and 32-bit bins for high gain (instead of the previous 32-bit bins for any gain). In this case, cycle count differences (between the two DSPs) for the histogramming part, are only due to the 6414 cache architecture. - The second code based on loop16, a "brand new" program I developed and optimized for the 6414 last August. It is designed to perform exactly the same computation as loop26. Simulation studies done last September suggested that loop16 (the new code) should be twice as fast, on a 600 MHz 6414, than loop26 on a 300 MHz 6203. This code has been verified on the hardware (the 6414 Demo board) just a few weeks ago. In this case, the histogramming part of the code, had to be modified since loop16 writes the energy in integer format rather than floating-point. Also this histogramming code ideally (i.e. no cache inefficiency) needs 6 cycles for each of the 128 channels, as opposed to 6 cycles per channel-above-threshold for the histogramming code associated with loop26. The DSP computation time has been obtained, in real time, by using one of the DSP counters (the 6203 counter counts at 1/4 of the CPU frequency; on the 6414 it counts at 1/8 of the CPU frequency). This is a nice alternative to using an oscilloscope, since the time per event is now stored in the DSP output data along with the results of the computation, and can be read out via VME. The histogramming code allows to fill histograms only for channels above a user-specified energy threshold. For all measurements, this threshold was set to 1.9 ADC counts. (The main optimal filtering code, loop26 or loop16, still computes energy, time and chi2 for all channels). The main result is shown in the attached postscript file. Event by event, I plot the computation time (in microseconds) versus the number of channels above the energy threshold (for which histograms are filled). There are 4 main families of points: - blue is for loop26 on the 300 MHz 6203 silicon - black is for loop26 on the 600 MHz 6414 silicon - red is for loop16 on the 600 MHz 6414 silicon - magenta is for loop16 - device simulator results (Code Composer) Here I plot only the first 64 events. The three "horizontal" groups are for the optimal filtering computation (loop26 or loop16) alone; The four other groups, showing an increase in time as a function of the number of channels histogrammed, show the total processing time (optimal filtering plus histogramming). The 6203 behaviour is a "straight line" since there is no cache; for the 6414, the processing time fluctuates and is in fact data-dependent. Note also that the 6414 simulator does not exactly reproduce the silicon. However, if the simulator did not take into account the cache at all, the magenta plot would be flat. So, even if the 6414 is not twice as fast as the 6203 (especially when histograms come into play) it is still slightly better, and has another advantage - a bigger memory. S.Simion - June 2002