| Horizontal | Vertical | ||||
| Step | Description | Object Size | SW Inputs | Object Size | SW Inputs |
| 1 | Add ICR | 1 TT | as needed | 1 TT | as needed |
| 2.a | ROI Sums | 2x1 | TTS: 0-8 x 0-7 |
1x2 | TTS: 0-7 x 0-8 |
| 2.b | EM Iso Sums | 2 (2x1) | ROIs: 2-5 x 2-5 |
2 (1x2) | ROIs: 2-5 x 2-5 |
| 2.c | HAD Sums | 2x1 | ROIs: 2-5 x 2-5 |
1x2 | ROIs: 2-5 x 2-5 |
| 3 | ROI Compares | 2x1 | ROIs: 0-8 x 0-7 |
1x2 | ROIs: 0-7 x 0-8 |
| 4.a | EM Iso LM > 2a*Iso |
2x1 | ROIs: 2-5 x 2-5 |
1x2 | ROIs: 2-5 x 2-5 |
| 4.b | EM Frac LM > 2b*HAD |
2x1 | ROIs: 2-5 x 2-5 |
1x2 | ROIs: 2-5 x 2-5 |
| 5 | AND Iso & EM Frac | 1-bit | 2-5 x 2-5 | 1-bit | 2-5 x 2-5 |
| 6 | Find Highest Thresh Passed | 3-bits | 2-5 x 2-5 | 3-bits | 2-5 x 2-5 |
| 7 | Choose H or V | ROIs: 2-5 x 2-5 | |||
| 8 | Make Output Words (Thresh & Iso) |
EM: 2-5 x 2-5 | |||
Rules for making the choice are given below.
An important question is at what point in the algorithm to choose between the Horizontal and Vertical ROIs in a given 2x2 region. It turns out that latency is minimized if the choice is made just before writing threshold words for each ROI to the output word. So that is what we have decided on.
The following are also possible, but would cost a full BC in latency, and therefore disfavored.