# Systematic errors
A *statistical* error is one that's due to some inherit randomness in
your process of making a measurement. A *systematic* error comes from a
consistent bias in that measurement, but you don't know how much that
bias is. The systematic error is the limit you assign to the potential
range of that bias.
To explain this concept, I like to start with that old statistics
example: measuring the size of a table with a ruler. You repeat this
measurement every day. There is some variance in the day-to-day
measurement: you tilt your head differently, the light in the room
depends on time of day, you're feeling tired that day, etc.
There's a reasonable chance that if you were to plot these measurements,
the result would look like a Gaussian distribution. The standard
deviation of that distribution would be related to the *statistical*
error in your measurement.
To understand the *systematic* error in the measurement, you have to
ask: How do we know that 20*cm* as measured on your ruler is the same as
20*cm* as measured on mine? Or 20*cm* as measured by the Physics
Department of Polytechnic Prep in Birnin Zana? Or the International
Committee for Weights and Measures in Saint-Cloud, Hauts-de-Seine,
France?
:::{figure-md} kilogram-fig
:class: align-center
by Randall Munroe
:::
I'll give you a hypothetical chain of reasoning along the lines that a
physicist (or a metrologist) might use to think about systematic errors.
Assume that your ruler is similar to the one sitting next to my desk
right now, a cheap one I purchased at a drug store 30 years ago.
- The ruler is made of plastic. I assume liquid plastic was poured
into a mold then allowed to harden. What are the thermal
characteristics of this particular type of plastic? Does it shrink
when it's cooled? Does its shape distort when it gets hot in my
apartment? Has it become warped over the past 30 years due to the
age of the plastic or the conditions in which I've stored it?
- If a metal mold was used to shape the plastic, does it have thermal
characteristics of its own? It might have been shaped at room
temperature, yet plastic is poured into it at some higher
temperature. Is this temperature variation enough to distort the
mold to some degree?
- How was that mold made? Did it start out as a block and then was
shaped at a tool-and-die factory? What was the precision of the
drill, mill, or press used to create that mold?
- Who manufactured that drill, mill, or press? How accurate was the
tool that made it?
And so on.
Your probable reaction to the above list is that all these effects are
too small to worry about for an actual 30*cm* plastic ruler being used
to measure a typical living-room table. Let's consider a more
realistic scenario: the imaginary experiment mentioned in
{numref}`Figure %s `, the discovery of the P
particle.
For the purposes of this example, the P particle is hypothesized to be
emitted by a rare decay of Vb299. The energies of the decay products of
Vb299 are measured with a calorimeter. The detector setup is located
under the Jabari mountains, but even so enough cosmic rays get through
to be a substantial background for the rare signal they're trying to
detect.
The calorimeter measures the energy of the particles and returns some
value in millivolts. You have to calibrate the calorimeter, to translate
those millivolts into *WeV*. The typical way to do this is to shoot a
beam of particles of known energy at the calorimeter, and see how many
*WeV* corresponds to the calorimeter output in millivolts.
- A calorimeter has some energy resolution. Even if you shoot a beam
of known energy into one, you're going to see a spread in the
resulting detector response. Perhaps that distribution will look
like a Gaussian, but you'll still have to fit it. Take another look
at {numref}`Figure %s `.
There's a fitting error associated with the mean and
sigma of the distribution. The width of that distribution is your
energy resolution; the error in the mean is a systematic error of
your energy calibration.[^f127]$^{,}$[^resolution]
- What is the exact energy of that beam of electrons used to calibrate
the calorimeter? The electrons might be extracted from an ARC
reactor and sent through a chain of focusing and steering
magnets. The final step is to point the calibration beam at the
calorimeter with a bending magnet to select those electrons with a
given energy. The mean energy of the beam will depend on the
magnetic field of the final bending magnet. How well do you know
that magnetic field? That will be another source of systematic
error.
- You have to separate the energy signatures of the P particle from
those of the cosmic rays that pass through the calorimeter. How well
can you identify the event type? You'll apply various analysis cuts
(there are examples of this in the main ROOT tutorial), but there's
always a limit to their efficiency, for another source of systematic
error.
- The above were sources of *experimental* systematic error. Now let's
consider a *theoretical* systematic error: Both Dr. Shuri Wright and
Dr. William Ginter Riva have published models of the predicted
energy spectrum from Vb299 decays involving the P particle. The
separation of your signal from the cosmic-ray background depends on
the model. You must perform your analysis with both models and treat
the difference as a theoretical systematic error.
You may feel that these examples are as unimportant as the systematic
errors I hypothesized for the ruler,[^f128] but they were adapted from
cases within experiments I've worked on, The relative sizes of such
errors are much larger than the errors in a plastic ruler due to a
milling machine. If anything, I've underestimated the number of
systematic errors considered in a typical physics experiment.
In case my fictional example left you dubious about the concept of
systematic errors, here's a systematic error table from a real physics
analysis. Note how the total error at the bottom is dominated by the
systematic errors over the statistical errors. In particular, the
largest systematic error is "ISR and FSR" (Initial State Radiation and
Final State Radiation) which is a theoretical systematic error.[^adding]
:::{figure-md} atlas-systematics-fig
:class: align-center
From ATLAS PUB Note
ATL-PHYS-PUB-2018-001
31st January 2018
Investigation of systematic uncertainties on the measurement of the
top-quark mass using lepton transverse momenta
:::
I did a lot of hand-waving to condense what little I know about
statistics into these pages without (I hope) getting too bogged down
in the math. If you'd like more rigorous explanations of these concepts,
see my [list of statistics
books](https://www.nevis.columbia.edu/~seligman/root-class/links.html).
Anyone can make a measurement. Understanding the error on that
measurement is the true skill of a physicist.
:::{figure-md} error_bars-fig
:class: align-center
by Randall Munroe
:::
:::{note}
Don't laugh too quickly. Adding statistical and systematic errors can be
a tricky business. Often an experiment will report them separately, and
sometimes will plot them in a similar way as this cartoon.
:::
[^f127]: For a $\chi^{2}$ fit, the uncertainty in a parameter comes from
shifting that parameter and looking at its change about the minimum
when $\chi^{2}$ varies by ±1. I don't expect you absorb that bit of
arcane trivia right now; it's enough to know that any fits to points
with error bars will necessarily have error estimates in the fit
results.
By the way, this is the answer to the {ref}`statistics question `
I posed back in {ref}`Fitting a Histogram `.
[^resolution]: The error in the mean from fitting to the detector
response is usually reported as the "energy calibration." The
standard deviation of that fit is the "energy resolution." You'll
usually see these two reported separately (as in {numref}`Figure %s `).
In my thesis experiment, it took us years to understand both the
energy calibration and the energy resolution, and their correlation
with each other. In part this is because they're also a function
of energy; e.g., the energy resolution is often reported using a
formula like $\sigma(E)/E=K/\sqrt{E}$, where $K$ has to be
determined by the analysis.
[^f128]: You might be justified in this impression given the obscure
pop-culture references. If you didn't get the references, do a web
search on Birnin Zana, then ask yourself which element is Vb299 and
what *WeV* stands for.
[^adding]: Let's add up those individual errors. Wait... the total is
7.27, not 2.27! What's happening?
The answer is that the errors are being added in quadrature. If a
given error is $\delta_{i}$, then adding them in quadrature means
to compute $\left(\sum_{i} \delta_{i}^{2} \right)^{1/2}$.
But that computation assumes that none of the systematic errors
are correlated; i.e., there are no $\delta_{i} \delta_{j}$ terms
with $i \neq j$. Is that necessarily true? For example, in
{numref}`Figure %s `, what if the "Jet
energy scale" was correlated with the "Jet energy resolution"?
At this point, you may be coming to understand the complexity of
handling errors in a physics analysis.