|
Type I and Type II Errors -
Making Mistakes in the Justice System |
|
Ever wonder how someone in America can be arrested if they really are
presumed innocent, why a defendant is found not guilty instead of
innocent, or why Americans put up with a justice system which sometimes
allows criminals to go free on technicalities? These questions can be
understood by examining the similarity of the American justice system to
hypothesis testing in statistics and the two types of errors it can
produce. (This discussion assumes that the reader has at least been
introduced to the normal distribution and its use in hypothesis testing.
Also please note that the American justice system is used for convenience.
Others are similar in nature such as the British system which inspired the
American system)
True, the trial process does not use numerical values
while hypothesis testing in statistics does, but both share at least four
common elements (other than a lot of jargon that sounds like double talk):
- The alternative hypothesis - This is the
reason a criminal is arrested. Obviously the police don't think the
arrested person is innocent or they wouldn't arrest him. In statistics
the alternative hypothesis is the hypothesis the researchers wish to
evaluate.
- The null hypothesis - In the criminal justice
system this is the presumption of innocence. In both the judicial system
and statistics the null hypothesis indicates that the suspect or
treatment didn't do anything. In other words, nothing out of the
ordinary happened The null is the logical opposite of the alternative. For example "not white" is the logical opposite of white.
Colors such as red, blue and green as well as black all qualify as "not
white".
- A standard of judgment - In the justice system
and statistics there is no possibility of absolute proof and so a
standard has to be set for rejecting the null hypothesis. In the justice
system the standard is "a reasonable doubt". The null hypothesis has to
be rejected beyond a reasonable doubt. In statistics the standard is the
maximum acceptable probability that the effect is due to random
variability in the data rather than the potential cause being
investigated. This standard is often set at 5% which is called the alpha
level.
- A data sample - This is the information
evaluated in order to reach a conclusion. As mentioned earlier, the data
is usually in numerical form for statistical analysis while it may be in
a wide diversity of forms--eye-witness, fiber analysis, fingerprints,
DNA analysis, etc.--for the justice system. However in both cases there
are standards for how the data must be collected and for what is
admissible. Both statistical analysis and the justice system operate on
samples of data or in other words partial information because, let's
face it, getting the whole truth and nothing but the truth is not
possible in the real world.
It only takes one good piece of evidence to send a
hypothesis down in flames but an endless amount to prove it correct. If
the null is rejected then logically the alternative hypothesis is
accepted. This is why both the justice system and statistics concentrate
on disproving or rejecting the null hypothesis rather than proving the
alternative. It's much easier to do. If a jury
rejects the presumption of innocence, the defendant is pronounced guilty.
Type I errors: Unfortunately, neither the legal
system or statistical testing are perfect. A jury sometimes makes an error
and an innocent person goes to jail. Statisticians, being highly
imaginative, call this a type I error. Civilians call it a travesty.
In the justice system, failure to reject the presumption
of innocence gives the defendant a not guilty verdict. This means only
that the standard for rejecting innocence was not met. It does not mean
the person really is innocent. It would take an endless amount of evidence
to actually prove the null hypothesis of innocence.
Type II errors: Sometimes, guilty people are set
free. Statisticians have given this error the highly imaginative name,
type II error.
Americans find type II errors disturbing but not as
horrifying as type I errors. A type I error means that not only has an
innocent person been sent to jail but the truly guilty person has gone
free. In a sense, a type I error in a trial is twice as bad as a type II
error. Needless to say, the American justice system puts a lot of emphasis
on avoiding type I errors. This emphasis on avoiding type I errors,
however, is not true in all cases where statistical hypothesis testing is
done.
In statistical hypothesis testing used for quality
control in manufacturing, the type II error is considered worse than a
type I. Here the null hypothesis indicates that the product satisfies the
customer's specifications. If the null hypothesis is rejected for a batch
of product, it cannot be sold to the customer. Rejecting a good batch by
mistake--a type I error--is a very expensive error but not as expensive as
failing to reject a bad batch of product--a type II error--and shipping it
to a customer. This can result in losing the customer and tarnishing the
company's reputation.
Justice System - Trial |
|
Defendant Innocent |
Defendant Guilty |
Reject Presumption
of Innocence (Guilty Verdict) |
Type I Error |
Correct |
Fail to Reject
Presumption of Innocence (Not Guilty Verdict) |
Correct |
Type II
Error |
|
|
Statistics - Hypothesis Test |
|
Null Hypothesis True |
Null Hypothesis False |
Reject Null
Hypothesis |
Type I Error |
Correct |
Fail to Reject Null
Hypothesis |
Correct |
Type II
Error |
|
|
In the criminal justice system a
measurement of guilt or innocence is packaged in the form of a witness,
similar to a data point in statistical analysis. Using this comparison we
can talk about sample size in both trials and hypothesis tests. In a
hypothesis test a single data point would be a sample size of one and ten
data points a sample size of ten. Likewise, in the justice system one
witness would be a sample size of one, ten witnesses a sample size ten,
and so forth.
Impact on a jury is going to depend on the
credibility of the witness as well as the actual testimony. An articulate
pillar of the community is going to be more credible to a jury than a
stuttering wino, regardless of what he or she says.
The normal distribution shown in figure 1
represents the distribution of testimony for all possible witnesses in a
trial for a person who is innocent. Witnesses represented by the left hand
tail would be highly credible people who are convinced that the person is
innocent. Those represented by the right tail would be highly credible
people wrongfully convinced that the person is guilty.
At first glace, the idea that highly
credible people could not just be wrong but also adamant about their
testimony might seem absurd, but it happens. According to the
innocence
project, "eyewitness misidentifications contributed to over 75% of the
more than 220 wrongful convictions in the United States overturned by
post-conviction DNA evidence." Who could possibly be more credible than a
rape victim convinced of the identity of her attacker, yet even here
mistakes have been documented.
For example, a rape victim mistakenly
identified
John Jerome White as her attacker even though the actual perpetrator
was in the lineup at the time of identification. Thanks to DNA evidence
White was eventually exonerated, but only after wrongfully serving 22
years in prison.
If the standard of judgment for evaluating
testimony were positioned as shown in figure 2 and only one witness
testified, the accused innocent person would be judged guilty (a type I
error) if the witnesses testimony was in the red area. Since the normal
distribution extends to infinity, type I errors would never be zero even
if the standard of judgment were moved to the far right. The only way to
prevent all type I errors would be to arrest no one. Unfortunately this
would drive the number of unpunished criminals or type II errors through
the roof.
|
|
figure 1. Distribution of possible witnesses in
a trial when the accused is innocent
|
|
|
figure 2. Distribution of possible witnesses in
a trial when the accused is innocent, showing the probable outcomes
with a single witness.
|
|
Figure 3 shows what happens not only to innocent
suspects but also guilty ones when they are arrested and tried for crimes.
In this case, the criminals are clearly guilty and face certain punishment
if arrested.
|
|
|
|
figure 3. Distribution of
possible witnesses in a trial showing the probable outcomes with a
single witness if the accused is innocent or obviously guilty.. |
|
figure 4. Distribution of possible
witnesses in a trial showing the probable outcomes with a single
witness if the accused is innocent or not clearly guilty.. |
|
|
|
If the police bungle the investigation and
arrest an innocent suspect, there is still a chance that the
innocent person could go to jail. Also, since the normal
distribution extends to infinity in both positive and negative
directions there is a very slight chance that a guilty person
could be found on the left side of the standard of judgment and
be incorrectly set free.
Unfortunately, justice is often not as
straightforward as illustrated in figure 3. Figure 4 shows the
more typical case in which the real criminals are not so clearly
guilty. Notice that the means of the two distributions are much
closer together. As before, if bungling police officers arrest
an innocent suspect there's a small chance that the wrong person
will be convicted. However, there is now also a significant
chance that a guilty person will be set free. This is
represented by the yellow/green area under the curve on the left
and is a type II error.
|
|
|
|
|
figure 5. The effects of increasing
sample size or in other words, number of independent witnesses. |
If the standard of judgment is moved to the
left by making it less strict the number of type II errors or
criminals going free will be reduced. This change in the
standard of judgment could be accomplished by throwing out the
reasonable doubt standard and instructing the jury to find the
defendant guilty if they simply think it's possible that she did
the crime. However, such a change would make the type I errors
unacceptably high. While fixing the justice system by moving the
standard of judgment has great appeal, in the end there's no
free lunch.
Fortunately, it's possible to reduce type I
and II errors without adjusting the standard of judgment. Juries
tend to average the testimony of witnesses. In other
words, a highly credible witness for the accused will counteract
a highly credible witness against the accused. So, although at
some point there is a diminishing return, increasing the number
of witnesses (assuming they are independent of each other) tends
to give a better picture of innocence or guilt.
Increasing sample size is an obvious way to
reduce both types of errors for either the justice system or a
hypothesis test. As shown in figure 5 an increase of sample size
narrows the distribution. Why? Because the distribution
represents the average of the entire sample instead of just a
single data point.
In hypothesis testing the sample size is
increased by collecting more data. In the justice system it's
increase by finding more witnesses. Obviously, there are
practical limitations to sample size. In the justice system
witnesses are also often not independent and may end up
influencing each other's testimony--a situation similar to
reducing sample size. Giving both the accused and the prosecution access to lawyers
helps make sure that no significant witness goes unheard, but
again, the system is not perfect.
About the only other way to decrease both the
type I and type II errors is to increase the reliability of the
data measurements or witnesses. For example the Innocence
Project has proposed
reforms on how lineups are performed. These include blind
administration, meaning that the police officer administering
the lineup does not know who the suspect is. That way the
officer cannot inadvertently give hints resulting in
misidentification.
The value
of unbiased, highly trained, top quality police investigators with
state of the art equipment should be obvious. There is no
possibility of having a type I error if the police never arrest
the wrong person. Of course, modern tools such as DNA testing are
very important, but so are properly designed and executed police
procedures and professionalism. The famous trial of O.
J. Simpson would have likely ended in a guilty verdict if the Los
Angeles Police officers investigating the crime had been beyond
reproach.
< Return to Contents
|
|
Statistical Errors
Applet
The applet below can alter both the
standard of judgment and distance between means for a
statistical hypothesis test. It calculates type I and type
II errors when you move the sliders. Like any analysis of
this type it assumes that the distribution for the null
hypothesis is the same shape as the distribution of the
alternative hypothesis.
Note, that the horizontal axis is set up
to indicate how many standard deviations a value is away
from the mean. Zero represents the mean for the
distribution of the null hypothesis.
When the sample size is one, the normal
distributions drawn in the applet represent the population
of all data points for the respective condition of Ho
correct or Ha correct. When the sample size is increased
above one the distributions become sampling distributions
which represent the means of all possible samples drawn
from the respective population. Standard error is simply
the standard deviation of a sampling distribution. Note
that this is the same for both sampling distributions
Try
adjusting the sample size, standard of judgment (the
dashed red line), and position of the distribution for
the alternative hypothesis (Ha) and you will develop a
"feeling" for how they interact.
Note that a type I error is often
called alpha. The type II error is often called beta.
The power of the test = ( 100% - beta). |
|
|
Applet 1. Statistical Errors |
Note: to run
the above applet you must have Java enabled in your
browser and have a Java runtime environment (JRE)
installed on you computer. If you have not installed a
JRE you can download it for free
here. |
|
|
|
|
|
|
|
|