Probability
Descriptive Statistics provides methods to describe the variables measured in the sample and their relations, but it does not allow to draw any conclusion about the population.
Now it is time to take the leap from the sample to the population and the bridge for that is Probability Theory.
Remember that the sample has a limited information about the population, and in order to draw valid conclusions for the population the sample must be representative of it. For that reason, to guarantee the representativeness of the sample, this must be drawn randomly. This means that the choice of individuals in the sample is by chance.
Probability Theory will provide us the tools to control the random in the sampling and to determine the level of reliability of the conclusions drawn from the sample.
Random experiments and events
Random experiments
The study of a characteristic of the population is conducted through random experiments.
Definition - Random experiment. A random experiment is an experiment that meets two conditions:
- The set of possible outcomes is known.
- It is impossible to predict the outcome with absolute certainty.
Example. Gambling are typical examples of random experiments. The roll of a dice, for example, is a random experiment because
- It is known the set of possible outcomes:
. - Before rolling the dice, it is impossible to predict with absolute certainty the outcome.
Another non-gambling example is the random choice of an individual of a human population and the determination of its blood type.
Generally, the draw of a sample by a random method is an random experiment.
Sample space
Example. Some examples of sample spaces are:
- For the toss of a coin
. - For the roll of a dice
. - For the blood type of an individual drawn by chance
. - For the height of an individual drawn by chance
.
Tree diagrams
In experiments where more than one variable is measured, the determination of the sample space can be difficult. In such a cases, it is advisable to use a tree diagram to construct the sample space.
In a tree diagram every variable is represented in a level of the tree and every possible outcome of the variable as a branch.
Example. The tree diagram below represents the sample space of a random experiment where the gender and the blood type is measured in a random individual.
Random events
There are different types of events:
- Impossible event: Is the event with no elements
. It has no chance of occurring. - Elemental events: Are events with only one element, that is, a singleton.
- Composed events: Are events with two or more elements.
- Sure event: Is the event that contains the whole sample space
. It always happens.
Set theory
Event space
Example. Given the sample space
As events are subsets of the sample space, using the set theory we have the following operations on events:
- Union
- Intersection
- Complement
- Difference
Union of events
Definition - Union event. Given two events
The union event
Intersection of events
Definition - Intersection event. Given two events
The intersection event
Two events are incompatible if their intersection is empty.
Complement of an event
Definition - Complementary event. Given an event
The complementary event
Difference of events
Definition - Difference event. Given two events
The difference event
Example. Given the sample space of rolling a dice
- The union of
and is . - The intersection of
and is . - The complement of
is . - The events
and are incompatible. - The difference of
and is , and the difference of and is .
Algebra of events
Given the events
, (idempotency). , (commutative). , (associative). , (distributive). , (neutral element). , (absorbing element). , (complementary symmetric element). (double contrary). , (Morgan’s laws). .
Probability definition
Classical definition of probability
Definition - Probability (Laplace). Given a sample space
This definition is well known, but it has important restrictions:
- It is required that all the elements of the sample space are equally likely (equiprobability).
- It can not be used with infinite sample spaces.
Example. Given the sample space of rolling a dice
However, given the sample space of the blood type of a random individual
because the blood types are not equally likely in human populations.
Frequency definition of probability
The following definition of probability uses this theorem.
Definition - Frequency probability. Given a sample space
Although frequency probability avoid the restrictions of classical definition, it also have some drawbacks:
- It computes an estimation of the real probability (more accurate the higher the sample size).
- The repetition of the experiment must be in identical conditions.
Example. Given the sample space of tossing a coin
Given the sample space of the blood type of a random individual
Axiomatic definition of probability
Definition - Probability (Kolmogórov). Given a sample space
-
The probability of any event is nonnegative,
-
The probability of the sure event is 1,
-
The probability of the union of two incompatible events (
) is the sum of their probabilities
From the previous axioms is possible to deduce some important properties of a probability function.
Given a sample space
-
. -
. -
If
then . -
. This means that . -
. -
. -
If
, where are elemental events, then
-
. -
-
. As and are incompatible,If we think of probabilities as areas, it is easy to see graphically,
-
-
. As and are incompatible, .If we think of probabilities as areas, it is easy to see graphically,
-
. As , and are incompatible, .If we think again of probabilities as areas, it is easy to see graphically because the area of
is added twice (one for and other for $), so it must be subtracted once. -
Probability interpretation
As set by the previous axioms, the probability of an event
In a certain way, this number expresses the plausibility of the event, that is, the chances that the event
- The maximum uncertainty correspond to probability
( and have the same chances of happening). - The minimum uncertainty correspond to probability
( will happen with absolute certainty) and ( won’t happen with absolute certainty)
When
Conditional probability
Conditional experiments
Occasionally, we can get some information about the experiment before its realization. Usually that information is given as an event
In such a case, we will say that
Usually, conditioning events change the sample space and therefore the probabilities of events.
Example. Assume that we have a sample of 100 women and 100 men with the following frequencies
Then, using the frequency definition of probability, the
However, if we know that the person is a woman, then the sample is reduced to the first row, and the probability of being smoker is
Conditional probability
Definition - Conditional probability Given a sample space
This definition allows to calculate conditional probabilities without changing the original sample space.
Example. In the previous example
Probability of the intersection event
From the definition of conditional probability it is possible to derive the formula for the probability of the intersection of two events.
Example. In a population there are a 30% of smokers and we know that there are a 40% of smokers with breast cancer. The probability of a random person being smoker and having breast cancer is
Independence of events
Sometimes, the probability of the conditioning event does not change the original probability of the main event.
Definition - Independent events. Given a sample space
if
This means that the occurrence of one event does not give relevant information to change the uncertainty of the other.
When two events are independent, the probability of the intersection of them is equal to the product of their probabilities,
Example. The sample space of tossing twice a coin is
If we name
Probability Space
Definition - Probability space. A probability space of a random experiment is a triplet
is the sample space of the experiment. is a set of events of the experiment. is a probability function.
If we know the probabilities of all the elements of
Probability space construction
In order to determine the probability of every elemental event we can use a tree diagram, using the following rules:
- For every node of the tree, label the incoming edge with the probability of the variable in that level having the value of the node, conditioned by events corresponding to its ancestor nodes in the tree.
- The probability of every elemental event in the leaves is the product of the probabilities on edges that go form the root to the leave.
Probability tree with dependent variables
In a probability tree with dependent variables, the probababilities of every level of the tree are different depending on the outcome of the previous leves.
Example. In a population there are a 30% of smokers and we know that there are a 40% of smokers with breast cancer, while only 10% of non-smokers have breast cancer. The probability tree of the probability space of the random experiment consisting of picking a random person and measuring the variables smoking and breast cancer is shown below.
Probability tree with independent variables
In a probability tree with independent variables, the probabilities of every level of the tree are the same no matter the outcome of the previous leves.
Example. The probability tree of the random experiment of tossing two coins is shown below.
Example. In a population there are 40% of males and 60% of females, the probability tree of drawing a random sample of three persons is shown below.
Total probability theorem
Partition of the sample space
Definition - Partition of the sample space. A collection of events
- The union of the events is the sample space, that is,
. - All the events are mutually incompatible, that is,
.
Usually it is easy to get a partition of the sample space splitting a population according to some categorical variable, like for example gender, blood type, etc.
Total probability theorem
If we have a partition of a sample space, we can use it to calculate the probabilities of other events in the same sample space.
Theorem - Total probability. Given a partition
The proof of the theorem is quite simple. As
And all the events of this union are mutually incompatible as
Example. A symptom
What is the probability that a random person of the population has the symptom?
To answer the question we can apply the total probability theorem using the partition
That is, half of the population has the symptom.
Indeed, it is a weighted mean of probabilities!
The answer to the previous question is even clearer with the tree diagram of the probability space.
Bayes theorem
A partition of a sample space
In such cases it may be helpful to calculate the posterior probability
Definition - Bayes. Given a partition
Example. In the previous example, a more interesting question is about the diagnosis for a person with the symptom.
In this case we can interpret
However, if after examining the person we observe the symptom, that information changes the uncertainty about the hypothesis, and we need calculate the posterior probabilities to diagnose, that is,
To calculate the posterior probabilities we can use the Bayes theorem.
As we can see the probability of having the disease has increased. Nevertheless, the probability of not having the disease is still greater than the probability of having it, and for that reason, the diagnosis is not having the disease.
In this case it is said the the symptom
Epidemiology
One of the branches of Medicine that makes an intensive use of probability is , that study the distribution and causes of diseases in populations identifying risk factors for disease and targets for preventive healthcare.
In Epidemiology we are interested in how often appears an event or medical event
There are different measures related to the frequency of a medical event. The most important are:
- Prevalence
- Incidence
- Relative risk
- Odds ratio
Prevalence
Definition - Prevalence. The prevalence of a medical event
Often, the prevalence is estimated from a sample as the relative frequency of people affected by the event in the sample. It is also common to express that frequency as a percentage.
Example. To estimate the prevalence of flu a sample of 1000 persons has been studied and 150 of them had flu. Thus, the prevalence of flu is approximately 150/1000=0.15, that is, a 15%.
Incidence
Incidence measures the probability of occurrence of a medical event in a population within a given period of time. Incidence can be measured as a cumulative proportion or as a rate.
Definition - Cumulative incidence. The cumulative incidence of a medical event
Example. A population initially contains 1000 persons without flu and after two years of observation 160 of them got the flu. The incidence proportion of flu is 160 cases per 1000 persons per two years, i.e. 16% per two years.
Incidence rate or Absolute risk
Definition - Incidence rate. The incidence rate or absolute risk of a medical event
Example. A population initially contains
Prevalence vs Incidence
Prevalence must not be confused with incidence. Prevalence indicates how widespread the medical event is, and is more a measure of the burden of the event on society with no regard to time at risk or when subjects may have been exposed to a possible risk factor, whereas incidence conveys information about the risk of being affected by the event.
Prevalence can be measured in cross-sectional studies at a particular time, while in order to measure incidence we need a longitudinal study observing the individuals during a period of time.
Incidence is usually more useful than prevalence in understanding the event etiology: for example, if the incidence of a disease in a population increases, then there is a risk factor that promotes it.
When the incidence is approximately constant for the duration of the event, prevalence is approximately the product of event incidence and average event duration, so
Comparing risks
In order to determine if a factor or characteristic is associated with the medical event we need to compare the risk of the medical event in two populations, one exposed to the factor and the other not exposed. The group of people exposed to the factor is known as the treatment group or experimental group and the group of people unexposed as the control group.
Usually the cases observed for each group are represented in a 2
Event |
No event |
|
Treatment group (exposed) | ||
Control group(unexposed) |
Attributable risk or Risk difference
Definition - Attributable risk. The attributable risk or risk difference of a medical event
The attributable risk is the risk of an event that is specifically due to the factor of interest.
Observe that the attributable risk can be positive, when the risk of the treatment group is greater than the risk of the control group, and negative, on the contrary.
Example. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.
Flu |
No flu |
|
Treatment group(vaccinated) | 20 | 480 |
Control group(Unvaccinated) | 80 | 420 |
The attributable risk of getting the flu for people vaccinated is
This means that the risk of getting flu in vaccinated people is a 12% less than in unvaccinated.
Relative risk
Definition - Relative risk. The relative risk of a medical event
Relative risk compares the risk of a medical event between the treatment and the control groups.
There is no association between the event and the exposure to the factor. Exposure to the factor decreases the risk of the event. Exposure to the factor increases the risk of the event.
The further from 1, the stronger the association.
Example. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.
Flu |
No flu |
|
Treatment group(vaccinated) | 20 | 480 |
Control group(Unvaccinated) | 80 | 420 |
The relative risk of getting the flu for people vaccinated is
This means that vaccinated people were only one-fourth as likely to develop flu as were unvaccinated people, i.e. the vaccine reduce the risk of flu by 75%.
Odds
An alternative way of measuring the risk of a medical event is the odds.
Unlike incidence or absolute risk, that is a proportion less than 1, the odds can be greater than 1. However, it is possible to convert an odd into a probability with the formula
Example. A population initially contains
Observe that the incidence is 160/1000.
Odds ratio
Definition - Odds ratio. The odds ratio of a medical event
Odds ratio compares the odds of a medical event between the treatment and the control groups. The interpretation is similar to the relative risk.
There is no association between the event and the exposure to the factor. Exposure to the factor decreases the risk of the event. Exposure to the factor increases the risk of the event.
The further from 1, the stronger the association.
Example. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.
Flu |
No flu |
|
Treatment group(vaccinated) | 20 | 480 |
Control group(Unvaccinated) | 80 | 420 |
The odds ratio of getting the flu for people vaccinated is
This means that the odds of getting the flu versus not getting the flu in vaccinated individuals is almost one fifth of that in unvaccinated, i.e. approximately for every 22 persons vaccinated with flu there will be 100 persons unvaccinated with flu.
Relative risk vs Odds ratio
Relative risk and odds ratio are two measures of association but their interpretation is slightly different. While the relative risk expresses a comparison of risks between the treatment and control groups, the odds ratio expresses a comparison of odds, that is not the same than the risk. Thus, an odds ratio of 2 does not mean that the treatment group has the double of risk of acquire the medical event.
The interpretation of the odds ratio is trickier because is counterfactual, and give us how many times is more frequent the event in the treatment group in comparison with the control group, assuming that in the control group the event is as frequent as the non-event.
The advantage of the odds ratio is that it does not depend on the prevalence or the incidence of the event, and must be used necessarily when the number of people with the medical event is selected arbitrarily in both groups, like in the case-control studies.
Example. In order to determine the association between lung cancer and smoking two samples were selected (the second one with the double of non-cancer individuals) getting the following results:
Sample 1
Cancer | No cancer | |
Smokers | 60 | 80 |
Non-smokers | 40 | 320 |
Sample 2
Cancer | No cancer | |
Smokers | 60 | 160 |
Non-smokers | 40 | 640 |
Thus, when we change the incidence or the prevalence of the event (lung cancer) the relative risk changes, while the odds ratio not.
The relation between the relative risk and the odds ratio is given by the following formula
where
The odds ratio always overestimate the relative risk when it is greater than 1 and underestimate it when it is less than 1. However, with rare medical events (with very small prevalence or incidence) the relative risk and the odds ratio are almost the same.
Diagnostic tests
In Epidemiology it is common to use diagnostic test to diagnose diseases.
In general, diagnostic tests are not fully reliable and have some risk of misdiagnosis as it is represented in the table below.
Sensitivity and specificity of a diagnostic test
The performance of a diagnostic test depends on the following two probabilities.
Sensitivity and specificity interpretation
Usually, there is a trade-off between sensitivity and specificity.
A test with high sensitivity will detect the disease in most sick persons, but it will produce also more false positives than a less sensitive test. This way, a positive outcome in a test with high sensitivity is not useful for confirming the disease, but a negative outcome is useful for ruling out the disease, since it rarely misdiagnoses those who have the disease.
On the other hand, a test with a high specificity will rule out the disease in most healthy persons, but it will produce also more false negatives than a less specific test. Thus, a negative outcome in a test with high specificity is not useful for ruling out the disease, but a positive is useful to confirm the disease, since it rarely give positive outcomes in healthy people.
Deciding on a test with greater sensitivity or a test with greater specificity depends on the type of disease and the goal of the test. In general, we will use a sensitive test when:
- The disease is serious and it is important to dectect it.
- The disease is curable.
- The false positives do not provoke serious traumas.
An we will use a specific test when:
- The disease is important but difficult or impossible to cure.
- The false positives provoke serious traumas.
- The treatment of false positives can have dangerous consequences.
Predictive values of a diagnostic test
But the most important aspect of a diagnostic test is its predictive power, that is measured with the following two posterior probabilities.
Positive and negative predictive values allow to confirm or to rule out the disease, respectively, if they reach at least a threshold of
However, these probabilities depends on the proportion of persons with the disease in the population
Thus, with frequent diseases, the positive predictive value increases, and with rare diseases, the negative predictive value increases.
Example. A diagnostic test for the flu has been tried in a random sample of 1000 persons. The results are summarized in the table below.
According to this sample, the prevalence of the flu can be estimated as
The sensitivity of this diagnostic test is
And the specificity is
The predictive positive value of the diagnostic test is
As this value is over
On the other hand, the predictive negative value is
As this value is almost 1, that means that is almost sure that a person does not have the flu if he or she gets a negative outcome in the test.
Thus, this test is a powerful test to rule out the flu, but not so powerful to confirm it.
Likelihood ratios of a diagnostic test
The following measures are usually derived from sensitivity and specificity.
Definition - Positive likelihood ratio
Positive likelihood ratio can be interpreted as the number of times that a positive outcome is more probable in people with the disease than in people without it.
On the other hand, negative likelihood ratio can be interpreted as the number of times that a negative outcome is more probable in people with the disease than in people without it.
Post-test probabilities can be calculated from pre-test probabilities through likelihood ratios.
Thus,
- A likelihood ratio greater than 1 increases the probability of disease.
- A likelihood ratio less than 1 decreases the probability of disease.
- A likelihood ratio 1 does not change the pre-test probability.