Degrees: Pharmacy, Biotechnology
Date: January 10, 2017

# Descriptive Statistics and Regression

## Question 1

The table below gives the distribution of the waiting time (in minutes) at the emergency room of a set of patients.

1. Plot the ogive of the waiting time.
2. Compute the median of the distribution, and explain its meaning.
3. What percentage of patients have waited for longer than 38 minutes?

## Question 2

To study fertility in two different populations $A$ and $B$, a sample of each population was taken and the number of pregnancies for each woman was recorded. The results of such records are shown below.

1. Draw the box diagram of each sample and compare them.
2. In which of the two samples is the mean more representative? Justify your answer.
3. Compute the skewness coefficient for both samples; which one is more skewed?
4. What is relatively bigger, a case of 5 pregnancies in sample $A$, or a case of 3 pregnancies in sample $B$?

Consider the following sums for your computations:
$\sum a_i=51$, $\sum a_i^2=199$, $\sum (a_i-\bar a)^3=-11.6016$, $\sum (a_i-\bar a)^4=217.9954$,
$\sum b_i=20$, $\sum b_i^2=52$, $\sum (b_i-\bar b)^3=49.5$, $\sum (b_i-\bar b)^4=220.3125$.

## Question 3

A study to find the relation between the reduction in cholesterol levels in blood and exercise has been carried out. The results are shown in the table below.

1. Which regression models explains better the reduction of cholesterol as a function of the exercise time, the linear o the exponential? Justify the answer.
2. According to the linear regression model, how much will be the reduction in cholesterol when the exercise time is increased by one minute?
3. According to the logarithmic model, how much exercise time is needed to get a reduction of cholesterol of 100 mg/dl? Is this estimation reliable? Justify your answer.

Consider the following values for your computations, where $X$=exercise time in minutes, and $Y$=cholesterol reduction:
$\sum x_i=2148$, $\sum \log(x_i)=53.0559$, $\sum y_j=199$, $\sum \log(y_j)=27.1766$,
$\sum x_i^2=507082$, $\sum \log(x_i)^2=282.9578$, $\sum y_j^2=5779$, $\sum \log(y_j)^2=80.035$,
$\sum x_iy_j=50750$, $\sum x_i\log(y_j)=6359.0468$, $\sum \log(x_i)y_j=1097.978$, $\sum \log(x_i)\log(y_j)=147.0682$.

# Probability and random variables

## Question 4

The medical emergency services of a town gets 6 requests per day in average. This service is staffed with three shifts of 8 hours each.

1. Compute the probability of getting more than 3 requests in an 8 hours shift.
2. Compute the probability that in some of the three shifts there are no requests.

## Question 5

The prevalence on certain disease in a population is 10%. A diagnosis test for that disease has a sensitivity of 95% and a specificity of 85%.

1. Compute the positive and negative predictive values and explain the result obtained. What is the test more useful for, to detect the disease or to rule it out?
2. What should be the specificity of the test so that the test has a positive predictive value equal to 80%?

## Question 6

In a study of blood pressure on 8000 individuals, it has been recorded that 2254 people show readings of blood pressure above 130 mmHg, and 3126 individuals show readings between 110 and 130 mmHg. Assume that blood pressure is normally distributed.

1. Compute the mean and standard deviation (of blood pressure).
2. Readings above 140 mmHg are considered to be a high pressure problem. How many people in the group have such pressure problem?
3. A test will flag a blood pressure problem if the reading of a patient pressure is in the bottom 5% or in the top 5% of the results for the population. For what values of the blood pressure is an individual in the population considered normal?

## Question 7

Students in a Chemistry class need to take two exams in order to pass the subject. The percentage of students that passed the midterm were 60% for the first exam, and 68% for the second. We also have that 80% of the students that passed the first midterm also passed the second midterm. A student from the class is picked randomly.

1. Compute the probability that the student has failed both exams.
2. Compute the probability that the student has passed the first exam if we know that she has failed the second exam.