Pharmacy exam 2017-01-10

Degrees: Pharmacy, Biotechnology
Date: January 10, 2017

Descriptive Statistics and Regression

Question 1

The table below gives the distribution of the waiting time (in minutes) at the emergency room of a set of patients.

$$ \begin{array}{cr} \hline \mbox{Time} & \mbox{Patients}
(0,10] & 22\newline (10,20] & 43\newline (20,30] & 33\newline (30,40] & 12\newline (40,50] & 6\newline (50,60] & 4\newline \hline \end{array} $$

  1. Plot the ogive of the waiting time.
  2. Compute the median of the distribution, and explain its meaning.
  3. What percentage of patients have waited for longer than 38 minutes?

plot of chunk ogive_waiting_time_emergency 2. $Me=18.89$ min. 3. 10% of patients have waited for longer than 38 minutes.

Question 2

To study fertility in two different populations $A$ and $B$, a sample of each population was taken and the number of pregnancies for each woman was recorded. The results of such records are shown below.

$$ \begin{array}{ccccccccccccccccc} \hline A & 2 & 3 & 4 & 4 & 3 & 2 & 6 & 1 & 5 & 3 & 4 & 4 & 3 & 2 & 5 & 0\newline B & 1 & 0 & 2 & 1 & 0 & 2 & 0 & 3 & 0 & 1 & 0 & 2 & 5 & 1 & 1 & 1\newline \hline \end{array} $$

  1. Draw the box diagram of each sample and compare them.
  2. In which of the two samples is the mean more representative? Justify your answer.
  3. Compute the skewness coefficient for both samples; which one is more skewed?
  4. What is relatively bigger, a case of 5 pregnancies in sample $A$, or a case of 3 pregnancies in sample $B$?

Consider the following sums for your computations:
$\sum a_i=51$, $\sum a_i^2=199$, $\sum (a_i-\bar a)^3=-11.6016$, $\sum (a_i-\bar a)^4=217.9954$,
$\sum b_i=20$, $\sum b_i^2=52$, $\sum (b_i-\bar b)^3=49.5$, $\sum (b_i-\bar b)^4=220.3125$.

plot of chunk fertility_boxplot 2. $\bar a=3.1875$ pregnancies, $s_a^2=2.2773$ pregnancies², $s_a=1.5091$ pregnancies, $cv_a=0.4734$. $\bar b=1.25$ pregnancies, $s_b^2=1.6875$ pregnancies², $s_b=1.299$ pregnancies, $cv_b=1.0392$. As the coefficient of variation of $A$ is less than the coefficient of variation of $B$, the mean of population $A$ is more representative than the mean of population $B$. 3. $g_{1,a}=-0.211$ and $g_{1,b}=1.4113$, so the distribution of $B$ is more skewed than the distribution of $A$. 5. $z_a(5)=1.2011$ and $z_b(3)=1.3472$, so 3 pregnancies is relatively bigger in population $B$ than 5 pregnancies in population $A$.

Question 3

A study to find the relation between the reduction in cholesterol levels in blood and exercise has been carried out. The results are shown in the table below.

$$ \begin{array}{lrrrrrrrrrr} \hline \mbox{Minutes of exercise} & 96 & 106 & 163 & 207 & 227 & 244 & 261 & 271 & 272 & 301\newline \mbox{Cholesterol reduction (mg/dl)} & 4 & 5 & 8 & 13 & 15 & 17 & 22 & 39 & 31 & 45\newline \hline \end{array} $$

  1. Which regression models explains better the reduction of cholesterol as a function of the exercise time, the linear o the exponential? Justify the answer.
  2. According to the linear regression model, how much will be the reduction in cholesterol when the exercise time is increased by one minute?
  3. According to the logarithmic model, how much exercise time is needed to get a reduction of cholesterol of 100 mg/dl? Is this estimation reliable? Justify your answer.

Consider the following values for your computations, where $X$=exercise time in minutes, and $Y$=cholesterol reduction:
$\sum x_i=2148$, $\sum \log(x_i)=53.0559$, $\sum y_j=199$, $\sum \log(y_j)=27.1766$,
$\sum x_i^2=507082$, $\sum \log(x_i)^2=282.9578$, $\sum y_j^2=5779$, $\sum \log(y_j)^2=80.035$,
$\sum x_iy_j=50750$, $\sum x_i\log(y_j)=6359.0468$, $\sum \log(x_i)y_j=1097.978$, $\sum \log(x_i)\log(y_j)=147.0682$.

1.Linear regression model of cholesterol reduction on exercise time: $\bar x=214.8$ min, $s_x^2=4569.16$ min². $\bar y=19.9$ mg/dl, $s_y^2=181.89$ (mg/dl)². $s_{xy}=800.48$ min⋅mg/dl. $r^2 = 0.771$. Exponential regression model of cholesterol reduction on exercise time: $\overline{\log(y)}=2.7177$ log(mg/dl), $s_{\log(y)}^2=0.6178$ log(mg/dl)². $s_{x\log(y)}=52.1504$ min⋅log(mg/dl). $r^2 = 0.9635$. Therefore, the exponential regression model is better since its coefficient of determination is higher. 2. Regression line of cholesterol reduction on exercise time: $y=-17.7312 + 0.1752x$. The cholesterol reduction increases 0.1752 mg/dl when the exercise time is increased by one minute. 3. Logarithmic regression model of exercise time on cholesterol reduction: $x=-14.6075 + 84.4135\log(y)$. $x(100)=374.131$ min. Despite the coefficient of determination is pretty close to 1, the estimation is not reliable since 100 mg/dl is far away from the range of values in the sample.

Probability and random variables

Question 4

The medical emergency services of a town gets 6 requests per day in average. This service is staffed with three shifts of 8 hours each.

  1. Compute the probability of getting more than 3 requests in an 8 hours shift.
  2. Compute the probability that in some of the three shifts there are no requests.

  1. Naming $X$ to the number of requests in an 8 hours shift, $X\sim P(2)$ and $P(X>3)=0.1429$.
  2. Naming $Y$ to the number of shifts with no requests, $Y\sim B(3,0.1353)$ and $P(Y>0)=0.3535$.

Question 5

The prevalence on certain disease in a population is 10%. A diagnosis test for that disease has a sensitivity of 95% and a specificity of 85%.

  1. Compute the positive and negative predictive values and explain the result obtained. What is the test more useful for, to detect the disease or to rule it out?
  2. What should be the specificity of the test so that the test has a positive predictive value equal to 80%?

  1. $PPV=P(D|+)=0.413$ and $NPV=P(\overline D|-)=0.9935$.
  2. The specificity should be $97.37%$.

Question 6

In a study of blood pressure on 8000 individuals, it has been recorded that 2254 people show readings of blood pressure above 130 mmHg, and 3126 individuals show readings between 110 and 130 mmHg. Assume that blood pressure is normally distributed.

  1. Compute the mean and standard deviation (of blood pressure).
  2. Readings above 140 mmHg are considered to be a high pressure problem. How many people in the group have such pressure problem?
  3. A test will flag a blood pressure problem if the reading of a patient pressure is in the bottom 5% or in the top 5% of the results for the population. For what values of the blood pressure is an individual in the population considered normal?

  1. Naming $X$ to the blood pressure, $X\sim N(118.723, 19.5221)$.
  2. $P(X>140)=0.1379$ and there are $1103.0473$ persons with high pressure.
  3. The blood pressure is normal in the interval $(86.612, 150.8341)$.

Question 7

Students in a Chemistry class need to take two exams in order to pass the subject. The percentage of students that passed the midterm were 60% for the first exam, and 68% for the second. We also have that 80% of the students that passed the first midterm also passed the second midterm. A student from the class is picked randomly.

  1. Compute the probability that the student has failed both exams.
  2. Compute the probability that the student has passed the first exam if we know that she has failed the second exam.

Naming $E_1$ tho the event of passing the first exam and $E_2$ to the event of passing the second exam:

  1. $P(\overline E_1\cap \overline E_2)=0.2$.
  2. $P(E_1|\overline E_2)=0.375$.
Previous
Next