Pharmacy exam 2018-01-19 Degrees: Pharmacy, Biotechnology Date: January 19, 2018 Question 1 A study done on a group of senior people to determine the relation between age $X$, and the number of visits to the doctor $Y$, shows the following results: $$ \begin{array}{lrrrrrrrr} \hline \mbox{Age} & 62 & 65 & 71 & 79 & 83 & 88 & 90 & 95\newline \mbox{No. of Visits} & 2 & 3 & 4 & 6 & 6 & 8 & 10 & 14\newline \hline \end{array} $$ Do the following: Estimate the number of times a 70-year-old patient will go to the doctor, according to a linear regression model. What will be the estimate equal to if you consider an exponential model instead of the linear one? Which of the two estimates is more reliable? A potential model has equation of the type $Y=aX^b$, where $a$ and $b$ are constants to be determined; what transformation should you apply to the variables $X$ and $Y$ to change a potential model into a linear one? Use the following sums for the computations: $\sum x_i=633$, $\sum \log(x_i)=34.8835$, $\sum y_j=53$, $\sum \log(y_j)=13.7827$, $\sum x_i^2=51109$, $\sum \log(x_i)^2=152.28$, $\sum y_j^2=461$, $\sum \log(y_j)^2=26.6206$, $\sum x_iy_j=4509$, $\sum x_i\log(y_j)=1144.0108$, $\sum \log(x_i)y_j=235.1289$, $\sum \log(x_i)\log(y_j)=60.7921$. Solution Linear model of Visits on Age: $\bar x=79.125$ years, $s_x^2=127.8594$ years² . $\bar y=6.625$ visits, $s_y^2=13.7344$ visits². $s_{xy}=39.4219$ years⋅visits. Regression line of Visits on Age: $y=-17.771 + 0.3083x$. $y(70) =3.8116$ visits. $\overline{\log(y)}=1.7228$ log(visits), $s_{\log(y)}^2=0.3594$ log(visits)². $s_{x\log(y)}=6.6823$ years⋅log(visits). Exponential model of Visits on Age: $y=e^{-2.4124 + 0.0523x}$. $y(70)=3.4762$ visits. Linear coefficient of determination of Visits on Age $r^2=0.885$. Exponential coefficient of determination of Visits on Age $r^2=0.9716$. Thus, the exponential model explains a little bit better the number of visits to the doctor with respect to the age. We must apply the logarithm to both Visits and Age: $\log(Y)=\log(aX^b)\Rightarrow \log(Y)=\log(a)+\log(X^b)=\log(a)+b\log(X)=a’+b\log(X)$. Question 2 The grass pollen concentration in the center of a city in grains/m$^3$ of air, during the last year, is given in the following table: $$ \begin{array}{cr} \hline \mbox{Pollen concentration} & \mbox{Num days}\newline 0-300 & 51\newline 300-500 & 60\newline 500-600 & 79\newline 600-800 & 91\newline 800-1000 & 60\newline 1000-1300 & 24\newline \hline \end{array} $$ Health authorities have determined that the level of pollen did not pose a risk for 75% of the days in the year; what is the minimum level of pollen that is consider a health hazard? On days with pollen level between 575 and 860 health authorities issue a warning to citizens; on how many days of the last year there were warnings issued? Are there outliers in the above sample? Platanaceae has a pollen cycle similar to grass: if $X$ are the pollen levels of grass, and $Y$ are the levels of the platanaceae, it is known that $Y=0.5X-100$. What will be the average pollen level for platanaceae? Which of the two averages is more representative? Can one say that the level of grass pollen comes from a population that is normally distributed? Use the following sums for the computations: $\sum x_i=220400$ grains/m$^3$, $\sum x_i^2=159575000$ (grains/m$^3$)$^2$, $\sum (x_i-\bar x)^3=261917220.867$ (grains/m$^3$)$^3$ y $\sum (x_i-\bar x)^4=4872705679772.61$ (grains/m$^3$)$^4$. Solution $P_{75}=784.0417$ grains/m³. $F(575)=0.4664$ and $F(860)=0.8192$, so the frequency of days with a warning is $0.3528$ that correspond to $128.77$ days. $Q_1=434.1849$ grains/m³, $Q_3=784.0417$ grains/m³ and $IQR=349.8568$ grains/m³. Fences: $F_1=-90.6001$ grains/m³ and $F_2=1308.8269$ grains/m³. Since all the values fall into the fences there are no outliers. $\bar x=603.8356$ grains/m³, $s_x^2=72574.3291$ (grains/m³)², $s_x=269.3962$ grains/m³ and $cv_x=0.4461$ $\bar y=201.9178$ grains/m³, $s_y=134.6981$ grains/m³ and $cv_y=0.6671$. The mean of $X$ is more representative than the mean of $Y$ as $cv_x<cv_y$. $g_1=0.0367$ and $g_2=-0.4654$. As both of them are between -2 and 2, we can assume that the pollen concentrations are normally distributed. Question 3 Polen level in Madrid in the year 2017 is normally distributed with mean equal to 90 particles per cubic meter. In 42 days of 2017, the level was above 120 particles per cubic meter. Do the following: Compute the standard deviation of the polen level in the year 2017. On how many days the polen level did not go over 50 particles per cubic meter of air? On 20% of the days the level of polen was high enough to pose a health risk for allergic people; what is the level of polen that triggers this high risk situation? Solution Let $X$ be the polen level in Madrid in 2017. $X\sim N(90,\sigma)$. $\sigma=25$ grains/m³. $P(X\leq 50)=0.0548$ that correspond to $20.0017$ days. $P_{80}=111.0405$ grains/m³. Question 4 A study on two drugs to reduces the cholesterol levels in blood shows that drug $A$ is effective in 75% of the people, and drug $B$ is effective in 85% of the cases. There is a 5% of people for which none of the two drugs works. Compute the percentage of the population for which only drug $A$ works. Assume that drug $A$ works on a person; what is the probability hat drug $B$ will also work in that person? On the other hand, if drug $B$ has not worked for a person, what is the probability that drug $A$ will actually work? Are the effects of the two drugs independent events? Solution $P(A\cap \overline B)=0.1$, that is, a $10%$. $P(B|A)=0.8667$. $P(A|\overline B)=0.6667$. $P(B|A)\neq P(B)$, thus the events are dependent. Question 5 The weekly average births on a hospital is equal to 14. Compute the probability that on a given day more than 2 births take place. Compute the probability that during a week there are more than one day without births taken place. Solution Let $X$ be the number of births in a day. $X\sim P(2)$. $P(X>2)=0.3233.$ Let $Y$ be the number of days without births in a week. $Y\sim B(7,0.1353)$. $P(Y>1)=0.2427$. Question 6 A trial to develop a diagnosis test for a desease is tested on 250 people, of which 50 suffer the desease and 200 are healthy. The medical team in charge of the trial wants for the test to have a positive predictive value of $0.7$, and a negative predictive value of $0.9$. In order to get the values given above, how many of the healthy people should get a positive outcome in the test? And how many of the sick people should get a negative outcome in the test? What is the probability that a person with two positive outcomes in the test has the desase? Solution Let $D$ be the event of having the disease. $P(+|\overline{D})=0.0625\Rightarrow 12.5$ persons. $P(-|D)=0.4165\Rightarrow 20.825$ persons. $P(D|+\cap +)=0.9561$. Exam Statistics Biostatistics Previous Pharmacy exam 2018-10-29 Next Pharmacy exam 2017-11-27