Date: May 19, 2016

Question 1

To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on gender, a sample of 390 patients (210 males and 180 females) was drawn and the recovery time was measured for every patient. The table below shows the frequencies of times.

1. Calculate the mean of recovery time for males, females and for the whole sample. What mean is more representative the mean of the recovery time of males or the one of females? Justify the answer.
2. What distribution is more symmetric, the distribution of recovery time of males or the one of females?
3. Compare the kurtosis of the recovery time of males and females.
4. Calculate the 80th percentile of the recovery time of males.
5. What percentage of females will have a recovery time greater than 63 days?

Use the following sums for the calculations, Males: $\sum x_in_i = 9290$ days, $\sum x_i^2n_i=474050$ days$^2$, $\sum(x_i-\bar x)^3n_i = 812271.3832$ days$^3$ and $\sum(x_i-\bar x)^4n_i = 48895722.3971$ days$^4$. Females: $\sum x_in_i = 6720$ days, $\sum x_i^2n_i=282300$ days$^2$, $\sum(x_i-\bar x)^3n_i = 347773.3333$ days$^3$ and $\sum(x_i-\bar x)^4n_i = 14802393.3333$ days$^4$.

Question 2

To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on age, a sample of 8 patients was drawn and the recovery time $Y$ (in days) and ages $X$ (in years) were measured for every patient. The table below shows the results.

Age (years) Recovery time (days)
32 20
38 25
48 32
51 40
57 55
61 75
68 102
71 130
1. Calculate the regresion line of the recovery time on the age.
2. According to the linear regression model, what is expected age for a patient with a recovery time of 100 days?
3. Calculate the exponential regression model of the recovery time on age.
4. What regression model explains better the relation between the recovery time and the age, the exponential or the linear? Justify the answer. Use the following sums for the calculations: $\sum x_i=426$, $\sum \log(x_i)=31.5425$, $\sum y_j=479$, $\sum \log(y_j)=31.1866$, $\sum x_i^2=24008$, $\sum \log(x_i)^2=124.909$, $\sum y_j^2=39603$, $\sum \log(y_j)^2=124.7374$, $\sum x_iy_j=29042$, $\sum x_i\log(y_j)=1724.5468$, $\sum \log(x_i)y_j=1956.6274$, $\sum \log(x_i)\log(y_j)=124.2263$.

Question 3

In a random sample of 500 people drawn from a population there are 20 persons with an injury $A$, 40 persons with other injury $B$ and 450 persons with none of the injuries. Use relative frequencies to estimate probabilities in following questions:

1. Calculate the probability that a person has both injuries
2. Calculate the probability that a person has some injury.
3. Calculate the probability that a person has injury $A$ but no $B$.
4. Calculate the probability that a person has injury $A$ if he or she has injury $B$.
5. Calculate the probability that a person has injury $B$ if he or she doesn’t have injury $A$.
6. Are the injuries $A$ and $B$ dependent?

Question 4

The level of severity $X$ of an injury is classified in a scale from 1 to 5, from low to high severity. The probability distribution of $X$ in a population is plotted below.

1. Calculate and plot the distribution function.
2. Calculate the following probabilities: $P(X\leq 2)$, $P(X>3)$, $P(X=4.2)$ and $P(1<X\leq 4.2)$.
3. Calculate the mean and the standard deviation of $X$. Is the mean representative?
4. If a level of severity of 0.05 is considered incurable, what is the probability of having some person with an incurable injury in a sample of 10 persons with the injury?
5. If there are 6 persons injuried per month in average, what is the probabilitiy of having more than 2 persons injuried? What is the probability of having more than 1 person injuried with an incurable injury?

Question 5

A diagnostic test to determine doping of athletes returns a positive outcome when the concentration of a substance in blood is greater than 4 $\mu$g/ml. If the distribution of the substance concentration in doped athletes follows a normal distribution model with mean 4.5 $\mu$g/ml and standard deviation 0.2 $\mu$g/ml, and in non-doped athletes follow a normal distribution model with mean 3 $\mu$g/ml and standard deviation 0.3 $\mu$g/ml,

1. what is the sensitivity and specificity of the test?
2. If there are a 10% of doped athletes in a competition, what are the predicted values?