Problems of Linear Regression

Exercise 1

Give some examples of:

  1. Non related variables.
  2. Variables that are increasingly related.
  3. Variables that are decreasingly related.

  1. The daily averge temperature and the daily number of births in a city.
  2. The hours preparing an exam and the score.
  3. The weight of a person and the time require to run 100 meters.

Exercise 2

In a study about the effect of different doses of a medicament, 2 patients got 2 mg and took 5 days to cure, 4 patients got 2 mg and took 6 days to cure, 2 patients got 3 mg ant took 3 days to cure, 4 patients got 3 mg and took 5 days to cure, 1 patient got 3 mg and took 6 days to cure, 5 patients got 4 mg and took 3 days to cure and 2 patients got 4 mg and took 5 days to cure.

  1. Construct the joint frequency table.
  2. Get the marginal frequency distributions and compute the main statistics for each variable.
  3. Compute the covariance and interpret it.

dose/days356202432414520

dose/days356Sum202463241745207Sum78520

Dose: x¯=3.05 mg, sx2=0.6475 mg2, sx=0.8047 mg.
Days: y¯=4.55 days, sy2=1.4475 days2, sy=1.2031 days.
3. sxy=0.6775 mgdays.

Exercise 3

The table below shows the two-dimensional frequency distribution of a sample of 80 persons in a study about the relation between the blood cholesterol (X) in mg/dl and the high blood pressure (Y).

XY[110,130)[130,150)[150,170)nx[170,190)412[190,210)10124[210,230)78[230,250)118ny3024

  1. Complete the table.
  2. Construct the linear regression model of cholesterol on pressure.
  3. Use the linear model to calculate the expected cholesterol for a person with pressure 160 mmHg.
  4. According to the linear model, what is the expected pressure for a person with cholesterol 270 mg/dl?

Use the following sums: xi=16960 mg/dl, yj=11160 mmHg, xi2=3627200 (mg/dl)2, yj2=1576800 mmHg2 y xiyj=2378800 mg/dlmmHg.

XY[110,130)[130,150)[150,170)nx[170,190)84012[190,210)1012426[210,230)79824[230,250)151218ny26302480

x¯=212 mg/dl, sx2=396 (mg/dl)2.
y¯=139.5 mmHg, sy2=249.75 mmHg2.
sxy=161 mg/dlmmHg.
Regression line of cholesterol on blood pressure: x=122.0721+0.6446y.
3. x(160)=225.2152 mg/dl.
4.

Regression line of blood pressure on cholesterol: y=53.3081+0.4066x.
y(270)=163.0808 mmHg.

Exercise 4

A research study has been conducted to determine the loss of activity of a drug. The table below shows the results of the experiment.

Time (in years)12345Activity (%)9684705852

  1. Construct the linear regression model of activity on time.
  2. According to the linear model, when will the activity be 80%? When will the drug have lost all activity?

  1. x¯=3 years, sx2=2 years2.
    y¯=72 %, sy2=264 %2.
    sxy=22.8 years%.
    Regression line of activity on time: y=106.2+11.4x.

Regression line of time on activity: x=9.2182+0.0864y.
x(80)=2.3091 years and x(0)=9.2182 years.

Exercise 5

A basketball team is testing a new stretching program to reduce the injuries during the league. The data below show the daily number of minutes doing stretching exercises and the number of injuries along the league.

Stretching minutes03010155253540Injuries41223101

  1. Construct the regression line of the number of injuries on the time of stretching.
  2. How much is the reduction of injuries for every minute of stretching?
  3. How many minutes of stretching are require for having no injuries? Is reliable this prediction?

Use the following sums (X=Number of minutes stretching, and Y=Number of injuries): xi=160 min, yj=14 injuries, xi2=4700 min2, yj2=36 injuries2 and xiyj=160 mininjuries.

  1. x¯=20 min, sx2=187.5 min2.
    y¯=1.75 injuries, sy2=1.4375 injuries2.
    sxy=15 mininjuries.
    Regression line of injuries on time of stetching: y=3.35+0.08x.
  2. 0.08 injuries/min.

Regression line of time of stretching on injuries: x=38.2609+10.4348y.
x(0)=38.2609 min.
r2=0.8348.

Exercise 6

For two variables X and Y we have

  • The regression line of Y on X is yx2=0.
  • The regression line of X on Y is y4x+22=0.

Calculate:

  1. The means x¯ and y¯.
  2. The correlation coefficient.

  1. x¯=8 and y¯=10.
  2. r=0.5.

Exercise 7

The means of two variables X and Y are x¯=2 and y¯=1, and the correlation coefficient is 0.

  1. Predict the value of Y for x=10.
  2. Predict the value of X for y=5.
  3. Plot both regression lines.

  1. y(10)=1.
  2. x(5)=2.

Exercise 8

A study to determine the relation between the age and the physical strength gave the scatter plot below. plot of chunk age_physical_strength_scatterplot

  1. Calculate the linear coefficient of determination for the whole sample.
  2. Calculate the linear coefficient of determination for the sample of people younger than 25 years old.
  3. Calculate the linear coefficient of determination for the sample of people older than 25 years old.
  4. For which age group the relation between age and strength is stronger?

Use the following sums (X=Age and Y=Weight lifted).

  • Whole sample: xi=431 years, yj=769 Kg, xi2=13173 years2, yj2=39675 Kg2 and xiyj=21792 yearsKg.

  • Young people: xi=123 years, yj=294 Kg, xi2=2339 years2, yj2=14418 Kg2 and xiyj=5766 yearsKg.

  • Old people: xi=308 years, yj=475 Kg, xi2=10834 years2, yj2=25257 Kg2 and xiyj=16026 yearsKg.

  1. x¯=26.9375 years, sx2=97.6836 years2.
    y¯=48.0625 kg, sy2=169.6836 kg2.
    sxy=67.3164 yearskg.
    r2=0.2734.
  2. x¯=17.5714 years, sx2=25.3878 years2.
    y¯=42 kg, sy2=295.7143 kg2.
    sxy=85.7143 yearskg.
    r2=0.9786.
  3. x¯=34.2222 years, sx2=32.6173 years2.
    y¯=52.7778 kg, sy2=20.8395 kg2.
    sxy=25.5062 yearskg.
    r2=0.9571.
  4. The linear relation between the age and the physical strength is a little bit stronger in the group of young people.
Previous
Next