In [1]:
#@author: Venky Rao raove@us.ibm.com
#@last edited: 14 Oct 2017
#@source: materials, data and examples adapted from R in Action 2nd Edition by Dr. Robert Kabacoff

Regression in R

In [2]:
#in many ways, regression analysis lives at the heart of statistics
#it is a broad term for a set of methodologies used to predict a response variable
#(also called a dependent, criterion or outcome variable) from one or more
#predictor variables (also called independent or explanatory variables)

Ordinary Least Squares (OLS) regression

In [3]:
#in OLS regression, a quantitative dependent variable is predicted from a weighted sum of
#predictor variables, where the weights are parameters estimated from the data.
#OLS regression is the most common variety of statistical analysis today.
#the methods that fall under the rubric of OLS regression include simple linear regression,
#polynomial regression and multiple linear regression.

Simple linear regression

In [1]:
#we use the dataset "women" in the base installation of R which provides the height
#and weight of 15 women of ages 30 to 39.  The following code calculates an equation
#for predicting weight from height.
#let's review the dataset first
women
heightweight
58 115
59 117
60 120
61 123
62 126
63 129
64 132
65 135
66 139
67 142
68 146
69 150
70 154
71 159
72 164
In [9]:
#fit a simple linear regression model
fit <- lm(weight ~ height, data = women)
#display results of the fitted model
summary(fit)
Call:
lm(formula = weight ~ height, data = women)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7333 -1.1333 -0.3833  0.7417  3.1167 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
height        3.45000    0.09114   37.85 1.09e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.525 on 13 degrees of freedom
Multiple R-squared:  0.991,	Adjusted R-squared:  0.9903 
F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14
In [11]:
#because a height of zero is impossible, you would not try to give a physical interpretation
#to the intercept

#from the Pr(>|t|) column, you see that the regression coefficient (3.45) is signiticantly
#different from zero (p < 0.001) and indicates that there is an expected increase of 3.45
#pounds of weight for every 1 inch increase in height.
#1.091e-14 = 1.091 * 10^-14 = 0.000000...
#let us understand this further:
##let us begin with a hypothesis: height and weight are independent
#the p-value is the probability of obtaining the sampled results, assuming independence of
     #the row and column variables in the population.
#because p-value < 0.001, you reject the hypothesis that height and weight are independent!

#the multiple R-squared (0.991) indicates that the model accounts for 99.1% of variance in heights
#the multiple R-squared is the squared correlation between the actual and predicted value

#the residual standard error (1.53 pounds) can be thought of as the average error in predicting
#weight from height using this model

#the F-statistic tests whether the predictor variables, take together, predict the response variable
#above chance levels.  because there is only one predictor variable in simple regression, in this
#example, the F-statistic is equivalent to the t-test for regression coefficient for height
In [3]:
#list all the weights of women
women$weight
  1. 115
  2. 117
  3. 120
  4. 123
  5. 126
  6. 129
  7. 132
  8. 135
  9. 139
  10. 142
  11. 146
  12. 150
  13. 154
  14. 159
  15. 164
In [5]:
#list all the predicted values in a fitted model
round(fitted(fit))
1
113
2
116
3
119
4
123
5
126
6
130
7
133
8
137
9
140
10
144
11
147
12
151
13
154
14
157
15
161
In [7]:
#list the residual values in a fitted model
round(residuals(fit))
1
2
2
1
3
1
4
0
5
0
6
-1
7
-1
8
-2
9
-1
10
-2
11
-1
12
-1
13
0
14
2
15
3
In [8]:
#plot height versus weight and add a line of fit
plot(women$height, women$weight,
    xlab = "Height (in inches)", # x-axis label
    ylab = "Weight (in pounds)") #y-axis label
abline(fit) #add the line of fit
In [9]:
#list the coefficients of the fitted model
coefficients(fit)
(Intercept)
-87.5166666666665
height
3.45
In [12]:
#from the above, we can see that the prediction equation is:
#predictedWeight = -87.52 + 3.45 * Height

Polynomial regression

In [3]:
#the following code shows the results of fitting a quadratic equation
fit2 <- lm(weight ~ height + I(height ^ 2), data = women)
summary(fit2)
Call:
lm(formula = weight ~ height + I(height^2), data = women)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50941 -0.29611 -0.00941  0.28615  0.59706 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 261.87818   25.19677  10.393 2.36e-07 ***
height       -7.34832    0.77769  -9.449 6.58e-07 ***
I(height^2)   0.08306    0.00598  13.891 9.32e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3841 on 12 degrees of freedom
Multiple R-squared:  0.9995,	Adjusted R-squared:  0.9994 
F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16
In [4]:
#the term "height ^ 2" adds a height-squared term to the prediction equation
#the "I" function treats the contents within the parenthesis as an R regular expression.

#from this analysis, the new prediction equation is:
# predictedWeight = 261.88 - 7.35 * Height + 0.083 * (Height ^ 2)
#both regression coefficients are significant at the p < 0.0001 level
#regression coefficient for height has Pr(>|t|) = 6.58e-07 = 6.58 * 10 ^ -7 = 0.000000658 
#regression coefficient for height ^ 2 has Pr(>|t|) = 9.32e-09 = 9.32 * 10 ^ -9 = 0.00000000932

#the p-value is the probability of obtaining the sampled results, assuming independence of
     #the row and column variables in the population.
#because p-value < 0.001, you reject the hypothesis that height and weight and height ^ 2 and weight are independent!

#the multiple R-squared has improved from 0.991 in the simple linear model to
#0.9995 in the quadratic model.  This shows that the amount of variance accounted for has increased (this is a good thing!)

#the significance of the squared term (t = 13.89, p < 0.01) suggests that the inclusion of the quadratic term improves model fit
#the plot below confirms this
In [16]:
#plot height versus weight and add a line of fit
plot(women$height, women$weight,
    xlab = "Height (in inches)", # x-axis label
    ylab = "Weight (in pounds)") #y-axis label
lines(women$height, fitted(fit2), col = "dark green") #add the line of fit including the quadratic term
abline(fit, col = "blue") #adding the original line of fit
In [18]:
#to fit a cubic polynomial, use the following code:
fit3 <- lm(weight ~ height + I(height ^ 2) + I(height ^ 3), data = women)
summary(fit3)
#although higher polynomials are possible, terms higher than cubic are rarely necessary
Call:
lm(formula = weight ~ height + I(height^2) + I(height^3), data = women)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40677 -0.17391  0.03091  0.12051  0.42191 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -8.967e+02  2.946e+02  -3.044  0.01116 * 
height       4.641e+01  1.366e+01   3.399  0.00594 **
I(height^2) -7.462e-01  2.105e-01  -3.544  0.00460 **
I(height^3)  4.253e-03  1.079e-03   3.940  0.00231 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2583 on 11 degrees of freedom
Multiple R-squared:  0.9998,	Adjusted R-squared:  0.9997 
F-statistic: 1.679e+04 on 3 and 11 DF,  p-value: < 2.2e-16
In [20]:
#the scatterplot() function in the "car" package provides a simple and convenient
#method of plotting a bivariate relationship.  here's an example:
#install.packages("car") in case you need to install the car package
library(car) #load the "car" package
scatterplot(weight ~ height, data = women, #data that you want to plot
           spread = FALSE, #this option suppresses spread and asymmetry information
            smoother.args = list(lty = 2), #specifies that the loess fit be rendered as a dashed line
           pch = 19, #displays points as filled circles
           main = "Women Age 30 - 39", #main title
           xlab = "Height (inches)", #x-axis label
           ylab = "Weight (lbs.)") #y-axis label
In [21]:
#you can tell from the chart above that the two variables are roughly
#asymmetrical and that a curved line will fit the data points better than a straight line

#loess stands for locally weighted scatterplot smoothing.  More information can be found here:
# https://en.wikipedia.org/wiki/Local_regression

Multiple linear regression

In [2]:
#when there is more than one predictor variable, simple regression becomes
#multiple linear regression.  Technically, ploynomial regression is a special
#case of multiple regression

#to understand multiple linear regression, we will use the states.x77 dataset
#in the base R package and explore the relationship between a state's murder rate
#and other characteristics of the state including population, illiteracy rate,
#average income and frost levels (mean number of days below freezing)
In [56]:
#the states.x77 dataset is contained in a matrix.  Since the lm() function needs
#a data frame, we first create a data frame using the following code:
states <- as.data.frame(state.x77[,c("Murder", "Population", "Illiteracy", "Income", "Frost")])
states #print the data frame
MurderPopulationIlliteracyIncomeFrost
Alabama15.1 36152.1 3624 20
Alaska11.3 3651.5 6315 152
Arizona 7.8 22121.8 4530 15
Arkansas10.1 21101.9 3378 65
California10.3 211981.1 5114 20
Colorado 6.8 25410.7 4884 166
Connecticut 3.1 31001.1 5348 139
Delaware 6.2 5790.9 4809 103
Florida10.7 82771.3 4815 11
Georgia13.9 49312.0 4091 60
Hawaii 6.2 8681.9 4963 0
Idaho 5.3 8130.6 4119 126
Illinois10.3 111970.9 5107 127
Indiana 7.1 53130.7 4458 122
Iowa 2.3 28610.5 4628 140
Kansas 4.5 22800.6 4669 114
Kentucky10.6 33871.6 3712 95
Louisiana13.2 38062.8 3545 12
Maine 2.7 10580.7 3694 161
Maryland 8.5 41220.9 5299 101
Massachusetts 3.3 58141.1 4755 103
Michigan11.1 91110.9 4751 125
Minnesota 2.3 39210.6 4675 160
Mississippi12.5 23412.4 3098 50
Missouri 9.3 47670.8 4254 108
Montana 5.0 7460.6 4347 155
Nebraska 2.9 15440.6 4508 139
Nevada11.5 5900.5 5149 188
New Hampshire 3.3 8120.7 4281 174
New Jersey 5.2 73331.1 5237 115
New Mexico 9.7 11442.2 3601 120
New York10.9 180761.4 4903 82
North Carolina11.1 54411.8 3875 80
North Dakota 1.4 6370.8 5087 186
Ohio 7.4 107350.8 4561 124
Oklahoma 6.4 27151.1 3983 82
Oregon 4.2 22840.6 4660 44
Pennsylvania 6.1 118601.0 4449 126
Rhode Island 2.4 9311.3 4558 127
South Carolina11.6 28162.3 3635 65
South Dakota 1.7 6810.5 4167 172
Tennessee11.0 41731.7 3821 70
Texas12.2 122372.2 4188 35
Utah 4.5 12030.6 4022 137
Vermont 5.5 4720.6 3907 168
Virginia 9.5 49811.4 4701 85
Washington 4.3 35590.6 4864 32
West Virginia 6.7 17991.4 3617 100
Wisconsin 3.0 45890.7 4468 149
Wyoming 6.9 3760.6 4566 173
In [4]:
#a good first step in multiple regression is to examine the relationships among
#the variables two at a time.  Let's do this below:
cor(states) #this function provides the bivariate correlations
MurderPopulationIlliteracyIncomeFrost
Murder 1.0000000 0.3436428 0.7029752-0.2300776-0.5388834
Population 0.3436428 1.0000000 0.1076224 0.2082276-0.3321525
Illiteracy 0.7029752 0.1076224 1.0000000-0.4370752-0.6719470
Income-0.2300776 0.2082276-0.4370752 1.0000000 0.2262822
Frost-0.5388834-0.3321525-0.6719470 0.2262822 1.0000000
In [5]:
#to visually see these correlations, we can generate scatterplots as follows:
# install.packages("car") #in case you don't have the "car" package installed
library(car) #load the "car" package to access the scatterplotMatrix() function
scatterplotMatrix(states, #data to be plotted
                  spread = FALSE, #this option suppresses spread and asymmetry information
            smoother.args = list(lty = 2), #specifies that the loess fit be rendered as a dashed line
                 main = "Scatter Plot Matrix") #main heading
In [6]:
#by default, the scatterplotMatrix() function provides scatter plots of the variables
#with each other in the off-diagonals and superimposes smoothed (loess) and linear fit
#lines on these plots.  the principal diagonal contains density and rug plots for each variable.

#you can see that the murder rate may be bimodal and that each of the predictor variables
#are skewed to some extent

#murder rates rise with population and illiteracy and fall with income and frost

#colder states have lower illiteracy rates, lower population and higher incomes
In [7]:
#now we can fit an lm() function as follows:
fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data = states)
summary(fit)
Call:
lm(formula = Murder ~ Population + Illiteracy + Income + Frost, 
    data = states)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7960 -1.6495 -0.0811  1.4815  7.6210 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.235e+00  3.866e+00   0.319   0.7510    
Population  2.237e-04  9.052e-05   2.471   0.0173 *  
Illiteracy  4.143e+00  8.744e-01   4.738 2.19e-05 ***
Income      6.442e-05  6.837e-04   0.094   0.9253    
Frost       5.813e-04  1.005e-02   0.058   0.9541    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.535 on 45 degrees of freedom
Multiple R-squared:  0.567,	Adjusted R-squared:  0.5285 
F-statistic: 14.73 on 4 and 45 DF,  p-value: 9.133e-08
In [8]:
#print the coefficients
coefficients(fit)
(Intercept)
1.23456341116072
Population
0.000223675357302149
Illiteracy
4.14283659025461
Income
6.4424701522877e-05
Frost
0.000581305537729435
In [9]:
#when there is more than one predictor variable, the regression coefficients
#indicate the increase in the dependent variable for a unit change in a
#predictor variable, holding all other predictor variables constant
#for example, the regression coefficient for illiteracy is 4.14,
#suggesting that an increase of 1% in illiteracy is associated with a 4.14% increase
#in the murder rate, controlling for population, income and temperature.

#the illiteracy coefficient is significantly different from zero at the p < 0.001 level
##the p-value is the probability of obtaining the sampled results, assuming independence of
     #the illiteracy and murder rate variables.
#because p-value < 0.001, you reject the hypothesis that illiteracy and murder rate are independent!

#on the other hand, frost is NOT significantly different from zero p = 0.954
#the p-value is the probability of obtaining the sampled results, assuming independence of
#the frost and murder rate variables.
#because p-value > 0.1, you accept the hypothesis that frost and murder rate are independent!
#this suggests that frost and murder rate are not linearly related when controlling for
#the other predictor variables.

#as indicated by multiple R-squared, taken together, the predictor variables account for
#57% of the variance in murder rates across states.

Multiple linear regression with interactions

In [4]:
#so far we have assumed that predictor variables do not interact with each other.
#however, this is not necessarily true as we can see in the following code:
fit <- lm(mpg ~ hp + wt + hp:wt, data = mtcars) #here mpg is the response variable
#hp, wt and the interaction between hp and wt (hp:wt) are the predictor variables
#the dataset is the "mtcars" dataset that is provided with the base R installation
summary(fit) #summary of the results
coefficients(fit) #prints out the coefficients
Call:
lm(formula = mpg ~ hp + wt + hp:wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0632 -1.6491 -0.7362  1.4211  4.5513 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
hp          -0.12010    0.02470  -4.863 4.04e-05 ***
wt          -8.21662    1.26971  -6.471 5.20e-07 ***
hp:wt        0.02785    0.00742   3.753 0.000811 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.153 on 28 degrees of freedom
Multiple R-squared:  0.8848,	Adjusted R-squared:  0.8724 
F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13
(Intercept)
49.8084234287603
hp
-0.120102090978023
wt
-8.21662429724361
hp:wt
0.0278481483187412
In [10]:
#you can see from the "Pr(>|t|)" column that the interaction between
#hp and wt is significant (p-value = 0.000811)
#the coefficient is significantly different from zero at the p < 0.001 level
##the p-value is the probability of obtaining the sampled results, assuming independence of
#the interaction between hp and wt and the response variable mpg
#because p-value < 0.001, you reject the hypothesis that mpg and hp:wt are independent!

#this means that the relationship between one predictor and the response variable depends
#on the level of the other predictor.  here it means that the relationship between
#mpg and hp depends on wt

#the model for predicting mpg is as follows:
# predictedMpg = 49.81 - 0.12 * hp - 8.22 * wt + 0.03 * hp * wt

#you can plug in various value of wt into this equation to examine the impact
#you can visualize the impacts using the effect() function in the effects package
#the code is provided below:

#install.packages("effects") #install the "effects" package
library(effects) #load the effects library
myEffect <- effect("hp:wt", #this is the term we want to plot; i.e. the terms whose interaction we want to visualize
                  fit,, #this is the fitted model returned by lm()
                  list(wt = c(2.2, 3.2, 4.2))) #list specifying the variables to be set to constant values and the values to employ
plot(myEffect, multiline = T) #multiline = TRUE superimposes the lines being plotted
In [8]:
#you can see from this graph that as the weight of the car increases, the relationship
#between horsepower and miles per gallon weakens.  For wt = 4.2, the line is almost horizontal,
#indicating that as hp increases, mpg does not change.

Regression diagnostics

In [14]:
#to properly interpret the coefficients of the OLS model, you must satisfy
#a number of statistical assumptions:

#1. Normality: for fixed values of the independent variables, the dependent variable
    #is normally distributed

#2. Independence: the predicted values of the dependent variable are independent of each other

#3. Linearity: the dependent variable is linearly related to the independent variables

#4. Homoscedasticity: the variance of the dependent variable does not vary with the
    #levels of the independent variables.  Aka "constant variance".
In [16]:
#let us look at the output from the confint() function applied to the "states"
#multiple regression problem:
states <- as.data.frame(state.x77[,c("Murder", "Population", "Illiteracy", "Income", "Frost")]) #create the data frame
fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data = states) #specify the lm() function
summary(fit) #print a summary of the lm() function
coefficients(fit) #print the coefficients from the lm() function
confint(fit) #print the confidence intervals of the lm() function
Call:
lm(formula = Murder ~ Population + Illiteracy + Income + Frost, 
    data = states)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7960 -1.6495 -0.0811  1.4815  7.6210 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.235e+00  3.866e+00   0.319   0.7510    
Population  2.237e-04  9.052e-05   2.471   0.0173 *  
Illiteracy  4.143e+00  8.744e-01   4.738 2.19e-05 ***
Income      6.442e-05  6.837e-04   0.094   0.9253    
Frost       5.813e-04  1.005e-02   0.058   0.9541    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.535 on 45 degrees of freedom
Multiple R-squared:  0.567,	Adjusted R-squared:  0.5285 
F-statistic: 14.73 on 4 and 45 DF,  p-value: 9.133e-08
(Intercept)
1.23456341116072
Population
0.000223675357302149
Illiteracy
4.14283659025461
Income
6.4424701522877e-05
Frost
0.000581305537729435
2.5 %97.5 %
(Intercept)-6.552191e+009.0213182149
Population 4.136397e-050.0004059867
Illiteracy 2.381799e+005.9038743192
Income-1.312611e-030.0014414600
Frost-1.966781e-020.0208304170
In [3]:
#the results above suggest inter-alia that you can be 95% confident:
#1. that the interval [2.38, 5.9] contains the true change in the murder rate for a 1% change in illiteracy rate;
#2. that because the interval for Frost contains 0, you can conclude that a change in temperature is unrelated
    #to murder rate, holding the other variables constant.

#However, your faith in these results is only as strong as the evidence you have that your data satisfies the
#statistical assumptions underlying the model, i.e. normality, independence, linearity and homoscedasticity.

#a set of techniques called "Regression Diagnostics" provides the necessary tools for evaluating the appropriateness
#of the regression model and can help you uncover and correct problems.

A typical approach to regression diagnostics

In [4]:
#the most common approach is to use the plot() function to the object returned by lm() as shown below:
fit <- lm(weight ~ height, data = women) #store the results of the lm() function in the fit object
par(mfrow = c(2,2)) #create a 2X2 matrix to store the plots generated by the plot() function
plot(fit) #generate the plots
In [5]:
#let us evaluate these charts against each of the assumptions of OLS regression:

#1. Normality: if a dependent variable is normally distributed for a fixed set of
    #predictor values, then the residual values should be normally distributed with
    #a mean of 0.  The Normal Q-Q plot (upper right) is a probability plot of the
    #standardized residuals against the values that would be expected under normality.
    #if you have met the normality assumption, the points on this graph should fall on
    #the straight 45-degree line.  Because they do not, you have clearly violated the
    #normality assumption.

#2. Independence: you cannot tell if the dependent variables are independent from these
    #plots.  You have to use your understanding of how the data was collected.  There's
    #no apriori reason to believe that one woman's weight influences another woman's
    #weight.  If you found out that the data were sampled from families, you might have
    #to adjust your assumption of independence.

#3. Linearity: if the dependent variable is linearly related to the independent variables,
    #there should be no systematic relationship between the residuals and the predictied
    #(fitted) values.  In other words, the model should capture all the systematic variance
    #present in the data, leaving nothing but random noise.  In the Residuals v/s Fitted
    #graph (upper left), you see clear evidence of a curved relationship, which suggests
    #you may want to add a quadratic term to the regression.

#4. Homoscedasticity: if you have met the constant variance assumption, the points in
    #the Scale - Location graph (bottom left) should be a random band around a horizontal
    #line.  You seem to meet this assumption.

#Finally, the Residuals v/s Leverage graph (bottom right) provides information about
#individual observations that you may wish to attend to.  The graph identifies outliers,
#high-leverage points and influential observations.  Specifically:

    #an outlier is an observation that is not predicted well by the fitted regression
    #model (i.e. it has a large positive or negative residual)

    #an observation with a high leverage value has an unusual combination of predictor
    #values.  That is, it is an outlier in the predictor space.  The dependent variable
    #value is not used to calculate an observation's leverage

    #an influential observation is an observation that has a disproportionate impact on
    #the determination of the model parameters.  Influential observations are identified
    #using a statistic called Cook's distance or Cook's D.
In [6]:
#let us review diagnostic plots for the quadratic fit using the following code:
fit2 <- lm(weight ~ height + I(height ^ 2), data = women) #store the results of the lm() function in the fit2 object
par(mfrow = c(2,2)) #create a 2X2 matrix to store the plots generated by the plot() function
plot(fit2) #generate the plots
In [7]:
#this set of plots suggests that the polynomial regression provides a better fit with regard
#to the linearity assumption, normality of residuals (except of observation 13), and
#homoscedasticity (constant residual variance).  Observation 15 appears to be influential
#(based on a large Cook's D value), and deleting it has an impact on the parameter estimates.

#let's try deleting these values and evaluate model fit (you need to be careful when deleting data;
#your models should fit your data, not the other way around!)

fit3 <- lm(weight ~ height + I(height ^ 2), data = women[-c(13, 15),]) #store the results of the lm() function in the fit3 object
par(mfrow = c(2,2)) #create a 2X2 matrix to store the plots generated by the plot() function
plot(fit3) #generate the plots