avatarAaron Zhu

Summary

The provided content discusses the challenges and remedies of heteroskedasticity and autocorrelation in linear regression models using Ordinary Least Squares (OLS), emphasizing the use of Heteroskedasticity-consistent (HC) and Heteroskedasticity-Autocorrelation-consistent (HAC) standard errors, as well as Generalized Least Square (GLS) and Feasible GLS (FGLS) methods to improve estimation efficiency.

Abstract

The article delves into the statistical issues of heteroskedasticity and autocorrelation in the context of OLS linear regression. It explains that while OLS estimators remain unbiased under these conditions, they are no longer the Best Linear Unbiased Estimators (BLUE) due to the lack of efficiency. The text outlines the consequences of these violations, such as misleading statistical inferences, and presents solutions like HC and HAC standard errors to adjust for these issues. It also details the process of using GLS and FGLS to correct for heteroskedasticity and autocorrelation, ensuring that the estimators regain efficiency. The author provides a comprehensive guide on applying FGLS for both heteroskedasticity and autocorrelation, including step-by-step procedures and the transformation of variables to meet the assumptions of homoscedasticity and independence.

Opinions

  • The author suggests that the presence of heteroskedasticity and autocorrelation in OLS models undermines the efficiency of the estimator, but not its unbiasedness.
  • It is argued that one should routinely use robust standard errors, as the assumption of homoscedasticity is rarely certain.
  • The article posits that GLS is a superior alternative to OLS when heteroskedasticity or autocorrelation is present, as it provides an efficient estimator.
  • The author emphasizes the practicality of FGLS as a method to estimate the variance-covariance matrix when it is unknown, making it a valuable tool for real-world applications.
  • The author provides a pragmatic approach to dealing with autocorrelation, particularly in time-series data, by suggesting an AR(1) correction as a common and effective solution.
  • The article concludes with a call to action for readers to explore further topics in linear regression and causal inference, indicating the author's view on the importance of understanding these concepts for data analysis.

Linear Regression with OLS: Heteroskedasticity and Autocorrelation

Understand OLS Linear Regression with a bit of math

Image by Author

Heteroskedasticity and Autocorrelation are unavoidable issues we need to address when setting up a linear regression. In this article, let’s dive deeper into what are Heteroskedasticity and Autocorrelation, what are the Consequences, and remedies to handle issues.

A typical linear regression takes the form as follows. The response variable (i.e., Y) is explained as a linear combination of explanatory variables (e.g., the intercept, X1, X2, X3, …) and ε is the error term (i.e., a random variable) that represents the difference between the fitted response value and the actual response value.

Figure 1 (Image by author)

What is Homoscedasticity?

Under the assumption of Homoscedasticity, the error term should have constant variance and iid. In other words, the diagonal values in the variance-covariance matrix of the error term should be constant and off-diagonal values should be all 0.

Figure 2 (Image by author)

What is Heteroskedasticity?

In the real world, Homoscedasticity assumption may not be plausible. The variance of the error terms may not remain the same. Sometimes the variance of the error terms depends on the explanatory variable in the model.

For example, the number of bedrooms is usually used to predict house prices, we see that the prediction error is larger for houses with 6+ bedrooms than the ones with 2 bedrooms because houses with 6+ bedrooms are typically worth a lot more than 2-bedroom houses, therefore, have larger unexplained and sometimes irreducible price variance, which leaks into the error term.

Figure 3 (Image by author)

We call the error term whose variances are NOT constant across observations Heteroskedastic error. This property is called Heteroskedasticity.

Figure 4 (Image by author)

What is Autocorrelation?

When there is autocorrelation in the model, the error terms are correlated. It means off-diagonal values of the covariance matrix of error terms are NOT all 0s.

Figure 5 (Image by author)

There are some possible sources of autocorrelation. In the time-series data, time is the factor that produces autocorrelation. For example, the current stock price is influenced by the prices from previous trading days (e.g., the stock price is more likely to fall after a huge price hike). In the cross-section data, the neighboring units tend to have similar characteristics.

What are the Consequence of Heteroskedasticity and Autocorrelation?

OLSE remains unbiased even under both heteroskedasticity and Autocorrelation as long as the assumption of Zero conditional mean (i.e., the Expected value of the error term is zero conditional on all values of the explanatory variable) holds.

Figure 6 (Image by author)

OLS estimator under Heteroskedasticity or Autocorrelation no longer has the least variance among all linear unbiased estimators because the Gauss-Markov Theorem requires homoskedasticity.

So the OLS estimator under heteroskedasticity or Autocorrelation is no longer BLUE. The OLSE is not efficient as compared under homoskedasticity.

Figure 7 (Image by author)

Since the variance of the OLS estimator is not efficient under heteroskedasticity or Autocorrelation, the statistical inference might provide misleading results.

What are the remedies of Heteroskedasticity and Autocorrelation?

Remedy 1: Heteroskedasticity-consistent (HC) and Heteroskedasticity- Autocorrelation-consistent (HAC) Standard Errors

Under Heteroskedasticity or Autocorrelation, we can still use the inefficient OLS estimator, but many literatures suggest using Heteroskedasticity-consistent (HC) standard errors (aka, robust standard errors, White standard errors) or Heteroskedasticity- Autocorrelation-consistent (HAC) Standard Errors (aka, Newey-West Standard Error) that allow for the presence of Heteroskedasticity or Autocorrelation (See Figure 7). These are the easiest and most common solutions.

Many econometricians argue that one should always use robust standard errors because one never can rely on Homoskedasticity.

Remedy 2: Generalized Least Square (GLS) and Feasible GLS (FGLS)

Instead of accepting an inefficient OLS estimator and correcting the standard errors, we can correct Heteroskedasticity or Autocorrelation by using a fully efficient estimator (i.e., unbiased and with the least variance) using Generalized Least Square (GLS).

Under Heteroskedasticity or Autocorrelation, although the OLS estimator and GLS estimator both are unbiased, the GLS estimator has a smaller variance than the OLS estimator.

If there is Heteroskedasticity or Autocorrelation and we either know the variance-covariance matrix of the error term or can estimate it empirically, then we can convert it into a homoscedastic model.

Figure 8 (Image by author)
Figure 9 (Image by author)

Q: Is the transformed model homoscedastic?

A: Yes, the error terms in the transformed model have constant variances and iid.

Figure 10 (Image by author)

The transformed model satisfies the homoscedastic assumption, therefore, the OLS estimator for the transformed model (i.e., GLS estimator) is efficient. GLS estimator can be computed as

Figure 11 (Image by author)

If we know the value of σ2Ω or Σ, we can just plug their values into a closed-form solution to find the GLS estimator.

If we don’t know the value of σ2Ω or Σ, the million-dollar question is “can we estimate their values?” The answer is YES. A common way to handle this kind of situation of using Feasible GLS (FGLS).

How to apply FGLS under Heteroskedasticity?

As discussed in Wooldridge’s Introductory Econometrics: A Modern Approach, we can assume that

Figure 12 (Image by author)

Let’s call the estimate of , the weight, W, in the FGLS model (aka, Weighted Least Squares Estimation (WLS)).

A Feasible GLS Procedure to correct for Heteroskedasticity:

Step 1: Let run OLS as is and obtain the residuals, i.e., Ui hat.

Figure 13 (Image by author)

Step 2: we create a new variable by first squaring the residuals and then taking the natural log.

Figure 14 (Image by author)

Step 3: Regress this newly created variable on Xs, then predict their fitted values.

Figure 15 (Image by author)

Step 4: Exponentiate the fitted value from step 3 and call it Weight, W. Then create a new matrix p, (i.e., N x N matrix)

Figure 16 (Image by author)
Figure 17 (Image by author)

Step 5: Transform both Y and X by multiplying the new matrix p.

Figure 18 (Image by author)

Step 6: Apply OLS on the transformed model, β hat that we get would be an efficient GLS estimator.

Figure 19 (Image by author)

How to apply FGLS under Autocorrelation?

For most time-series data with autocorrelation, first-order autoregressive disturbances (i.e., AR(1)) correction would be sufficient. We have

Figure 20 (Image by author)
Figure 21 (Image by author)

Step 1: Let run OLS as is and obtain the residual vector e

Figure 22 (Image by author)

Step 2: estimate ρ by r, then create a new matrix p, (i.e., N x N matrix)

Figure 23 (Image by author)

Step 3: Transform both Y and X by multiplying the new matrix p. The first observation is different from other observations. We can ignore the first observation (i.e., t=1) for our application.

Figure 24 (Image by author)

Step 4: Apply OLS on the transformed model and obtain the GLS estimator.

Figure 25 (Image by author)

Final Notes

When there is Heteroskedasticity in the linear regression model, the variance of error terms won’t be constant and when there is autocorrelation, the covariance of error terms are not zeros.

Under Heteroskedasticity or Autocorrelation, the OLS estimator would still be unbiased, but no longer efficient, meaning it won’t have the least variance.

To address the issues of Heteroskedasticity or Autocorrelation, we can either obtain robust standard error for the OLS estimator or to make the estimator more efficient, we can step up to obtain a GLS estimator by FGLS.

Here are some related posts you can explore if you’re interested in Linear Regression and Causal Inference.

Thank you for reading !!!

If you enjoy this article and would like to Buy Me a Coffee, please click here.

You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.

Linear Regression
Heteroskedasticity
Homoscedasticity
Fgls
Robust Error
Recommended from ReadMedium
avatarData PR
Time Series in R-2

Hello,

4 min read