Analysis of covariance

Covariance is a measure of how much two variables change together and how strong the relationship is between them.^[1] Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV), while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV). Therefore, when performing ANCOVA, we are adjusting the DV means to what they would be if all groups were equal on the CV.^[2]

1 Uses of ANCOVA
- 1.1 Increase Power
- 1.2 Adjusting Preexisting Differences
2 Assumptions of ANCOVA
3 Conducting an ANCOVA
4 Power considerations
5 See also
6 References
7 External links

Uses of ANCOVA

Increase Power

ANCOVA can be used to increase statistical power^[3] (the ability to find a significant difference between groups when one exists) by reducing the within-group error variance. In order to understand this, it is necessary to understand the test used to evaluate differences between groups, the F-test. The F-test is computed by dividing the explained variance between groups (e.g., gender difference) by the unexplained variance within the groups. Thus,

F =

If this value is larger than a critical value, we conclude that there is a significant difference between groups. Unexplained variance includes error variance (e.g., individual differences), as well as the influence of other factors. Therefore, the influence of CVs is grouped in the denominator. When we control for the effect of CVs on the DV, we remove it from the denominator making F larger, thereby increasing your power to find a significant effect if one exists.

Adjusting Preexisting Differences

Another use of ANCOVA is to adjust for preexisting differences in nonequivalent (intact) groups. This controversial application aims at correcting for initial group differences (prior to group assignment) that exists on DV among several intact groups. In this situation, participants cannot be made equal through random assignment, so CVs are used to adjust scores and make participants more similar than without the CV. However, even with the use of covariates, there are no statistical techniques that can equate unequal groups. Furthermore, the CV may be so intimately related to the IV that removing the variance on the DV associated with the CV would remove considerable variance on the DV, rendering the results meaningless.^[4]

Assumptions of ANCOVA

There are five assumptions that underlie the use of ANCOVA and affect interpretation of the results:^[5]

Assumption 1: Normality of Residuals

The residuals (error terms) should be normally distributed.

Assumption 2: Homogeneity of Variances

The error variances should be equal for different treatment classes.

Assumption 3: Homogeneity of Regression Slopes

The slopes of the different regression lines should be equal.

Assumption 4: Linearity of Regression

The regression relationship between the dependent variable and concomitant variables must be linear.

Assumption 5: Independence of Error terms

The error terms should be uncorrelated.

The third issue, concerning the homogeneity of different treatment regression slopes is particularly important in evaluating the appropriateness of ANCOVA model. Also note that we only need the error terms to be normally distributed. In fact both the independent variable and the concomitant variables will not be normally distributed in most cases.

Conducting an ANCOVA

Test Multicollinearity

If a CV is highly related to another CV (at a correlation of .5 or more), then it will not adjust the DV over and above the other CV. One or the other should be removed since they are statistically redundant.

Test the Homogeneity of Variance Assumption

Tested by Levene's test of equality of error variances. This is most important after adjustments have been made, but if you have it before adjustment you are likely to have it afterwards.

Test the Homogeneity of Regression Slopes Assumption

To see if the CV significantly interacts with the IV, run an ANCOVA model including both the IV and the CVxIV interaction term. If the CVxIV interaction is significant, ANCOVA should not be performed. Instead, Green & Salkind^[6] suggest assessing group differences on the DV at particular levels of the CV. Also consider using a moderated regression analysis, treating the CV and its interaction as another IV. Alternatively, one could use mediation analyses to determine if the CV accounts for the IV’s effect on the DV.

Run ANCOVA Analysis

If the CVxIV interaction is not significant, rerun the ANCOVA without the CVxIV interaction term. In this analysis, you need to use the adjusted means and adjusted MSerror. The adjusted means refer to the group means after controlling for the influence of the CV on the DV.

Follow-up Analyses

If there was a significant main effect, it means that there is a significant difference between the levels of one IV, ignoring all other factors.^[1] To find exactly which levels are significantly different from one another, one can use the same follow-up tests as for the ANOVA. If there are two or more IVs, there may be a significant interaction, which means that the effect of one IV on the DV changes depending on the level of another factor. One can investigate the simple main effects using the same methods as in a factorial ANOVA.

Power considerations

While the inclusion of a covariate into an ANOVA generally increases statistical power by accounting for some of the variance in the dependent variable and thus increasing the ratio of variance explained by the independent variables, adding a covariate into ANOVA also reduces the degrees of freedom. Accordingly, adding a covariate which accounts for very little variance in the dependent variable might actually reduce power.

References

^ ^a ^b Howell, D. C. (2009) Statistical methods for psychology (7th ed.). Belmont: Cengage Wadsworth.
^ Keppel, G. (1991). Design and analysis: A researcher's handbook (3rd ed.). Englewood Cliffs: Prentice-Hall, Inc.
^ Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics (5th ed.). Boston: Pearson Education, Inc.
^ Miller, G. A., & Chapman, J. P. (2001). Misunderstanding Analysis of Covariance. Journal of Abnormal Psychology, 110 (1), 40-48.
^ Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). New York, NY: McGraw-Hill/Irwin.
^ Green, S. B., & Salkind, N. J. (2011). Using SPSS for Windows and Macintosh: Analyzing and Understanding Data (6th ed.). Upper Saddle River, NJ: Prentice Hall.

External links

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Grouped data
Frequency distribution
Contingency table

Dependence

Pearson product-moment correlation
Rank correlation (Spearman's rho, Kendall's tau)
Partial correlation
Scatter plot

Statistical graphics

Bar chart
Biplot
Box plot
Control chart
Correlogram
Forest plot
Histogram
Q–Q plot
Run chart
Scatter plot
Stemplot
Radar chart

Data collection

Designing studies	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling Stratified sampling Opinion poll Questionnaire

Controlled experiment	Design of experiments Randomized experiment Random assignment Replication Blocking Factorial experiment Optimal design

Uncontrolled studies	Natural experiment Quasi-experiment Observational study

Statistical inference

Statistical theory	Sampling distribution Sufficient statistic Meta-analysis

Bayesian inference	Bayesian probability Prior Posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator

Frequentist inference	Confidence interval Hypothesis testing Likelihood-ratio

Specific tests	Z-test (normal) Student's t-test F-test Chi-squared test Wald test Mann–Whitney U Shapiro–Wilk Signed-rank Kolmogorov–Smirnov test

General estimation	Bias Robustness Efficiency Maximum likelihood Method of moments Minimum distance Density estimation

Correlation and regression analysis

Correlation	Pearson product–moment correlation Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust

Generalized linear model	Exponential families Logistic (Bernoulli) Binomial Poisson

Partition of variance	Analysis of variance (ANOVA) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data

Cohen's kappa
Contingency table
Graphical model
Log-linear model
McNemar's test

Multivariate statistics

Time series analysis

General	Decomposition Trend Stationarity Seasonal adjustment

Time domain	ACF PACF XCF ARMA model ARIMA model Vector autoregression

Frequency domain	Spectral density estimation

Survival analysis

Survival function
Kaplan–Meier
Logrank test
Failure rate
Proportional hazards models
Accelerated failure time model

Applications

Biostatistics	Bioinformatics Biometrics Clinical trials & studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process & Quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Outline
Index

Design of experiments

Scientific Method	Scientific experiment Statistical design Control Internal & external validity Experimental unit Blinding Optimal design: Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size

Treatment & Blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable

Models & Inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison

Designs: Completely Randomized	Factorial Fractional factorial Plackett-Burman Taguchi Response surface methodology Polynomial & rational modeling Box-Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test

Glossary Category Statistics portal Statistical outline Statistical topics

Least squares and regression analysis

Computational statistics

Least squares
Linear least squares
Non-linear least squares
Iteratively reweighted least squares

Correlation and dependence

Pearson product-moment correlation
Rank correlation (Spearman's rho
Kendall's tau)
Partial correlation
Confounding variable

Regression analysis

Ordinary least squares
Partial least squares
Total least squares
Ridge regression

Regression as a
statistical model

Linear regression	Simple linear regression Ordinary least squares Generalized least squares Weighted least squares General linear model

Predictor structure	Polynomial regression Growth curve Segmented regression Local regression

Non-standard	Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic

Non-normal errors	Generalized linear model Binomial Poisson Logistic

Decomposition of variance

Analysis of variance
Analysis of covariance
Multivariate AOV

Model exploration

Mallows' Cp
Stepwise regression
Model selection
Regression model validation

Background

Mean and predicted response
Gauss–Markov theorem
Errors and residuals
Goodness of fit
Studentized residual
Minimum mean-square error

Design of experiments

Response surface methodology
Optimal design
Bayesian design

Numerical approximation

Numerical analysis
Approximation theory
Numerical integration
Gaussian quadrature
Orthogonal polynomials
Chebyshev polynomials
Chebyshev nodes