Many statistical tests make the assumption of equal variance. If this assumption is violated, then the results of the tests become unreliable.
The most common statistical tests and procedures that make this assumption of equal variance include:
1. ANOVA
2. t-tests
3. Linear Regression
This tutorial explains the assumption made for each test, how to determine if this assumption is met, and what to do if it is violated.
Equal Variance Assumption in ANOVA
An ANOVA (“Analysis of Variance”) is used to determine whether or not there is a significant difference between the means of three or more independent groups.
Here’s an example of when we might use an ANOVA:
Suppose we recruit 90 people to participate in a weight-loss experiment. We randomly assign 30 people to use program A, B, or C for one month.
(Video) Two Sample t-Test:Equal vs Unequal Variance Assumption| Statistics Tutorial #24| MarinStatsLecturesTo see if the program has an impact on weight loss, we can perform a one-way ANOVA.
An ANOVA assumes that each of the groups has equal variance. There are two ways to test if this assumption is met:
1. Create boxplots.
Boxplots offer a visual way to check the assumption of equal variances.
The variance of weight loss in each group can be seen by the length of each box plot. The longer the box, the higher the variance. For example, we can see that the variance is a bit higher for participants in program C compared to both program A and program B.
2. Conduct Bartlett’s Test.
Bartlett’s Test tests the null hypothesis that the samples have equal variances vs. the alternative hypothesis that the samples do not have equal variances.
If the p-value of the test is less than some significance level (like 0.05), then we have evidence to say that the samples do not all have equal variances.
What if the equal variance assumption is violated?
In general, ANOVA’s are considered to be fairly robust against violations of the equal variances assumption as long as each group has the same sample size.
However, if the sample sizes are not the same and this assumption is severely violated, you could instead run a Kruskal-Wallis Test, which is the non-parametric version of the one-way ANOVA.
Equal Variance Assumption in t-tests
A two sample t-test is used to test whether or not the means of two populations are equal.
The test makes the assumption that the variances are equal between the two groups. There are two ways to test if this assumption is met:
1. Use the rule of thumb ratio.
As a rule of thumb, if the ratio of the larger variance to the smaller variance is less than 4, then we can assume the variances are approximately equal and use the two sample t-test.
For example, suppose sample 1 has a variance of 24.5 and sample 2 has a variance of 15.2. The ratio of the larger sample variance to the smaller sample variance would be calculated as 24.5 / 15.2 = 1.61.
Since this ratio is less than 4, we could assume that the variances between the two groups are approximately equal.
2. Perform an F-test.
The F-test tests the null hypothesis that the samples have equal variances vs. the alternative hypothesis that the samples do not have equal variances.
If the p-value of the test is less than some significance level (like 0.05), then we have evidence to say that the samples do not all have equal variances.
What if the equal variance assumption is violated?
If this assumption is violated then we can perform Welch’s t-test, which is a non-parametric version of the two sample t-test and does not make the assumption that the two samples have equal variances.
Equal Variance Assumption in Linear Regression
Linear regression is used to quantify the relationship between one or more predictor variables and a response variable.
Linear regression makes the assumption that the residuals have constant variance at every level of the predictor variable(s). This is known as homoscedasticity. When this is not the case, the residuals are said to suffer from heteroscedasticity and the results of the regression analysis become unreliable.
The most common way to determine if this assumption is met is to created a plot of residuals vs. fitted values. If the residuals in this plot seem to be scattered randomly around zero, then the assumption of homoscedasticity is likely met.
However, if there exists a systematic pattern in the residuals, such as the “cone” shape in the following plot then heteroscedasticity is a problem:
What if the equal variance assumption is violated?
If this assumption is violated, the most common way to deal with it is to transform the response variable using one of the three transformations:
1. Log Transformation:Transform the response variable from y tolog(y).
2. Square Root Transformation: Transform the response variable from y to √y.
3. Cube Root Transformation:Transform the response variable from y to y1/3.
By performing these transformations, the problem of heteroscedasticity typically goes away.
Another way to fix heteroscedasticity is to use weighted least squares regression. This type of regression assigns a weight to each data point based on the variance of its fitted value.
Essentially, this gives small weights to data points that have higher variances, which shrinks their squared residuals. When the proper weights are used, this can eliminate the problem of heteroscedasticity.
Additional Resources
The Three Assumptions Made in an ANOVA
The Four Assumptions Made in a T-Test
The Four Assumptions of Linear Regression
FAQs
What is the equal variance assumption? ›
What Is the Assumption of Equal Variance? In simple terms, variance refers to the data spread or scatter. Statistical tests, such as analysis of variance (ANOVA), assume that although different samples can come from populations with different means, they have the same variance.
What is equal variance in statistics? ›'variances equal' simply means that the population variance for one thing is the same as the population variance for some other thing or things.
What statistic tells you if your assumptions of equal variance are correct? ›Levene's test ( Levene 1960) is used to test if k samples have equal variances. Equal variances across samples is called homogeneity of variance. Some statistical tests, for example the analysis of variance, assume that variances are equal across groups or samples. The Levene test can be used to verify that assumption.
Can you assume population variances are equal? ›If sample standard deviations are similar, assumption of equal population variance may be reasonable and the pooled procedure could be used.
What does it mean equal variances not assumed? ›When equal variances are assumed, the calculation uses pooled variances; when equal variances cannot be assumed, the calculation utilizes un-pooled variances and a correction to the degrees of freedom.
What are the two main assumptions for the analysis of variance? ›ANOVA assumes that the data is normally distributed. The ANOVA also assumes homogeneity of variance, which means that the variance among the groups should be approximately equal. ANOVA also assumes that the observations are independent of each other.
Does equal variance mean equal standard deviation? ›Standard deviation can be greater than the variance since the square root of a decimal is larger (and not smaller) than the original number when the variance is less than one (1.0 or 100%). The standard deviation is smaller than the variance when the variance is more than one (e.g. 1.2 or 120%).
Why is equal variance important? ›It is important because it is a formal requirement for statistical analyses such as ANOVA or the Student's t-test. The unequal variance doesn't have much impact on ANOVA if the data sets have equal sample sizes. However, if the sample sizes are different, ANOVA will end up with inaccurate results.
How do you interpret equality of variance? ›If the p-value is larger than the alpha level, then you can say that the null hypothesis stands — that the variances are equal; if the p-value is smaller than the alpha level, then the implication is that the variances are unequal.
What is homoscedasticity vs heteroscedasticity? ›Heteroskedastic refers to a condition in which the variance of the residual term, or error term, in a regression model varies widely. Homoskedastic refers to a condition in which the variance of the error term in a regression model is constant.
Does Mann Whitney U test assume equal variance? ›
So, Mann-Whitney U test assumes the equal variances (homoscedasticity) and the different variations of two populations affect results of the test. It has been noted for a long time in statistical books (for references, see the paper shown in below).
Does a paired t-test assume equal variance? ›The paired t-test is really a test of the mean difference between the samples (such as the difference between 'before' and 'after' values), and the assumption in the test is that the differences are approximately normally distributed. This does not require nor imply that the two samples have the same variance.
When can you assume equal variance in t-test? ›Two-sample T-Test with equal variance can be applied when (1) the samples are normally distributed, (2) the standard deviation of both populations are unknown and assumed to be equal, and (3) the sample is sufficiently large (over 30).
Does Chi Square assume equal variance? ›Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data.
What are the 3 assumptions for an independent measures t-test? ›Independence of the observations.
Each subject should belong to only one group. There is no relationship between the observations in each group. No significant outliers in the two groups.
For this test, a p-value of less than 0.05 indicates that there is, in fact, enough variance in the sample to account for possible mean differences. The p-value reported for Levene's Test for Equality of Variance in the table above is p = 0.000, which is well below the 0.05 threshold.
What are the 3 most common assumptions in statistical analysis? ›A few of the most common assumptions in statistics are normality, linearity, and equality of variance.
What are the two types of variance in statistics? ›There are two types of Variance: Common Cause Variation and Special Cause Variation.
What are the 4 assumptions of normal distribution? ›A normal distribution can be described by four moments: mean, standard deviation, skewness and kurtosis.
What is the assumption of equal standard deviations? ›Further we check, if the standard deviations of the two populations are roughly equal. As a rule of thumb, the condition of equal population standard deviations is met, if the ratio of the larger to the smaller sample standard deviation is less than 2 (Weiss 2010).
What is the equal variance assumption in linear regression? ›
Equal variances: The variance of the residuals should be consistent across all predicted values. Check this assumption by examining the scatterplot of “residuals versus fits.” The variance of the residuals should be consistent across the x-axis.
What is the difference between multicollinearity and heteroscedasticity? ›Introduction. Heteroscedasticity is an error term that implies unequal scattered distribution of residuals in a regression analysis. Heteroscedasticity mainly occurs due to outliers in the data. Besides, Multicollinearity indicates a high correlation between independent variables.
What does it mean if you have heteroskedasticity? ›Heteroskedasticity refers to a situation where the variance of the residuals is unequal over a range of measured values. If heteroskedasticity exists, the population used in the regression contains unequal variance, the analysis results may be invalid.
How do you know if data is homoscedastic? ›You can tell if a regression is homoskedastic by looking at the ratio between the largest variance and the smallest variance. If the ratio is 1.5 or smaller, then the regression is homoskedastic.
Does one-way ANOVA assume equal variance? ›One-way ANOVA assumes that all the populations have the same standard deviation (and thus the same variance). This assumption is not very important when all the groups have the same (or almost the same) number of subjects, but is very important when sample sizes differ.
What are the assumptions of a Mann-Whitney test? ›- The sample drawn from the population is random.
- Independence within the samples and mutual independence is assumed. That means that an observation is in one group or the other (it cannot be in both).
- Ordinal measurement scale is assumed.
The major difference between the Mann-Whitney U and the Kruskal-Wallis H is simply that the latter can accommodate more than two groups. Both tests require independent (between-subjects) designs and use summed rank scores to determine the results.
Is the square of standard deviation equal variance? ›Generally, "the variance is equal to the square of the standard deviation" is widely used as the relationship between the variance and the standard deviation for a sample data set. Hence, the variance is equal to the square of standard deviation.
What is equal variance and UNequal variance? ›The Two-Sample assuming Equal Variances test is used when you know (either through the question or you have analyzed the variance in the data) that the variances are the same. The Two-Sample assuming UNequal Variances test is used when either: You know the variances are not the same.
Does two way Anova assume equal variance? ›Two-way ANOVA, like all ANOVA tests, assumes that the observations within each cell are normally distributed and have equal variances.
What are the 4 assumption of linear regression? ›
Linearity: The relationship between X and the mean of Y is linear. Homoscedasticity: The variance of residual is the same for any value of X. Independence: Observations are independent of each other. Normality: For any fixed value of X, Y is normally distributed.