public class VIF_Tolerance
extends java.lang.Object
The Whole idea of checking the tolerance is to see whether the predictors that are going to be used in a model are too highly correlated with each other. According to Field A. 2005. perfect multi-colinearity exists when at least one predictor is a perfect linear combination of the others.
1) If the Tolerance is considerably Low or VIF is considerably high, any model containing those may have reduced predictive power.
2) if the tolerance is almost 0, then the correlation is so high that the (optimization) algorithms cannot "decide" which pair of coefficients to put, since there are infinite pairs that could work. (Field A. 2005)
According to Bowerman & O Connell 1990, an average of VIF substantially higher than 1, it would have been a concern.
According to Myers 1990 a VIF value higher than 10 would have been an issue.
Also according to Menard (1995), tolerance below 0.2 would have been a problem.
From experience I would say it depends on the set. In small sets, even a VIF value of 3 can be an issue.
The general idea of the Algorithm is that from a set of double predictors n, we run n Least square Regressions, where each one of the predictors will become the dependent Variable Y and the rest will be the independent predictors matrix X iteratively until all of the predictors become the Y .
From each iteration we retrieve the regression's R-Square with the formula :
R2 = 1 - SSR / SSTOwhere SSR is the sum of squared residuals and SSTO is the total sum of squares.
Tolerance is calculated from the following formula :
Tol = 1-R2where
R2the OLS regressions R Square. More details can be found http://en.wikipedia.org/wiki/Multicollinearity
VIF's Formula is
VIF= 1/(1-R2)where
R2is again the OLS regressions R Square. More details can be found http://en.wikipedia.org/wiki/Variance_inflation_factor
| Constructor and Description |
|---|
VIF_Tolerance() |
| Modifier and Type | Method and Description |
|---|---|
double[] |
get_TOL()
returns the double array [] with Tolerance values.
|
void |
get_VIF_TOL_old(double[][] variables)
Computes VIF (variance inflation factor) and Tolerance, but it uses a (one_out regression approach)
Warning: It might be quite inefficient in large sets with many rows and columns.
|
void |
get_VIF_TOL(double[][] variables)
Computes VIF (variance inflation factor) and Tolerance, but it uses a correlation matrix as its base for the computations.
|
double[] |
get_VIF()
returns the double array [] with VIF values.
|
public void get_VIF_TOL_old(double[][] variables)
variables - : the [k,n] array representing the x sample of predictorsMathIllegalArgumentException - if the x set of variables is less than 2public void get_VIF_TOL(double[][] variables)
variables - : the [k,n] array representing the x sample of predictorspublic double[] get_VIF()
returns the double array [] with VIF values.
This array will hold as many values as the variables (column) count.
Watch out for very big values heres.
public double[] get_TOL()
returns the double array [] with Tolerance values.
This array will hold as many values as the variables (column) count.
Watch out for very low values heres.