Why do we impute missing data?

Multiple Imputation is a method used to estimate observations that are orginally missing, by making assumptions about how the missing data was created we can reestimate it. Missing data is something that is found in almost every analysis, and the loss of data will impact the accuracy of any estimation. The idea of imputation is to replace the values that are missing with the most plausible values

There are three different mechanisms of missing that can occur:

  • Missing Completely at Random (MCAR)
  • Missing at Random (MAR)
  • Missing Not at Random (MNAR)


Terminology

Univariable: One single predictor variable Univariate: One single outcome variabe
Multivariable: More than one predictor variable Multivariate: More than one outcome variable
Sum of Squares: Sum of the deviation squared of individual points from the mean, e.g. for X, \(SS_{xx} = \sum_{i=1}^{n}(x_{i} - \bar{x})^{2}\) Sum of Cross Product Sum of the deviation of the outcome variable, y, and the preictor variable x, \(SS_{xy} = \sum_{i=1}^{n}(y_{i} - \bar{y})(x_{i} - \bar{x})\)
Variance Estimator: As the MLE estimate of the Variance is biased towards the estimated line the following estimator is used instead; \({\hat{\sigma}}^2 = \sum_{i=1}^{n} {\hat{\varepsilon}}^{2} = \sum_{i=1}^{n}\frac{(y_{i} - {\hat{\alpha}} - {\hat{\beta}}x_{i})^{2}}{n-2}\) Test Test

Back to Top

Dated: Feb 2019

CV | Contact Me