Health Journalism Glossary

Imputation

  • Medical Studies

In biostatistics, the results of calculations are only as good as the data used to generate them. If too much data are missing, it can cause difficulties in making calculations with that data, such as introducing bias and unreliability, making the data difficult to analyze accurately or reducing the efficiency of calculations. In some cases, the missing data can be left alone, but if the gaps are too large, or those fields are too essential for the calculation, researchers and statisticians must sometimes substitute in values for missing data to be able to make calculations—imputation is the substitution of those values with statistically derived estimates.

Deeper dive
That is, imputed data are not random numbers that the researchers make up, however. Instead, imputation involves making estimations of appropriate, likely ranges of the data, either by using averages from the existing data, using a different data set, or making assumptions about the data based on other information. Whenever researchers use imputation for missing data, the study should explain what method they used to impute the substituted values and what their reasoning was for choosing that method. Ideally, they should also have addressed in the limitations section any potential biases that could have resulted from their use of imputation for missing data. If they haven’t, it’s wise to ask the researchers and/or outside sources about potential bias arising from the imputed data.

Share: