We usually want to avoid missing values in our data as much as possible, because they can reduce power, bias results, and make modeling harder.
But interestingly, inserting missing values on purpose can sometimes be very useful. For example, it allows you to compare imputation methods under controlled conditions, test how robust your results are when assumptions change, and create realistic practice datasets for tutorials and courses.
In R, a convenient way to do this is data amputation using mice::ampute(). The visualization below shows how the function can be used to insert different missingness mechanisms in the same dataset. Under MCAR, missingness is spread randomly across the data. Under MAR, the missingness in x1 shifts mainly along the x2 axis, meaning it is explained by an observed variable. Under MNAR, the shift happens mainly along the x1 axis, meaning the missingness depends on the missing variable itself. Note: The code shown below the graph is a simplified example and does not reproduce the exact same output.
Thanks to Rianne Schouten for introducing me to the ampute() function. Make sure to check out this article by Schouten et al. (2022), which explains the topic in more detail:
rianneschouten.github.io/mic…
I’ve also added a new bonus module on data amputation to my Missing Data Imputation in R course. It introduces the core concept and walks through a reproducible R example.
If you’d like to explore the full course, you can still join and get lifetime access to all materials:
statisticsglobe.com/online-c…
Talk to you soon.
Joachim
#rstats #datascience #statistics #missingdata