Censored Data
1 min read
Updated:
With truncated or censored data, the parameters would be underestimated without any handling. pymc example
Note that truncated data means those data points outside a certain range are not included, while censored data means those data points outside a certain range are clipped.
There are two approaches to handle censored data:
- Impute the censored data with a value, e.g. mean, median, or mode
- Add non-negative random noise to the censored data, learn two models for censored and uncensored data, pool the parameters of the two models. pymc example
- Use the censored/truncated distribution directly censored distribution truncated distribution. They solves the bias problem by updating the likelihood function to reflect our knowledge about the data generating process that we have zero probability of observing data outside the upper and lower bounds, so the limited range of observed data won’t shrink the posterior distribution.