skip to content
Ben Lau statistics . machine learning . programming . optimization . research

Central Limit Theorem

2 min read Updated:

When does Central Limit Theorem not hold?

When its conditions are not met

  1. random variables must be iid (can be weakened in some variants)
  2. finite variance
  3. sample size should be large enough (>=30 as a rule of thumb)

Why we always use Normal Distribution?

First, many distributions we wish to model are truly close to being normal distributions. The central limit theorem shows that the sum of many independent random variables is approximately normally distributed. This means that in practice, many complicated systems can be modeled successfully as normally distributed noise, even if the system can be decomposed into parts with more structured behavior.

Second, out of all possible probability distributions with the same variance, the normal distribution encodes the maximum amount of uncertainty over the real numbers. We can thus think of the normal distribution as being the one that inserts the least amount of prior knowledge into a model.

— from: Deep Learning 3.9.3

In many applications, the error term consists of many miscellaneous factors not captured by the regressors, i.e. ϵi\epsilon_i captures all influences on yiy_i other than xix_i. The Central Limit Theorem suggests that the error term has a normal distribution.

剖析深度學習 (1):為什麼Normal Distribution這麼好用?

Acknowledgements