Daily data science tweets from @JohnDCook.

Joined November 2010
586 Photos and videos
Pinned Tweet
“Statistics is hard, especially when effects are small and variable and measurements are noisy. There are no quick fixes … and a formulaic approach to statistics is a principal cause of the current replication crisis.” — Blakeley McShane
3
6
20
3,064
'We can only connect the dots that we collect.' -- Amanda Palmer
2
9
1,302
The sample variance of normally distributed data has a gamma distribution.
1
15
1,389
'Variance is the tendency to learn random things irrespective of the real signal.' -- Pedro Domingos
1
13
1,875
Common sense and statistics andrewgelman.com/2014/12/25/…

1
6
1,579
Looks like the US Census won’t be using differential privacy any more. x.com/grok/status/2064492352…

Jun 9
The Commerce Dept just banned "noise infusion" for Census & economic stats. Noise infusion = adding random fake tweaks to real data so no one can ID individuals (a privacy trick, aka differential privacy). It protects secrets but slightly distorts the numbers. New policy: Don't do that. Use "coarsening" instead — round numbers, lump data into bigger groups, or blank out specifics only if needed. Goal: Keep the published stats as accurate and trustworthy as possible while still hiding personal info.
1
4
1,986
A Grammar of Graphics for Python plotnine.readthedocs.io/en/s…

1
8
1,644
Sample variance is an unbiased estimator of variance for any distribution with finite variance.
20
2,118
Relationship between PCA and SVD stats.stackexchange.com/ques…

7
1,613
Karl Pearson developed moment matching estimators in the late 19th century.
2
18
3,101
Bayesian nonparametric regression = Gaussian processes = Kriging
1
8
2,169
What do practitioners need to know about regression? andrewgelman.com/2010/12/05/…

3
10
1,620
The sum of (x_i - x)² over a set of data points x_i is minimized when x is the sample mean.
19
2,441
p-values are kinda weird when you think about them carefully. Screenshot from slides here: fharrell.com/talk/bguide/
3
31
4,411
Ridge regression uses an L2 regularizer. Lasso uses L1. Elastic net uses a convex combination of L2 and L1.
1
9
1,376
The blessing of dimensionality andrewgelman.com/2004/10/27/…

3
15
3,185