Data Science Fact

Data Science Fact

586 Photos and videos

Tweets

Pinned Tweet

Data Science Fact @DataSciFact

Apr 30

“Statistics is hard, especially when effects are small and variable and measurements are noisy. There are no quick fixes … and a formulaic approach to statistics is a principal cause of the current replication crisis.” — Blakeley McShane

3,064

Data Science Fact

Data Science Fact @DataSciFact

Jun 12

Occam's razor and Bayes' theorem johndcook.com/blog/2011/01/1…

Occam’s razor and Bayes’ theorem

Occam's razor says that if two models fit equally well, the simpler model is likely to be a better description of reality. Why should that be? A paper by Jim Berger suggests a Bayesian justification...

johndcook.com

1,510

Data Science Fact

Data Science Fact @DataSciFact

Jun 12

'We can only connect the dots that we collect.' -- Amanda Palmer

1,302

Data Science Fact

Data Science Fact @DataSciFact

Jun 11

The sample variance of normally distributed data has a gamma distribution.

1,389

Data Science Fact

Data Science Fact @DataSciFact

Jun 11

'Variance is the tendency to learn random things irrespective of the real signal.' -- Pedro Domingos

1,875

Data Science Fact

Data Science Fact @DataSciFact

Jun 10

Common sense and statistics andrewgelman.com/2014/12/25/…

1,579

Data Science Fact

Data Science Fact @DataSciFact

Jun 10

Buggy code is biased code johndcook.com/blog/2010/10/1…

Buggy code is biased code

Buggy simulation code is biased code because you're more likely to discover bugs that contradict your expectations than bugs that confirm your expectations.

johndcook.com

1,360

Data Science Fact

Data Science Fact @DataSciFact

Jun 10

Looks like the US Census won’t be using differential privacy any more. x.com/grok/status/2064492352…

Grok

@grok

Jun 9

Replying to @BasedBasset @censusSDC @CommerceGov

The Commerce Dept just banned "noise infusion" for Census & economic stats. Noise infusion = adding random fake tweaks to real data so no one can ID individuals (a privacy trick, aka differential privacy). It protects secrets but slightly distorts the numbers. New policy: Don't do that. Use "coarsening" instead — round numbers, lump data into bigger groups, or blank out specifics only if needed. Goal: Keep the published stats as accurate and trustworthy as possible while still hiding personal info.

1,986

Data Science Fact

Data Science Fact @DataSciFact

Jun 9

A Grammar of Graphics for Python plotnine.readthedocs.io/en/s…

1,644

Data Science Fact

Data Science Fact @DataSciFact

Jun 9

Sample variance is an unbiased estimator of variance for any distribution with finite variance.

2,118

Data Science Fact

Data Science Fact @DataSciFact

Jun 8

New data, not just bigger data johndcook.com/blog/2015/10/2…

New data, not just big data

The term "big data" is misleading to the extent it implies greater quantities of what we've had before. We don't just have more data, we have new data.

johndcook.com

1,559

Data Science Fact

Data Science Fact @DataSciFact

Jun 8

Relationship between PCA and SVD stats.stackexchange.com/ques…

1,613

Data Science Fact

Data Science Fact @DataSciFact

Jun 5

Karl Pearson developed moment matching estimators in the late 19th century.

3,101

Data Science Fact

Data Science Fact @DataSciFact

Jun 5

Bayesian nonparametric regression = Gaussian processes = Kriging

2,169

Data Science Fact

Data Science Fact @DataSciFact

Jun 4

Conditional independence notation X ⫫ Y johndcook.com/blog/2020/03/2…

Conditional independence notation

Random variables being independent is analogous to lines being perpendicular, so a variation on the symbol for perpendicular is used to denote independence.

johndcook.com

1,641