Unexpectedly Intriguing!
16 August 2024

The Inventions In Everything team has covered a lot of unusual patents over the years and are used to seeing really strange things, but even they were dumbfounded after being presented with the following chart. If you read the headline on this article, you already know something remarkable is afoot!...

Spurious Correlations: American cheese consumption correlates with Patents Granted in the US

If you know your data correlation coefficients, you can see from the information presented with the chart that the correlation between the per capita consumption of American cheese in the U.S. and the total number of patents granted in the U.S. appears to be both strong and statistically significant.

What could possibly explain the strength of that correlation? Could the inspiration behind the visions of American inventors be explained by their consumption of American cheese? Then again, maybe its not the inventors but the patent examiners instead. Could an increased intake of American cheese be making them more productive in issuing patents?

Or maybe, just maybe, someone's playing fast and loose with their data analysis and this whole cheesy correlation is a carefully invented fraud.

As much as we'd love to be able to definitively link the total number of patents issued to the individual consumption of American cheese, and despite the existence of an AI-generated paper proclaiming that such a relationship is real, the truth is it is all a spurious correlation. Tyler Vigen did some serious data dredging to uncover the unlikely relationship. Here, he explains why it works:

Why this works

Data dredging: I have 25,153 variables in my database. I compare all these variables against each other to find ones that randomly match up. That's 632,673,409 correlation calculations! This is called “data dredging.” Instead of starting with a hypothesis and testing it, I instead abused the data to see what correlations shake out. It’s a dangerous way to go about analysis, because any sufficiently large dataset will yield strong correlations completely at random.

Lack of causal connection: There is probably no direct connection between these variables, despite what the AI says above. This is exacerbated by the fact that I used "Years" as the base variable. Lots of things happen in a year that are not related to each other! Most studies would use something like "one person" instead of "one year" to be the "thing" studied.

Observations not independent: For many variables, sequential years are not independent of each other. If a population of people is continuously doing something every day, there is no reason to think they would suddenly change how they are doing that thing on January 1. A simple p-value calculation does not take this into account, so mathematically it appears less probable than it really is.

Confounding variable: 2020 is particularly different from the other years on this graph. Confounding variables (like global pandemics) will cause two variables to look connected when in fact a "sneaky third" variable is influencing both of them behind the scenes.

Y-axis doesn't start at zero: I truncated the Y-axes of the graph above. I also used a line graph, which makes the visual connection stand out more than it deserves. Mathematically what I showed is true, but it is intentionally misleading.

Vigen's website is a treasure trove of similarly dubious connections, and of course, he's the author of a book on the topic. In the following video he gives an overview of what can make a correlation spurious:

Still, these kinds of things are fun to think about. If you know any inventors or patent examiners, don't be afraid to send them some good old fashioned American cheese to snack on so they can keep their productive output up. Just in case it might really work.

References

Tyler Vigen. Spurious correlation #2,196. tylervigen.com. Creative Commons. CC by 4.0 Attribution 4.0 International Deed.

Labels: ,

About Political Calculations

Welcome to the blogosphere's toolchest! Here, unlike other blogs dedicated to analyzing current events, we create easy-to-use, simple tools to do the math related to them so you can get in on the action too! If you would like to learn more about these tools, or if you would like to contribute ideas to develop for this blog, please e-mail us at:

ironman at politicalcalculations

Thanks in advance!

Recent Posts

Indices, Futures, and Bonds

Closing values for previous trading day.

Most Popular Posts
Quick Index

Site Data

This site is primarily powered by:

This page is powered by Blogger. Isn't yours?

CSS Validation

Valid CSS!

RSS Site Feed

AddThis Feed Button

JavaScript

The tools on this site are built using JavaScript. If you would like to learn more, one of the best free resources on the web is available at W3Schools.com.

Other Cool Resources

Blog Roll

Market Links

Useful Election Data
Charities We Support
Shopping Guides
Recommended Reading
Recently Shopped

Seeking Alpha Certified

Archives