Political Calculations
May 13, 2008

Should we rebuild the tool we might very well have lost forever from scratch or should we wait five to seven business days to see if the code can be recovered?

It's almost a classic economics question. We've already sunk time and effort into creating the tool we had planned to post yesterday, so that argues in favor of waiting until the code is recovered from our no-longer functioning USB flash drive. But, that doesn't take into account the risk involved in just letting things ride. We don't know for a certainty if the data on our flash drive can ever be recovered. If it can't, that argues in favor of rebuilding the tool, taking advantage of what we learned in the process of doing it the first time to make the process go quickly.

As you're about to find out, our answer to that question is that we should do both.

One of the things that we've learned over time is that when it comes to coding a major project, it really helps to use a modular approach. That way, we can redeploy code as needed to save time and effort in future projects. For this situation, it occurred to us that we could rebuild a portion of the code, in this case, the portion that mathematically models the distribution of household taxpayer Adjusted Gross Income (AGI) in the United States, and use it in a different application that we would very likely have developed anyway.

More than that, it allowed us to revisit the household income distribution model that we had previously created, exploring an option that we hadn't previously considered: Could we make a model that more accurately reflects reality if we broke the overall distribution into segments?

The answer, luckily, was yes. Better still, should our potentially lost forever code be recovered, we can simply replace the code for our previous income distribution model with the new one. That way, we won't have to rebuild the entire code for that tool from scratch, unless and until we find out for sure that we might have to rebuild the rest of it. Even better, that tool will be better since we've made the model behind it much better!

Speaking of which, here's that new distribution, for which we used ZunZun's invaluable 2D Function Finder to identify the NIST Hahn distribution as being particularly good. We've split the distribution into four main segments, based upon AGIs of $0-$28,038, $28,038-$31,630, $31,630-$206,409, and $206,409 on up:

2005 U.S. Cumulative Number Tax Returns vs Household Adjusted Gross Income

In modeling the cumulative number of tax returns against household adjusted gross income this way, we reduced the error between our model and actual data points to be within plus or minus 0.004% for values from $0 through $206,398. We did find a division by zero anomaly centered on an AGI of $15,600 that affected the calculated results from $14,100 through $17,400, which we corrected with a simple data patch using a NIST Hahn distribution with different coefficients over this range.

Above $206,409, our greatest error is an overcount of 722 at $1,547,988 and the magnitude of the error is within 0.0006% of the actual cumulative number of returns filed in 2005 at this point. Otherwise, the model either slightly overcounts or undercounts the cumulative number of personal income tax returns by magnitudes of plus or minus 0.0003%.

Here's the corresponding chart we have for the number of personal income tax returns filed in 2005 by household Adjusted Gross Income, or in other words, the number of income tax filing households within each $100 interval of Adjusted Gross Income from $0 through $500,000:

2005 Distribution of Number of Personal Income Tax Returns vs Household AGI

In this chart, you begin to see the effects of a minor blending error between our mathematical models for Segments #1, #2 and #3, which occurs approximately at a household AGI of $31,000. In the next chart, the effect of this blending error will appear to be amplified, as it occurs near the effective peak of where the maximum aggregate amount of household income is to be found in the United States, at the transition between our modeled segments:

2005 Approximate Aggregate Household Income vs Household Adjusted Gross Income

We should note that the median household income for our model would be at roughly $31,880, which just above the top income threshold for our Segment #2. Our next chart shows a close-up view of this region of the chart, which confirms that the "peak" is largely a visual artifact given the scale of the chart:

2005 Approximate Aggregate Household Income vs Household Adjusted Gross Income Close-Up view of 'Peak' Feature

Now we get to the point of this whole exercise! We've rebuilt the portion of our potentially lost code with our mathematical model of the distribution of household income in the United States for 2005, and we've built a tool around it so that anyone can now "read" the values off of each of the charts that we've incorporated into this post!

Household Adjusted Gross Income Data
Input Data Values
Household Adjusted Gross Income [2006 USD]


Distribution Data
Calculated Results Values
Cumulative Number of Returns for AGI
Number of Returns at Income Level (AGI plus or minus $50)
Aggregate Income at Income Level (AGI plus or minus $50)

Our household income distribution model effectively tops out at an AGI of $142,000,000. We've arbitrarily capped the tool's results at this level, as the number of individuals earning this kind of money in a year is awfully sparse. If you enter a larger figure, you'll only get results for this maximum value.

What can we say? Without having access to all the data we typically have on hand to keep working on all our various projects that we have going at any one time, we get bored pretty easily....

Labels: , ,



<< Home
Unexpectedly Intriguing!

About Political Calculations



blog advertising
is good for you

Welcome to the blogosphere's toolchest! Here, unlike other blogs dedicated to analyzing current events, we create easy-to-use, simple tools to do the math related to them so you can get in on the action too! If you would like to learn more about these tools, or if you would like to contribute ideas to develop for this blog, please e-mail us at:

ironman at politicalcalculations.com

Thanks in advance!

Most Popular Posts

The S&P 500 at Your Fingertips

Mapping S&P 500 Performance, Since 1871

Should You Trade In Your Gas Guzzler?

What Are the Chances Your Marriage Will Last?

Reckoning the Odds of Recession

Your 2009 Paycheck

Tipping Around the World

Revisiting the Lottery

Estimating Your Life Expectancy

Connecting the Dots for Personal Income Taxes

Quick Index

First Time Visitor to Political Calculations?

On the Moneyed Midways

A Lot, But Not All, of Our Tools

Recession Probability Track

Recession Probability Track - 12 July 2005 through 10 July 2009

Political Calculations' Recession Probability Track shows the probability that the U.S. economy will be in recession 12 months from the indicated date (shown in red) while revealing the probability trend over the past four years.

Previously, the probability of recession peaked at 50% on 4 April 2007, which means that March-April 2008 was the most likely period in which the NBER would have found the U.S. to be in recession.

As it happens, they almost did. The NBER instead chose December 2007 as the beginning month of the most recent recession (we had found a 46% probability for a recession beginning in that month!)

On the Moneyed Midways

Political Calculations is also the online home of On the Moneyed Midways (aka OMM), a review of the best posts contributed to the week's best business and money-related blog carnivals. More than that, we also name one post in each edition as being The Best Post of the Week, Anywhere! and at the end of each year, we name The Best Post of the Year, Anywhere! as well as identifying the best blogs we found during the course of the year!

The link below will take you to the running index containing our most recent back issues (you can easily navigate the index to find older editions.)

OMM's Running Index for 2008

Recent Posts

Political Calculations' Top Nine, Plus One

On the Moneyed Midways - May 12, 2008

Andrew Biggs Is, Partially, Wrong

Ha! Take That, Detroit!...

Dead in the Water

Freakonomics Asks, We Answer

How Much Does the US Rely on Personal Income Taxes...

Horton Hears the Who's Economy

On the Moneyed Midways - May 2, 2008

67,024+ Pages of the US Tax Code vs Transparency

Site Data

This site is primarily powered by:

This page is powered by Blogger. Isn't yours?

Visitors since December 6, 2004:

TTLB Ecosystem

CSS Validation

Valid CSS!

RSS Site Feed

AddThis Feed Button

JavaScript

The tools on this site are built using JavaScript. If you would like to learn more, one of the best free resources on the web is available at W3Schools.com.

Other Cool Resources

ZunZun - Exceptional regression analysis tool.
Wolfram Integrator - Solve integrals. Do calculus!
Create a Graph - Easy-to-use basic graph-making tool.
Many Eyes - Data visualization extraordinaire!


Archives
December 2004
January 2005
February 2005
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008
August 2008
September 2008
October 2008
November 2008
December 2008
January 2009
February 2009
March 2009
April 2009
May 2009
June 2009
July 2009

Pajamas Media BlogRoll Member
Belmont Club
Big Picture, The
Bloodhoundblog
Budgets Are Sexy
Cafe Hayek
Carpe Diem
Cheap, Healthy, Good
College Analysts
Copywriting Tips
Core77
Coyote Blog
Craig Harper
Digerati Life, The
Disciplined Approach to Investing
Dividend Guy, The
Division of Labour
Doug Short
Dough Roller, The
Eclectecon
Econlog
Economics Roundtable
EconomicsUK
Entrepreneurial Mind
Environmental Economics
Escape from Cubicle Nation
Execupundit
Fat Pitch Financials
Fortify Your Oasis
Gongol
Hot Air
Hugh Hewitt
Ideologic LLC
Instapundit
Intangible Economy
I've Paid Twice for This Already
Joanne Jacobs
Kaus Files
Little Green Footballs
Mahalanobis
Making Ripples
Market Power
Michelle Malkin
Mighty Bargain Hunter
Monevator
Money Blue Book
My Dollar Plan
New Economist
Newmark's Door
Nina Simosko
Physorg
Polipundit
Political Yin/Yang
Powerline
Private Sector Development
Radio Equalizer
Real Clear Politics
Roger L. Simon
SCSU Scholars
Skeptical Optimist
Small Business Buzz
Sound Politics
SOX First
Speculist, The
Sports Economist, The
squawkfox
The Truth Laid Bear
Three Star Leadership
Tim Worstall
Tough Money Love
Townhall
Trusted Advisor
voluntaryXchange
WILLisms
Winterspeak