23 December 2016

The Biggest Math Story of 2016

NASA's Real World: Mathematics - Source: http://www.nasa.gov/audience/foreducators/nasaeclips/

We've reached the end of 2016 here at Political Calculations, which means that it is time once again to celebrate the biggest math-related news stories of the year!

But not just any math-related news stories. Because the most frequent question that most people ask about the maths they learn in school is "where would I ever use this in real life?", we celebrate the math-related stories that have real world impact, where the outcome of the maths described in the news really do matter to daily life.

So while stories about perfectoid spaces or the remarkable geometry of hexaflexagons might be cool to mathematicians, it's hard to find where they might be relevant to everyone else. The same is true of the story of the computer generated proof of the Boolean Pythagorean Triples problem, which is so long, it would take a human an estimated 10 billion years to read through all of it!

But then, there's that one story that falls in that category that's so off the wall, we cannot help ourselves and have decided to include it in our year end wrap up just for pure fun. It's the story of the team of Oxford University mathematicians who teamed with an engineer at Tufts University to model the dynamics of the tongues of chameleons. Really. Because... math!

The chameleon tongue strike is well documented, most people have seen examples of it in action in nature documentaries—generally in slow motion. What sets it apart is its speed—a chameleon can push its tongue out at a target at speeds up to 100 kilometers per hour. But how it does so, has not been well understood. In this new effort, the researchers have found that in order to reach such incredible speeds so quickly, the chameleon relies on three main parts: the sticky pad that is situated on the end of its tongue which adheres to prey, coils of acceleration muscles and retractor muscles that pull prey back in before they have a chance to escape. They also note that both types of muscles coil around a tiny bone in the mouth—the hyoid. In order for a chameleon to catch prey, all of its systems must work in near perfect unison.

It all starts, the researchers report, with the accelerator muscles contracting, which squeezes tube shaped segments inside of the tongue, pushing them to the far end in what they team calls a loaded position. As the accelerator muscles contract, the tongue is forced outward while at the same time, the tube shaped segments are pushed outwards telescopically, like an old fashioned car radio antenna. The sheets are made of collagen which is of course very elastic, which means they are stretched out as the tongue is pushed away from the mouth, but then naturally recoil once the target has been reached. Retraction is assisted by retractor muscles.

The researchers have put all these actions into a mathematical model which allows them to manipulate various factors, such as how big around the sheets can be. They noted that such changes to the system could be destructive—if the radius of the inner sheath was more than 1.4 millimeters, they found, the tongue would rip loose from its base as it was launched causing the loss of the tongue.

Aside from the obvious but limited military applications in the latter example, the truth is that these are all stories that just don't click with everyday experiences. And at first glance, a story that came very early in 2016 would not either, with the announcement of the discovery of the largest prime number ever yet found. That newly discovered prime number contains over 22 million digits, which is some 5 million digits longer than the previous record holder, and can be calculated using the following math:

The Largest Known Prime Number in 2016 = 274,207,281 - 1

Why that's significant to everyday life is because prime numbers are essential to modern cryptography, where they are used as the keys that allow the intended recipients of the encrypted information to be able to access or copy information, whether in the form of digitally-signed e-mails or movies stored on digital video discs. The more known prime numbers there are, and especially the larger they are, the generally more secure the encrypted information may be.

That's also an important development because of another discovery regarding prime numbers, particularly smaller ones, where Stanford University mathematicians Robert Lemke Oliver and Kannan Soundararajan found that smaller prime numbers are not as randomly distributed as had previously been believed.

Though the idea behind prime numbers is very simple, they still are not fully understood — they cannot be predicted, for example and finding each new one grows increasingly difficult. Also, they have, at least until now, been believed to be completely random. In this new effort, the researchers have found that the last digit of prime number does not repeat randomly. Primes can only end in the numbers 1, 3,7 or 9 (apart from 2 and 5 of course), thus if a given prime number ends in a 1, there should be a 25 percent chance that the next one ends in a 1 as well—but, that is not the case the researchers found. In looking at all the prime numbers up to several trillion, they made some odd discoveries.

For the first several million, for example, prime numbers ending in 1 were followed by another prime ending in 1 just 18.5 percent of the time. Primes ending in a 3 or a 7 were followed by a 1, 30 percent of the time and primes ending in 9 were followed by a 1, 22 percent of the time. These numbers show that the distribution of the final digit of prime numbers is clearly not random, which suggests that prime numbers are not actually random. On the other hand, they also found that the more distant prime numbers became the more random the distribution of their last digit became.

Prime numbers aren't the only big numbers that appear to have hidden patterns. Just as a quick aside, even the well-known constant pi, which is the ratio of the circumference of a circle to its diameter, has hidden patterns within its infinitely long, non-repeating, not-terminating decimal format transcription. And then, why we call it pi isn't random either, thanks to the mathematical contributions of a farm boy from Wales named William Jones in the early 1700s. Wales, of course, is a country whose geographic area has become a popular unit of measure, which made the news back in January 2016 when it survived a petition challenge by metric system purists/fascists who wanted to stop everyone else from using the size of Wales as a unit of measure.

Getting back to mathematical patterns of note, 2016 also saw the first statistically significant demonstration of Zipf's Law:

Zipf's law in its simplest form, as formulated in the thirties by American linguist George Kingsley Zipf, states surprisingly that the most frequently occurring word in a text appears twice as often as the next most frequent word, three times more than the third most frequent one, four times more than the fourth most frequent one, and so on.

The law can be applied to many other fields, not only literature, and it has been tested more or less rigorously on large quantities of data, but until now had not been tested with maximum mathematical rigour and on a database large enough to ensure statistical validity.

Researchers at the Centre de Recerca Matemàtica (CRM), part of the Government of Catalonia's CERCA network, who are attached to the UAB Department of Mathematics, have conducted the first sufficiently rigorous study, in mathematical and statistical terms, to test the validity of Zipf's law. This study falls within the framework of the Research in Collaborative Mathematics project run by Obra Social "la Caixa". To achieve this, they analysed the whole collection of English-language texts in the Project Gutenberg, a freely accessible database with over 30,000 works in this language. There is no precedent for this: in the field of linguistics the law had never been put to the test on sets of more than a dozen texts.

According to the analysis, if the rarest words are left out - those that appear only once or twice throughout a book - 55% of the texts fit perfectly into Zipf's law, in its most general formulation. If all the words are taken into account, even the rarest ones, the figure is 40%.

While not much more than an interesting finding now, that Zipf's Law appears to hold across such a large body of human language will have applications in the development of artificial intelligence, particularly for machine understanding of human language. That next generation Amazon Echo, Google Home, or whatever mobile voice recognition app will be on your mobile devices in the future will get better at understanding and communicating with you as a result. More practically, because technology will be better able to replicate the patterns inherent in human writing, movie producers have moved one step closer to realizing their long-held dream of being able to replace all those annoying and costly human screenwriters with automated script writers, where audiences won't be able to tell the difference for most Hollywood productions.

Another development in developing a better understanding of the real world through math can be found in the story of a team of University of Washington social scientists who have found a way to improve their ability to collect data about people who are hard to collect data about, which has direct and immediate applications for both social science research and public policy.

A hallmark of good government is policies which lift up vulnerable or neglected populations. But crafting effective policy requires sound knowledge of vulnerable groups. And that is a daunting task since these populations — which include undocumented immigrants, homeless people or drug users — are usually hidden in the margins thanks to cultural taboos, murky legal status or simple neglect from society.

"These are not groups where there's a directory you can go to and look up a random sample," said Adrian Raftery, a professor of statistics and sociology at the University of Washington. "That makes it very difficult to make inferences or draw conclusions about these 'hidden' groups."

Raftery and his team started looking for methods to assess the uncertainty in RDS studies. They quickly settled on bootstrapping, a statistical approach used to assess uncertainty in estimates based on a random sample. In traditional bootstrapping, researchers take an existing dataset—for example, condom use among 1,000 HIV-positive men—and randomly resample a new dataset, calculating condom use in the new dataset. They then do this many times, yielding a distribution of values of condom use that reflects the uncertainty in the original sample.

The team modified bootstrapping for RDS datasets. But instead of bootstrapping data on individuals, they bootstrapped data about the connections among individuals....

By tree bootstrapping, Raftery's team found that they could get much better statements of scientific certainty about their conclusions from these RDS-like studies. They then applied their method to a third dataset—a RDS study of intravenous drug users in Ukraine. Again, Raftery's team found that they could draw firm conclusions.

"Previously, RDS might give an estimate of 20 percent of drug users in an area being HIV positive, but little idea how accurate this would be. Now you can say with confidence that at least 10 percent are," said Rafferty. "That's something firm you can say. And that can form the basis of a policy to respond, as well as additional studies of these groups."

While all these stories so far have significant real world impact, the biggest math story of 2016 is the failure of "scientific" political polling to anticipate two major world events: the United Kingdom's "Brexit" referendum, in which British voters unexpectedly opted to leave the European Union, and the also unexpected election of Donald Trump to be the 45th President of the United States over Hillary Clinton, where numerous political polls in both nations had anticipated the opposite outcomes in both elections.

The pollsters, the betting markets and the tenor of (mainstream) media reports all favored a Hillary Clinton win on Tuesday. The widely followed FiveThirtyEight forecast gave her a 71.4% chance, and they were relatively skeptical. The New York Times' Upshot, which had endorsed the Democrat, gave her an 84% chance. Punters were sanguine as well. The Irish betting site Paddy Power gave Clinton 2/9 odds, or 81.8%.

They were wrong, just as they had been five months prior.

"It will be an amazing day," Donald Trump thundered in Raleigh, North Carolina on Monday, Election Eve. "It will be called Brexit plus plus plus. You know what I mean?"

What he meant is that conventional wisdom ascribed similarly thin chances to a "Leave" victory in Britain's June referendum to exit the European Union. Stephen Fischer and Rosalind Shorrocks of Elections Etc summed up the probabilities on June 22, the eve of that vote: poll-based models gave "Remain" 68.5%; polls themselves gave it 55.6%; and betting markets gave it 76.7%. No matter who you asked, Britain's future in the trading bloc seemed secure. Yet the Leavers won, 51.9% to 48.1%, sending the pound plunging against other currencies and wiping out a record $3 trillion in wealth as markets across the globe swooned. (See also, Brexit's Effect on the Market.)

Ultimately, the failure of 2016's political polls, and the resulting shocks they generated in the economies of both the U.K. and the U.S. which will have enduring impacts for years to come, can be traced to the mathematical models that were used to represent the preferences of each nation's voting populations, which proved to be off the mark. So much so that the story of the Political Polls of 2016 also made it into our Examples of Junk Science series.

After the U.S. elections, 2016's political pollsters reflected what went wrong for them:

Patrick Murray, director of the Monmouth University Polling Institute, told Investopedia Wednesday morning that the result was "very similar to what happened in Brexit," and pollsters "didn't learn that lesson well enough." A large contingent of Trump voters, who he describes as "quietly biding their time," simply did not show up in the polls; Monmouth's last national poll, conducted from November 3 to 6, showed Trump leading Clinton by 6 points among likely voters.

Asked what pollsters could do about these quiet voters, Murray confessed, "We don't know yet." He laid out a plan for a "first line of inquiry": see who didn't talk to the pollsters, look for patterns, try to develop a profile of the kind of voter that threw the forecasts off. "It'll take a while to sort this out," he said, adding, "there's no question that this was a big miss."

Is he worried polling will be discredited? "Yes." The industry has relied on "establishment models," and these are apparently inadequate when it comes to capturing a transformational political movement.

Writing on Wednesday, Wells – who, as a member of YouGov's British team, describes himself as "quite separate" from the pollster's American operations – offered his colleagues across the pond some advice, since "we in Britain have, shall I say, more recent experience of the art of being wrong." As with the Brexit vote, he attempts to separate the quality of the data and the quality of the narrative. Given that Clinton probably won the popular vote, polls showing her doing so are not fundamentally wrong. They favored her too much, however, and state polls were without a doubt way off the mark.

Can polling be fixed? It's too early to say, but one suggestion Wells makes is that modeling turnout on historical data, as U.S. pollsters have tended to do, can be treacherous. When Brits tried this method on the Brexit vote, rather than simply asking people how likely they were to vote, they got "burnt."

Perhaps tree bootstrapping might have helped the pollsters, but that wasn't to be in 2016, and their failure to correctly anticipate the outcomes of the most significant elections in years because of the mathematical models they employed makes the story of 2016's political polls the biggest math-related story of 2016.

As we look back over 2016 and the biggest math-related stories that we identified for the year, we can't help but notice yet another hidden pattern - nearly all of the stories we selected as having the most significant real world impact are really about communication. Whether its about patterns in human language, or securing communications between people, or finding out what people think, the biggest math-related stories of 2016 illustrate a portion of the extent to which maths are intersecting with our daily lives in ways that are almost invisible, and yet have profound influence.

And that wraps up our year here at Political Calculations. We'll be back again after New Year's!

Previously on Political Calculations