EXCLUSIVE: US economist confirms Warmest 100 findings
Australian David Quach is a MATLAB developer with Supported Intelligence in Chicago, where he’s currently developing business valuation and decision support software. Before joining Supported Intelligence, Quach was a Senior Economist with Access Economics (now Deloitte Access Economics) and an economic adviser to the Australian Competition and Consumer Commission.
Following an exclusive story published on TheVine earlier this week, Quach reached out to Warmest 100 founder Nick Drewe via Twitter, asking if he could run a series of statistical analyses on the team’s raw data, tallied from 35,081 votes across 3,602 entries, or 2.7% of the total vote when compared to the 1.3 million votes registered with the poll last year.
“I listen to the Hottest 100 and always find it interesting to discuss which songs you think are going to be near the top,” the 28-year-old told TheVine. “But also as an economist and someone who deals with data a lot, I think what Nick Drewe and Tom Knox did was really interesting.”
Quach utilised MATLAB, a numerical computing environment, to run a bootstrap analysis of the prediction, the results of which he has released exclusively to TheVine. Bootstrapping is a statistical technique that analyses the accuracy of sample data when the overall population data is unknown.
Here in his own words, Quach tells us exactly how accurate he thinks the Warmest 100 prediction will be and why.
David Quach: It was reported this week that a Brisbane-based online marketer collected 35,081 votes posted on Facebook and Twitter to predict the Triple J Hottest 100. According to the online marketer Nick Drewe, the list of predictions, the Warmest 100, is an accurate prediction of the Hottest 100. But only 2.7% of the expected total number of votes has been collected, so just how accurate is their prediction?
I performed statistical analysis on the data set behind the Warmest 100, kindly provided by Nick Drewe, and found the predictions will be surprisingly accurate. The Warmest 100 will predict 90-95 of the songs that will be on the Hottest 100, according to my statistical analysis of the raw data. In addition, the number one song predicted by the Warmest 100, has an 83% chance of being correct.
Despite accurately predicting a large majority of the songs in the Hottest 100, the order of the songs will almost certainly be different. The probability that the Warmest 100 correctly predicts the top 10 songs in the correct order is 2%, while there is an almost 0% chance that all 100 songs are predicted in the correct order.
Interestingly, the accuracy of predictions near the top of the list is high. The probability that the Warmest 100 correctly predicts 9 or more of the top 10 songs, not regarding the order the songs are in, is 99%. The probability that the number 1 song on the Hottest 100 is either the number 1 or number 2 song predicted by the Warmest 100 is almost 100%.
There was also money that could have been made from this analysis. Yesterday an online betting agency was paying $5 for a $1 bet that the Warmest 100 would correctly predict 1-19 songs in the Hottest 100 in the right placing and the same odds for if the Warmest 100 correctly predicted 20-39 songs. The only problem was that my analysis was showing that about 96% of the time 1-19 songs would be correct and the other 4% of the time 20-39 songs would be correct.
If I had put $100 on both bets I would have won back $500 with near certainty. This was as good as free money. However, in a timely manner, the betting website suspended betting on this event.
The statistical technique that I used is called ‘bootstrapping’. Bootstrapping is a statistic technique used to estimate the accuracy of sample data.
In the bootstrapping analysis, I assumed that the sample data of 35,080 votes is the same as the population data. This assumption is reasonable if the data is a random sample of the population data. Then I simulated 20,000 different samples. For each sample, there were 3508 voters. Like the voting rules in the Hottest 100, each voter chooses 10 songs (drawn randomly from the population data) and cannot vote for the same song twice.
20,000 different samples are simulated in order to calculate how many of these samples match up with the population data.
Like any statistical analysis, the results are only as good as its data and assumptions. So come Saturday, you can still enjoy a beer and a snag in the comfort that the Hottest 100 countdown is not completely spoiled. Just don’t be too surprised if you’ve heard many of the songs already played on the Warmest 100.
David Quach is an economist and software developer currently residing in Chicago