By Thilak A
Most pollsters believe that since they correctly picked NDA as the winner, their forecasts are much more accurate than 2004 and 2009. Is that assumption right?
Let’s take a look at some data for exit polls to actual seat variances from 2004, 2009 and 2014.
As is evident, by the table below, exit polls tend to over or underestimate seats by 60 to 40 for the winner or loser.
Merely getting a result directionally right doesn’t make it statistically accurate. The variances have to be as less as possible.
It is well understood that India is the toughest place in the world to get polling right because of multi-cornered fights, wide disparities in languages, incomes, cultures and various other factors.
But, that is life and a well-functioning polling mechanism serves an important purpose for society. We can’t just throw up our hands after getting vote share predictions (even if we get them right) and ignore seat projections. After all, seat projections are what people care about.
The first step in being a better forecaster is to have a strong measurement mechanism to know how wrong you have been. For example, when Netflix announced its prize, they judged the predictive accuracy of various forecasts by the Root Mean Square error method.
Do we need a similar method to judge the accuracy of polls?
Just as the Congress is going through an “introspection” phase right now, so should pollsters, through a rigorous evaluation of their forecasts, data and underlying processes.
However, the golden boy of the moment is Today’s Chanakya whose 340 seat exit poll forecast for NDA was within a hair’s breadth of the actual result. The forecast is considered to be as epic as Modi’s victory itself.
We tried to understand what drove the accuracy of the forecast. We took a look at the Vozag exit polling database and were surprised with the results.
But, before we jump in, let’s go through a basic primer on how the forecasting process works.
Any forecasting project starts with a survey on how people are going to vote based on which the vote share is computed. This is the “science” part of the process and includes survey questions, sample design, respondent rates and other factors.
The second part takes the vote share output from the survey and converts them into seat projections based on some science (not all of which works), a little bit of art and a good dose of black magic.
Now, given this background, let’s see how Today’s Chanakya differs from CNN-IBN-Lokniti & NDTV-Hansa who gave a median forecast. CNN-IBN-Lokniti’s India wide projections for NDA were 276 seats which underestimated their performance by 61 seats. Similarly, NDTV-Hansa’s projections were 279 which ended up having a difference of 58 seats.
Just one state, Uttar Pradesh, accounted for a big portion of the variance. CNN-IBN’s forecast in Uttar Pradesh had a difference of 24 seats (40 percent of the total national variance) and NDTV-Hansa’s variance was 17 seats (30 percent of the total variance).
We analysed the vote share variance with the seat share variance for Uttar Pradesh in the below table.
As you can see from the table, Chanakya significantly underestimated BJP+’s vote share in UP by 9.3 percent.
In fact, even though both, CNN-IBN and NDTV, underestimated BJP+’s vote share by 3.3 percent, they were way closer to the final number than Chanakya. Yet, despite this underestimation, Chanakya magically ended up with a much higher forecast, than CNN-IBN or NDTV and much closer to the final seat projection figure than either of them.
Thilak A is the product manager and senior data analyst at Vozag.com, a quantitative & data analytics research helping consumers make better decisions.
Your guide to the latest seat tally, live updates, analysis and list of winners for Lok Sabha Elections 2019 on firstpost.com/elections. Follow us on Twitter and Instagram or like our Instagram or like our Facebook page for updates from all 542 constituencies on counting day of the general elections.
Updated Date: May 20, 2014 11:55:14 IST