Govt jobs data raises stink: Garbage in, garbage out

Do India's official number-crunchers know their oats? Or are the people who analyse the data numerically-challenged?

The recent release of employment data by the National Sample Survey Office (NSSO) left everyone scratching their heads.

After six years of UPA spending on aam aadmi schemes, including the National Rural Employment Guarantee Act (NREGA), the last thing one expected was to see the aam aadmi opting out of work.

Is the India job math correct? Sherwin Crasto/Reuters

But this is what the data shows. India's labour force participation rate (LFPR)-which is the sum of those employed and those willing to be employed -dropped between 2004-05 and 2009-10 from 43% to 40%. And this, when the unemployment rate had also fallen.

Not surprisingly, the same numbers led to opposing conclusions. Jobless growth, pronounced one Mint commentator. Business Standard said the exact opposite: "Data debunks jobless growth theory", the paper headlined. Who is right? Maybe both? The Economic Times surmised a few days later that the data were either wrong due to poor structuring of the survey's questions, or faultily collected by those who administered it.

Firstpost, which has reserved judgment till the detailed numbers are made available by the NSSO, has taken the view that there's more to the data than meets the eye. The only way the drop in the LFPR can be squared with the drop in unemployment is by assuming that more people are dropping out of the job market, since the LFPR measures both those with jobs and those actively seeking it.

So what's going wrong with data? Is it about poor collection, shoddy implementation, lousy calculations, or something else? The Economic Times talked to two sarkari statisticians-one past and the other present-and came up with two different answers.

While Pronab Sen, former Chief Statistician and currently Principal Advisor to the Planning Commission, told the newspaper the design of the survey itself was faulty, the current incumbent in the Chief Statistician's chair, TCA Anant, defended his boys.

Sen thinks the problem lies in the way the survey's questions are asked, and how many questions need follow-up questions (For example, if you ask someone if he is employed, and he says he has a temporary job, you have to ask more questions to get more accurate data).

Anant, however, thinks the devil is in the aggregate, not in the detail. "Once you disaggregate the LFPR data into women, children...and other categories, the fall in labour force makes a lot of sense."

What he means is that the drop in the labour participation rate has much to do with the reduction in child employment and also the decision of many women to opt out of the workforce when family incomes rise. Firstpost has reported this part of the argument.

But employment is not the only area plagued by dubious data. As Business Standard points out, the NSSO has been showing total organised sector employment at around 30 million for decades, but the Employees Provident Fund Organisation (EPFO)-which is only for organised sector employees-has nearly 45 million accounts.

Even accounting for the fact that people changing jobs may have two accounts or more, it is clear that the organised sector employs more people than what the NSSO data show.

Thanks to the confusion, Business Standard takes the high road and pooh-poohs the government's data. It disagrees with Sen that the NSSO survey is merely poorly administered. The problem is more fundamental. "The answer lies not in its sampling or quality of data collection; It simply does not ask the obvious questions." In short, the NSSO needs to devise another questionnaire.

Controversies have raged around the Index of Industrial Production (which The Financial Express says it can't trust) and also the Wholesale Prices Index, with figures for the latter being constantly revised. Firstpost has been talking about double-digit inflation long before the actual numbers were reported.

Similar controversies surround data relating to poverty, with the poverty-wallahs and NGOs always damning the official figures, saying they are a gross underestimate. But Surjit Bhalla, Chairman of an emerging market advisory and fund management firm Oxus Investments, believes there is dishonesty in the way people look at figures - especially poverty figures.

Writing in The Indian Express, he points out how world poverty estimates have remained virtually unchanged despite enormous growth in incomes.

He says: "Curiously, after each decade of high growth, the proportion of World Bank poor has stayed constant at around 25 to 30%. In 1987, according to its calculations, 28.7% of the world's population was absolutely poor; in 2005, the proportion of world poor: 25.2%. During the same period, per capita incomes in the entire developing world nearly doubled! Yet poverty stayed the same? How come?"

To find out, the government is launching yet another questionnaire. It has devised a new poverty measurement scheme - the first-ever census of people below the poverty line (BPL), which was kicked off on Wednesday.

But how are we to know whether it will capture the right data?

This survey will include you as a BPL case not by asking you about your income or consumption, but by measuring your ownership of things or the jobs held by your family members. The exclusion criteria include: ownership of any motorised vehicle, refrigerators, landline phones (but not mobiles, which are apparently okay for BPL cases), mechanised farm equipment, and kisan credit cards with credit limits above Rs 50,000, et al. Land ownership linked to levels of irrigation available and having a family member working in government or in public sector organisations will also get you excluded from the BPL list.

So far, so good. But once again, what looks good on paper may not pan out exactly as planned. How is the surveyor going to find out if my brother works with the government or not? Or even if I own a Bajaj Pulsar? I can simply deny it, if my purpose is to obtain BPL benefits.

Seems like Sen, Anant and the newspaper pundits all have a point. Just like the Blind Men and the Elephant. You can't figure out what the creature is just by putting your arm around it.

Sensitive social sector data need extensive revalidation and cross-verification with other sources of data before it can even be considered fit for use. It's good when you have nothing else, but otherwise, the gi-go principle applies.

Garbage in, garbage out.