Data Protection Bill Series: The importance of defining personal data

At the crux of any data protection law are definitions of personal data and sensitive personal data.

Editor's note: The Data Protection Bill series carefully examines the various sections of the draft Personal Data Protection Bill, 2018 as laid down by the Justice Srikrishna Commission and submitted to MEITY for approval. This is Part III of the series.

At the crux of any data protection law are the definitions of personal data and sensitive personal data, which define the material scope of the law. Personal Data Protection Bill, 2018 presents quite a comprehensive approach to definitions used and contains many welcome inclusions to the category of sensitive personal data, such as of data on religious and political beliefs. The adequacy of a list-based approach to the definition of sensitive personal data is one issue that needs consideration.

Representational image.

Representational image.

Normally, the broad definitions and thus the greater protections accorded to this data are extremely welcome. In this Bill, however, the introduction of data mirroring and data localization norms leads to a viewing of this nature of these definitions in a new light. As a result, there will definitely be major issues created for companies in terms of identification and separation of this data, and thereafter with compliance with the requirements of the law.

Personal data on an identifiable natural person

For this, first, the definitions of personal data and sensitive personal data needs to be looked at. The Bill defines personal data under Section 3(29) to include data about or relating to a natural person, who is directly / indirectly identifiable. The data may be with regard to any characteristic, trait, attribute, or other feature of the identity of a person. It may further involve a combination of these features or a combination of features with other information.

This definition is broad enough to include a wide range of data, including typical personal data like names, images and phone numbers. Data related to cookies, IP addresses and other online identifiers will also be included as data amounting to personal data in combination with other information. The definition further, makes no difference between whether the data is true or incorrect, and instead broadly includes ‘data relating to a natural person’. Thus, even opinion, such as assessments made or a credit score, which are not fact, would be included.

Data mirroring for personal data

Since the release of the Bill, the data localization norms, in particular, sparked concerns all over. Section 40(1) of the Bill imposes data mirroring requirements on personal data, with at least one copy of all personal data to be kept on a server in India.

Be it data from cookies, Google searches, smartphone data like sleep patterns, Facebook likes and clicks, and so on, will normally be personal data.

Be it data from cookies, Google searches, smartphone data like sleep patterns, Facebook likes and clicks, and so on, will normally be personal data.

This is an immense requirement for companies since in the internet age, personal data is produced in huge volumes. Each type of data discussed in this article, be it data from cookies, Google searches, smartphone data like sleep patterns, Facebook likes and clicks, and so on, will normally be personal data, unless it is anonymized. Under the data mirroring requirements, a copy of all of this data will be required to be stored in India.

Adequacy of a list-based definition of sensitive personal data

Turning to the definition of sensitive personal data, a list-based approach, similar to that under the Information Technology (Sensitive Personal Data) Rules, has been adopted under Section 3(35). The new list includes passwords, financial data, health data, official identifiers issued by the government, sex life, sexual orientation, biometric data, genetic data, transgender status, intersex status, caste or tribe, religious or political beliefs, and any other category of data specified. While comprehensive at present, the list-based approach does generate concerns on whether this will stand the test of time. Consider the earlier list-based definition under the IT(SPDI) Rules, which very soon became inadequate.

Further, advancement in technology could create newer forms of sensitive data, which may not fit directly into these categories. For instance, consider sentiment analysis, which can be used for several purposes including recruitment purposes, and is a big part of the analysis to be done by the proposed Social Media Communication Hub. This would not fit into the current list. Consider also data like Google search data or data collected from the internet of things (IoT), which while not sensitive personal data, have immense potential of what they could reveal via Big Data Analytics.

Applicability to financial data like transaction histories

Several related definitions have also been introduced. Financial data, for instance, includes not only account numbers, credit card numbers and the like, but also ‘any personal data regarding the relationship between a financial institution and the data principal’. While this specifically includes financial status and credit histories, it will also include data like transaction histories. The broader definition would deal with instances like the use of this data to prepare credit scores and grant loans by wallet companies like PayTM and MobiKwik.

Health data, health records, and health apps

Health data also includes data relating to the physical or mental health of a person and includes records regarding the past, present or future state of health and data collected in relation to the provision of health services. While the term ‘health services’ is undefined, the inclusion of physical or mental health and health records, in general, makes the definition broad enough to include data from even smartphone apps which track health data. Some ambiguity remains with data like the record of sleep patterns on smartphones, or the retention by Google of search records made, which may include health-related searches.

Section 106 allows the government to identify certain forms of biometric data, the processing of which will not be allowed except in accordance with the law. Reuters.

Section 106 allows the government to identify certain forms of biometric data, the processing of which will not be allowed except in accordance with the law. Reuters.

Biometric data and bar on its processing

Biometric data has been defined to include facial images, fingerprints, iris scans, or any other similar personal data, based on technical processes and other measurements, which allow the unique identification of that person. This would include, for example, authentication mechanisms like the use of facial recognition and even heartbeats. Section 106 allows the government to identify certain forms of biometric data, the processing of which will not be allowed except in accordance with the law.

Explicit consent required for sensitive data

The inclusion of religious and political beliefs and caste is another welcome addition. The specification of explicit consent under Section 18 for processing of sensitive personal data, further implies that technically, Cambridge Analytica would need explicit consent before it can extract political affiliation and other related data from Facebook data.

Data Localization norms under the Bill

Section 40(2) allows the Central Government to specify categories of ‘critical personal data’ that cannot be taken outside India, thus imposing data localization norms. Noting that all sensitive personal data is not automatically subject to this requirement, there is ambiguity as to what will be categorised as ‘critical personal data’ for these requirements.

Justifications offered for data localisation

The justification offered for these requirements by the Justice Shri Krishna Committee’s Report accompanying the Bill include enabling better enforcement (for example, Google reports compliance with about 50 percent of data requests), contributing to an AI ecosystem and preventing foreign surveillance. Another justification offered is the vulnerability to cross-border transfer of data via undersea cables. The sabotage of these, the report says, has the potential to lead to economic turmoil and civil disorder.

Will defining ‘critical personal data’ really ensure security?

The Report mentions data like the Aadhaar number, genetic data, biometric data, health data, etc., as among those which could be identified as critical data. However, in the age of big data and IoT, a large amount of crucial data can be derived from seemingly innocuous data. For instance, if the intention is to prohibit the transfer of biometric data, then every single image would have to be prohibited from being transferred abroad since facial recognition data and even fingerprints can be extracted from normal images.

Consider also the SmeshApp incident of 2016, where location data derived from an app installed in the phones of army personnel revealed crucial information such as troop movements and counter-terrorism moves of the army. While location data is clearly personal data under the Bill, it is not sensitive personal data. The Report recommends including location data as sensitive personal data but does not clarify if this is critical personal data. If identified as such, this could put a halt to a service like Google maps.

Will prohibiting critical data transfer prevent surveillance?

Further, considering foreign surveillance, how far will the prohibition of transfer of specific types of data prevent this. For instance, consider data like the content of e-mails and messages. This would be a crucial source of information for surveillance purposes. Given the ambiguity of the status of such content, which may be personal data, may not be personal data if separated from the sender and receiver of the message or otherwise anonymized, and may be sensitive personal data based on the content of the message, how would its surveillance be prevented this way?

Looking at the Cambridge Analytica incident again, highly sensitive data like political affiliations could be derived from something as obscure as Facebook likes and clicks. The concerns expressed by the Supreme Court with the proposed Social Media Communications Hub are founded on this — the amount of surveillance that can be done using data from social media sites. The same concerns arise even if this were to be restricted to publicly available data. In other words, how far will prohibiting the transfer of ‘critical’ data prevent surveillance?

The Bill on publicly available data

The last issue to be considered here is the Bill’s position on publicly available data. This makes a mention in Section 17, where the Data Protection Authority created under the Act is allowed to allow non-consensual processing of publicly available data for ‘reasonable purposes’. It is discussed in more detail in the Report, with reference to the Right to be Forgotten (which will be discussed in a later part of this series) and with respect to Section 17.

Here, the Report notes that the American concept of allowing the free processing of publicly available information will not suit the internet age. The Report further notes that while a person will have a lower expectation of privacy with respect to personal data made publicly available, he may not expect that the government and private persons may process this and profile it for various purposes. Based on this, the Report directs the Authority to whitelist purposes for which the processing of personal data would be allowed. Apart from this, the Report does not further clarify what purposes would be so whitelisted.

This is a particular concern, given the proposed Social Media Communication Hub, the recently proposed social media monitoring by the UIDAI, and the proposed National AI Marketplace. The Report’s discussion of requiring data mirroring of personal data for use in the development of AI creates a concern that personal data will form a major part of the National AI Marketplace.

The next part of the series will examine ‘personal data’ and other important definitions laid down under the Bill. You can read the past parts of the series:

Part I: Quick overview of India's draft data protection law

Part II: Understanding jurisdiction within and outside the country

The author is a lawyer specialising in technology, privacy, and cyber laws. She is also a certified information privacy professional.

Find latest and upcoming tech gadgets online on Tech2 Gadgets. Get technology news, gadgets reviews & ratings. Popular gadgets including laptop, tablet and mobile specifications, features, prices, comparison.