Publications

Reconciling Big Data and GDPR

Publié le 14 March 2023
big data et données personnelles

The digital age symbolizes an imminent change in society and the daily life of the population and, incidentally, a change in all economic actors. The digital revolution has led to the emergence of new approaches to digital development.

In this context of digital transformation, Big Data is one of the emerging technologies that can generate undeniable competitive advantages.

The CNIL defines Big Data as a “gigantic volume of digital data produced combined with ever-increasing storage capacities and increasingly sophisticated real-time analysis tools”.

The key characteristics of Big Data are:

  • The volume: gigantic volumes of data;
  • The speed: the need for high processing speed coupled with rapid changes in available data;
  • The variety: varied unstructured data, stored by social media (social networks, blogs), exchanged between people (e.g., via email), or organized by data controllers on traditional media (internal databases).

The considerable mass of new data created, stored, transferred, processed and finally transformed into information requires rethinking methods and tools. In other words, dealing with an ever-increasing variety of data requires rethinking the relationship to that data. Data exploitation schemes must be based on adapted technologies and methods. While data creation used to be the result of purposeful action, today it exists without human intervention. Data no longer only characterize individuals, things, facts: they exist as entities, they are an immaterial product.

In this context of massive use of data, it is perceived as a fragile asset to be protected due to its easily alterable character during numerous manipulations (whether automatic or manual transformations). The question that thus arises is to what extent Big Data is articulated with GDPR and with all the principles aimed at protecting personal data provided for by this European regulation.

Two essential uses of this emerging technology concern, on the one hand, predictive justice, and on the other hand, statistics performed by businesses through Big Data Analytics, a tool for strategic decision-making to improve a company’s operational efficiency.

First, the use of Big Data systems greatly expands the capabilities of the statistical systems used by businesses. It also allows for an improvement in their marketing strategy, business performance, operational efficiency as well as the optimization of the services offered.

The “data culture” in companies is based on a quantitative approach to data.

The life cycle of a piece of data can be described in three stages: the management stage, the storage and centralization stage and the use and consumption stage. principles such as the purpose of data processing, proportionality and relevance, a limited retention period, the principle of security and confidentiality and in particular the rights of individuals can result in severe penalties (the amount of fines can be up to 4% of global turnover or 20 million euros, as provided in Articles 83 et seq. of GDPR).

Other risks are related to the concepts of “monetization” of data or its “commoditization”. Companies are aware of the benefits they can gain from collecting as much data as possible on their prospects or customers, with the aim of better segmenting, targeting and understanding their behavior. On the one hand, they can caRRY OUT “monetization of the data”: to manage to collect them and in exchange for the personal data of customers, companies can offer them discounts or preferential rates. On the other hand, we talk about “data commoditization”: to this end, companies are becoming aware that data can give them a competitive advantage on their own activities, but also have a market value for other sectors or activities. As a result, companies are led to change and adapt their “business models” to be able to draw the appropriate benefits.

Among the solutions that allow a framework for the massive use of data by companies, we find the mapping of the existing data flow, that is to say, the identification of strategic data and their use for a specific purpose, corresponding to the company’s objectives. To this end, particular attention will have to be paid to the principle of data minimization: the data must be adequate, relevant and limited to what is necessary for the purposes for which they are processed” (Article 5 of GDPR). This principle will have to be applied not only at the time of collection of personal data but also throughout their life cycle, with the aim of collecting only the data necessary for the purposes originally intended by the data controller. Companies must also implement backup, retention and securityprocesses to provide sufficient protection for data that is essential to the conduct of their business.

Secondly, the use of Big Data in the service of predictive justice has increased significantly in recent years.

Big Data offers the possibility of deeper knowledge of targeted populations and, where appropriate, the construction of predictive models of behavior through the processing of a significant volume of structured or unstructured data via complex analysis algorithms.

Among the examples of tools that can predict a court decision, we can mention Predictice, Predpol, HART, or Compas software. The major risks of this collection and exploitation of data from Big Data are mainly based on: the legality of the data processing carried out, the rights of individuals, the detour of these data for a purpose other than the one initially planned and in particular profiling.

This last risk, which can moreover have significant repercussions with regard to respect for the rights of individuals, has given rise to numerous debates at the international level. Nevertheless, in Europe, a legal regime framing algorithmic decisions, protective for litigants, has been set by GDPR. Article 22 of GDPR provides that “the data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, producing legal effects concerning him or her or significantly affecting him or her in a similar way”.

As such, this article establishes the principle of prohibiting the use of predictive justice alone for any legal decision. However, an exception is provided for: this principle is not applicable if the decision “is authorized by Union law or the law of the Member State to which the controller is subject and which also provides for appropriate measures to safeguard the rights and freedoms and legitimate interests of the data subject”.

Special attention will thus have to be paid to the rights of individuals in the context of this use of Big Data in the service of predictive justice. To this end, anonymization is a technique used in order to reduce the potential risks due to the massive use of personal data.

The anonymizationis, according to the former G29 group, “a technique applied to personal data in order to irreversibly prevent their identification”. The main anonymization techniques are, on the one hand, randomization, which consists of modifying the attributes in a dataset so that they become less specific, and, on the other hand, generalization, which consists of modifying the attributes of datasets so that they are common to a set of individuals. In addition, in order for the anonymizationis considered effective, 3 conditions will need to be met:

  • The impossibility of isolating an individual in the dataset (individualization)
  • The impossibility of linking together distinct datasets about the same individual (correlation)
  • The impossibility of inferring new information about an individual with near certainty (inference).

It is necessary to emphasize, however, that anonymization is not a risk-free technique. The European Data Protection Board (EDPB), in an opinion published on April 10, 2014, points out the existence of certain limitations inherent in these anonymization techniques that will need to be carefully considered by the data controller. The advice given by the EDPB to data controllers in order to avoid the risks inherent in the use of anonymization techniques is to “carefully design the application of an individual technique to the situation at hand and opt for a combination of such techniques to enhance the reliability of the result”.

What other solutions are there to better reconcile Big Data and GDPR?

A good personal data governance by players using Big Data solutions is necessary. In this perspective, the adoption of a set of rules allowing the respect of the principles provided by GDpr by design “Privacy by Design” and by default “Privacy by Default” is required.

Any processing of personal data (collection, recording, modification, organization, storage, consultation, communication) must correspond to a precise, determined purpose, and must be based on one of the six legal bases provided by GDPR.

The upstream consideration of the rules stemming from GDPR would make it possible to have an ethical Big Data system, compliant with the requirements provided for by the European regulation and respectful of the original values of Big Data.

In conclusion, all of these aspects will have to be taken into account and integrated within tools and software used by data controllers. The idea of adopting gooddata governance practices, while maintaining the innovative nature of Big Data solutions, will allow for the monitoring of an effective compliance process and the respect of ethical principles in terms of data protection and information security.

– Georgiana HRISCU

Sources