Since the early 2010s, and even more so in the wake of the current COVID-19 pandemic, learning and learning environments are being digitalized. This growing development leads to a “boom” in the data collected from the various learning platforms. Thus, the current interest in learning analytics is only a reflection of what Big Data is also experiencing. The definition of Big Data has three factors: volume, velocity, and variety. Big Data allows us to understand the uses and practices of individuals, but also to design theoretical and behavioral models and predict trends. In this way, Big Data has both immediate and long-term uses, granting real-time access to the needs and habits of individuals while allowing the creation of predictive models.
The massive amount of educational data, the continuous increase in processing power, and the motivation to deduce new information from available data explain why learning analytics is of utmost importance.
Learning analytics, according to Siemens and Baker in 2012 and to Fischer, Hmelo-Silver, Goldman, and Reimann in 2018, is defined as “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs.” It is a tool for understanding the learning processes and context allowing discovery and prediction. We can also use it to improve the learning processes in digital environments.
This emerging economy is just the tip of the iceberg of Big Data. It is of particular interest to companies and large commercial groups looking for a better understanding of the habits and practices of learners when acting as consumers. The goal is to target better and optimize commercial offers and predict future trends. According to a study conducted among higher education teachers by the Center for Digital Education, the main benefits of Big Data analytics in education would be tracking and predicting a student’s performance (69%), increasing graduation rates (61%), making real-time adjustments to curricula (47%), measuring the institutional performance of the school (44%), and uncovering potential flaws in the administration through analytics (22%).
This is a set of issues that must be addressed now, to avoid being caught up in the speed of technological development and being overwhelmed by events.
Learning analytics are computer, mathematical and statistical techniques that enable the detection of relevant information from a massive set of data. Therefore, the terms “analytics” should be interpreted as an action carried out for understanding and even predicting future actions in the pursuit of ever greater performance and efficiency. This method is very close to data science and is equivalent to the principle of data mining.
Theubjectt of learning analytics differs from the two previous scientific methods because it is multidisciplinary and works for the design of computer environments favorable to human learning.
The reasons why scientists are particularly interested in learning analytics are:
Learning analytics allows the tracking of data produced during learning activities. These data can be the exam results, participation rates during activities and more generally from attendance rates, end-of-year exams and diplomas obtained. These are the data traditionally collected by teachers. However, the new digital environment is progressively becoming a place where data are collected, such as learning platforms and personal spaces on the intranet. These new ecosystems provide a considerable amount of data on the behavior of learners, from the connection to the various fingerprints left by them, which are automatically stored by the computer server before being processed by algorithms capable of analyzing and establishing the user’s profile.
All these reasons make it all more necessary to recall the indispensable characteristics that learning analytics must meet:
Subsequent to these different characteristics, three major functions can be attached to learning analytics: the descriptive function, the predictive function and the prescriptive function. Each of these functions helps on explaining why digital learning analytics goes beyond the classical field of education sciences.
A) The descriptive function
The descriptive function describes the specific context in which learners evolve. This extends from the conditions to the expectations of the learners during their learning sessions while taking into consideration the courses, the available resources, and the tasks to be carried out in the digital environment. In other words, it is about the characteristics of the learning and the learning strategies adopted and the possibility of varying them.
In this analysis, three elements are used to predict the success, failure, or dropout of the learner (one of the goals of learning analytics):
B) The predictive function
The predictive function focuses on the trends identified from the context and situations of the learners. These trends are then refined by machine learning. Thanks to this system, which is continuously fed with new data, it is possible to influence and predict the success of learners.
C) The prescriptive function
The prescriptive function is about the expected effect to the situation described by the two previous functions. In other words, it proposes specific recommendations that can support and guide the learners towards their objectives. The advice offered must be based on past experiences and be reliable.
A) The desire to predict the progress of learners
Thanks to real-time access to learning data, researchers have developed analysis techniques which are very similar to data mining. They are allowing a classification of profiles. These classifications include the learner’s abilities, the adaptability of the content, the level of deployment of engagement strategies and the fight against dropout.
B) The necessity to measure social interactions
Social learning analytics is based on the analysis of social networks and the communities that exist within them, the subsets created by these networks but also their proximity, the frequency of exchanges, their density and the affinities that emerge. From these observations, it is possible to identify the accompanying persons and thus to characterize the links between the different actors.
The study of social networks presents a new and vast scope for deepening social learning. They allow us to considerably broaden the techniques for measuring and representing biometric data, thanks to the recurrent use of cameras that allow us to examine the synchronization of physical postures. In line with these developments, it is quite relevant to imagine that tomorrow, it will be possible to analyze new indicators such as voice or heart rate to measure the intensity of social interactions.
C) The need to analyze discourse
Language partly conditions the success of activities, the use of tools and the success of joint work. This is why speech analysis should be taken into account by learning analytics, since it makes the individual’s place in the group visible and therefore improves it.
D) The importance of visualizing learning: an aid to decision-making
Learning analytics must be perceived as tools that serve and support the regulation and self-regulation of learning. This necessarily leads to awareness based on the information made available, the self questioning of the learner, the adaptability of learning strategies, the production of regulation and improvement paths. Learning analytics have a wide range of advantages, from the objectivity of the elements (decisions, results and behaviors) to the speed and regularity of the feedback available to both learners and teachers.
Learning analytics are the result of a cycle of tracks passing through different stages such as collection and processing. These steps undeniably raise questions about the protection of personal data, which are closely linked to the respect of privacy. Even data that appears to be “insignificant” can affect the privacy of individuals. Moreover, consent to data collection, processing and re-use by online services for specific, distinct purposes presented to the user rarely allows control over the information that will potentially be produced by the automated aggregation of applications present on websites, applications, etc.
The General Regulation on the Protection of Personal Data (GDPR), which was adopted in 2016 and came into force on May 25, 2018, tends to slow down the development of learning analytics. All the processing undertaken by the software using learning analytics must be lawful and justified. The context is very important, because depending on it, the controller and/or its processor will have to justify the lawfulness of the processing.
Today, data protection is an integral part of the learning analytics research process.
In the case of a learning analytics development project, the specificities of each institutional project will have to be studied. In most cases, the approach will at least be confronted with the GDPR whenever there is large-scale data collection, data crossing or innovative use of said data. This can be explained by the large number of learners who allow the creation of personalized interfaces and standard profiles. It is important to know that digital learning analytics also process administrative data (registration…), performance data (exam results…), and usage data of the learning resources offered by the institution in question. On top of that, it is rare for the data to be stored in a single database.
For all these reasons, it is essential to conduct a privacy impact analysis beforehand, and to submit it to the institution’s data protection officer for approval.
It appears that data exploitation is central to the digital learning analytics approach. Data are therefore a major issue for this emerging discipline. The objective remains the improvement of the quality of learning without increasing the workload of teachers and other supervisory staff. Maximum automation is also an essential element of this data processing because it allows rapid decision-making by those involved in the learning process. Of course, automation must be compliant with the GDPR as well.
Some researchers have proposed adding the “DELICATE Checklist” model to this first layer of regulation, to allow learning analytics practitioners to reflect on its privacy aspects. Simply put, the point is to continue placing the human being at the center of concerns while being transparent and open. This model is not yet applied.
Even if the objective of learning analytics is rather benevolent towards the learner, their conditions of use can raise many questions.
Indeed, despite the principles of the GDPR, which tend to constrain learning analytics, it is difficult to determine in advance which information about the environment and the results is likely to be relevant for the study. The very activity of learning analytics is therefore part of an exploratory approach allowing for a posteriori modeling. This means that digital learning analytics does not seek to validate theoretical hypotheses, but rather conducts research to then hypothesize about online learning habits. This approach is diametrically opposed to what is usually done in the humanities and social sciences. It is also hardly compatible with the principle of minimizing data collected under the GDPR, hence why this practice is so criticized.
To further investigate the issue of minimizing data collection, many questions need to be asked such as the relevance of the processed data. Is it limited to learning data? Is it only academic data, or also social data (scholarship student or employee…) or even psychological data?
In addition to this first difficulty, an official definition for school data is lacking. The perimeter covered by this type of data is thus particularly wide, ranging from administrative information to productions that are handed in, to learning traces. This explains why it is fundamental to organize a real ethical reflection on data collection. Data relevance is measured in terms of its usefulness for its intended purpose.
Another danger which we previously touched on is the combined use of data on learners’ learning with data on their bodily intimacy (fingerprints or facial recognition, etc.). The generalized surveillance of the body through learning analytics is a real problem, especially for people starting higher education while still minors. However, besides this specificity, the collection of physiological data (like Knewton‘s example) at an early age and on people who are likely to forget that these data have ever been collected; this is a risk that ought to be foreseen. It should not be forgotten that the right to erasure exists, which is why it is important for learners to be aware and conscious of the data collected.
Another important aspect that has not been highlighted at all is the possibility offered by learning analytics to preserve anonymity or at least to contribute to data privacy. Anonymization is a major challenge, because the cross-referencing of data makes it is very easy to identify a person without having their name.
A) Permanent surveillance of learners
Learning analytics are a form of permanent surveillance during courses or exams: this obvious fact cannot be contradicted. Indeed, it is possible to know, for each learner, when he/she left the course, if he/she listened to a part of the course several times, etc… Similarly, the use of a camera allows facial recognition (artificial intelligence) and therefore the collection of physiological data, but also the recording of the learner’s emotions. This once again brings up the importance of consent and the need to make learners aware of the collection of their data.
Without going back to the specific case of minors, is consent really free in the educational context? Indeed, it is quite legitimate to wonder whether, in the educational context, the student can be considered as validly giving his consent given the authority exercised over him by the teacher who proposes the use of such a tool. As a reminder, Article 4 of the GDPR states that consent is a “free, specific, informed and unambiguous expression of will by which the data subject signifies his or her agreement, by means of a declaration or a clear positive act, to personal data relating to him or her being processed”. Is this really respected when there is a form of authority weighing on the signatory, i.e. the learner who is going to be evaluated by said teacher?
Notwithstanding this possible “pressure” to consent to the collection of data in order to use the tool submitted by the teacher, another question arises. Is the learning part of a public interest mission?
B) Lack of consideration for personality
Learning analytics are used to obtain information on the learning of learners and therefore allow them to be classified in relation to each other. Thus, learners are quantified or rather their personality is measured to obtain the best performance. In other words, learning analytics undermine the very essence of the learner’s personality. In fact, by associating learning platforms with sensors to analyze learners’ reactions ever more finely, we can offer support that is as close as possible to the needs of each individual in order to set up personalized learning.
The expected benefits are in line with the “quantified self” phenomenon, which was studied in 2013 by the French Supervisory Authority (CNIL) in its study “The body, the new connected object”. By dint of quantifying learners’ performances, is it possible that learners develop a form of compulsory and anxiety-provoking normalism? Similarly, in the opposite case, isn’t the search for personalization of learning to improve the learner’s performance likely to turn into an obsession? Especially since success is not only reflected in results and grades.
In any case, learning analytics must not transform learners into learning “objects”; they must remain a service. In other words, the massification of analyzed data and objectification should not contribute to the emergence of a new discipline. In the same way, the learner should not tend to become a simple passive producer of data; the relevance of the data should remain a priority.