Bikram SenguptaMar 26, 2018 17:41:45 IST
The education industry is set to be transformed through digitisation and the application of cognitive computing.
Briefly, digitisation is leading to the creation of enormous data sets, whose volume, velocity and variety are rapidly outpacing the human ability to analyse or make sense of it. Educational institutions who have embraced digitisation are often found “drowning in data, but starving for insights”. Cognitive computing — which can analyse data (including unstructured data) at scale, extract insights and interact naturally with human stakeholders — can drive higher learning engagement through personalisation and improve decision making, thereby leading to better educational outcomes.
But what are these kinds of data one may find in education? And, what insights and decision-making can result from it? In this article, we will try to shed some light on these questions.
There are three broad categories of data one finds in educational settings: data about individuals (eg students); content (eg related to learning or assessments) and interactions (eg between students, or between students and content). We will take a look at each category and consider how cognitive technologies can help extract value from it. Finally, we will briefly discuss how educational institutions should prepare themselves for the cognitive era.
This is mostly data about learners, although it can include data about teachers or instructors as well (such as their academic credentials, certifications and experience levels). As learners go through their educational journey, an enormous amount of data is collected and stored in school records, learning management systems (LMS) and other applications. This is largely structured data but can include unstructured and semi-structured data as well.
• demographic data such as age, gender, socio-economic indicators, special needs indicators (eg disability), educational and occupational data on parents or guardians
• data on student attendance and behaviour (eg disciplinary actions are taken); notes recorded by teachers documenting various observations about a student (related to learning or behaviour)
• academic performance, such as scores obtained in homework and formative assessments, as well performance in standardised summative assessments at regional or national levels
In many cases, a collection of such data through longitudinal systems and its use towards improving instruction are being encouraged through competitive grants such as Race to the Top, created by the United States Department of Education. Globally, many countries are implementing such systems, although at different stages of maturity and integration.
Descriptive, predictive and prescriptive analysis
Once such data is available to an analytical engine, the immediate benefit is the increased visibility it can provide on how students are doing, and interesting relationships it can draw from the data. Descriptive analytics, that tries to answer the question, “What has happened?” can aggregate and mine the data to report results that vary from standard but useful statistics to non-obvious insights. These can spur an institution to take corrective action when needed.
For example, “John consistently scored the lowest in History through the course” might lead to John being placed in a special class on History, “On average, Section A scored 20 percent more than section B in Mathematics” might lead to a review of the performance of Section B’s Mathematics teacher, or “80 percent of students in Grade 1 who scored poorly in English also struggled in Mathematics” can underscore the need to improve the students’ language skills so that they are better equipped to tackle word problems.
While useful, descriptive analytics is limited by the fact that they provide various perspectives on what has already happened. The corrective actions that stem from it can lead to some improvements in the future but are essentially reactive in nature. In many cases, one is left wondering what could have been done to prevent an undesirable outcome from arising in the first place.
Are there signals that could have been picked up from past data to predict such an outcome, with enough time left to take early and proactive corrective action? When we consider high-cost outcomes such as failure in key tests, or drop-out from school, the potential benefits of being able to predict, correct and avoid these can be enormous.
This is where predictive analytics come in. Historical data can be used to train machine learning systems to recognise patterns that lead students towards undesirable educational outcomes, and give early warning of “what could happen”.
Going forward, with the proliferation of IoT devices on campus, other kinds of student data can potentially become available – for example, data on a student’s location, movement or health can come from sensors on their personal devices such as smartphones, tablets or wellness bands. Such data can have a variety of uses – from a deeper personalisation of learning experience based on a more holistic and continuous view of a student to applications that inform and optimise resource usage, logistics and security on campus.
Learning Content and Assessments
This data is highly heterogeneous in its form, involving a variety of text, audio, image or video content, assembled in creative ways to deliver engaging learning experiences. It includes formal, expert-created content (eg digital textbooks, instructor lectures) as well as an ever-growing body of user-generated and often informal content (eg student notes, blogs, e-portfolios). Assessments — from multiple-choice, short-answer to essay-type questions or problems, as well as student responses — also belong here. So, in short, all content arising from teaching, learning and assessing come under this category.
Finding the right content
With the explosive growth of digital content, managing its scale, heterogeneity and complexity — and being able to derive actionable insights from it — can quickly overwhelm even digitally savvy institutions.
Managing and making sense of unstructured data has been the biggest driver for the growth and advancement of cognitive computing in recent years. A plethora of techniques in machine learning, natural language processing, speech recognition and computer vision have been developed to help extract structure, meaning, relationships and eventually value from unstructured data. These techniques can be employed to address many of the pressing needs of teachers and students.
For example, using natural language processing techniques, one can enrich curriculum instruction text and map instructions and content to a structured semantic domain where their relationships become quantifiable in terms of proximity in that domain. This leads to automatic linking of learning content with curriculum instructions – essentially resulting in a digital library with automatic cataloguing capabilities where every new resource uploaded becomes easily available to instructors looking for help with teaching specific instructions. Besides extracting semantics, natural language processing techniques can also be used to infer other characteristics of a resource that can assist its use – for example, its readability, complexity or alignment to the Bloom’s taxonomy.
When a learning resource contains audio, speech to text services can be used to extract natural language transcripts for analysis. When a resource includes images or video, visual recognition, as well as video analytics techniques, can be leveraged to recognise characters, identify objects or detect motion – thereby extracting semantics and structure that can make these learning resources more discoverable. An instructor looking for a learning resource to teach gravity can then not only search for “introduction to gravity” (thereby expressing a preference for basic rather than advanced materials on gravity), but can additionally indicate an interest in videos that involve the use of an apple’s motion to explain the concept (maybe inspired by stories on how gravity is said to have been discovered!).
Of course, such technology is not yet fully mature or available in the tools we use daily, but research is making significant advancements in these areas and with the data available to train and improve machine learning systems growing daily, these experiences may become commonplace in the not-too-distant future.
Besides making content more discoverable, there are many other exciting applications of content analytics in learning – in fact too many to list, but to give a couple of examples, the extraction of concept maps and summaries from text (imagine how helpful these would be for students pressed for time before an exam!) and the segmentation of large, legacy content into shorter, more cohesive modules that can support the agile learning that today’s learners need.
Another area that can get truly transformed through cognitive computing is assessments. This is a highly human-effort-intensive domain that often suffers from lack of experience, subjectivity and bias. So far, the only aspect that could be automated at-scale (beyond the evaluation of answers to multiple-choice questions) is adaptive testing. This is the technology behind much of the computer-assisted testing we experience and relies on estimating the difficulty level of a question from how students have fared in the past, and then using this knowledge to pose a more difficult (or easier) question to a learner if her answer to the previous question is correct (or incorrect).
Research has made progress in the areas of essay grading and short answer scoring – however, given the current levels of technology maturity as well as the high stakes that are often involved, this may be an area where we employ technology to play an assistive role in helping human graders better handle scale and variability, than aim for complete automation.
Finally, technology can also play an important role in the generation of assessments from content. While some progress has been made on the automated generation of very simple questions from text, human oversight is still needed to select meaningful questions and filter out the rest. Generation of more intelligent questions that probe deeper levels of understanding is an important area for the research community to focus on.
Student data and learning content can offer valuable insights, but a new, exciting stream of data arises when students start interacting with content. This data, which we call interaction data, details the student’s journey through the content – for example, when did she launch a learning video, how long did she watch it, where did she pause, go backwards or forward and so on. The reason why such data is interesting is because it can provide novel insights on both the learner and the content.
For example, a student pausing at a point in a video multiple times and rewinding may indicate some difficulty in understanding the related areas of content but also active engagement, which might suggest that showing her additional materials that explain those concepts better would help. On the other hand, when many students display such a struggle in a similar area of content, the problem may lie in the content itself – so the content creators can benefit from such “live” feedback and deliberate on ways to make the content more accessible.
Social aspects of learning
Besides learning from content, students learn a lot from teachers and from each other. This social nature of learning has led most courses these days to support online discussion forums where students and teachers can engage in conversations on the subject, beyond what is possible in class. This leads to another kind of interaction data that is human-to-human, but whose analysis can also reveal many interesting insights.
For example, natural language processing techniques can be used to mine “frequently asked questions” from a discussion forum, which would give instructors insight into the topics that need the most attention. Such questions, and the answers instructors provide can be used to train a virtual assistant that over time, can provide students immediate feedback on questions it has high confidence in while referring other questions to the human instructor. Moreover, sentiment analysis on the student posts, as well as analysis of the social networks that emerge over time in the student population, can provide interesting insights on student behaviour and flow of knowledge, which can enrich the perspectives an instructor develops from other sources such as assessment data or in-class observations.
Preparing for the data (and cognitive) journey
In a large organisation, digitisation often results in a multitude of platforms, systems and applications from a variety of vendors and for diverse management, pedagogical, learning and assessment tasks. This leads to fragmentation of data across multiple vendor-supplied software and hardware, along with security, privacy and operational risks. More importantly, from a cognitive perspective, this can create major roadblocks in the way of harnessing the power of all the data in a consolidated and coordinated manner to maximise the value at an organisational level. Fragmented insights and recommendations may result from this.
As educational institutions gear up for the cognitive journey, they must begin with a conversation with their trusted technology partners to move towards a consolidated, scalable, secure and extensible data foundation based on industry standards. Along with this, it is important to discuss and agree on several important principles and aspects of governance and stewardship. For example:
• Who owns a particular data set? Who else is entitled to access and make use of it? What are acceptable forms of usage?
• For what purposes can cognitive technologies be applied on data? Who owns the models that result from applying such technologies on a dataset? Who owns the insights?
• How will the quality of models and insights/recommendations be measured? What information on the data and training methods need to be shared with stakeholders for them to decide on the usefulness, trustworthiness and relevance of a model or recommendation?
• How can all data and models be made secure? What are possible malicious uses and how can these be detected and prevented?
• What skills would be needed to run and manage the cognitive systems on an ongoing basis? How should training of human personnel be organised, and how can skills be kept current?
The last word
In today’s knowledge-driven economy, data is the “next natural resource”, the “new oil”. However, owning an oil well or a coal field by itself does not lead to economic prosperity. One needs to invest in additional machinery and equipment to extract natural resources and turn them into something useful and valuable. The same is true of data, across all industries, including Education.
Data on students, content and interactions offer tremendous value, but only if educational institutions invest in the platforms and processes necessary to govern these in a consolidated and secure manner. In addition, institutions need to empower themselves with cognitive technologies to be able to analyse all of the data at-scale and derive insights that transform learning experiences and outcomes.
Instead of the overwhelming fear of “drowning in data” educational institutions can then swim with confidence and be in control as they use cognitive insights to navigate a new course that brings competitive advantage to them and lasting value to their key stakeholder — the learners.
The author is a senior technical staff member at IBM Research and senior manager, Cognitive Education and Interactions Department, IBM Research (India)