Nov 2, 2020 · Big data is an essential aspect of innovation which has recently gained major attention from both academics and practitioners. Considering the importance of the education sector, the current tendency is moving towards examining the role of big data in this sector. So far, many studies have been conducted to comprehend the application of big data in different fields for various purposes ... ... Nov 15, 2021 · The term educational big data stems from the rapidly growing educational data development, including students' inherent attributes, learning behavior, and psychological state. Educational big data has many applications that can be used for educational administration, teaching innovation, and research management. ... Nov 1, 2020 · Big Data and Its Research Implications for Higher Education: Cases from UK Higher Education Institutions . Paper presented at the 2015 IIAI 4th International Confress on Advanced Applied ... Jul 20, 2023 · The need for data-driven decision-making primarily motivates interest in analysing Big Data in higher education. Although there has been considerable research on the value of Big Data in higher education, its application to address critical issues within the sector is still limited. This systematic review, conducted in December 2021 and encompassing 75 papers, analysed the applications of Big ... ... Oct 16, 2023 · Research question 1. This study sought to examine education big data and learning analytics. The first finding answers the first research question, which is about the distribution of education big ... ... Jul 1, 2018 · This study provides an in-depth review of Big Data Technology (BDT) advantages, implementations, and challenges in the education sector. BDT plays an essential role in optimizing education ... ... Jul 20, 2023 · Big Data and data science: A critical review of issues for educational research. Brit- ish Journal of Educational Technology , 50 (1), 101–113. https:// doi. org/ 10. 1111/ bjet. 12595 ... Introduction. The purpose of this position paper is to present current status, opportunities, and challenges of big data and AI in education. The work has originated from the opinions and panel discussion minutes of an international conference on big data and AI in education (The International Learning Sciences Forum, 2019), where prominent researchers and experts from different disciplines ... ... Nov 2, 2020 · A systematic review on big data in education is conducted in order to explore the trends, classify the research themes, and highlight the limitations and provide possible future directions in the domain. Big data is an essential aspect of innovation which has recently gained major attention from both academics and practitioners. Considering the importance of the education sector, the current ... ... In the past decade, the applications of big data and learning analytics in education have made significant headways resulting in new opportunities for educational research. However, big data analytics (BDA) has brought new challenges to educational analytics. This paper conducts a systematic data-driven Literature review of BDA in education. Using a topic modeling approach, we have identified ... ... ">
  • Review article
  • Open access
  • Published: 02 November 2020

Big data in education: a state of the art, limitations, and future research directions

  • Maria Ijaz Baig 1 ,
  • Liyana Shuib   ORCID: orcid.org/0000-0002-7907-0671 1 &
  • Elaheh Yadegaridehkordi 1  

International Journal of Educational Technology in Higher Education volume  17 , Article number:  44 ( 2020 ) Cite this article

64k Accesses

96 Citations

35 Altmetric

Metrics details

Big data is an essential aspect of innovation which has recently gained major attention from both academics and practitioners. Considering the importance of the education sector, the current tendency is moving towards examining the role of big data in this sector. So far, many studies have been conducted to comprehend the application of big data in different fields for various purposes. However, a comprehensive review is still lacking in big data in education. Thus, this study aims to conduct a systematic review on big data in education in order to explore the trends, classify the research themes, and highlight the limitations and provide possible future directions in the domain. Following a systematic review procedure, 40 primary studies published from 2014 to 2019 were utilized and related information extracted. The findings showed that there is an increase in the number of studies that address big data in education during the last 2 years. It has been found that the current studies covered four main research themes under big data in education, mainly, learner’s behavior and performance, modelling and educational data warehouse, improvement in the educational system, and integration of big data into the curriculum. Most of the big data educational researches have focused on learner’s behavior and performances. Moreover, this study highlights research limitations and portrays the future directions. This study provides a guideline for future studies and highlights new insights and directions for the successful utilization of big data in education.

Introduction

The world is changing rapidly due to the emergence of innovational technologies (Chae, 2019 ). Currently, a large number of technological devices are used by individuals (Shorfuzzaman, Hossain, Nazir, Muhammad, & Alamri, 2019 ). In every single moment, an enormous amount of data is produced through these devices (ur Rehman et al., 2019 ). In order to cater for this massive data, current technologies and applications are being developed. These technologies and applications are useful for data analysis and storage (Kalaian, Kasim, & Kasim, 2019 ). Now, big data has become a matter of interest for researchers (Anshari, Alas, & Yunus, 2019 ). Researchers are trying to define and characterize big data in different ways (Mikalef, Pappas, Krogstie, & Giannakos, 2018 ).

According to Yassine, Singh, Hossain, and Muhammad ( 2019 ), big data is a large volume of data. However, De Mauro, Greco, and Grimaldi ( 2016 ) referred to it as an informational asset that is characterized by high quantity, speed, and diversity. Moreover, Shahat ( 2019 ) described big data as large data sets that are difficult to process, control or examine in a traditional way. Big data is generally characterized into 3 Vs which are Volume, Variety, and Velocity (Xu & Duan, 2019 ). The volume refers to as a large amount of data or increasing scale of data. The size of big data can be measured in terabytes and petabytes (Herschel & Miori, 2017 ). In order to cater for the large volume of data, high capacity storage systems are required. The variety refers to as a type or heterogeneity of data. The data can be in a structured format (databases) or unstructured format (images, video, emails). Big data analytical tools are helpful in handling unstructured data. Velocity refers to as the speed at which big data can access. The data is virtually present in a real-time environment (Internet logs) (Sivarajah, Kamal, Irani, & Weerakkody, 2017 ).

Currently, the concept of 3 V’s is inflated into several V’s. For instance, Demchenko, Grosso, De Laat, and Membrey ( 2013 ) classified big data into 5vs, which are Volume, Velocity, Variety, Veracity, and Value. Similarly, Saggi and Jain ( 2018 ) characterized big data into 7 V’s namely Volume, Velocity, Variety, Valence, Veracity, Variability, and Value.

Big data demand is significantly increasing in different fields of endeavour such as insurance and construction (Dresner Advisory Services, 2017 ), healthcare (Wang, Kung, & Byrd, 2018 ), telecommunication (Ahmed et al., 2018 ), and e-commerce (Wu & Lin, 2018 ). According to Dresner Advisory Services ( 2017 ), technology (14%), financial services (10%), consulting (9%), healthcare (9%), education (8%) and telecommunication (7%) are the most active sectors in producing a vast amount of data.

However, the educational sector is not an exception in this situation. In the educational realm, a large volume of data is produced through online courses, teaching and learning activities (Oi, Yamada, Okubo, Shimada, & Ogata, 2017 ). With the advent of big data, now teachers can access student’s academic performance, learning patterns and provide instant feedback (Black & Wiliam, 2018 ). The timely and constructive feedback motivates and satisfies the students, which gives a positive impact on their performance (Zheng & Bender, 2019 ). Academic data can help teachers to analyze their teaching pedagogy and affect changes according to students’ needs and requirement. Many online educational sites have been designed, and multiple courses based on individual student preferences have been introduced (Holland, 2019 ). The improvement in the educational sector depends upon acquisition and technology. The large-scale administrative data can play a tremendous role in managing various educational problems (Sorensen, 2018 ). Therefore, it is essential for professionals to understand the effectiveness of big data in education in order to minimize educational issues.

So far, several review studies have been conducted in the big data realm. Mikalef et al. ( 2018 ) conducted a systematic literature review study that focused on big data analytics capabilities in the firm. Mohammad & Torabi ( 2018 ), in their review study on big data, observed the emerging trends of big data in the oil and gas industry. Furthermore, another systematic literature review was conducted by Neilson, Daniel, and Tjandra ( 2019 ) on big data in the transportation system. Kamilaris, Kartakoullis, and Prenafeta-Boldú ( 2017 ), conducted a review study on the use of big data in agriculture. Similarly, Wolfert, Ge, Verdouw, and Bogaardt ( 2017 ) conducted a review study on the use of big data in smart farming. Moreover, Camargo Fiorini, Seles, Jabbour, Mariano, and Sousa Jabbour ( 2018 ) conducted a review study on big data and management theory. Even though that many fields have been covered in the previous review studies, yet, a comprehensive review of big data in the education sector is still lacking today. Thus, this study aims to conduct a systematic review of big data in education in order to identify the primary studies, their trends & themes, as well as limitations and possible future directions. This research can play a significant role in the advancement of big data in the educational domain. The identified limitations and future directions will be helpful to the new researchers to bring encroachment in this particular realm.

The research questions of this study are stated below:

What are the trends in the papers published on big data in education?

What research themes have been addressed in big data in education domain?

What are the limitations and possible future directions?

The remainder of this study is organized as follows: Section 2 explains the review methodology and exposes the SLR results; Section 3 reports the findings of research questions; and finally, Section 4 presents the discussion and conclusion and research implications.

Review methodology

In order to achieve the aforementioned objective, this study employs a systematic literature review method. An effective review is based on analysis of literature, find the limitations and research gap in a particular area. A systematic review can be defined as a process of analyzing, accessing and understanding the method. It explains the relevant research questions and area of research. The essential purpose of conducting the systematic review is to explore and conceptualize the extant studies, identification of the themes, relations & gaps, and the description of the future directions accordingly. Thus, the identified reasons are matched with the aim of this study. This research applies the Kitchenham and Charters ( 2007 ) strategies. A systematic review comprised of three phases: Organizing the review, managing the review, and reporting the review. Each phase has specific activities. These activities are: 1) Develop review protocol 2) Formulate inclusion and exclusion criteria 3) Describe the search strategy process 4) Define the selection process 5) Perform the quality evaluation procedure and 6) Data extraction and synthesis. The description of each activity is provided in the following sections.

Review protocol

The review protocol provides the foundation and mechanism to undertake a systematic literature review. The essential purpose of the review protocol is to minimize the research bias. The review protocol comprised of background, research questions, search strategy, selection process, quality assessment, and extraction of data and synthesis. The review protocol helps to maintain the consistency of review and easy update at a later stage when new findings are incorporated. This is the most significant aspect that discriminates SLR from other literature reviews.

Inclusion and exclusion criteria

The aim of defining the inclusion and exclusion criteria is to be rest assured that only highly relevant researches are included in this study. This study considers the published articles in journals, workshops, conferences, and symposium. The articles that consist of introductions, tutorials and posters and summaries were eliminated. However, complete and full-length relevant studies published in the English language between January 2014 to 2019 March were considered for the study. The searched words should be present in title, abstract, or in the keywords section.

Table  1 shows a summary of the inclusion and exclusion criteria.

Search strategy process

The search strategy comprised of two stages, namely S1 (automatic stage) and S2 (manual stage). Initially, an automatic search (S1) process was applied to identify the primary studies of big data in education. The following databases and search engines were explored: Science Direct, SAGE.

Journals, Emerald Insight, Springer Link, IEEE Xplore, ACM Digital Library, Taylor and Francis and AIS e-Library. These databases were considered as it possessed highest impact journals and germane conference proceedings, workshops and symposium. According to Kitchenham and Charters ( 2007 ), electronic databases provide a broad perspective on a subject rather than a limited set of specific journals and conferences. In order to find the relevant articles, keywords on big data and education were searched to obtain relatable results. The general words correlated to education were also explored (education OR academic OR university OR learning.

OR curriculum OR higher education OR school). This search string was paired with big data. The second stage is a manual search stage (S2). In this stage, a manual search was performed on the references of all initial searched studies. Kitchenham ( 2004 ) suggested that manual search should be applied to the primary study references. However, EndNote was used to manage, sort and remove the replicate studies easily.

Selection process

The selection process is used to identify the researches that are relevant to the research questions of this review study. The selection process of this study is presented in Fig.  1 . By applying the string of keywords, a total number of 559 studies were found through automatic search. However, 348 studies are replica studies and were removed using the EndNote library. The inclusion and exclusion criteria were applied to the remaining 211 studies. According to Kitchenham and Charters ( 2007 ), recommendation and irrelevant studies should be excluded from the review subject. At this phase, 147 studies were excluded as full-length articles were not available to download. Thus, 64 full-length articles were present to download and were downloaded. To ensure the comprehensiveness of the initial search results, the snowball technique was used. In the second stage, manual search (S2) was performed on the references of all the relevant papers through Google Scholar (Fig. 1 ). A total of 1 study was found through Google Scholar search. The quality assessment criteria were applied to 65 studies. However, 25 studies were excluded, as these studies did not fulfil the quality assessment criteria. Therefore, a total of 40 highly relevant primary studies were included in this research. The selection of studies from different databases and sources before and after results retrieval is shown in Table  2 . It has been found that majority of research studies were present in Science Direct (90), SAGE Journals (50), Emerald Insight (81), Springer Link (38), IEEE Xplore (158), ACM Digital Library (73), Taylor and Francis (17) and AIS e-Library (52). Google Scholar was employed only for the second round of manual search.

figure 1

Selection Process

Quality assessment

According to (Kitchenham & Charters, 2007 ), quality assessment plays a significant role in order to check the quality of primary researches. The subtleties of assessment are totally dependent on the quality of the instruments. This assessment mechanism can be based on the checklist of components or a set of questions. The primary purpose of the checklist of components and a set of questions is to analyze the quality of every study. Nonetheless, for this study, four quality measurements standard was created to evaluate the quality of each research. The measurement standards are given as:

QA1. Does the topic address in the study related to big data in education?

QA2. Does the study describe the context?

QA3. Does the research method given in the paper?

QA4. Does data collection portray in the article?

The four quality assessment standards were applied to 65 selected studies to determine the integrity of each research. The measurement standards were categorized into low, medium and high. The quality of each study depends on the total number of score. Each quality assessment has two-point scores. If the study meets the full standard, a score of 2 is awarded. In the case of partial fulfillment, a score of 1 is acquired. If none of the assessment standards is met, then a score of 0 is awarded. In the total score, if the study gets below 4, it is counted as ‘low’ and exact 4 considered as ‘medium’. However, the above 4 is reflected as ‘high’. The details of studies are presented in Table 11 in Appendix B . The 25 studies were excluded as it did not meet the quality assessment standard. Therefore, based on the quality assessment standard, a total of 40 primary studies were included in this systemic literature review (Table 10 in Appendix A ). The scores of the studies (in terms of low, medium and high) are presented in Fig.  2 .

figure 2

Scores of studies

Data extraction and synthesis

The data extraction and synthesis process were carried by reading the 65 primary studies. The studies were thoroughly studied, and the required details extracted accordingly. The objective of this stage is to find out the needed facts and figure from primary studies. The data was collected through the aspects of research ID, names of author, the title of the research, its publishing year and place, research themes, research context, research method, and data collection method. Data were extracted from 65 studies by using this aspect. The narration of each item is given in Table  3 . The data extracted from all primary studies are tabulated. The process of data synthesizing is presented in the next section.

Figure  3 presented the allocation of studies based on their publication sources. All publications were from high impact journals, high-level conferences, and workshops. The primary studies are comprised of 21 journals, 17 conferences, 1 workshop, and 1 symposium. However, 14 studies were from Science Direct journals and conferences. A total of 5 primary studies were from the SAGE group, 1 primary study from SpringerLink. Whereas 6 studies were from IEEE conferences, 2 studies were from IEEE symposium and workshop. Moreover, 1 primary study from AISeL Conference. Hence, 4 studies were from Emraldinsight journals, 5 studies were from ACM conferences and 2 studies were from Taylor and Francis. The summary of published sources is given in Table  4 .

figure 3

Allocation of studies based on publication

Temporal view of researches

The selection period of this study is from January 2014–March 2019. The yearly allocation of primary studies is presented in Fig.  4 . The big data in education trend started in the year 2014. This trend gradually gained popularity. In 2015, 8 studies were published in this domain. It has been found that a number of studies rise in the year 2017. Thus, the highest number of publication in big data in the education realm was observed in the year 2017. In 2017, 12 studies were published. This trend continued in 2018, and in that year, 11 studies that belong to big data in education were published. In 2019, the trend of this domain is still continued as this paper covers that period of March 2019. Thus, 4 studies were published until March 2019.

figure 4

Temporal view of Papers

In order to find the total citation count for the studies, Google Scholar was used. The number of citation is shown in Fig.  5 . It has been observed that 28 studies were cited by other sources 1–50 times. However, 11 studies were not cited by any other source. Thus, 1 study was cited by other sources 127 times. The top cited studies with their titles are presented in Table  5 , which provides general verification. The data provided here is not for comparison purpose among the studies.

figure 5

Research methodologies

The research methods employed by primary studies are shown in Fig.  6 . It has been found that majority of them are review based studies. These reviews were conducted in a different educational context and big data. However, reviews covered 28% of primary studies. The second most used research method was quantitative. This method covered 23% of the total primary studies. Only 3% of the study was based on a mix method approach. Moreover, design science method also covered 3% of primary studies. Nevertheless, 20% of the studies used qualitative research method, whereas the remaining 25% of the studies were not discussed and given in the articles.

figure 6

Distribution of Research Methods of Primary Studies

Data collection methods

The data collection methods used by primary studies are shown in Fig.  7 . The primary studies employed different data collection methods. However, the majority of studies used extant literature. The 5 types of research conducted surveys which covered 13% of primary Studies. The 4 studies carried experiments for data collection, which covered 10% of primary studies. Nevertheless, 6 studies conducted interviews for data collection, which is based on 15% of primary studies. The 4 studies used data logs which are based on 10% of primary studies. The 2 studies collected data through observations, 1 study used social network data, and 3 studies used website data. The observational, social network data and website-based researches covered 5%, 3% and 8% of primary studies. Moreover, 11 studies used extant literature and 1 study extracted data from a focus group discussion. The extant literature and focus group-based studies covered 28% and 3% of primary studies. However, the data collection method is not available for the remaining 3 studies.

figure 7

Distribution of Data Collection Methods of Primary Studies

What research themes have been addressed in educational studies of big data?

The theme refers to an idea, topic or an area covered by different research studies. The central idea reflects the theme that can be helpful in developing real insight and analysis. A theme can be in single or combination of more words (Rimmon-Kenan, 1995 ). This study classified big data research themes into four groups (Table  6 ). Thus, Fig.  8 shows a mind map of big data in education research themes, sub-themes, and the methodologies.

figure 8

Mind Map of big data in education research themes, sub-themes, and the methodologies

Figure  9 presents, research themes under big data in education, namely learner’s behavior and performance, modelling, and educational data warehouse, improvement of the educational system, and integration of big data into the curriculum.

figure 9

Research Themes

The first research theme was based on the leaner’s behavior and performance. This theme covers 21 studies, which consists of 53% of overall primary studies (Fig.  9 ). The theme studies are based on teaching and learning analytics, big data frameworks, user behaviour, and attitude, learner’s strategies, adaptive learning, and satisfaction. The total number of 8 studies relies on teaching and learning analytics (Table  7 ). Three (3) studies deal with big data framework. However, 6 studies concentrated on user behaviour and attitude. Nevertheless, 2 studies dwell on learning strategies. The adaptive learning and satisfaction covered 1 study, respectively. In this theme, 2 studies conducted surveys, 4 studies carried out experiments and 1 study employed the observational method. The 5 studies reported extant literature. In addition, 4 studies used event log data and 5 conducted interviews (Fig.  10 ).

figure 10

Number of Studies and Data Collection Methods

In the second theme, studies conducted focused on modeling and educational data warehouses. In this theme, 6 studies covered 15% of primary studies. This theme studies investigated the cloud environment, big data modeling, cluster analysis, and data warehouse for educational purpose (Table  8 ). Three (3) studies introduced big data modeling in education and highlighted the potential for organizing data from multiple sources. However, 1 study analyzed data warehouse with big data tools (Hadoop). Moreover, 1 study analyzed the accessibility of huge academic data in a cloud computing environment whereas, 1 study used clustering techniques and data warehouse for educational purpose. In this theme, 4 studies reported extant review, 1 study conduct survey, and 1 study used social network data.

The third theme concentrated on the improvement of the educational system. In this theme, 9 studies covered 23% of the primary studies. They consist of statistical tools and measurements, educational research implications, big data training, the introduction of the ranking system, usage of websites, big data educational challenges and effectiveness (Table  9 ). Two (2) studies considered statistical tools and measurements. Educational research implications, ranking system, usage of websites, and big data training covered 1 study respectively. However, 3 studies considered big data effectiveness and challenges. In this theme, 1 study conducted a survey for data collection, 2 studies used website traffic data, and 1 study exploited the observational method. However, 3 studies reported extant literature.

The fourth theme concentrated on incorporating the big data approaches into the curriculum. In this theme, 4 studies covered 10% of the primary studies. These 4 studies considered the introduction of big data topics into different courses. However, 1 study conducted interviews, 1 study employed survey method and 1 study used focus group discussion.

The 20% of the studies (Fig. 6 ) used qualitative research methods (Dinter et al., 2017 ; Veletsianos et al., 2016 ; Yang & Du, 2016 ). Qualitative methods are mostly applicable to observe the single variable and its relationship with other variables. However, this method does not quantify relationships. In qualitative researches, understanding is attained through ‘wording’ (Chaurasia & Frieda Rosin, 2017 ). The behaviors, attitude, satisfaction, performance, and overall learning performance are related with human phenomenons (Cantabella et al., 2019 ; Elia et al., 2018 ; Sedkaoui & Khelfaoui, 2019 ). Qualitative researches are not statistically tested (Chaurasia & Frieda Rosin, 2017 ). Big data educational studies which employed qualitative methods lacks some certainties that are present in quantitative research methods. Therefore, future researches might quantify the educational big data applications and its impact on higher education.

The six studies conducted interviews for data collection (Chaurasia et al., 2018 ; Chaurasia & Frieda Rosin, 2017 ; Nelson & Pouchard, 2017 ; Troisi et al., 2018 ; Veletsianos et al., 2016 ). However, 2 studies used observational method (Maldonado-Mahauad et al., 2018 ; Sooriamurthi, 2018 ) and one (1) study conducted focus group discussion (Buffum et al., 2014 ) for data collection (Fig.  10 ). The observational studies were conducted in uncontrolled environments. Sometimes results of these studies lead to self-selection biased. There is a chance of ambiguities in data collection where human language and observation are involved. The findings of interviews, observations and focus group discussions are limited and cannot be extended to a wider population of learners (Dinter et al., 2017 ).

The four big data educational studies analyzed the event log data and conducted interviews (Cantabella et al., 2019 ; Hirashima et al., 2017 ; Liang et al., 2016 ; Yang & Du, 2016 ). However, longitudinal data are more appropriate for multidimensional measurements and to analyze the large data sets in the future (Sorensen, 2018 ).

The eight studies considered the teaching and learning analytics (Chaurasia et al., 2018 ; Chaurasia & Frieda Rosin, 2017 ; Dessì et al., 2019 ; Roy & Singh, 2017 ). There are limited researches that covered the aspects of learning environments, ethical and cultural values and government support in the adoption of educational big data (Yang & Du, 2016 ). In the future, comparison of big data in different learning environments, ethical and cultural values, government support and training in adopting big data in higher education can be covered through leading journals and conferences.

The three studies are related to big data frameworks for education (Cantabella et al., 2019 ; Muthukrishnan & Yasin, 2018 ). However, the existed frameworks did not cover the organizational and institutional cultures, yet lacking robust theoretical grounds (Dubey & Gunasekaran, 2015 ; Muthukrishnan & Yasin, 2018 ). In the future, big data educational framework that concentrates on theories and adoption of big data technology is recommended. The extension of existed models and interpretation of data models are recommended. This will help in better decision and ensure the predictive analysis in the academic realm. Moreover, further relations can be tested by integrating other constructs like university size and type (Chaurasia et al., 2018 ).

The three studies dwelled on big data modeling (Pardos, 2017 ; Petrova-Antonova et al., 2017 ; Wassan, 2015 ). These models do not incorporate with the present systems (Santoso & Yulia, 2017 ). Therefore, efficient research solutions that can manage the educational data, new interchanging and resources are required in the future. One (1) study explored a cloud-based solution for managing academic big data (Logica & Magdalena, 2015 ). However, this solution is expensive. In the future, a combination of LMS that is supported by open-source applications and software’s can be used. This development will help universities to obtain benefits from unified LMS and to introduce new trends and economic opportunities for the academic industry. The data warehouse with big data tools was investigated by one (1) study (Santoso & Yulia, 2017 ). Nevertheless, a manifold node cluster can be implemented to process and access the structural and un-structural data in future (Ramos et al., 2015 ). In addition, new techniques that are based on relational and nonrelational databases and development of index catalogs are recommended to improve the overall retrieval system. Furthermore, the applicability of the least analytical tools and parallel programming models are needed to be tested for academic big data. MapReduce, MongoDB, pig,

Cassandra, Yarn, and Mahout are suggested for exploring and analysis of educational big data (Wassan, 2015 ). These tools will improve the analysis process and help in the development of reliable models for academic analytics.

One (1) study detected ICT factors through data mining techniques and tools in order to enhance educational effectiveness and improves its system (Martínez-Abad et al., 2018 ). Additionally, two studies also employed big data analytic tools on popular websites to examine the academic user’s interest (Martínez-Abad et al., 2018 ; Qiu et al., 2015 ). Thus, in future research, more targeted strategies and regions can be selected for organizing the academic data. Similarly, in-depth data mining techniques can be applied according to the nature of the data. Thus, the foreseen research can be used to validate the findings by applying it on other educational websites. The present research can be extended by analyzing the socioeconomic backgrounds and use of other websites (Qiu et al., 2015 ).

The two research studies were conducted on measurements and selection of statistical software for educational big data (Ozgur et al., 2015 ; Selwyn, 2014 ). However, there is no statistical software that is fit for every academic project. Therefore, in future research, all in one’ type statistical software is recommended for big data in order to fulfill the need of all academic projects. The four research studies were based on incorporating the big data academic curricula (Buffum et al., 2014 ; Sledgianowski et al., 2017 ). However, in order to integrate the big data into the curriculum, the significant changes are required. Firstly, in future researches, curricula need to be redeveloped or restructured according to the level and learning environment (Nelson & Pouchard, 2017 ). Secondly, the training factor, learning objectives, and outcomes should be well designed in future studies. Lastly, comparable exercises, learning activities and assessment plan need to be well structured before integrating big data into curricula (Dinter et al., 2017 ).

Discussion and conclusion

Big data has become an essential part of the educational realm. This study presented a systematic review of the literature on big data in the educational sector. However, three research questions were formulated to present big data educational studies trends, themes, and identification of the limitations and directions for further research. The primary studies were collected by performing a systematic search through IEEE Xplore, ScienceDirect, Emerald Insight, AIS Electronic Library, Sage, ACM Digital Library, Springer Link, Taylor and Francis, and Google Scholar databases. Finally, 40 studies were selected that meet the research protocols. These studies were published between the years 2014 (January) and 2019 (April). Through the findings of this study, it can be concluded that 53% of extant studies were conducted on learner’s behavior and performance theme. Moreover, 15% of the studies were on modeling and educational Data Warehouse, and 23% of the studies were on the improvement of educational system themes. However, only 10% of the studies were on the integration of big data into the curriculum theme.

Thus, a large number of studies were conducted in learner’s behavior and performance theme. However, other themes gained lesser attention. Therefore, more researches are expected in modeling and educational Data Warehouse in the future, in order to improve the educational system and integration of big data into the curriculum, related themes.

It has been found that 20% of the studies used qualitative research methods. However, 6 studies conducted interviews, 2 studies used observational method and 1 study conducted focus group discussion for data collection. The findings of interviews, observations and focus group discussions are limited and cannot be extended to a wider population of learners. Therefore, prospect researches might quantify the educational big data applications and its impact in higher education. The longitudinal data are more appropriate for multidimensional measurements and future analysis of the large data sets. The eight studies were carried out on teaching and learning analytics. In the future, comparison of big data in different learning environments, ethical and cultural values, government support and training to adopt big data in higher education can be covered through leading journals and conferences.

The three studies were related to big data frameworks for education. In the future, big data educational framework that dwells on theories and extension of existed models are recommended. The three studies concentrated on big data modeling. These models cannot incorporate with present systems. Therefore, efficient research solutions are that can manage the educational data, new interchanging and resources are required in a future study. The two studies explored a cloud-based solution for managing academic big data and investigated data warehouse with big data tools. Nevertheless, in the future, a manifold node cluster can be implemented for processing and accessing of the structural and un-structural data. The applicability of the least analytical tools and parallel programming models needs to be tested for academic big data.

One (1) study considered the detection of ICT factors through data mining technique and 2 studies employed big data analytic tools on popular websites to examine the academic user’s interest. Thus, more targeted strategies and regions can be selected for organizing the academic data in future. Four (4) research studies featured on incorporating the big data academic curricula. However, the big data based curricula need to be redeveloped by considering the learning objectives. In the future, well-designed learning activities for big data curricula are suggested.

Research implications

This study has two folded implications for stakeholders and researchers. Firstly, this review explored the trends published on big data in education realm. The identified trends uncover the studies allocation, publication sources, sequential view and most cited papers. In addition, it highlights the research methods used in these studies. The described trends can provide opportunities and new ideas to researchers to predict the accurate direction in future studies.

Secondly, this research explored the themes, sub-themes, and the methodologies in big data in education domain. The classified themes, sub-themes, and the methodologies present a comprehensive overview of existing literature of big data in education. The described themes and sub-themes can be helpful for researchers to identify new research gap and avoid using repeated themes in future studies. Meanwhile, it can help researchers to focus on the combination of different themes in order to uncover new insights on how big data can improve the learning and teaching process. In addition, illustrated methodologies can be useful for researchers in the selection of method according to nature of the study in future.

Identified research can be an implication for stakeholders towards the holistic expansion of educational competencies. The identified themes give new insight to universities to plan mixed learning programs that combine conventional learning with web-based learning. This permits students to accomplish focused learning outcomes, engrossing exercises at an ideal pace. It can be helpful for teachers to apprehend the ways to gauge students learning behaviour and attitude simultaneously and advance teaching strategy accordingly. Understanding the latest trends in big data and education are of growing importance for the ministry of education as they can develop flexible possibly to support the institutions to improve the educational system.

Lastly, the identified limitations and possible future directions can provide guidelines for researchers about what has been explored or need to explore in future. In addition, stakeholders can also extract ideas to impart the future cohort and comprehend the learning and academic requirements.

Availability of data and materials

Not applicable.

Ahmed, E., Yaqoob, I., Hashem, I. A. T., Shuja, J., Imran, M., Guizani, N., & Bakhsh, S. T. (2018). Recent advances and challenges in mobile big data. IEEE Communications Magazine , 56 (2), 102–108. China: East China Normal University. https://doi.org/10.1109/MCOM.2018.1700294 .

Anshari, M., Alas, Y., & Yunus, N. (2019). A survey study of smartphones behavior in Brunei: A proposal of Modelling big data strategies. In Multigenerational Online Behavior and Media Use: Concepts, Methodologies, Tools, and Applications , (pp. 201–214). IGI global.

Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice , 25 (6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807 .

Article   Google Scholar  

Buffum, P. S., Martinez-Arocho, A. G., Frankosky, M. H., Rodriguez, F. J., Wiebe, E. N., & Boyer, K. E. (2014, March). CS principles goes to middle school: Learning how to teach big data. In Proceedings of the 45th ACM technical Computer science education , (pp. 151–156). New York: ACM. https://doi.org/10.1145/2538862.2538949 .

Camargo Fiorini, P., Seles, B. M. R. P., Jabbour, C. J. C., Mariano, E. B., & Sousa Jabbour, A. B. L. (2018). Management theory and big data literature: From a review to a research agenda. International Journal of Information Management , 43 , 112–129. https://doi.org/10.1016/j.ijinfomgt.2018.07.005 .

Cantabella, M., Martínez-España, R., Ayuso, B., Yáñez, J. A., & Muñoz, A. (2019). Analysis of student behavior in learning management systems through a big data framework. Future Generation Computer Systems , 90 (2), 262–272. https://doi.org/10.1016/j.future.2018.08.003 .

Chae, B. K. (2019). A general framework for studying the evolution of the digital innovation ecosystem: The case of big data. International Journal of Information Management , 45 , 83–94. https://doi.org/10.1016/j.ijinfomgt.2018.10.023 .

Chaurasia, S. S., & Frieda Rosin, A. (2017). From big data to big impact: Analytics for teaching and learning in higher education. Industrial and Commercial Training , 49 (7), 321–328. https://doi.org/10.1108/ict-10-2016-0069 .

Chaurasia, S. S., Kodwani, D., Lachhwani, H., & Ketkar, M. A. (2018). Big data academic and learning analytics. International Journal of Educational Management , 32 (6), 1099–1117. https://doi.org/10.1108/ijem-08-2017-0199 .

Coccoli, M., Maresca, P., & Stanganelli, L. (2017). The role of big data and cognitive computing in the learning process. Journal of Visual Languages & Computing , 38 , 97–103. https://doi.org/10.1016/j.jvlc.2016.03.002 .

De Mauro, A., Greco, M., & Grimaldi, M. (2016). A formal definition of big data based on its essential features. Library Review , 65 (3), 122–135. https://doi.org/10.1108/LR-06-2015-0061 .

Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P. (2013). Addressing big data issues in scientific data infrastructure. In Collaboration Technologies and Systems (CTS), 2013 International Conference on , (pp. 48–55). San Diego: IEEE. https://doi.org/10.1109/CTS.2013.6567203 .

Dessì, D., Fenu, G., Marras, M., & Reforgiato Recupero, D. (2019). Bridging learning analytics and cognitive computing for big data classification in micro-learning video collections. Computers in Human Behavior , 92 (1), 468–477. https://doi.org/10.1016/j.chb.2018.03.004 .

Dinter, B., Jaekel, T., Kollwitz, C., & Wache, H. (2017). Teaching Big Data Management – An Active Learning Approach for Higher Education . North America: Paper presented at the proceedings of the pre-ICIS 2017 SIGDSA, (pp. 1–17). North America: AISeL.

Dresner Advisory Services. (2017). Big data adoption: State of the market. ZoomData. Retrieved from https://www.zoomdata.com/master-class/state-market/big-data-adoption

Google Scholar  

Dubey, R., & Gunasekaran, A. (2015). Education and training for successful career in big data and business analytics. Industrial and Commercial Training , 47 (4), 174–181. https://doi.org/10.1108/ict-08-2014-0059 .

Elia, G., Solazzo, G., Lorenzo, G., & Passiante, G. (2018). Assessing learners’ satisfaction in collaborative online courses through a big data approach. Computers in Human Behavior , 92 , 589–599. https://doi.org/10.1016/j.chb.2018.04.033 .

Gupta, D., & Rani, R. (2018). A study of big data evolution and research challenges. Journal of Information Science. , 45 (3), 322–340. https://doi.org/10.1177/0165551518789880 .

Herschel, R., & Miori, V. M. (2017). Ethics & big data. Technology in Society , 49 , 31–36. https://doi.org/10.1016/j.techsoc.2017.03.003 .

Hirashima, T., Supianto, A. A., & Hayashi, Y. (2017, September). Model-based approach for educational big data analysis of learners thinking with process data. In 2017 International Workshop on Big Data and Information Security (IWBIS) (pp. 11-16). San Diego: IEEE. https://doi.org/10.1177/0165551518789880

Holland, A. A. (2019). Effective principles of informal online learning design: A theory-building metasynthesis of qualitative research. Computers & Education , 128 , 214–226. https://doi.org/10.1016/j.compedu.2018.09.026 .

Kalaian, S. A., Kasim, R. M., & Kasim, N. R. (2019). Descriptive and predictive analytical methods for big data. In Web Services: Concepts, Methodologies, Tools, and Applications , (pp. 314–331). USA: IGI global. https://doi.org/10.4018/978-1-5225-7501-6.ch018 .

Kamilaris, A., Kartakoullis, A., & Prenafeta-Boldú, F. X. (2017). A review on the practice of big data analysis in agriculture. Computers and Electronics in Agriculture , 143 , 23–37. https://doi.org/10.1016/j.compag.2017.09.037 .

Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University , 33 (2004), 1–26.

Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering , 45 (4), 13–65.

Lia, Y., & Zhaia, X. (2018). Review and prospect of modern education using big data. Procedia Computer Science , 129 (3), 341–347. https://doi.org/10.1016/j.procs.2018.03.085 .

Liang, J., Yang, J., Wu, Y., Li, C., & Zheng, L. (2016). Big Data Application in Education: Dropout Prediction in Edx MOOCs. In Paper presented at the 2016 IEEE second international conference on multimedia big data (BigMM) , (pp. 440–443). USA: IEEE. https://doi.org/10.1109/BigMM.2016.70 .

Logica, B., & Magdalena, R. (2015). Using big data in the academic environment. Procedia Economics and Finance , 33 (2), 277–286. https://doi.org/10.1016/s2212-5671(15)01712-8 .

Maldonado-Mahauad, J., Pérez-Sanagustín, M., Kizilcec, R. F., Morales, N., & Munoz-Gama, J. (2018). Mining theory-based patterns from big data: Identifying self-regulated learning strategies in massive open online courses. Computers in Human Behavior , 80 (1), 179196. https://doi.org/10.1016/j.chb.2017.11.011 .

Martínez-Abad, F., Gamazo, A., & Rodríguez-Conde, M. J. (2018). Big Data in Education. In Paper presented at the proceedings of the sixth international conference on technological ecosystems for enhancing Multiculturality - TEEM'18, Salamanca, Spain , (pp. 145–150). New York: ACM. https://doi.org/10.1145/3284179.3284206 .

Mikalef, P., Pappas, I. O., Krogstie, J., & Giannakos, M. (2018). Big data analytics capabilities: A systematic literature review and research agenda. Information Systems and e-Business Management , 16 (3), 547–578. https://doi.org/10.1007/10257-017-0362-y .

Mohammadpoor, M., & Torabi, F. (2018). Big Data analytics in oil and gas industry: An emerging trend. Petroleum. In press. https://doi.org/10.1016/j.petlm.2018.11.001 .

Muthukrishnan, S. M., & Yasin, N. B. M. (2018). Big Data Framework for Students’ Academic. Paper presented at the symposium on computer applications & industrial electronics (ISCAIE), Penang, Malaysia (pp. 376–382). USA: IEEE. https://doi.org/10.1109/ISCAIE.2018.8405502

Neilson, A., Daniel, B., & Tjandra, S. (2019). Systematic review of the literature on big data in the transportation Domain: Concepts and Applications. Big Data Research . In press. https://doi.org/10.1016/j.bdr.2019.03.001 .

Nelson, M., & Pouchard, L. (2017). A pilot “big data” education modular curriculum for engineering graduate education: Development and implementation. In Paper presented at the Frontiers in education conference (FIE), Indianapolis, USA , (pp. 1–5). USA: IEEE. https://doi.org/10.1109/FIE.2017.8190688 .

Nie, M., Yang, L., Sun, J., Su, H., Xia, H., Lian, D., & Yan, K. (2018). Advanced forecasting of career choices for college students based on campus big data. Frontiers of Computer Science , 12 (3), 494–503. https://doi.org/10.1007/s11704-017-6498-6 .

Oi, M., Yamada, M., Okubo, F., Shimada, A., & Ogata, H. (2017). Reproducibility of findings from educational big data. In Paper presented at the proceedings of the Seventh International Learning Analytics & Knowledge Conference , (pp. 536–537). New York: ACM. https://doi.org/10.1145/3027385.3029445 .

Ong, V. K. (2015). Big Data and Its Research Implications for Higher Education: Cases from UK Higher Education Institutions. In Paper presented at the 2015 IIAI 4th international confress on advanced applied informatics , (pp. 487–491). USA: IEEE. https://doi.org/10.1109/IIAI-AAI.2015.178 .

Ozgur, C., Kleckner, M., & Li, Y. (2015). Selection of statistical software for solving big data problems. SAGE Open , 5 (2), 59–94. https://doi.org/10.1177/2158244015584379 .

Pardos, Z. A. (2017). Big data in education and the models that love them. Current Opinion in Behavioral Sciences , 18 (2), 107–113. https://doi.org/10.1016/j.cobeha.2017.11.006 .

Petrova-Antonova, D., Georgieva, O., & Ilieva, S. (2017, June). Modelling of educational data following big data value chain. In Proceedings of the 18th International Conference on Computer Systems and Technologies (pp. 88–95). New York City: ACM. https://doi.org/10.1145/3134302.3134335

Qiu, R. G., Huang, Z., & Patel, I. C. (2015, June). A big data approach to assessing the US higher education service. In 2015 12th International Conference on Service Systems and Service Management (ICSSSM) (pp. 1–6). New York: IEEE. https://doi.org/10.1109/ICSSSM.2015.7170149

Ramos, T. G., Machado, J. C. F., & Cordeiro, B. P. V. (2015). Primary education evaluation in Brazil using big data and cluster analysis. Procedia Computer Science , 55 (1), 10311039. https://doi.org/10.1016/j.procs.2015.07.061 .

Rimmon-Kenan, S. (1995). What Is Theme and How Do We Get at It?. Thematics: New Approaches, 9–20.

Roy, S., & Singh, S. N. (2017). Emerging trends in applications of big data in educational data mining and learning analytics. In 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence , (pp. 193–198). New York: IEEE. https://doi.org/10.1109/confluence.2017.7943148 .

Saggi, M. K., & Jain, S. (2018). A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management , 54 (5), 758–790. https://doi.org/10.1016/j.ipm.2018.01.010 .

Santoso, L. W., & Yulia (2017). Data warehouse with big data Technology for Higher Education. Procedia Computer Science , 124 (1), 93–99. https://doi.org/10.1016/j.procs.2017.12.134 .

Sedkaoui, S., & Khelfaoui, M. (2019). Understand, develop and enhance the learning process with big data. Information Discovery and Delivery , 47 (1), 2–16. https://doi.org/10.1108/idd-09-2018-0043 .

Selwyn, N. (2014). Data entry: Towards the critical study of digital data and education. Learning, Media and Technology , 40 (1), 64–82. https://doi.org/10.1080/17439884.2014.921628 .

Shahat, O. A. (2019). A novel big data analytics framework for smart cities. Future Generation Computer Systems , 91 (1), 620–633. https://doi.org/10.1016/j.future.2018.06.046 .

Shorfuzzaman, M., Hossain, M. S., Nazir, A., Muhammad, G., & Alamri, A. (2019). Harnessing the power of big data analytics in the cloud to support learning analytics in mobile learning environment. Computers in Human Behavior , 92 (1), 578–588. https://doi.org/10.1016/j.chb.2018.07.002 .

Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V. (2017). Critical analysis of big data challenges and analytical methods. Journal of Business Research , 70 , 263–286. https://doi.org/10.1016/j.jbusres.2016.08.001 .

Sledgianowski, D., Gomaa, M., & Tan, C. (2017). Toward integration of big data, technology and information systems competencies into the accounting curriculum. Journal of Accounting Education , 38 (1), 81–93. https://doi.org/10.1016/j.jaccedu.2016.12.008 .

Sooriamurthi, R. (2018). Introducing big data analytics in high school and college. In Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (pp. 373–374). New York: ACM. https://doi.org/10.1145/3197091.3205834

Sorensen, L. C. (2018). "Big data" in educational administration: An application for predicting school dropout risk. Educational Administration Quarterly , 45 (1), 1–93. https://doi.org/10.1177/0013161x18799439 .

Article   MathSciNet   Google Scholar  

Su, Y. S., Ding, T. J., Lue, J. H., Lai, C. F., & Su, C. N. (2017). Applying big data analysis technique to students’ learning behavior and learning resource recommendation in a MOOCs course. In 2017 International conference on applied system innovation (ICASI) (pp. 1229–1230). New York: IEEE. https://doi.org/10.1109/ICASI.2017.7988114

Troisi, O., Grimaldi, M., Loia, F., & Maione, G. (2018). Big data and sentiment analysis to highlight decision behaviours: A case study for student population. Behaviour & Information Technology , 37 (11), 1111–1128. https://doi.org/10.1080/0144929x.2018.1502355 .

Ur Rehman, M. H., Yaqoob, I., Salah, K., Imran, M., Jayaraman, P. P., & Perera, C. (2019). The role of big data analytics in industrial internet of things. Future Generation Computer Systems , 92 , 578–588. https://doi.org/10.1016/j.future.2019.04.020 .

Veletsianos, G., Reich, J., & Pasquini, L. A. (2016). The Life Between Big Data Log Events. AERA Open , 2 (3), 1–45. https://doi.org/10.1177/2332858416657002 .

Wang, Y., Kung, L., & Byrd, T. A. (2018). Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change , 126 , 3–13. https://doi.org/10.1016/j.techfore.2015.12.019 .

Wassan, J. T. (2015). Discovering big data modelling for educational world. Procedia - Social and Behavioral Sciences , 176 , 642–649. https://doi.org/10.1016/j.sbspro.2015.01.522 .

Wolfert, S., Ge, L., Verdouw, C., & Bogaardt, M. J. (2017). Big data in smart farming–a review. Agricultural Systems , 153 , 69–80. https://doi.org/10.1016/j.agsy.2017.01.023 .

Wu, P. J., & Lin, K. C. (2018). Unstructured big data analytics for retrieving e-commerce logistics knowledge. Telematics and Informatics , 35 (1), 237–244. https://doi.org/10.1016/j.tele.2017.11.004 .

Xu, L. D., & Duan, L. (2019). Big data for cyber physical systems in industry 4.0: A survey. Enterprise Information Systems , 13 (2), 148–169. https://doi.org/10.1080/17517575.2018.1442934 .

Yang, F., & Du, Y. R. (2016). Storytelling in the age of big data. Asia Pacific Media Educator , 26 (2), 148–162. https://doi.org/10.1177/1326365x16673168 .

Yassine, A., Singh, S., Hossain, M. S., & Muhammad, G. (2019). IoT big data analytics for smart homes with fog and cloud computing. Future Generation Computer Systems , 91 (2), 563–573. https://doi.org/10.1016/j.future.2018.08.040 .

Zhang, M. (2015). Internet use that reproduces educational inequalities: Evidence from big data. Computers & Education , 86 (1), 212–223. https://doi.org/10.1016/j.compedu.2015.08.007 .

Zheng, M., & Bender, D. (2019). Evaluating outcomes of computer-based classroom testing: Student acceptance and impact on learning and exam performance. Medical Teacher , 41 (1), 75–82. https://doi.org/10.1080/0142159X.2018.1441984 .

Download references

Acknowledgements

Not applicable

Author information

Authors and affiliations.

Department of Information Systems, Faculty of Computer Science & Information Technology University of Malaya, 50603, Kuala Lumpur, Malaysia

Maria Ijaz Baig, Liyana Shuib & Elaheh Yadegaridehkordi

You can also search for this author in PubMed   Google Scholar

Contributions

Maria Ijaz Baig composed the manuscript under the guidance of Elaheh Yadegaridehkordi. Liyana Shuib supervised the project. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Liyana Shuib .

Ethics declarations

Competing interests.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Baig, M.I., Shuib, L. & Yadegaridehkordi, E. Big data in education: a state of the art, limitations, and future research directions. Int J Educ Technol High Educ 17 , 44 (2020). https://doi.org/10.1186/s41239-020-00223-0

Download citation

Received : 09 March 2020

Accepted : 10 June 2020

Published : 02 November 2020

DOI : https://doi.org/10.1186/s41239-020-00223-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data science applications in education
  • Learning communities
  • Teaching/learning strategies

big data in education research paper

Advertisement

Advertisement

A decade of research into the application of big data and analytics in higher education: A systematic review of the literature

  • Open access
  • Published: 20 July 2023
  • Volume 29 , pages 5807–5831, ( 2024 )

Cite this article

You have full access to this open access article

big data in education research paper

  • Ana Stojanov   ORCID: orcid.org/0000-0002-8377-4372 1 &
  • Ben Kei Daniel 1  

8136 Accesses

8 Citations

Explore all metrics

The need for data-driven decision-making primarily motivates interest in analysing Big Data in higher education. Although there has been considerable research on the value of Big Data in higher education, its application to address critical issues within the sector is still limited. This systematic review, conducted in December 2021 and encompassing 75 papers, analysed the applications of Big Data and analytics in higher education. The focus was on their usage in supporting learning, teaching and administration as reported in papers indexed in SCOPUS, Web of Science and IEEE Xplore. The key findings from the review revealed that Big Data and analytics are predominantly used to support learning and, to a lesser extent, guide teaching and informing administrative decision-making processes. The review also identified a set of studies focused on supporting student well-being. Further, we extend the use of Big Data in higher education to include the well-being of students and staff. This paper contributes to the growing debate on the practical use of Big Data and analytics to provide valuable insights for solving systemic challenges facing high education in the twenty-first century.

Similar content being viewed by others

big data in education research paper

Big Data in Higher Education: The Big Picture

Overview of big data and analytics in higher education.

big data in education research paper

Big data in education: a state of the art, limitations, and future research directions

Explore related subjects.

  • Digital Education and Educational Technology

Avoid common mistakes on your manuscript.

1 Introduction and related research

Higher education institutions generate large amounts of exponentially increasing ‘Big Data’ (Monino & Sedkaoui, 2016 ) either via interaction between different stakeholders or via stakeholders interacting with the learning management systems and student records systems. Although there is no single definition of Big Data (Arbia, 2021 ), it is generally described using 3 Vs (Erevelles et al., 2016 ; Grover et al., 2018 ; Ristevski & Chen, 2018 ) – volume (a large amount of data), variety (heterogeneity of data, ranging from structured to unstructured) and velocity (the speed with which the data can be accessed). Some researchers added veracity (data quality, e.g., Jin et al., 2015 ), value (the worth of the generated insights, e.g.Lycett, 2013 ; Naeem et al., 2022 ) or variability (the presence of inconsistency and noise, e.g., Jo, 2019 ). Regardless of any particular conceptualisation, the increasing amount of data generated in the higher education sector provides opportunities for extracting valuable, actionable insights, similar to other sectors. For example, in the healthcare sector (Singh et al., 2021 ), Big Data techniques are used in medical image processing to detect or predict disease progression (Rehman et al., 2022 ). In cybersecurity (Alani, 2021 ), Big Data and analytics are being used for ransomware (Huang et al., 2018 ) or phishing detection (Gutierrez et al., 2018 ). In addition, Big Data has been used for crop yield prediction (Abbas et al., 2020 ), digital marketing (Kushwaha et al., 2021 ) and search engine optimisation (Drivas et al., 2020 ).

With the increasing use of digital technologies to support learning and teaching, a significant amount of data is being generated, primarily by engaging students and faculty in learning management systems (LMS). This data can be harvested, processed and used to address critical challenges higher education institutions face. Drawing on the successes of using Big Data in various sectors, higher education can likewise seize the opportunity to apply Big Data techniques to gain valuable insight for decision-making. Early research noted that higher education is a sector yet to be penetrated by Big Data and analytics (Attaran et al., 2018 ), referring to Big Data as an ‘untapped opportunity’ in higher education (Chaurasia & Frieda Rosin, 2017 ). Several early articles discussed the potential of harnessing Big Data in higher education (e.g., Attaran et al., 2018 ; Daniel, 2017 ; Tasmin et al., 2020 ). For example, Daniel ( 2015 ) proposed three scenarios in which Big Data can support learning, teaching and administration. These scenarios were developed from a critical analysis of early work on applying Big Data in education. Chaurasia and Frieda Rosin ( 2017 ) further proposed four potential uses of Big Data in education, including reporting and compliance (which could be subsumed under supporting administration in Daniel's ( 2015 ) framework), analysis and visualisation (classified under supporting the teaching and learning process), security and risk mitigation (supporting administration) and predictive analytics (supporting the teaching and learning process). Big Data can identify at-risk students, provide individualised learning experiences, or improve student assessment (Ray & Saeed, 2018 ). However, it is less clear how much the value of Big Data in higher education has been fully realised over the 2011–2021 period.

A literature review indicated that each of these uses had been realised. As a point case, Waheed and colleagues (Waheed et al., 2020 ) focused on using Big Data to support learning. They demonstrated that a neural network model containing data from the virtual learning environment could predict students at risk of failing a course. On the other hand, Cooper et al. ( 2016 ) used Big Data to identify if courses have accessibility issues, demonstrating its use to support administration. Looking at individual papers may provide insights into a particular use of each of the scenarios proposed but not a comprehensive overview, as they are limited to demonstrating application to a single case study. Therefore, systematic reviews are beneficial for obtaining an overview of the application of Big Data in higher education.

Notably, current systematic reviews of Big Data in higher education focus on how learning analytics supports study success (Ifenthaler & Yau, 2020 ), the effectiveness of interventions on student outcomes such as retention, engagement and performance (Foster & Francis, 2020 ) and the effectiveness of learning analytics in addressing student dropout rates (De Oliveira et al., 2021 ). The relatively narrow focus of the existing literature is a drawback, as the systematic reviews focus on a single use of Big Data in higher education. Perhaps two exceptions are Baig et al. ( 2020 ), who examined the trends in 40 published papers on Big Data in education as well as research themes addressed in this domain, and Alkhalil et al. ( 2021 ), who conducted a systematic mapping study on the use of Big Data in higher education. However, Baig et al.’s ( 2020 ) review focused on general education, without considering the specific characteristics of higher education settings. Similarly, Alkhalil et al. ( 2021 ) mapping article does not provide a summary and synthesis of the application of Big Data in higher education. As a result, there is limited research providing a broader overview of the available literature on the role of Big Data and associated analytics in higher education.

Further, it seems that studies focusing on Big Data are predominantly concerned with system performance, such as the development of predictive algorithms (Ifenthaler & Yau, 2020 ) or student satisfaction with using a dashboard (Ramaswami et al., 2019 ) but less on the application of those algorithms or the usage of the dashboard for the betterment of students’ learning outcomes. Moreover, the available reviews predominantly focus on the potential benefits of learning while neglecting the benefits for teaching and administration, leaving the question of whether the latter applications are lacking or if those conducting the reviews have paid more attention to the benefits of learning.

Furthermore, it remains unclear what challenges are faced when Big Data and analytics are used in higher education. For example, technical issues, ethical considerations, and practical limitations have posed significant obstacles to the widespread adoption of Big Data in higher education (Daniel, 2019 ; Klein et al., 2019 ).

In this article, we report on the outcome of a systematic review of the application of Big Data in higher education that covers published work from 2011 to 2022, providing a comprehensive picture, identifying gaps and suggesting directions for future research. Our work opens up a valuable dialogue for policymakers interested in incorporating Big Data analytics into their operational and strategic initiatives.

1.1 Framing the review

The systematic review examines the literature on using Big Data in higher education in 2011–2022. We were particularly interested in research that showcased some benefits, improvements or otherwise contributed to decision-making or better student outcomes. The following questions guided the systematic review:

What are Big Data's and related analytics' existing uses in higher education?

To what extent does applying Big Data and analytics support learning, teaching or administrative decision-making in higher education?

What are the existing challenges of applying Big Data in higher education?

Prior to conducting the systematic review (a type of descriptive research), we performed a broad search ("big data" AND "Education") to familiarise ourselves with the scope of the literature and core keywords and to develop the exclusion criteria. Upon identifying the main keywords, we constructed our query (( ( "learning analytics" OR "big data" OR "data mining" OR "dashboard" OR "academic analytics") AND ( "higher education" OR "tertiary education" OR "HEI" OR "University" OR "College" OR "Faculty") AND ( "intervention" OR "implementation" OR "case study" OR "application") AND ( "improve" OR "enhance" OR "decision making"))). We searched the title, abstract and keywords in SCOPUS, Web of Science and IEEE Xplore. Web of Science was chosen for its multidisciplinarity, and because it is the leading citation search database (Li et al., 2018 ), SCOPUS is the largest abstract database (Schotten et al., 2017 ), and IEEE as the database containing computer science papers.

The search (December 2021) resulted in 1,072 entries; however, upon removing the duplicates, there were 851 entries. Articles not in English, review articles and editorials were excluded from the search. Also, we limited the search to the following categories: "Education", "Educational research", "Computer Science OR Information Systems", "Computer Science Artificial Intelligence", "Computer Science Interdisciplinary applications", "Education Scientific disciplines", and "Telecommunications" (in Web of Science), as well as to "Computer Science" and "Social Sciences" (in Scopus). The screening of the articles underwent two stages. In the initial stage, the first author read the titles of the article to determine whether they are relevant to the study research questions. In the second stage, the abstracts were read to determine if the articles would be retained for analysis. If no decision could be made based on the abstract, the full text was screened too. The first author read and summarised all included studies. We excluded papers (1) whose context was not higher education, (2) that focused on attitudes, beliefs or opinions about Big Data. Our exclusions also extended to (3) discussion/conceptual/review papers, (4) papers where Big Data was not the main focus, (5) papers that dealt with teaching of Big Data, (6) papers merely focused on the performance of algorithms (7) papers dealing with architecture, (8) papers with incomprehensible abstracts (9) papers whose full text was not available and, (10) retracted articles because of nonsensical content. The search procedure and retention of articles for the final review are depicted in Fig.  1 . The summary of the reviewed studies is given in Table 1 in the online supplementary materials.

figure 1

Flowchart for the paper selection process

In this review, we did not perform a formal quality or risk of bias assessment as our focus was primarily on mapping the types and areas of Big Data applications in higher education, aligning with this study's descriptive and exploratory nature. The review was not registered. Online first (ahead of print) articles were also included in the review.

3 Results and discussion

In presenting the findings, we were guided by the tripartite approach (Daniel & Harland, 2017 ), which suggests systematic review studies should be described, synthesised and critically evaluated to provide new insights for further research.

3.1 Description of the studies

We used descriptive statistics to summarise the findings of the review. Figure  2 presents the number of published articles per year. As shown from the figures, the number of articles published per year shows an upward trend, implying that the interest in research into the role of Big Data in higher education continues to grow globally.

figure 2

Number of published papers per year

The majority of the papers were from the US ( f  = 14), followed by papers co-authored with scholars affiliated with universities in different countries ( f  = 8), Spain ( f  = 7), the UK ( f  = 6), China ( f  = 5), Indonesia, Finland and Australia (all had a frequency of f  = 3), Türkiye, Singapore, Japan, Italy, India and Ecuador ( all had a frequency of f  = 2), and Vietnam, Sweden, South Korea, Philippines, Ireland, Iran, Greece, Czechia, Chile, Canada and Belgium (all had a frequency of f  = 1).

3.2 Synthesis of the studies

In synthesising the findings, we grouped the publications into three main use categories: Big Data's role in supporting learning, teaching, and administration (Daniel, 2015 ). As will be seen later, some could not be classified into either of these, and we included additional categories (see Fig.  3 ).

figure 3

Graphical representation of the uses of Big Data in higher education

3.2.1 Supporting learning

Performance.

At the time of the review, most published studies focused on supporting students' learning or various aspects of it. One of those aspects is the discourse of Big Data as it relates to predicting student performance. For example, Castells et al. ( 2020 ) and Perez and Gonzalez ( 2016 ) presented a tool that predicts students' performance, while Prieto et al. ( 2020 ) presented two case studies related to the development of a data visualisation tool that analyses student performance and facilitates conversations between students and councillors. Furthermore, Gutiérrez et al. ( 2020 ) presented a learning analytics dashboard that uses a multilevel clustering algorithm to predict a student's success in an academic programme and then depicts that chance along with information on the quality of the prediction. Other studies reported using learning analytics to generate student dashboards (Ramaswami et al., 2019 ); dashboards display an individual's activity or performance and compare it to the average in the class or use an explainable LM algorithm (Afzaal et al., 2021 ). Researchers have also used students’ data to identify the factors behind the predicted performance in quizzes and assignments and provide information on how the prediction could be improved by listing activities to be performed (e.g. watching a video) (Azcona et al., 2017 ).

In addition to the development of dashboards, researchers (Lonn et al., 2015 ) also endeavoured to discover the relationship between advisors’ use of learning analytics early warning system, which used information from a learning management system to provide weekly updates about student engagement and performance to advisors, and their students’ performance during a summer bridge program. Also, researchers used on-campus geolocation data to identify study groups and examine the performance among group members (Azcona et al., 2017 ) or examined the link between learning input (the frequency of playing videos) and performance (Ji & Han, 2019 ). Similarly, others discussed the associations between performance, submission time and the number of submissions, how work at night or weekends impacts performance, or how different group work patterns affect performance (Apiola et al., 2019 ).

Other uses of Big Data included the prediction of whether students will graduate within four years (He et al., 2018 ), examining the relationship between performance and digital footprint, identifying engagement patterns as an early predictor of performance, and correlating performance at an early stage and overall performance (Summers et al., 2021 ). The research examined the effect of intervention in the form of guidance and recommendations (prepared by the teacher and based on learning analytics) on students' academic self-efficacy and problem-solving skills (Karaoglan Yilmaz, 2022 ) as well as developing tools (Broos et al., 2018 ) that provides feedback about the performance on a positioning test (abilities to solve math problems). Studies reported the effect of learning analytics intervention on student performance in a blended course (Gong et al., 2018 ; Zhang et al., 2020 ), applied uplift modelling to demonstrate that offering tutorials to students with the most considerable likelihood to be retained as a result of the tutorials, boosts the effects of such retention efforts (Olaya et al., 2020 ).

Researchers were interested in students dropping out both late and early. For example, Salazar-Fernandez et al. ( 2019 ) analysed the educational trajectories of 414 students in courses with high failure rates to identify similarities and differences between students. In particular, they looked at factors such as gender, income and entry math skills that could explain the different trajectories and failure rates. On the other hand, Dodge et al. ( 2015 ), Figueroa-Cañas and Sancho-Vinuesa ( 2021 ), and Linden & Webster ( 2019 ) trialled an intervention aimed at minimising the number of students who are unlikely to succeed earlier on in their academic programme. Similarly, Herodotou et al. ( 2020 ) demonstrated how PLA (predictive learning analytics) could inform the practice of the student support team who contacted the students identified as having a low probability of completing their studies. To identify causes of academic failure, Nkhoma et al. ( 2020 ) analysed 968 letters (written by students in a business school) using natural language processing (at the university in question, students who satisfy the criteria to be classified as "at risk" are asked to explain their situation in a letter and work with an advisor on a study plan). Frequencies of the most common words and word pairs were extracted, enabling the researchers to identify the most common reasons. To gain more insights from the data and provide context for the keywords by modelling the relationship between them, the researchers used visualisation based on semantic network analysis and topic modelling to validate the findings. Five significant reasons were identified: learning skills, assessment, time management, courses and family issues. Géryk and Popelínský ( 2014 ) presented an interactive visual analytics tool EDAIME that explores academic analytics and examines whether changes in the fields of study are related to retention.

Student engagement

Most studies focused on student engagement reported either a pattern of engagement with the learning resources (Nkomo & Nat, 2021 ) or the results of an intervention aimed at increasing student engagement (e.g., Lawrence et al., 2019 ; Lu et al., 2017 ). For example, Karaoglan Yilmaz and Yilmaz ( 2022 ) provided students in the experimental group ( N  = 33) with personalised metacognitive feedback (based on learning analytics containing info on the weekly learning management system use) and personalised recommendations and compared their engagement to that of the control group. Similarly, Cobos and Ruiz‐Garcia ( 2021 ) provided intervention in the form of feedback about students' progress, as well as suggestions for improvement of performance, to examine whether an intervention can change the engagement of students enrolled in a MOOC (massive open online course) as well as the perception about their persistence, intention to complete and pass the course, and performance. Álvarez-Méndez et al. ( 2020 ) extracted Moodle log files ( N  = 33,776) for 121 students and analysed them to examine their interactions with the LMS resources.

Another, more unusual utilisation of Big Data in understanding engagement was observed by (Cheong et al., 2018 ). They presented the results of piloting an integrated Telegram Application and web-based forum that uses natural language processing and text mining to provide thoughtfulness scores on students' questions and answers as they write them with the idea that students might gain insight into how the document is developed and that their contribution to it may foster metacognitive skills. McNely et al. ( 2012 ) presented a visualisation tool (Uatu) that provides real-time engagement metrics on co-authorship and collaborative writing. Wang et al. ( 2021 ) were interested in what features could be derived from the logged interactions data of the problem-solving process during a simulation and whether the extracted features could predict success or failure in problem-solving.

Miscellaneous

Other uses of Big Data for supporting student learning can be seen in the work of Althbiti et al. ( 2021 ), who introduced PAARS (Personalised Academic Advisory Recommender System), an automated recommender system for courses that helps students with course selection. The students can input their research area or learning objectives, and the system provides a list of recommended courses based on content-based filtering algorithms and ensemble learning algorithms. Another example of miscellaneous use is Park and Jo ( 2015 ), who developed a learning analytics dashboard called LAPA (Learning Analytics for Prediction & Action), which supports students learning by informing them of their online behaviour.

3.2.2 Supporting teaching

Teaching-focused curriculum analytics.

Dawson and Hubball ( 2014 ) developed and implemented a curriculum analytics tool that visualises the connections between courses in a curriculum network. The tool can analyse individual student learning pathways and identify dominant student pathways or curriculum pathways that impede/promote timely completion. Similarly, Hilliger et al. ( 2020 ) present a tool that generates reports of attained competencies at the course and program levels. In a different paper, Barb and Kilicay-Ergin ( 2020 ) evaluate the curriculum coherence of the Information Science programme by identifying academic overlaps and gaps using ontologies and natural language processing.

Monitoring student behaviour

Other uses include examining if students engage with the learning materials as intended. For example, Nagi ( 2019 ), Alachiotis et al. ( 2019 ), Ayub et al. ( 2017 ), and Llopis-Albert and Rubio ( 2021 ) extracted learning analytics from the learning management system to examine if students were engaging with the assigned hands-on activities (e.g., participation in quizzes, visits to the platform), while Harindranathan and Folkestad ( 2019 ) examined whether student behaviour in terms of quiz taking was aligned with the intention of the instructional design by extracting and analysing Canvas quiz log data. Similarly, Baralis et al. ( 2017 ) set out to find out whether the planned objectives of the educational video service, such as appreciation (number of accessed courses), effectiveness (as reflected in the correlation between use and performance) and flexibility, were reflected in the users’ behaviour. In addition to examining the use of video recordings, Sarsfield and Conway ( 2018 ) also looked at potential differences between subgroups of students and modules in terms of usage. To examine if there are differences in learners in the use of practical vs theoretical resources, Braccini et al. ( 2021 ) looked at 2,000,000 records while reviewing the visible (i.e. how many times a user replied to a post by a different user) and invisible interaction (i.e. how many times a user read a message by another user) of students, Hernández-García et al. ( 2016 ) extracted and analysed data from the LMS (~ 114,756 records).

Examples of miscellaneous use include Beasley et al. ( 2021 ), who analysed peer review text from two visualisation courses (~ 4,687 reviews in total) using sentiment analysis and Gottipati et al. ( 2017 ), who present a learning analytics tool that analyses qualitative data by extracting the sentiments of the feedback that students leave at the end of the course.

To capture the effectiveness of peer instruction compared to traditional teaching, Kuromiya et al. ( 2020 ) chose the number of access to Moodle content, Moodle quizzes, Moodle forum and Moodle resources as engagement indicators. They compared these indicators for periods when three teachers held traditional lectures vs peer instruction (intervention period).

3.2.3 Supporting administration

Administrative curriculum analytics.

Regarding supporting administration, one common use of Big Data is the examination of curriculum analytics. Armatas et al. ( 2022 ) developed a programme review tool to conduct learning analytics associated with the performance of a programme. The tool’s forms of analysis include network analysis, grades analysis (e.g., information about how complex a subject is), and prediction of award GPA (grade point average). Similarly, Cooper et al. ( 2016 ) use Big Data to identify accessibility deficits in courses by recording the percentage of students (2009–2013) who declared and did not declare a disability and comparing the odds ratio for completing each module and thus were able to pinpoint several modules that may have accessibility issues.

Analysis of admission and enrolment

The use of Big Data for tracking admission is reported in the work of Khudzaeva et al. ( 2018 ), who used clustering techniques to group the high schools based on the GPA of students so that they could change the quotas for admission according to the results (e.g. increase quotas for school that produce students who have high GPA, and reduce for those with low). Another example is Burkhardt et al. ( 2016 ). They conducted retrospective data analysis of the admission and enrolment data. They produced a dashboard using Visual Basic and Excel to allow decision-makers to input student factors (e.g., financial aid offers) and obtain output as predicted enrolment.

Analysis of resources

Big Data was also used to study the available resources at an institutional level. For example, Alrehaili et al. ( 2021 ) present the Higher Education Activities and Processes Automation Framework (HEAPAF) and higher education ontology. This framework can be used to extract data from different resources. The authors use it to analyse, find, and rank the right resources for teaching a course. Likewise, to examine the collaborative relationship between authors and determine experts in particular fields through social network analysis (Elisabeth et al., 2019 ), the authors downloaded the metadata of published articles from Scopus, which contained the authors’ names, keywords, affiliations and funding. They examined author-author links, author-keywords, and author affiliations by creating a graph of the author network, which showed the connectedness between authors and the most central (famous) authors in a network. Anastasios et al. ( 2011 ) present a tool that evaluates the research performance of a university and the achievement of a research policy using multiple indicators. The tool provides graphical visualisation (e.g., network analysis) on four indicators: scientific publications, collaboration with other higher education institutions, and collaboration with industry and research sectors. Scholars (Srinivas & Rajendran, 2019 ) also report on SWOT (strengths, opportunities, weaknesses and threats) analysis with the help of text analysis of students' online reviews (N = 24,390) collected from a university review website. The authors used topic modelling (to automatically identify predominantly discussed topics and categorise each sentence in the appropriate topic), sentiment analysis (to detect the affect associated with each sentence), and SWOT analysis using the topic-based opinion summaries to identify strengths and weaknesses. They compared the topic opinion summaries concerning their university to those of other universities to identify threats and opportunities.

Other studies report administrative use of Big Data more broadly on the university campus. For example, Du et al. ( 2019 ) uncovered student recreation centre usage patterns by using historical data from swipe cards and user profile data. They developed a web app that predicts visit volume. At the same time, Xia and Liu ( 2018 ) used library data of 18,294 students and staff about the books borrowed in 2017. They examined the relationship between readers and book categories to inform decision-making and give recommendations (e.g., books that are frequently borrowed together to be located near each other). Chi et al. ( 2012 ) developed SAS (smart alumni system) to connect alumni and students, which incorporates a social networking style mentoring system and uses data mining to discover user relationships. Alumni who have expressed interest in mentoring and guiding students on their career path are matched with students based on interests, occupation or the city where they live. Big Data has also been applied to examine the effect of training on the use of the academic management system and how enhanced usage of the system is reflected in the performance of staff and faculty (Joy & Nambirajan, 2021 ).

Other uses include Rad et al. ( 2011 ), who endeavoured to cluster and rank university majors in Iran. The authors identified 177 university majors from a list solicited from the relevant ministry, and eight main specialisation groups were defined. Then 64 experts were asked to compare the university majors' influence on these eight specialisation groups and the importance of each specialisation group for present-day Iran. Ten clusters were derived using k-means clustering.

3.2.4 The intersection of supporting learning and teaching

Some studies could not be categorised into supporting learning or teaching. Thus we created a separate category representing the intersection of supporting teaching and learning. A paper that fits into this category is Villamañe et al. ( 2016 ), who present RubricVis, a tool that provides visually enriched feedback for rubric assessment. The tool can present information as a radar graph; students can see their weak and strong areas, compare their performance to their peers in the group, and follow their progress. Teachers can observe a student's performance or a group of students, track their progress, or compare the performance of different groups.

A further example is Romero et al. ( 2013 ), who analysed the quiz results of 104 students and developed association rules. The quiz was changed, and the course was modified based on the results. The results of the students taking the original quiz were compared to those of two other groups taking the modified quiz. The updates in the quiz resulted in a better score, indicating that improvement in the course also improved performance.

Nguyen et al. ( 2018 ) examined the extent to which the students’ timing of engagement and instructors' learning design match, as well as how performance relates to study patterns (i.e. engagement), while Essa and Ayad ( 2012 ) present a tool (S3) that offers a 1) synoptic view of students' progress as well as visualisation and identification of students at risk; 2) comparison to another learner, as well as 3) sociogram showing patterns of collaborations.

Taniguchi et al. ( 2017 ) aimed to study the impression topics hidden in students' journals by extracting weekly keywords commonly mentioned in the journals and students' impressions regarding those keywords. Students (~ 100) in an "Information Science" course were instructed to write weekly reflective entries after class (total N of entries = 1,664). The weekly topics were frequently mentioned when students were writing about something problematic about them.

To profile students enrolled in a MOOC on an IELTS preparation course, Ocaña et al. ( 2019 ) obtained the data of 22,164 students via the EdX platform and applied the k means algorithm to attain five clusters: strong starters, weak finishers, more content, less assessment; more assessment, less content; very high engagement, moderate performance; high engagement, high performance. Laakso et al. ( 2018 ) present ViLLE, a tool that automatically assesses exercises and provides insights to the teacher.

3.2.5 Miscellaneous

This section summarises the application of Big Data in higher education, which could not be neatly classified into support for learning, teaching or administration, or the intersection of two or more. McCulloch et al. ( 2021 ) created a visualisation tool and web-based visual analytics dashboard for empowering autistic students to communicate their experiences and manage their activities. The visualisation highlights students' physiological status (e.g. stressed vs unstressed) and marks locations related to high stress (using geolocations and Fitbit data), nudging them to consider stress management techniques. The tool also visualises sleep quality (timing and duration of each sleeping phase).

To identify the discourse in the social media footprint left by students on unofficial Facebook pages of 41 private and public higher education institutions in the Ilcos Region in the Philippines, Aviles and Esquivel ( 2019 ) used sentiment analysis to determine the polarity of the posts and comments ( N  = 3,000). A web-based application classified the posts into five categories (academics, social engagement, emotions, finances, policies and health). A word cloud of the most frequent words was produced. The majority of the posts were positive and related to social engagement. Similarly, to analyse student feedback from Twitter API and web applications using sentiment analysis Sivakumar and Reddy ( 2017 ) extracted tweets on engineering education and then calculated their emotion and polarity.

To examine the affective response evoked by viewing one’s learning analytics information, Joseph-Richard et al. ( 2021 ) asked 42 students to indicate their emotional reactions using a questionnaire while they watched their learning analytics. Students reacted to watching their learning analytics with diverse emotions. Viewing one’s own PLA did not necessarily lead to increased motivation but also led to fear, scepticism, and doubt.

To analyse student agency, 130 computer programming students completed a questionnaire measuring 11 dimensions of agency, and, based on the answers, they were clustered into one of four profiles (Jääskelä et al., 2021 ).

4.1 What are Big Data's and related analytics' existing uses in higher education?

The review reveals that the current application of Big Data and analytics in higher education is varied and spans the three spheres of supporting learning, teaching and administration. The majority of studies, however, reflected efforts to support learning, with fewer supporting teaching and administration. This trend suggests that the application of Big Data and analytics is recognised as a potentially powerful tool to enhance student learning outcomes, which aligns with previous research on the application of Big Data in education (Ifenthaler & Yau, 2020 ). Although most studies focused on exploring strategies to support student learning, the issues addressed predominantly assess students' performance, predict dropout rates and propose intervention strategies. Some studies employed sentiment analysis and the development of metacognitive skills.

In contrast, fewer studies focused on the role of Big Data in enhancing administrative issues. The few studies that reported using Big Data in decision-making related to administrative matters included curriculum analysis, addressing equity issues in representation and access to learning, admission processes, resources allocation, collaborative relationships or library usage, and conducting a SWOT analysis and social network analysis. This dearth of exploration might be due to factors such as more visible and immediate benefits in learning outcomes or potential skill gaps among administrative staff and aligns with Alkhalil et al.'s ( 2021 ) observation that current research on BD in higher education is still at an immature stage, which suggests the potential for further exploration in administrative decision-making. Leveraging Big Data in administrative functions presents immense opportunities for strategic planning, operational efficiency, and promoting equity in higher education institutions. By harnessing the predictive power and insightful analytics of Big Data, administrative decisions can be more data-driven and effective. This further underscores the importance of future research into administrative uses of Big Data. Such research would contribute to our understanding of the scope of Big Data applications in higher education and help identify and address potential barriers to implementation (Daniel, 2015 ).

4.1.1 Theoretical implications

Some studies could not be classified because they overlap between supporting teaching and supporting learning. It is good to remember that in practice, it is difficult to draw a line where the dimension of support for teaching ends and learning support begins. Ultimately, all teaching activity is aimed at supporting learning.

In comparing with the past studies, it is evident that our findings are consistent with previous findings (e.g., Aytaç & Bilge, 2020 ; Baig et al., 2020 ; Ifenthaler & Yau, 2020 ). However, a unique aspect we noticed is the emergence of Big Data applications to support stakeholder well-being, which has been less emphasised in earlier research. This observation opens up a new perspective on the potential of Big Data in education. Thus, we propose a three-dimensional model consisting of supporting the teaching and learning process, administration and supporting the well-being of actors in higher education. Thus, the model proposed by Daniel ( 2015 ) could be refined and updated to reflect the findings.

4.2 To what extent does applying Big Data and analytics support learning, teaching or administrative decision-making in higher education?

As discussed above, although we identified studies spanning the three applications, most focused on supporting students' learning. Moreover, even the studies focused on students have more to do with the data aspect than the learning aspect, as also identified by other studies (Ifenthaler & Yau, 2020 ). Thus, it seems that the potential of Big Data in higher education is underutilised.

4.3 What are the existing challenges of applying Big Data in higher education?

4.3.1 ethical considerations are swept under the carpet.

Few reviewed studies (e.g., Burkhardt et al., 2016 ; Joseph-Richard et al., 2021 ; Sarsfield & Conway, 2018 ) explicitly mentioned ethical considerations or if the study had ethics approval. This trend of low concern for ethical issues is in line with other studies that review 252 papers on learning analytics and found that only 18% mention ethics (Viberg et al., 2018 ). Yet, in the current climate of increasing discussions about data privacy and ownership (Ifenthaler & Tracey, 2016 ; Jones et al., 2020 ; Lawson et al., 2016 ) in educational technology research, such questions deserve deeper reflection and are likely to pose challenges for future research.

4.3.2 The focus is on the technology, not the stakeholder

As some other review studies noted (e.g., Ifenthaler & Yau, 2020 ), most publications focus on system performance and are more concerned with analytics than learning. This observation also aligns with other authors' views (Gašević et al., 2015 ; Roberts et al., 2017 ). Thus, the potential of Big Data and analytics is not adequately realised. For example, although Ocaña et al. ( 2019 ) discuss how profiling can identify at-risk students and improve course design and delivery, they do not take the profiling in their study one step further to implement this potential. Similarly, Ayub et al. ( 2017 ) state that the association rules obtained could be used to improve the learning management system but do not elaborate on how it could be achieved. Instead, they offer some generic advice, such as the introduction of gamification in the LMS, but it is unclear how those recommendations follow the association rules.

Further, as the findings by Joseph-Richard et al. ( 2021 ) suggest, the implementation of predictive learning analytics needs to be conducted with the student's well-being in mind, as some may experience nudging as nagging (Lawrence et al., 2019 ) and may not want to know their prediction (Afzaal et al., 2021 ) or may have privacy concerns (Laakso et al., 2018 ). Furthermore, the predictions may be confusing, especially if students do not know how the predictions are arrived at (Gutiérrez et al., 2020 ).

4.3.3 Eagerness to demonstrate the effectiveness

In some instances, the quality of evidence for a given claim needs to be more robust (Ferguson & Clow, 2017 ), as the studies examining intervention effectiveness do not always report the results of statistical tests. For example, Lawrence et al. ( 2019 ) report "increases in at-risk students engaging with their courses" (p. 53), but the researchers provide no statistics to back this claim. Likewise, in another cohort study with the intervention, 10 out of 35 took the final exam, and in the group without intervention, 4 out of 36 (Figueroa-Cañas & Sancho-Vinuesa, 2021 ), yet no formal statistical tests were reported. Hence, the effect remains unclear (i.e., are the descriptive differences due to chance, or do they represent a meaningful effect).

Similarly, Cobos and Ruiz‐Garcia ( 2021 ) report a "considerable imbalance" in the success rate between the control and experimental groups but have not provided the appropriate statistics. The conclusions drawn and the evidence to back them remain unclear even if statistical tests are performed. This is so because the applied statistical test is inappropriate for answering the research question. For example, Afzaal et al. ( 2021 ) conducted two separate paired samples t-tests to test if the students who used the dashboard between quizzes 1 and 2 performed better at the time two than those who did not. However, in this situation, a more appropriate approach would have been mixed ANOVA, with time one and time two performance used as a within-subjects factor and group as a between-subjects factor, as the increase in one group could be statistically insignificant from the increase in the other group.

Sample size considerations or post hoc discussions of achieved power are rarely mentioned. Still, in some instances, the sample size, and thus the subsequent power to detect an effect, is relatively small. For example, Zhang et al. ( 2020 ) had 49 participants divided into experimental and control groups. Similarly, Gong et al. ( 2018 ) examined the effectiveness of an intervention using 31 participants in total.

4.3.4 Data/tools-related limitations

The data used in the predictive models, or for data mining, is limited to the one available in the learning management system, and other "offline" variables are not considered (Nguyen et al., 2018 ). Further, multiple tests are conducted on the same data set without making significance level adjustments to deal with type 1 error inflation, which in some cases (e.g., Álvarez-Méndez et al., 2020 ) could result in non-significant findings. The overreliance on p -value, disregarding effect size, may be further misleading (Nuzzo, 2014 ). Pre-processing the data is time-consuming (Harindranathan & Folkestad, 2019 ). In some cases, the results do not provide insights beyond the specific context (Sarsfield & Conway, 2018 ; Nkomo & Nat, 2021 ; Ji & Han, 2019 ). The need for some tools is unclear, or users do not see their attractiveness (e.g., Cheong et al., 2018 ). The diagrams produced by some tools can be overwhelming (Dawson & Hubball, 2014 ) and complex (McCulloch et al., 2021 ), making them difficult to understand, and the interventions based on learning analytics may have limited or no impact (Dodge et al., 2015 ; Park & Jo, 2015 ). Some of these challenges are overcome in those cases where the dashboards offer an interpretation or call to action. Still, even in those cases, we do not understand the long-term impact of "negative" predictions on students' well-being.

4.3.5 Lack of theory

Further, as others have noted (Foster & Francis, 2020 ), the explanatory mechanisms or theory behind the interventions and the subsequent change are lacking. In cases where a theory drives the intervention, for example, increasing metacognitive skills, the results are not as expected. Thus, researchers must be vigilant for any dangers that educational technology may become an education hazard. Such risk is, for example, possible if researchers applying Big Data in higher education try to adapt the stakeholders to the available technology instead of adapting the available technology to the needs of the stakeholders. Are, for example, the multiple dashboards reviewed in this paper improving some outcome, or are students using them for no particular purpose? Can the negative predictions about performance become self-fulfilling prophecies (Archer & Prinsloo, 2020 ; Parkes et al., 2020 )? Such questions have not been the focus of interest in the papers.

The papers implementing an intervention often lack scientific rigour, and the long-term outcomes of such interventions regarding motivation or well-being are unknown. Although the ultimate aim of Big Data and analytics is to inform decision-making (Lonn et al., 2012 ), a limited number of the articles reviewed achieved this aim.

5 Limitations and conclusions

Big Data and analytics are arguably two critical research paradigms that have emerged in an era where society rapidly generates data in large volumes. As an emerging paradigm, working with Big Data and analytics requires the knowledge of the 'fourth tradition' (Daniel, 2017 ). According to Daniel ( 2017 ), this fourth tradition is an empirical data-intensive scientific approach underpinned by the principles of knowledge discovery through data mining and visualisation. The fourth tradition necessitates the development of predictive and actionable analytics to solve complex societal problems. For over a decade, predictive and actionable analytics have been viewed as promising mechanisms for addressing challenges the higher education sector faces in the twenty-first century. The systematic review presented in the article closely examined published literature on the role of Big Data and associated analytics. As our goal was to gain a broad overview of the application of Big Data in higher education in the stated review period, we did not control the quality of the reviewed papers.

Furthermore, the search for articles was limited to only three databases. There is always the possibility that the inclusion or exclusion of keywords would have led to a different set of papers. Nonetheless, we believe the included keywords reflect the sufficient breadth of use of Big Data in higher education.

Despite these limitations, this systematic review offers a broad overview of the uses of Big Data in higher education. The review findings revealed that Big Data and analytics are predominantly used to support learning and, to a lesser extent, teaching and administration, and this is broadly consistent with other studies (Aytaç & Bilge, 2020 ; Baig et al., 2020 ). However, we observed some varied uses in supporting the latter. We also identified a set of studies focused on helping student well-being. Thus, we proposed refinement of the model postulated by Daniel ( 2015 ) to include supporting the teaching and learning process, administration, and the well-being of students and staff. Future studies could expand on the well-being aspect. Further, studies examining the effect of interventions based on big data should use more rigorous statistical tests to make a convincing argument for their effectiveness. In addition, studies utilising Big Data should be based on a theoretical perspective and be targeted towards solving an existing problem in practice.

Data availability

The data is available at https://osf.io/2eq7k/?view_only=1100a797b33b4d5f98072db99d3b9325 .

The references used in the systematic review are marked with *

Abbas, F., Afzaal, H., Farooque, A. A., & Tang, S. (2020). Crop yield prediction through proximal sensing and machine learning algorithms. Agronomy, 10 (7), 1046.  https://doi.org/10.3390/agronomy10071046

Article   Google Scholar  

*Afzaal, M., Nouri, J., Zia, A., Papapetrou, P., Fors, U., Wu, Y., Li, X., & Weegar, R. (2021). Explainable AI for data-driven feedback and intelligent action recommendations to support students self-regulation [Original Research]. Frontiers in Artificial Intelligence , 4 . https://doi.org/10.3389/frai.2021.723447

*Alachiotis, N., Verykios, V., & Stavropoulos, E. (2019). Analysing learners behavior and resources effectiveness in a distance learning course: a case study of the Hellenic Open University. Journal of Information Science Theory and Practice 7, 2019 , 06-20.  https://doi.org/10.1633/JISTaP.2019.7.3.1

Alani, M. M. (2021). Big data in cybersecurity: A survey of applications and future trends. Journal of Reliable Intelligent Environments, 7 (2), 85–114. https://doi.org/10.1007/s40860-020-00120-3

Alkhalil, A., Abdallah, M. A. E., Alogali, A., & Aljaloud, A. (2021). Applying Big Data analytics in higher education. International Journal of Information and Communication Technology Education, 17 (3), 29–51. https://doi.org/10.4018/ijicte.20210701.oa3

*Alrehaili, N. A., Aslam, M. A., Alahmadi, D. H., Alrehaili, D. A., Asif, M., & Arshad Malik, M. S. (2021). Ontology-based smart system to automate higher education activities. Complexity , 2021 , 1-20.  https://doi.org/10.1155/2021/5588381

*Althbiti, A., Algarni, S., Alghamdi, T., & Ma, X. (2021). A Personalised Academic advisory recommender system (PAARS): a case study. Proceedings 4th International Conference on Information and Computer Technologies, 270–278. https://doi.org/10.1109/ICICT52872.2021.00051

*Álvarez-Méndez, A., Carrera, M., Barrios, J., Llatas, C., & Vázquez, P. (2020). Application of data mining in Moodle platform for the analysis of the academic performance of a compulsory subject in university students. Proceedings 14th International Technology, Education and Development Conference , 984–992. https://doi.org/10.21125/inted.2020.0355

*Anastasios, T., Sgouropoulou, C., Xydas, I., Terraz, O., & Miaoulis, G. (2011). Academic research policy-making and evaluation using graph visualisation. 15th Panhellenic Conference on Informatics , https://doi.org/10.1109/PCI.2011.38

*Apiola, M., Lokkila, E., & Laakso, M.-J. (2019). Digital learning approaches in an intermediate-level computer science course. The International Journal of Information and Learning Technology , 36 (5), 467-484.  https://doi.org/10.1108/ijilt-06-2018-0079

Arbia, G. (2021). Statistics New Empiricism and Society in the Era of Big Data . Springer. https://doi.org/10.1007/978-3-030-73030-7

Book   Google Scholar  

Archer, E., & Prinsloo, P. (2020). Speaking the unspoken in learning analytics: Troubling the defaults. Assessment & Evaluation in Higher Education, 45 (6), 888–900.  https://doi.org/10.1080/02602938.2019.1694863

*Armatas, C., Kwong, T., Chun, C., Spratt, C., Chan, D., & Kwan, J. (2022). Learning analytics for programme review: evidence, analysis, and action to improve student learning outcomes. Technology, Knowledge and Learning , 27 (2), 461-478.  https://doi.org/10.1007/s10758-021-09559-6

Attaran, M., Stark, J., & Stotler, D. (2018). Opportunities and challenges for big data analytics in US higher education: A conceptual model for implementation. Industry and Higher Education, 32 (3), 169–182. https://doi.org/10.1177/0950422218770937

Aviles, J., & Esquivel, R. (2019). Mining social media data of philippine higher education institutions using naive bayes classifier algorithm. SSRN Electronic Journal .  https://doi.org/10.2139/ssrn.3379025

Aytaç, Z., & Bilge, H. Ş. (2020). Big data analytics in higher education: A systematic review. Journal of Internet Applications and Management, 11 (2), 81–99.

Google Scholar  

*Ayub, M., Toba, H., Wijanto, M., & Yong, S. (2017). Modelling students’ activities in programming subjects through educational data mining. Global Journal of Engineering Education , 19 , 249-255

*Azcona, D., Corrigan, O., Scanlon, P., & Smeaton, A. F. (2017). Innovative Learning Analytics Research at a Data-Driven HEI. Proceedings of the 3rd International Conference on Higher Education Advances , 435–443. https://doi.org/10.4995/HEAd17.2017.5245

Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: A state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17 (1), 44. https://doi.org/10.1186/s41239-020-00223-0

*Baralis, E., Cagliero, L., Farinetti, L., Mezzalama, M., & Venuto, E. (2017). Experimental validation of a massive educational service in a blended learning environment. IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), 1, 381 - 390. https://doi.org/10.1109/COMPSAC.2017.123

Barb, A. S., & Kilicay-Ergin, N. (2020). Applications of natural language techniques to enhance curricular coherence. Procedia Computer Science, 168 , 88–96. https://doi.org/10.1016/j.procs.2020.02.263

Beasley, Z. J., Friedman, A., & Rosen, P. (2021). Through the looking glass: insights into visualization pedagogy through sentiment analysis of peer review text. IEEE Computer Graphics and Applications, 41 (6), 59–70. https://doi.org/10.1109/mcg.2021.3115387

*Braccini, A. M., Limongelli, C., Sciarrone, F., & Temperini, M. (2021). Business intelligence for teaching analytics: a case study. Springer Proceedings in Complexity , 341–351.  https://doi.org/10.1007/978-3-030-62066-0_26

*Broos, T., Verbert, K., Langie, G., Soom, C., & De Laet, T. (2018). Multi-institutional positioning test feedback dashboard for aspiring students: lessons learnt from a case study in Flanders. 8th International Conference on Learning Analytics and Knowledge. 8th International Conference on Learning Analytics and Knowledge, LAK 2018 , 51–55.  https://doi.org/10.1145/3170358.3170419

*Burkhardt, J. C., DesJardins, S. L., Teener, C. A., Gay, S. E., & Santen, S. A. (2016). Enrollment Management in Medical School Admissions: A Novel Evidence-Based Approach at One Institution. Academic Medicine , 91 (11), 1561-1567.  https://doi.org/10.1097/acm.0000000000001188

*Castells, J., Doust, M. P., Galárraga, L., Méndez, G. G., Ortiz-Rojas, M., & Jiménez, A. (2020). A student-oriented tool to support course selection in academic counselling sessions. In M.-M. P.J., K. C.D., T. Y.-S., G. D., V. K., P.-S. M., P.-S. M., H. I., Z.-P. MA, O.-R. M., & S. E. (Eds.), 2020 Workshop on Adoption, Adaptation and Pilots of Learning Analytics in Under-Represented Regions, LAUR 2020 , 2704, 48–57. CEUR-WS

Chaurasia, S. S., & Frieda Rosin, A. (2017). From Big Data to Big Impact: Analytics for teaching and learning in higher education. Industrial and Commercial Training, 49 (7/8), 321–328.  https://doi.org/10.1108/ict-10-2016-0069

*Cheong, M. L. F., Chen, J. Y. C., & Dai, B. T. (2018). Integrated Telegram and Web-based Forum with Automatic Assessment of Questions and Answers for Collaborative Learning. IEEE International Conference on Teaching, Assessment, and Learning for Engineering, TALE 2018, 9-16 .

*Chi, H., Jones, E. L., & Grandham, L. P. (2012). Enhancing Mentoring Between Alumni and Students via Smart Alumni System. 12th Annual International Conference on Computational Science, ICCS 2012 , 9 , 1390–1399. https://doi.org/10.1016/j.procs.2012.04.153

*Cobos, R., & Ruiz‐Garcia, J. C. (2021). Improving learner engagement in MOOCs using a learning intervention system: A research study in engineering education. Computer Applications in Engineering Education , 29 (4), 733-749.  https://doi.org/10.1002/cae.22316

*Cooper, M., Ferguson, R., & Wolff, A. (2016). What can analytics contribute to accessibility in e-learning systems and to disabled students' learning? Proceedings of the Sixth International Conference on Learning Analytics & Knowledge , 99–103. https://doi.org/10.1145/2883851.2883946

Daniel, B. (2015). Big Data and analytics in higher education: opportunities and challenges. British Journal of Educational Technology, 46 (5), 904–920. https://doi.org/10.1111/bjet.12230

Daniel, B. K. (2017). Big Data in Higher Education: The Big Picture. In Big Data and Learning Analytics in Higher Education: Current Theory and Practice. (pp. 19–28). Springer International Publishing. https://doi.org/10.1007/978-3-319-06520-5_3

Daniel, B. K. (2019). Big Data and data science: A critical review of issues for educational research. British Journal of Educational Technology, 50 (1), 101–113.  https://doi.org/10.1111/bjet.12595

Daniel, B. K., & Harland, T. (2017). Higher education research methodology: a step-by-step guide to the research process. Routledge . https://doi.org/10.4324/9781315149783

*Dawson, S., & Hubball, H. (2014). Curriculum analytics: application of social network analysis for improving strategic curriculum decision-making in a research-intensive university. Teaching & Learning Inquiry: The ISSOTL Journal , 2 (2), 59-74.  https://doi.org/10.2979/teachlearninqu.2.2.59

De Oliveira, C. F., Sobral, S. R., Ferreira, M. J., & Moreira, F. (2021). How does learning analytics contribute to prevent students’ dropout in higher education: a systematic literature review. Big Data and Cognitive Computing, 5 (4), 64. https://doi.org/10.3390/bdcc5040064

Dodge, B., Whitmer, J., & Frazee, J. P. (2015). Improving undergraduate student achievement in large blended courses through data-driven interventions. Proceedings of the fifth international conference on learning analytics and knowledge. Poughkeepsie, New York. https://doi.org/10.1145/2723576.2723657

Drivas, I. C., Sakas, D. P., Giannakopoulos, G. A., & Kyriaki-Manessi, D. (2020). Big data analytics for search engine optimization. Big Data and Cognitive Computing, 4 (2), 5. https://doi.org/10.3390/bdcc4020005

*Du, Y., Gebremedhin, A. H., & Taylor, M. E. (2019). Analysis of University Fitness Center Data Uncovers Interesting Patterns, Enables Prediction. IEEE Transactions on Knowledge and Data Engineering , 31 (8), 1478-1490.  https://doi.org/10.1109/tkde.2018.2863705

*Elisabeth, D., Rokhman, M. F., Harahap, N. C., Hakim, S. A., & Sensuse, D. I. (2019). Discovering scientific collaboration activities using social network analysis. a case study: faculty of computer science universitas Indonesia. 11th International Conference on Information Technology and Electrical Engineering, ICITEE 2019 . https://doi.org/10.1109/ICITEED.2019.8929957

Erevelles, S., Fukawa, N., & Swayne, L. (2016). Big Data consumer analytics and the transformation of marketing. Journal of Business Research, 69 (2), 897–904. https://doi.org/10.1016/j.jbusres.2015.07.001

*Essa, A., & Ayad, H. (2012). Improving student success using predictive models and data visualisations. Research in Learning Technology , 20 , 58-70. https://doi.org/10.3402/rlt.v20i0.19191

Ferguson, R., & Clow, D. (2017). Where is the evidence? A call to action for learning analytics. Proceedings of the Seventh International Learning Analytics & Knowledge Conference , 56–65. https://doi.org/10.1145/3027385.3027396

*Figueroa-Cañas, J., & Sancho-Vinuesa, T. (2021). Changing the recent past to reduce ongoing dropout: an early learning analytics intervention for an online statistics course. Open Learning: The Journal of Open, Distance and e-Learning , 1–18. https://doi.org/10.1080/02680513.2021.1971963

Foster, C., & Francis, P. (2020). A systematic review on the deployment and effectiveness of data analytics in higher education to improve student outcomes. Assessment & Evaluation in Higher Education, 45 (6), 822–841. https://doi.org/10.1080/02602938.2019.1696945

Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59 (1), 64–71. https://doi.org/10.1007/s11528-014-0822-x

*Géryk, J., & Popelínský, L. (2014). Visual Analytics for increasing efficiency of higher education institutions. In W. Abramowicz & A. Kokkinaki (Eds.), Business Information Systems Workshops. BIS 2014. Lecture Notes in Business Information Processing (Vol. 183, pp. 117–127). Springer International Publishing. https://doi.org/10.1007/978-3-319-11460-6_11

*Gong, L., Liu, Y., & Zhao, W. (2018). Using learning analytics to promote student engagement and achievement in blended learning. ICEBT '18: Proceedings of the 2018 2nd International Conference on E-Education, E-Business and E-Technology ,19–24. https://doi.org/10.1145/3241748.3241760

Gottipati, S., Shankararaman, V., & Gan, S. (2017). A conceptual framework for analysing students' feedback. 47th IEEE Frontiers in Education Conference, FIE 2017, 1-8.  https://doi.org/10.1109/FIE.2017.8190703

Grover, V., Chiang, R. H. L., Liang, T.-P., & Zhang, D. (2018). Creating strategic business value from big data analytics: a research framework. Journal of Management Information Systems, 35 (2), 388–423. https://doi.org/10.1080/07421222.2018.1451951

Gutierrez, C. N., Kim, T., Corte, R. D., Avery, J., Goldwasser, D., Cinque, M., & Bagchi, S. (2018). Learning from the ones that got away: detecting new forms of phishing attacks. IEEE Transactions on Dependable and Secure Computing, 15 (6), 988–1001. https://doi.org/10.1109/tdsc.2018.2864993

*Gutiérrez, F., Seipp, K., Ochoa, X., Chiluiza, K., De Laet, T., & Verbert, K. (2020). LADA: A learning analytics dashboard for academic advising. Computers in Human Behavior , 107 , 105826. https://doi.org/10.1016/j.chb.2018.12.004

*Harindranathan, P., & Folkestad, J. (2019). Learning analytics to inform the learning design: supporting instructor’s inquiry into student learning in unsupervised technology-enhanced platforms. Online Learning , 23 (3), 34–55. https://doi.org/10.24059/olj.v23i3.2057

*He, L., Levine, R. A., Bohonak, A. J., Fan, J., & Stronach, J. (2018). Predictive analytics machinery for STEM student success studies. Applied Artificial Intelligence , 32 (4), 361-387.  https://doi.org/10.1080/08839514.2018.1483121

*Hernández-García, Á., González-González, I., Jimenez-Zarco, A., & Chaparro-Peláez, J. (2016). Visualisations of online course interactions for social network learning analytics. International Journal of Emerging Technologies in Learning (iJET) , 11 (7), 6-15.  https://doi.org/10.3991/ijet.v11i07.5889

*Herodotou, C., Naydenova, G., Boroowa, A., Gilmour, A., & Rienties, B. (2020). How can predictive learning analytics and motivational interventions increase student retention and enhance administrative support in distance education? Journal of Learning Analytics , 7 (2), 72–83. https://doi.org/10.18608/jla.2020.72.4

*Hilliger, I., Aguirre, C., Miranda, C., Celis, S., & Pérez-Sanagustín, M. (2020). Design of a curriculum analytics tool to support continuous improvement processes in higher education. Proceedings of the Tenth International Conference on Learning Analytics & Knowledge , 181–186. https://doi.org/10.1145/3375462

Huang, D. Y., Aliapoulios, M. M., Li, V. G., Invernizzi, L., Bursztein, E., McRoberts, K., Levin, J., Levchenko, K., Snoeren, A. C., & McCoy, D. (2018). Tracking ransomware end-to-end. Proceedings - IEEE Symposium on Security and Privacy , 618–631. https://doi.org/10.1109/SP.2018.00047

Ifenthaler, D., & Tracey, M. W. (2016). Exploring the relationship of ethics and privacy in learning analytics and design: Implications for the field of educational technology. Educational Technology Research and Development, 64 (5), 877–880. https://doi.org/10.1007/s11423-016-9480-3

Ifenthaler, D., & Yau, J.Y.-K. (2020). Utilising learning analytics to support study success in higher education: A systematic review. Educational Technology Research and Development, 68 (4), 1961–1990. https://doi.org/10.1007/s11423-020-09788-z

*Jääskelä, P., Heilala, V., Kärkkäinen, T., & Häkkinen, P. (2021). Student agency analytics: learning analytics as a tool for analysing student agency in higher education. Behaviour & Information Technology , 40 (8), 790-808.  https://doi.org/10.1080/0144929x.2020.1725130

*Ji, Y., & Han, Y. (2019). Monitoring Indicators of the Flipped Classroom Learning Process based on Data Mining – Taking the Course of “Virtual Reality Technology” as an example. International Journal of Emerging Technologies in Learning (iJET) , 14 (3), 166-176.  https://doi.org/10.3991/ijet.v14i03.10105

Jin, X., Wah, B. W., Cheng, X., & Wang, Y. (2015). Significance and challenges of big data research. Big Data Research, 2 (2), 59–64. https://doi.org/10.1016/j.bdr.2015.01.006

Jo, T. (2019). Text Mining: Concepts, Implementation, and Big Data Challenge . Springer.

Jones, K. M. L., Asher, A., Goben, A., Perry, M. R., Salo, D., Briney, K. A., & Robertshaw, M. B. (2020). “We’re being tracked at all times”: Student perspectives of their privacy in relation to learning analytics in higher education. Journal of the Association for Information Science and Technology, 71 (9), 1044–1059. https://doi.org/10.1002/asi.24358

*Joseph-Richard, P., Uhomoibhi, J., & Jaffrey, A. (2021). Predictive learning analytics and the creation of emotionally adaptive learning environments in higher education institutions: a study of students' affect responses. The International Journal of Information and Learning Technology , 38 (2), 243-257.  https://doi.org/10.1108/ijilt-05-2020-0077

*Joy, J., & Nambirajan, T. (2021). Learning analytics for academic management system enhancement: A participatory action research in an Indian context. Management in Education .  https://doi.org/10.1177/08920206211037689

*Karaoglan Yilmaz, F. G. (2022). Utilising learning analytics to support students' academic self-efficacy and problem-solving skills. The Asia-Pacific Education Researcher , 31 (2), 175-191.  https://doi.org/10.1007/s40299-020-00548-4

*Karaoglan Yilmaz, F. G., & Yilmaz, R. (2022). Learning Analytics Intervention Improves Students’ Engagement in Online Learning. Technology, Knowledge and Learning , 27 (2), 449-460.  https://doi.org/10.1007/s10758-021-09547-w

*Khudzaeva, E., Mintarsih, F., Muharam, A. T., & Wirawan, C. (2018). Application of clustering method in data mining for determining SNMPTN quota invitation UIN Syarif Hidayatullah Jakarta. 6th International Conference on Cyber and IT Service Management, CITSM 2018, 1–4 . https://doi.org/10.1109/CITSM.2018.8674329

Klein, C., Lester, J., Rangwala, H., & Johri, A. (2019). Technological barriers and incentives to learning analytics adoption in higher education: Insights from users. Journal of Computing in Higher Education, 31 (3), 604–625. https://doi.org/10.1007/s12528-019-09210-5

*Kuromiya, H., Majumdar, R., & Ogata, H. (2020). Fostering evidence-based education with learning analytics: capturing teaching-learning cases from log data. Educational Technology & Society , 23 , 1176-3647

Kushwaha, A. K., Kar, A. K., & Dwivedi, Y. K. (2021). Applications of big data in emerging management disciplines: A literature review using text mining. International Journal of Information Management Data Insights, 1 (2), 100017. https://doi.org/10.1016/j.jjimei.2021.100017

*Laakso, M.-J., Kaila, E., & Rajala, T. (2018). ViLLE – collaborative education tool: Designing and utilising an exercise-based learning environment. Education and Information Technologies , 23 (4), 1655–1676.  https://doi.org/10.1007/s10639-017-9659-1

*Lawrence, J., Brown, A., Redmond, P., & Basson, M. (2019). Engaging the disengaged: Exploring the use of course-specific learning analytics and nudging to enhance online student engagement. Student Success , 10 , 47-58.  https://doi.org/10.5204/ssj.v10i2.1295

Lawson, C., Beer, C., Rossi, D., Moore, T., & Fleming, J. (2016). Identification of ‘at risk’ students using learning analytics: The ethical dilemmas of intervention strategies in a higher education institution. Educational Technology Research and Development, 64 (5), 957–968. https://doi.org/10.1007/s11423-016-9459-0

Li, K., Rollins, J., & Yan, E. (2018). Web of Science use in published research and review papers 1997–2017: A selective, dynamic, cross-domain, content-based analysis. Scientometrics, 115 (1), 1–20. https://doi.org/10.1007/s11192-017-2622-5

*Linden, K., & Webster, L. (2019). Back to Basics: combining analytics and early assessment with personalised contact to improve student progress. 36th International Conference of Innovation, Practice and Research in the Use of Educational Technologies in Tertiary Education: Personalised Learning. Diverse Goals. One Heart, ASCILITE 2019 , 499–502.  https://doi.org/10.14742/apubs.2019.319

*Llopis-Albert, C., & Rubio, F. (2021). Application of Learning Analytics to Improve Higher Education. Multidisciplinary Journal for Education, Social and Technological Sciences , 8 (2), 1-18.  https://doi.org/10.4995/muse.2021.16287

*Lonn, S., Aguilar, S. J., & Teasley, S. D. (2015). Investigating student motivation in the context of a learning analytics intervention during a summer bridge program. Computers in Human Behavior , 47 , 90-97.  https://doi.org/10.1016/j.chb.2014.07.013

Lonn, S., Krumm, A. E., Waddington, R. J., & Teasley, S. D. (2012). Bridging the gap from knowledge to action: putting analytics in the hands of academic advisors. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, 184–187.  https://doi.org/10.1145/2330601

*Lu, O. H. T., Huang, J. C. H., Huang, A. Y. Q., & Yang, S. J. H. (2017). Applying learning analytics for improving students engagement and learning outcomes in an MOOCs enabled collaborative programming course. Interactive Learning Environments , 25 (2), 220-234.  https://doi.org/10.1080/10494820.2016.1278391

Lycett, M. (2013). ‘Datafication’: Making sense of (big) data in a complex world. European Journal of Information Systems, 22 (4), 381–386. https://doi.org/10.1057/ejis.2013.10

*McCulloch, S., Gildner, J., Hoefel, B., Cervantes, G., Ahmed, S., & Sharmin, M. (2021). Visualisation as a Tool to Understand the Experience of College Students with Autism. Proceedings - 2021 IEEE 45th Annual Computers, Software, and Applications Conference, COMPSAC 2021 , 438–445.  https://doi.org/10.1109/COMPSAC51774.2021.00067

*McNely, B., Gestwicki, P., Hill, J., Parli-Horne, P., & Johnson, E. (2012). Learning analytics for collaborative writing: A prototype and case study 2nd International Conference on Learning Analytics and Knowledge, LAK 2012, 222–225. https://doi.org/10.1145/2330601.2330654

Monino, J.-L., & Sedkaoui, S. (2016). The Big Data Revolution . In Big Data, Open Data and Data Development (eds J.-L. Monino and S. Sedkaoui).  https://doi.org/10.1002/9781119285199.ch1

Naeem, M., Jamal, T., Diaz-Martinez, J., Aziz Butt, S., Montesano, N., Imran Tariq, M., De-la-Hoz-Franco, E., De-La-Hoz-Valdiris, E., Naeem, M., Jamal, T., De-la-Hoz-Franco, E., De-La-Hoz-Valdiris, E., Butt, S. A., & Tariq, M. I. (2022). Trends and future perspective challenges in big data. Smart Innovation, Systems and Technologies, 253 , 309–325. https://doi.org/10.1007/978-981-16-5036-9_30

*Nagi, K. (2019). Using Learning Analytic Tools to Enhance Quality of Hands-on-Activities in Online Technology Courses. Universal Journal of Educational Research , 7 (4), 1084–1089. https://doi.org/10.13189/ujer.2019.070420

*Nguyen, Q., Huptych, M., & Rienties, B. (2018). Linking students' timing of engagement to learning design and academic performance. 8th International Conference on Learning Analytics and Knowledge, LAK 2018, 141–150. https://doi.org/10.1145/3170358.3170398

*Nkhoma, C., Dang-Pham, D., Hoang, A.-P., Nkhoma, M., Le-Hoai, T., & Thomas, S. (2020). Learning analytics techniques and visualisation with textual data for determining causes of academic failure. Behaviour & Information Technology , 39 (7), 808-823.  https://doi.org/10.1080/0144929x.2019.1617349

*Nkomo, L. M., & Nat, M. (2021). Student Engagement Patterns in a Blended Learning Environment: an Educational Data Mining Approach. TechTrends , 65 (5), 808-817.  https://doi.org/10.1007/s11528-021-00638-0

Nuzzo, R. (2014). Scientific method: statistical errors. Nature, 506 (7487), 150–152. https://doi.org/10.1038/506150a

*Ocaña, M., Khosravi, H., & Bakharia, A. (2019). Profiling language learners in the big data era. 3 6th International Conference of Innovation, Practice and Research in the Use of Educational Technologies in Tertiary Education: Personalised Learning. Diverse Goals. One Heart, ASCILITE 2019, 237-245.

*Olaya, D., Vásquez, J., Maldonado, S., Miranda, J., & Verbeke, W. (2020). Uplift modeling for preventing student dropout in higher education. Decision Support Systems , 134 , 113320. https://doi.org/10.1016/j.dss.2020.113320

*Park, Y., & Jo, I.-H. (2015). Development of the Learning Analytics Dashboard to Support Students' Learning Performance. Journal of Universal Computer Science , 21 , 110-133.

Parkes, S., Benkwitz, A., Bardy, H., Myler, K., & Peters, J. (2020). Being more human: Rooting learning analytics through distance and reconnection with the values of higher education. Higher Education Research & Development, 39 (1), 113–126. https://doi.org/10.1080/07294360.2019.1677569

Perez, O. A., & Gonzalez, V. E. (2016). Student dashboard for a multi-agent approach for academic advising. Computers in Education Journal , 16 (3) , 73-90

*Prieto, M. Á. Z., Ortiz-Rojas, M., Ulloa, M., & Jiménez, A. (2020). Applying the LALA Framework for the adoption of a Learning Analytics tool in Latin America: Two case studies in Ecuador. In M.-M. P.J., K. C.D., T. Y.-S., G. D., V. K., P.-S. M., P.-S. M., H. I., Z.-P. MA, O.-R. M., & S. E. (Eds.), 2020 Workshop on Adoption, Adaptation and Pilots of Learning Analytics in Under-Represented Regions, LAUR 2020, 2704, 6–14). CEUR-WS

*Rad, A., Naderi, B., & Soltani, M. (2011). Clustering and ranking university majors using data mining and AHP algorithms: A case study in Iran. Expert Systems with Applications , 38 (1), 755-763.  https://doi.org/10.1016/j.eswa.2010.07.029

*Ramaswami, G. S., Susnjak, T., & Mathrani, A. (2019). Capitalising on learning analytics dashboard for maximising student outcomes. IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE, 2019 , 1-6.  https://doi.org/10.1109/CSDE48274.2019.9162357

Ray, S., & Saeed, M. (2018). Applications of Educational data mining and learning analytics tools in handling big data in higher education. Applications of Big Data Analytics: Trends, Issues, and Challenges , 135–160. https://doi.org/10.1007/978-3-319-76472-6_7

Rehman, A., Naz, S., & Razzak, I. (2022). Leveraging big data analytics in healthcare enhancement: Trends, challenges and opportunities. Multimedia Systems, 28 (4), 1339–1371. https://doi.org/10.1007/s00530-020-00736-8

Ristevski, B., & Chen, M. (2018). Big Data Analytics in Medicine and Healthcare. Journal of Integrative Bioinformatics , 15 (3). https://doi.org/10.1515/jib-2017-0030

Roberts, L. D., Howell, J. A., & Seaman, K. (2017). Give me a customizable dashboard: personalized learning analytics dashboards in higher education. Technology, Knowledge and Learning, 22 (3), 317–333. https://doi.org/10.1007/s10758-017-9316-1

*Romero, C., Zafra, A., Luna, J. M., & Ventura, S. (2013). Association rule mining using genetic programming to provide feedback to instructors from multiple choice quiz data. Expert Systems , 30 (2), 162-172.  https://doi.org/10.1111/j.1468-0394.2012.00627.x

*Salazar-Fernandez, J. P., Sepulveda, M., & Munoz-Gama, J. (2019). Influence of student diversity on educational trajectories in engineering high-failure rate courses that lead to late dropout. 10th IEEE Global Engineering Education Conference, EDUCON 2019, 607–616). IEEE Computer Society.  https://doi.org/10.1109/EDUCON.2019.8725143

*Sarsfield, M., & Conway, J. (2018). What can we learn from learning analytics? A case study based on an analysis of student use of video recordings. Research in Learning Technology , 26 . https://doi.org/10.25304/rlt.v26.2087

Schotten, M., El Aisati, M. H., Meester, W. J. N., Steiginga, S., & Ross, C. A. (2017). A Brief History of Scopus: The World’s Largest Abstract and Citation Database of Scientific Literature. Research Analytics , 31–58. https://doi.org/10.1201/9781315155890-3

Singh, R. K., Agrawal, S., Sahu, A., & Kazancoglu, Y. (2021). Strategic issues of big data analytics applications for managing healthcare sector: a systematic literature review and future research agenda. The TQM Journal , ahead-of-print (ahead-of-print). https://doi.org/10.1108/tqm-02-2021-0051

*Sivakumar, M., & Reddy, U. S. (2017). Aspect-based sentiment analysis of students opinion using machine learning techniques. 2 017 International Conference on Inventive Computing and Informatics, ICICI 2017 , 726–731. https://doi.org/10.1109/ICICI.2017.8365231

*Srinivas, S., & Rajendran, S. (2019). Topic-based knowledge mining of online student reviews for strategic planning in universities. Computers & Industrial Engineering , 128 , 974-984.  https://doi.org/10.1016/j.cie.2018.06.034

*Summers, R. J., Higson, H. E., & Moores, E. (2021). Measures of engagement in the first three weeks of higher education predict subsequent activity and attainment in first-year undergraduate students: a UK case study. Assessment & Evaluation in Higher Education , 46 (5), 821-836.  https://doi.org/10.1080/02602938.2020.1822282

*Taniguchi, Y., Suehiro, D., Shimada, A., & Ogata, H. (2017). Revealing hidden impression topics in students' journals based on nonnegative matrix factorisation. In H. R., V. R., Kinshuk, S. DG, C. N.-S., & C. M. (Eds.), 17th IEEE International Conference on Advanced Learning Technologies, ICALT 2017 (pp. 298–300). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICALT.2017.113

Tasmin, R., Muhammad, R. N., & Nor Aziati, A. H. (2020). Big data analytics applicability in higher learning educational system. IOP Conference Series: Materials Science and Engineering, 917 (1), 012064. https://doi.org/10.1088/1757-899X/917/1/012064

Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Computers in Human Behavior, 89 , 98–110. https://doi.org/10.1016/j.chb.2018.07.027

*Villamañe, M., Larrañaga, M., Álvarez, A., & Ferrero, B. (2016). RubricVis: enriching rubric-based formative assessment with visual learning analytics. TEEM '16: Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality , 363–368. https://doi.org/10.1145/3012430.3012541

Waheed, H., Hassan, S.-U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104 , 106189. https://doi.org/10.1016/j.chb.2019.106189

*Wang, K. D., Salehi, S., Arseneault, M., Nair, K., & Wieman, C. (2021). Automating the Assessment of Problem-solving Practices Using Log Data and Data Mining Techniques L@S 2021 - Proceedings of the 8th ACM Conference on Learning @ Scale, 69–76.  https://doi.org/10.1145/3430895.3460127

*Xia, T., & Liu, Y. (2018). Application of improved association-rules mining algorithm in the circulation of university library. 2018 International conference on big data and artificial intelligence (pp. 60–64). https://doi.org/10.25236/icbdai.2018.010

Zhang, J.-H., Zou, L.-C., Miao, J.-J., Zhang, Y.-X., Hwang, G.-J., & Zhu, Y. (2020). An individualised intervention approach to improving university students’ learning performance and interactive behaviours in a blended learning environment. Interactive Learning Environments , 28 (2), 231-245.  https://doi.org/10.1080/10494820.2019.1636078

Download references

Open Access funding enabled and organized by CAUL and its Member Institutions

Author information

Authors and affiliations.

Higher Education Development Centre, University of Otago, 65/75 Union Place West, Dunedin, New Zealand, 9016

Ana Stojanov & Ben Kei Daniel

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ana Stojanov .

Ethics declarations

Competing interests.

We have no competing interests to declare. We received no financial support for this review.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 36 KB)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Stojanov, A., Daniel, B.K. A decade of research into the application of big data and analytics in higher education: A systematic review of the literature. Educ Inf Technol 29 , 5807–5831 (2024). https://doi.org/10.1007/s10639-023-12033-8

Download citation

Received : 05 October 2022

Accepted : 05 July 2023

Published : 20 July 2023

Issue Date : April 2024

DOI : https://doi.org/10.1007/s10639-023-12033-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic review
  • Learning analytics
  • Higher education
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 October 2023

Education big data and learning analytics: a bibliometric analysis

  • Shaza Arissa Samsul   ORCID: orcid.org/0000-0003-3417-1433 1 ,
  • Noraffandy Yahaya 1 &
  • Hassan Abuhassna   ORCID: orcid.org/0000-0002-5774-3652 1  

Humanities and Social Sciences Communications volume  10 , Article number:  709 ( 2023 ) Cite this article

7441 Accesses

4 Citations

Metrics details

  • Science, technology and society

The contemporary era’s extensive use of data, particularly in education, has provided new insights and benefits. This data is called ‘education big data’, and the process of learning through such data is called ‘learning analytics’. Education in big data and learning analytics are two important processes that produce impactful results and understanding. it is crucial to take advantage of these processes to enhance the current education system. We conduct a bibliometric analysis based on the PRISMA statement template. The publications used for the analysis are based on the years 2012–2021. We examine and analyze a total of 250 publications, mainly sourced from the Scopus database, for insights regarding education big data and learning analytics. All of the publications also undergo filtration according to specific inclusion and exclusion criteria. Based on the bibliometric analysis conducted, we discover the distribution of education big data and learning analytics publications across the years 2012–2021, the most relevant journals and authors, the most significant countries, the primary research keywords, and the most important subject area involved. This study presents the trends and recommendations in education big data and learning analytics. We also offer suggestions for improvement and highlight the potential for enhancement of the education system through the full utilization of education big data and learning analytics.

Similar content being viewed by others

big data in education research paper

Education reform and change driven by digital technology: a bibliometric study from a global perspective

big data in education research paper

The European Tertiary Education Register, the reference dataset on European Higher Education Institutions

big data in education research paper

A bibliometric analysis of knowledge mapping in Chinese education digitalization research from 2012 to 2022

Introduction.

Big data in education has become a trend in recent years (Wang, 2016 ). The current era involves the creation and use of an enormous volume of data. Big data is a result of the inclusion of data in several industries, including banking, economics, and education. The transformation of technology into digital operations creates a massive digital treasure trove of data, especially in education (Michalik et al., 2014 ). Big data in education can be valuable and can be converted into insight using learning analytics. The possession of big data can definitely produce new knowledge and intuition in the education sector (Wang, 2016 ). The concept of big data is underpinned by the massive increase in the volume, structure, and speed with which data is generated (Daniel, 2017 ). Educators can analyze and improve the traditional educational system through the usage of big data (Drigas and Leliopoulos, 2014 ). The key accomplishment of learning analytics in recent years may be identified as the growth of digital learning, which has improved the quality and accessibility of educational data (Sghir et al., 2023 ).

Industry evolution 4.0 demands that higher education be upgraded in terms of programs and courses to prepare students for a highly computerized learning environment (Mkrttchian et al., 2021 ). Moreover, technological advancements in big data are unquestionably accelerators for boosting analytics in higher education (Mkrttchian et al., 2021 ). The structure of learning environments may be changed and enhanced as a result of input from learning analytics data (Talan and Demirbilek, 2023 ). Hence, education big data and learning analytics are major facilitators in the process of enhancing structure of learning environment. The objective of this study is to analyze the trends and recommendations of education big data and learning analytics publications using the bibliometric analysis method. This study also presents a visualization of the current trend in education big data and learning analytics in different topics. Data is taken from the Scopus database to answer the following research questions:

What is the distribution of education big data and learning analytics publications in the years 2012–2021?

What are the most relevant journals and authors in education big data and learning analytics research?

What are the most significant countries in the education big data and learning analytics research area?

What are the primary research keywords for education big data and learning analytics within the last decade?

What is the most important subject area involving education big data and learning analytics?

Materials and methods

The bibliometric analysis and meta-analyses method was used in conducting this study’s systematic literature review (SLR). The research process and procedure used in this study are based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement template. Based on comprehensive reporting through the PRISMA template, readers may evaluate the applicability of the methodologies and consequently, the veracity of the study’s conclusion (Page et al., 2021 ). There are four processes involved in using the PRISMA template, namely identification, screening, and eligibility, and ultimately establishing which studies were included in the review. The details of this processes are explained further in Fig. 1 .

figure 1

The systematic literature review process using the PRISMA statement template based on four phases that are identification, screening, eligibility and included.

The chosen topics that were used in this SLR were education big data and learning analytics. The Scopus database was utilized to source studies for review. As shown in Fig. 1 , the SLR process involved using the PRISMA statement template for data selection. The first step in the identification phase using the PRISMA statement template was to identify records in the database using “Education Big Data” and “Learning Analytics” keywords. The total number of documents found based on this search was 885.

These results then underwent a screening process, which left 252 documents remaining. The screening process excluded studies published in the year 2022 and some subject areas that were irrelevant to this study, such as business, management, and accounting. Conference papers, reviews, and editorial documents were also excluded from the analysis. The process continued with eligibility screening, which reduced the number of documents to 250 after removing full articles that were not in English. A software tool called VOSviewer was used to conduct data analysis and visualization. As explained by Soegoto et al. ( 2022 ), VOSviewer can effectively analyze and visualize bibliometric data analysis.

Research question 1

This study sought to examine education big data and learning analytics. The first finding answers the first research question, which is about the distribution of education big data and learning analytics publications between 2012 and 2021. As shown in Fig. 2 , there was an increase in the number of documents produced during this 10-year period. Just one document on this subject was published in 2012, whereas 54 documents were produced in 2021. There was a decrement between 2017 and 2018, from 31 documents to 26, but this was followed by a further increase from 2019 onwards.

figure 2

“Document” in y -axis is represented as number of publication and “Year” in x -axis represents the year observed.

Research question 2

The second research question sought to identify the most relevant journals and authors in education big data and learning analytics research. Figure 3 shows the most relevant journals in education big data and learning analytics research in terms of total publications(TP). The Scopus database was then searched for the top ten frequently cited journals on this topic. The details about the journals, such as TP, Total Citations (TC), Citation Score, Most Cited Article, Times Cited, and Publisher Name, are shown in Table 1 .

figure 3

The indicator (text: number), text used represents journal name and number represents number of total publications.

As shown in Table 1 , the most relevant journal in education big data and learning analytics research, with a total of 8 publications and 63 citations, was “IEEE Access” published by IEEE. This journal was followed by “Lecture Notes in Educational Technology” published by Springer Nature, which had a total of 8 publications and 19 citations. The most cited article in this journal was “Big Data Learning Analytics: A new perspective”, which examined the significance of education big data and learning analytics. The “Educational Technology and Society” journal, with a total of 3 publications and 126 citations, was also highly relevant.

Research question two also determined the most productive authors in the area of education big data and learning analytics. The top fifteen authors were searched in the Scopus database. A list of data about the most productive authors in education big data and learning analytics research, based on TC, is presented in Fig. 4 . A summary of the authors, including Author Name, Year of First Publication, TP, h-Index, TC, Current Affiliation, and Country, is illustrated in Table 2 .

figure 4

The y -axis represents name of author and x -axis represents number of total citation.

Table 2 provides a summary of the most productive authors in education big data and learning analytics research. According to the Scopus database, the most productive author was Ben Williamson from the University of Edinburgh, UK, whose first publication in this area was in 2007, and who, at the time of investigation, had a total of 60 publications, 1700 citations, an h-index of 25. The second most productive author was Hiroaki Ogata, from Japan, with a total of 371 publications, 3155 citations, and an h-index of 27, followed by Lynne D. Roberts from Australia, with 107 total publication, 1883 TC, and an h-index of 23. Of the top fifteen authors, Ryan Shaun Joazeiro de Baker from Columbia University, New York, United States, had the highest total number of citations, at 7752, and TP, at 278. A list of the most productive authors in education big data and learning analytics research is provided in Table 2 .

Research question 3

The third research question sought to identify the countries with the most significant contributions to research on education big data and learning analytics. Figure 5 is a map that visualizes the most significant countries in this regard in terms of TP, according to the Scopus database. Table 3 presents summary of the most significant countries in the education big data and learning analytics research area. The criteria listed for the analysis are Rank, Country, TP, and Most Significant Academic Institution.

figure 5

The map chart for most significant countries in education big data and learning analytics according to total publications.

As shown in Table 3 , the most significant country in the education big data and learning analytics research area was the United States, with a total of 59 publications, with City College of New York being the most significant research institution in this area. This was followed by the United Kingdom, with a total of 35 publications, with the University of Aberdeen being the most significant institution. China was ranked third, with a total of 22 publications, and Capital University of Economics and Business was its most significant institution in this area. The other countries that were most productive in this research area are set out in Table 3 .

The study then measured the number of documents produced by each country, which are portrayed using a bar chart to provide a clearer view. As illustrated in Fig. 6 , the highest number of documents produced was in the United States, followed by the United Kingdom, China, and India. Malaysia is also included in the top ten countries, being ranked ninth.

figure 6

The y -axis represents number of publications and x -axis represents country name.

Next, this study examined the relationship of co-authorship with countries related to education big data and learning analytic research using VOSviewer software. The highest total number of link strengths of co-authorship and country was in the United States, with 14 links involving 59 documents and 1544 TC. As shown in Fig. 7 , the country with the second highest link strength was the United Kingdom, which also had 14 links with other countries, involving 35 documents and 752 TC. The map also shows other countries’ co-authorship relationships.

figure 7

The line linking each country represents the relationship between co-authorship with other countries. The size of the circle shows number of publications.

Research question 4

The fourth research question was about the primary research keywords used in education big data and learning analytics research within the last decade. This study examined the co-occurrence of all keywords in the data associated with education big data and learning analytics. Figure 8 is a map based on the co-occurrence relationships of all keywords in the Scopus database. The keyword that had the highest co-occurrence (Oc) of 126 and link strength of 485 was “Big Data”, followed by “Learning Analytics” (Oc = 89). Other keywords with high co-occurrence included “Learning Systems” (Oc = 28), “Machine Learning” (Oc = 38), “Data Analytics” (Oc = 36), “Data Mining” (Oc = 30), and “Education” (Oc = 36).

figure 8

The line linking each keyword represents their co-occurrence with other keywords. The size of the circle shows number of occurrence.

The co-occurrence of author keywords was also analyzed, as mapped in Fig. 9 . The keyword with the highest occurrence was “Big Data”, with 90 occurrences and 150 total links with other keywords, followed by “Learning Analytics” (Oc = 88). Other keywords, such as “Machine Learning” (Oc = 33), “Higher Education” (Oc = 32), “Data Analytics” (Oc = 14), and “Educational Data Mining” (Oc = 14), are also included in the map.

figure 9

The line linking each keyword represents the relationship between their co-occurrence with other author keywords. The size of the circle shows number of occurrence.

Research Question 5

To answer the fifth research question, which sought to identify the most important subject area involving education big data and learning analytics, the data were analyzed according to the subject area. As demonstrated in Fig. 10 , the highest percentage (34.6%), comprising 152 of the documents published, was in the Computer Science area. Computer Science thus appears to be the most relevant subject area regarding education big data and learning analytics. This was followed by 145 (33%) documents from the Social Sciences area. The fewest publications, at three documents, were in the Chemical Engineering area. The remaining data from several publications regarding education big data and learning analytics according to the subject area are presented in Fig. 10 .

figure 10

The pie chart showing number of publications produced according to the subject area.

Based on the result of an analysis of documents sourced from the Scopus database, all of the research questions have been answered in sequence. The distribution of publications on education big data and learning analytics between the years 2012 and 2021 shows an increasing pattern. This clearly shows that awareness about the importance of education big data and learning analytics is rising. Research by Şahin and Yurdugül ( 2020 ) supports this, indicating that education big data and learning analytics are two significant fields that can improve the e-learning environment. Interest in education big data and learning analytics has escalated because of the possibilities of advancement in many sectors. Yu and Couldry ( 2022 ) also stated that digital platforms and learning analytics are proliferating in the education industry. This is supported by the increasing pattern in the distribution of education big data and learning analytics publications in the last decade.

Furthermore, the bibliometric analysis shows that the most relevant journals in education big data and learning analytics research were published by IEEE. The most frequently cited article was about the usage of a data-driven approach for understanding learners’ behavior. This article demonstrated that the most well-organized analytical method to make advancements in learning strategies is the learning analytics approach (Al-Shabandar et al., 2018 ). This study also discovered that decision-making and learning methods can be expanded using big data in education and learning analytics. Big data implementations can fully realize the potential of this personalized learning and improve teaching (Lutfiani and Meria, 2022 ). The article from the most prolific author in this analysis, Ben Williamson from the University of Edinburgh in the UK, discussed two important learning advancements that can be made using big data, namely instructing machines and the use of computerized choice systems to influence human judgments (Knox et al., 2020 ).

In addition, the analysis found that the most significant country in the area of education big data and learning analytics research was the United States, with the highest number of publications. City College of New York was the most significant academic institution in the United States. One study from the United States agrees that major impacts on both educational practices can be seen by fully utilizing educational data mining and learning analytics (Baker and Inventado, 2014 ). Another study suggests that collaboration between the education system, industry players, and government entities in processing data analytics can facilitate the transition of technology to Industry 4.0 (Qin and Chiang, 2019 ). According to one of the most cited articles from the United States, employment and admittance screening, financial management, sponsorship tracking, and academic achievement evaluation are just a few of the administration and educational applications that might benefit from big data principles and data analytics (Picciano, 2012 ). Hence, numerous advancements and improvements can be achieved with education big data and learning analytics, with collaboration from many sectors.

Moreover, the primary research keywords for education big data and learning analytics within the last decade were also analyzed in this study. Based on the results, “Big Data” and “Learning Analytics” were the most frequently used keywords for the education big data and learning analytics research areas. Both of these keywords also had high co-occurrence with all other keywords. Research about potential ways to optimize e-learning agrees that big data and learning analytics play a crucial role in the future of higher education (García and Secades, 2013 ). Learning analytics has made an important contribution to the education field in producing reliable projections of academic achievements based on assessments of the educational process (Tempelaar et al., 2021 ). Over the last 5 years, educational big data and learning analytics have been a growing focus on classifiying and identifying students’ behavior (Lemay et al., 2021 ).

This study also identified the most important subject area involving education big data and learning analytics, namely computer science. Computer science, education, statistics, and other social sciences fields are brought together by educational data science to investigate and comprehend theoretical and practical phenomena (Daniel, 2016 ). Educational data mining also involves the combination of computer science, education, and statistics to better comprehend learning, administration processes, and research issues in higher education (Ray and Saeed, 2018 ).

In summary, based on the present study’s analysis, the trends and recommendations of education big data and learning analytics include acting as a system for early detection that recognizes students who are at risk for academic failure or dropout, helping to provide dashboards for learning analytics, enabling amalgamation with Artificial Intelligence (AI) and machine learning, and providing future orientation in education. The increasing awareness of the importance of education big data and learning analytics is beneficial in enabling early detection of declining student performance based on the availability of big data sources. Dashboards and data visualization using learning analytics can greatly help to analyse complex data to produce insights for prevention and measures to be taken for specific problems. The simplification of data visualization from learning analytics is the future trend in the process of enabling amalgamation with AI and machine learning. The integration of AI and machine learning in the education system could bring a huge impact. As an example, AI and machine learning can create new opportunities for automated evaluation and personalized critiques, and enable smart teaching systems. All of these trends in fully utilizing big data and learning analytics are likely to continue and escalate over time.

Conclusion and implications

The new era involves enormous amounts of data, which should be fully utilized for the advancement and enhancement of traditional systems in many sectors, especially in education. This study involved a bibliometric analysis of 250 publications regarding education big data and learning analytics. The increasing number of publications involving education big data and learning analytics publications during the past 10 years, from 2012 until 2021, implies that the importance of both of these topics has been acknowledged. Most of the relevant journals were published by IEEE, which has made a significant contribution to the field of education big data and learning analytics. Moreover, the most productive author to date is Ben Williamson, currently affiliated with the University of Edinburgh, UK, with a total of 1700 citations since his first publication in this area in 2007. Moreover, this study also found that the United States, with a total of 59 publications, is the most significant country in the area of education big data and learning analytics research, with its most prolific institution being City College of New York. The United States also had the highest total link strength of co-authorship on education big data and learning analytics. Since the United States is a big country, its exposure of the importance of using education big data and learning analytics worldwide is highly significant. “Big Data” and “Learning Analytics” were the keywords with the highest occurrence in most of the publications analyzed in this study. Most of the journals used these two keywords to explain details about education big data and learning analytics research. Computer Science and Social Sciences appear to be the most important subject areas regarding education big data and learning analytics. Both of these areas play important parts in making full use of the benefits of education big data and learning analytics. Other sectors were also involved, such as engineering, mathematics, art and humanities, psychology, and many more.

The main conclusion that can be drawn from this study is that big data and learning analytics are currently important skills to be maneuvered. Both big data and learning analytics could bring many significant benefits, such as improving the e-learning environment, understanding learners’ behavior, advancement in learning strategies, and many more. Learning techniques can be enhanced to achieve the best structured analytical strategy by employing learning analytics. This future trend of manipulating big data and learning analytics is certainly growing, especially in the field of education. In the future, new teaching and learning impacts gained from education big data and learning analytics could escalate the trend of personalized learning, predictive analytics, and adaptive learning, and enable data-driven decision-making. Throughout the upcoming years, education big data and learning analytics will remain significant in the field of education.

Limitations

One of the limitations of this study is regarding limited information access, as it only used Scopus to identify publications to undergo bibliometric analysis. Other databases, such as Springer Link, IEEE Xplore Digital Library, or Web of Science, might have provided different insights and produced different results. In addition, the results of this research could also have been narrowed down and thus been more accurate if more specific keywords had been used in the analysis. The keywords used were “Education Big Data” and “Learning Analytics”. The depth of analysis could also have been increased if more keywords regarding education big data and learning analytics had been used, such as “Big Data Analytics”, “Educational Data Mining”, “Deep Learning,” and many more.

Data availability

All data sets are available upon request.

Al-Shabandar R, Hussain AJ, Liatsis P, Keight R (2018) Analyzing learners behavior in MOOCs: An examination of performance and motivation using a data-driven approach. IEEE Access 6:73669–73685

Article   Google Scholar  

Baker RS, Inventado PS (2014) Educational Data Mining and Learning Analytics. In: pp. 61–75. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3305-7_4

Daniel BK (2016) Big data and learning analytics in higher education: current theory and practice. Springer International Publishing, Switzerland

Daniel BK (2017) Big data in higher education: the big picture. In Big data and learning analytics in higher education, Springer, Cham. p 19–28

Drigas AS, Leliopoulos P (2014) The use of big data in education. Int J Comput Sci Issues 11(5):58

Google Scholar  

García OA, Secades VA (2013) Big Data & learning analytics: a potential way to optimize elearning technological tools. In: International Association for Development of the Information Society International conference e-learning

Knox J, Williamson B, Bayne S (2020) Machine behaviourism: future visions of “learnification” and “datafication” across humans and digital technologies. Learn Media Technol 45(1):31–45

Lemay DJ, Baek C, Doleck T (2021) Comparison of learning analytics and educational data mining: a topic modeling approach. Comput Educ: Artif Intell 2:100016

Lutfiani N, Meria L (2022) Utilization of big data in educational technology research. Int Trans Educ Technol 1(1):73–83

Michalik P, Štofa J, Zolotova I (2014) Concept definition for Big Data architecture in the education system. Paper presented at the 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI), IEEE, pp 331–334

Mkrttchian V, Gamidullaeva L, Finogeev A, Chernyshenko S, Chernyshenko V, Amirov D et al. (2021) Big data and internet of things (IoT) technologies’ influence on higher education: current state and future prospects. Int J Web-Based Learn Teach Technol 16(5):137–157

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al. (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg 88:105906

Article   PubMed   Google Scholar  

Picciano AG (2012) The evolution of big data and learning analytics in American higher education. J Asynchronous Learn Netw 16(3):9–20

Qin SJ, Chiang LH (2019) Advances and opportunities in machine learning for process data analytics. Comput Chem Engin 126:465–473

Article   CAS   Google Scholar  

Ray S, Saeed M (2018) Applications of educational data mining and learning analytics tools in handling big data in higher education. In Alani M, Tawfik H, Saeed M, Anya O (eds) Applications of big data analytics. Springer, Cham

Şahİn M, Yurdugül H (2020) Educational data mining and learning analytics: past, present and future. Bartın Univ J Fac Educ 9(1):121–131

Sghir N, Adadi A, Lahmer M (2023) Recent advances in predictive learning analytics: a decade systematic review (2012–2022). Educ Inform Technol 28(7):8299–8333

Soegoto H, Soegoto ES, Luckyardi S, Rafdhi AA (2022) A bibliometric analysis of management bioenergy research using Vosviewer application. Indones J Sci Technol 7(1):89–104

Talan T, Demirbilek M (2023) Bibliometric analysis of research on learning analytics based on web of science database. Inform Educ 22(1):161–181

Tempelaar D, Rienties B, Nguyen Q (2021) The contribution of dispositional learning analytics to precision education. Educ Technol Soc 24(1):109–122

Wang Y (2016) Big opportunities and big concerns of big data in education. TechTrends 60(4):381–384

Yu J, Couldry N (2022) Education as a domain of natural data extraction: analysing corporate discourse about educational tracking. Inform Commun Soc 25(1):127–144

Download references

Acknowledgements

This work was funded by the Ministry of Higher Education Malaysia under the Fundamental Research Grant Scheme (FRGS/1/2020/SSI0/UTM/02/8).

Author information

Authors and affiliations.

School of Education, Faculty of Social Sciences and Humanities, Universiti Teknologi Malaysia, 81310 UTM, Johor Bahru, Johor, Malaysia

Shaza Arissa Samsul, Noraffandy Yahaya & Hassan Abuhassna

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the research conception and design. The introduction and methodology were done by NY. The methodology and analysis were also planned and performed by HA. The full draft of the manuscript, including all parts was analyzed and written by SAS. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shaza Arissa Samsul .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Overall data from scopus database, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Samsul, S.A., Yahaya, N. & Abuhassna, H. Education big data and learning analytics: a bibliometric analysis. Humanit Soc Sci Commun 10 , 709 (2023). https://doi.org/10.1057/s41599-023-02176-x

Download citation

Received : 10 October 2022

Accepted : 21 September 2023

Published : 16 October 2023

DOI : https://doi.org/10.1057/s41599-023-02176-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Influence of e-learning on the students’ of higher education in the digital era: a systematic literature review.

  • Rashmi Singh
  • Shailendra Kumar Singh
  • Niraj Mishra

Education and Information Technologies (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

big data in education research paper

REVIEW article

Challenges and future directions of big data and artificial intelligence in education.

\r\nHui Luan

  • 1 Institute for Research Excellence in Learning Sciences, National Taiwan Normal University, Taipei, Taiwan
  • 2 National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
  • 3 School of Dentistry, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
  • 4 Graduate School of Education, Rutgers – The State University of New Jersey, New Brunswick, NJ, United States
  • 5 Apprendis, LLC, Berlin, MA, United States
  • 6 Department of Computer Science and Information Engineering, College of Electrical Engineering and Computer Science, National Central University, Taoyuan City, Taiwan
  • 7 Graduate School of Informatics, Kyoto University, Kyoto, Japan
  • 8 Department of Electrical Engineering, College of Technology and Engineering, National Taiwan Normal University, Taipei, Taiwan
  • 9 Centro de Tecnologia, Universidade Federal de Santa Maria, Santa Maria, Brazil
  • 10 Department of Chinese and Bilingual Studies, Faculty of Humanities, The Hong Kong Polytechnic University, Kowloon, Hong Kong
  • 11 Program of Learning Sciences, National Taiwan Normal University, Taipei, Taiwan

We discuss the new challenges and directions facing the use of big data and artificial intelligence (AI) in education research, policy-making, and industry. In recent years, applications of big data and AI in education have made significant headways. This highlights a novel trend in leading-edge educational research. The convenience and embeddedness of data collection within educational technologies, paired with computational techniques have made the analyses of big data a reality. We are moving beyond proof-of-concept demonstrations and applications of techniques, and are beginning to see substantial adoption in many areas of education. The key research trends in the domains of big data and AI are associated with assessment, individualized learning, and precision education. Model-driven data analytics approaches will grow quickly to guide the development, interpretation, and validation of the algorithms. However, conclusions from educational analytics should, of course, be applied with caution. At the education policy level, the government should be devoted to supporting lifelong learning, offering teacher education programs, and protecting personal data. With regard to the education industry, reciprocal and mutually beneficial relationships should be developed in order to enhance academia-industry collaboration. Furthermore, it is important to make sure that technologies are guided by relevant theoretical frameworks and are empirically tested. Lastly, in this paper we advocate an in-depth dialog between supporters of “cold” technology and “warm” humanity so that it can lead to greater understanding among teachers and students about how technology, and specifically, the big data explosion and AI revolution can bring new opportunities (and challenges) that can be best leveraged for pedagogical practices and learning.

Introduction

The purpose of this position paper is to present current status, opportunities, and challenges of big data and AI in education. The work has originated from the opinions and panel discussion minutes of an international conference on big data and AI in education ( The International Learning Sciences Forum, 2019 ), where prominent researchers and experts from different disciplines such as education, psychology, data science, AI, and cognitive neuroscience, etc., exchanged their knowledge and ideas. This article is organized as follows: we start with an overview of recent progress of big data and AI in education. Then we present the major challenges and emerging trends. Finally, based on our discussions of big data and AI in education, conclusion and future scope are suggested.

Rapid advancements in big data and artificial intelligence (AI) technologies have had a profound impact on all areas of human society including the economy, politics, science, and education. Thanks in large part to these developments, we are able to continue many of our social activities under the COVID-19 pandemic. Digital tools, platforms, applications, and the communications among people have generated vast amounts of data (‘big data’) across disparate locations. Big data technologies aim at harnessing the power of extensive data in real-time or otherwise ( Daniel, 2019 ). The characteristic attributes of big data are often referred to as the four V’s. That is, volume (amount of data), variety (diversity of sources and types of data), velocity (speed of data transmission and generation), and veracity (the accuracy and trustworthiness of data) ( Laney, 2001 ; Schroeck et al., 2012 ; Geczy, 2014 ). Recently, a 5th V was added, namely value (i.e., that data could be monetized; Dijcks, 2013 ). Because of intrinsic big data characteristics (the five Vs), large and complex datasets are impossible to process and utilize by using traditional data management techniques. Hence, novel and innovative computational technologies are required for the acquisition, storage, distribution, analysis, and management of big data ( Lazer et al., 2014 ; Geczy, 2015 ). Big data analytics commonly encompasses the processes of gathering, analyzing, and evaluating large datasets. Extraction of actionable knowledge and viable patterns from data are often viewed as the core benefits of the big data revolution ( Mayer-Schönberger and Cukier, 2013 ; Jagadish et al., 2014 ). Big data analytics employ a variety of technologies and tools, such as statistical analysis, data mining, data visualization, text analytics, social network analysis, signal processing, and machine learning ( Chen and Zhang, 2014 ).

As a subset of AI, machine learning focuses on building computer systems that can learn from and adapt to data automatically without explicit programming ( Jordan and Mitchell, 2015 ). Machine learning algorithms can provide new insights, predictions, and solutions to customize the needs and circumstances of each individual. With the availability of large quantity and high-quality input training data, machine learning processes can achieve accurate results and facilitate informed decision making ( Manyika et al., 2011 ; Gobert et al., 2012 , 2013 ; Gobert and Sao Pedro, 2017 ). These data-intensive, machine learning methods are positioned at the intersection of big data and AI, and are capable of improving the services and productivity of education, as well as many other fields including commerce, science, and government.

Regarding education, our main area of interest here, the application of AI technologies can be traced back to approximately 50 years ago. The first Intelligent Tutoring System “SCHOLAR” was designed to support geography learning, and was capable of generating interactive responses to student statements ( Carbonell, 1970 ). While the amount of data was relatively small at that time, it was comparable to the amount of data collected in other traditional educational and psychological studies. Research on AI in education over the past few decades has been dedicated to advancing intelligent computing technologies such as intelligent tutoring systems ( Graesser et al., 2005 ; Gobert et al., 2013 ; Nye, 2015 ), robotic systems ( Toh et al., 2016 ; Anwar et al., 2019 ), and chatbots ( Smutny and Schreiberova, 2020 ). With the breakthroughs in information technologies in the last decade, educational psychologists have had greater access to big data. Concretely speaking, social media (e.g., Facebook, Twitter), online learning environments [e.g., Massive Open Online Courses (MOOCs)], intelligent tutoring systems (e.g., AutoTutor), learning management systems (LMSs), sensors, and mobile devices are generating ever-growing amounts of dynamic and complex data containing students’ personal records, physiological data, learning logs and activities, as well as their learning performance and outcomes ( Daniel, 2015 ). Learning analytics, described as “the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs” ( Long and Siemens, 2011 , p. 34), are often implemented to analyze these huge amounts of data ( Aldowah et al., 2019 ). Machine learning and AI techniques further expand the capabilities of learning analytics ( Zawacki-Richter et al., 2019 ). The essential information extracted from big data could be utilized to optimize learning, teaching, and administration ( Daniel, 2015 ). Hence, research on big data and AI is gaining increasing significance in education ( Johnson et al., 2011 ; Becker et al., 2017 ; Hwang et al., 2018 ) and psychology ( Harlow and Oswald, 2016 ; Yarkoni and Westfall, 2017 ; Adjerid and Kelley, 2018 ; Cheung and Jak, 2018 ). Recently, the adoption of big data and AI in the psychology of learning and teaching has been trending as a novel method in cutting-edge educational research ( Daniel, 2015 ; Starcic, 2019 ).

The Position Formulation

A growing body of literature has attempted to uncover the value of big data at different education levels, from preschool to higher education ( Chen N.-S. et al., 2020 ). Several journal articles and book chapters have presented retrospective descriptions and the latest advances in the rapidly expanding research area from different angles, including systematic literature review ( Zawacki-Richter et al., 2019 ; Quadir et al., 2020 ), bibliometric study ( Hinojo-Lucena et al., 2019 ), qualitative analysis ( Malik et al., 2019 ; Chen L. et al., 2020 ), and social network analysis ( Goksel and Bozkurt, 2019 ). More details can be found in the previously mentioned reviews. In this paper, we aim at presenting the current progress of the application of big data and AI in education. By and large, the research on the learner side is devoted to identifying students’ learning and affective behavior patterns and profiles, improving methods of assessment and evaluation, predicting individual students’ learning performance or dropouts, and providing adaptive systems for personalized support ( Papamitsiou and Economides, 2014 ; Zawacki-Richter et al., 2019 ). On the teacher side, numerous studies have attempted to enhance course planning and curriculum development, evaluation of teaching, and teaching support ( Zawacki-Richter et al., 2019 ; Quadir et al., 2020 ). Additionally, teacher dashboards, such as Inq-Blotter, driven by big data techniques are being used to inform teachers’ instruction in real time while students simultaneously work in Inq-ITS ( Gobert and Sao Pedro, 2017 ; Mislevy et al., 2020 ). Big data technologies employing learning analytics and machine learning have demonstrated high predictive accuracy of students’ academic performance ( Huang et al., 2020 ). Only a small number of studies have focused on the effectiveness of learning analytics programs and AI applications. However, recent findings have revealed encouraging results in terms of improving students’ academic performance and retention, as well as supporting teachers in learning design and teaching strategy refinement ( Viberg et al., 2018 ; Li et al., 2019 ; Sonderlund et al., 2019 ; Mislevy et al., 2020 ).

Despite the growing number of reports and methods outlining implementations of big data and AI technologies in educational environments, we see a notable gap between contemporary technological capabilities and their utilization for education. The fast-growing education industry has developed numerous data processing techniques and AI applications, which may not be guided by current theoretical frameworks and research findings from psychology of learning and teaching. The rapid pace of technological progress and relatively slow educational adoption have contributed to the widening gap between technology readiness and its application in education ( Macfadyen, 2017 ). There is a pressing need to reduce this gap and stimulate technological adoption in education. This work presents varying viewpoints and their controversial issues, contemporary research, and prospective future developments in adoption of big data and AI in education. We advocate an interdisciplinary approach that encompasses educational, technological, and governmental spheres of influence. In the educational domain, there is a relative lack of knowledge and skills in AI and big data applications. On the technological side, few data scientists and AI developers are familiar with the advancements in education psychology, though this is changing with the advent of graduate programs at the intersection of Learning Sciences and Computer Science. Finally, in terms of government policies, the main challenges faced are the regulatory and ethical dilemmas between support of educational reforms and restrictions on adoptions of data-oriented technologies.

An Interdisciplinary Approach to Educational Adoption of Big Data and AI

In response to the new opportunities and challenges that the big data explosion and AI revolution are bringing, academics, educators, policy-makers, and professionals need to engage in productive collaboration. They must work together to cultivate our learners’ necessary competencies and essential skills important for the 21st century work, driven by the knowledge economy ( Bereiter, 2002 ). Collaboration across diverse disciplines and sectors is a demanding task—particularly when individual sides lack a clear vision of their mutually beneficial interests and the necessary knowledge and skills to realize that vision. We highlight several overlapping spheres of interest at the intersection of research, policy-making, and industry engagements. Researchers and the industry would benefit from targeted educational technology development and its efficient transfer to commercial products. Businesses and governments would benefit from legislature that stimulates technology markets while suitably protecting data and users’ privacy. Academics and policy makers would benefit from prioritizing educational reforms enabling greater adoption of technology-enhanced curricula. The recent developments and evolving future trends at intersections between researchers, policy-makers, and industry stakeholders arising from advancements and deployments of big data and AI technologies in education are illustrated in Figure 1 .

www.frontiersin.org

Figure 1. Contemporary developments and future trends at the intersections between research, policy, and industry driven by big data and AI advances in education.

The constructive domains among stakeholders progressively evolve along with scientific and technological developments. Therefore, it is important to reflect on longer-term projections and challenges. The following sections highlight the novel challenges and future directions of big data and AI technologies at the intersection of education research, policy-making, and industry.

Big Data and AI in Education: Research

An understanding of individual differences is critical for developing pedagogical tools to target specific students and to tailor education to individual needs at different stages. Intelligent educational systems employing big data and AI techniques are capable of collecting accurate and rich personal data. Data analytics can reveal students’ learning patterns and identify their specific needs ( Gobert and Sao Pedro, 2017 ; Mislevy et al., 2020 ). Hence, big data and AI have the potential to realize individualized learning to achieve precision education ( Lu et al., 2018 ). We see the following emerging trends, research gaps, and controversies in integrating big data and AI into education research so that there is a deep and rigorous understanding of individual differences that can be used to personalize learning in real time and at scale.

(1) Education is progressively moving from a one-size-fits-all approach to precision education or personalized learning ( Lu et al., 2018 ; Tsai et al., 2020 ). The one-size-fits-all approach was designed for average students, whereas precision education takes into consideration the individual differences of learners in their learning environments, along with their learning strategies. The main idea of precision education is analogous to “precision medicine,” where researchers harvest big data to identify patterns relevant to specific patients such that prevention and treatment can be customized. Based on the analysis of student learning profiles and patterns, precision education predicts students’ performance and provides timely interventions to optimize learning. The goal of precision education is to improve the diagnosis, prediction, treatment, and prevention of learning outcomes ( Lu et al., 2018 ). Contemporary research gaps related to adaptive tools and personalized educational experiences are impeding the transition to precision education. Adaptive educational tools and flexible learning systems are needed to accommodate individual learners’ interaction, pace, and learning progress, and to fit the specific needs of the individual learners, such as students with learning disabilities ( Xie et al., 2019 ; Zawacki-Richter et al., 2019 ). Hence, as personalized learning is customized for different people, researchers are able to focus on individualized learning that is adaptive to individual needs in real time ( Gobert and Sao Pedro, 2017 ; Lu et al., 2018 ).

(2) The research focus on deploying AI in education is gradually shifting from a computational focus that demonstrates use cases of new technology to cognitive focus that incorporates cognition in its design, such as perception ( VanRullen, 2017 ), emotion ( Song et al., 2016 ), and cognitive thinking ( Bramley et al., 2017 ). Moreover, it is also shifting from a single domain (e.g., domain expertise, or expert systems) to a cross-disciplinary approach through collaboration ( Spikol et al., 2018 ; Krouska et al., 2019 ) and domain transfers ( L’heureux et al., 2017 ). These controversial shifts are facilitating transitions from the knowing of the unknown (gaining insights through reasoning) to the unknown of the unknown (figuring out hidden values and unknown results through algorithms) ( Abed Ibrahim and Fekete, 2019 ; Cutumisu and Guo, 2019 ). In other words, deterministic learning, aimed at deductive/inductive reasoning and inference engines, predominated in traditional expert systems and old AI. Whereas, today, dynamic and stochastic learning, the outcome of which involves some randomness and uncertainty, is gradually becoming the trend in modern machine learning techniques.

(3) The format of machine-generated data and the purpose of machine learning algorithms should be carefully designed. There is a notable gap between theoretical design and its applicability. A theoretical model is needed to guide the development, interpretation, and validation of algorithms ( Gobert et al., 2013 ; Hew et al., 2019 ). The outcomes of data analytics and algorithmically generated evidence must be shared with educators and applied with caution. For instance, efforts to algorithmically detect mental states such as boredom, frustration, and confusion ( Baker et al., 2010 ) must be supported by the operational definitions and constructs that have been prudently evaluated. Additionally, the affective data collected by AI systems should take into account the cultural differences combined with contextual factors, teachers’ observations, and students’ opinions ( Yadegaridehkordi et al., 2019 ). Data need to be informatively and qualitatively balanced, in order to avoid implicit biases that may propagate into algorithms trained on such data ( Staats, 2016 ).

(4) There are ethical and algorithmic challenges when balancing human provided learning and machine assisted learning. The significant influence of AI and contemporary technologies is a double-edged sword ( Khechine and Lakhal, 2018 ). On the one hand, it facilitates better usability and drives progress. On the other, it might lead to the algorithmic bias and loss of certain essential skills among students who are extensively relying on technology. For instance, in creativity- or experience-based learning, technology may even become an obstacle to learning, since it may hinder students from attaining first-hand experiences and participating in the learning activities ( Cuthbertson et al., 2004 ). Appropriately balancing the technology adoption and human involvement in various educational contexts will be a challenge in the foreseeable future. Nonetheless, the convergence of human and machine learning has the potential for highly effective teaching and learning beyond the simple “sum of the parts of human and artificial intelligence” ( Topol, 2019 ).

(5) Algorithmic bias is another controversial issue ( Obermeyer et al., 2019 ). Since modern AI algorithms extensively rely on data, their performance is governed solely by data. Algorithms adapt to inherent qualitative and quantitative characteristics of data. For example, if data is unbalanced and contains disproportionately better information on students from general population in comparison to minorities, the algorithms may produce systematic and repeatable errors disadvantaging minorities. These controversial issues need to be addressed before its wide implementation in education practice since every single student is precious. More rigorous studies and validation in real learning environments are required though work along these lines is being done ( Sao Pedro et al., 2013 ).

(6) The fast expansion of technology and inequalities of learning opportunities has aroused great controversies. Due to the exponential nature of technological progress, particularly big data and AI revolution, a fresh paradigm and new learning landscape are on the horizon. For instance, the elite smartphone 10 years ago, in 2010, was BlackBerry. Today, 10 years later, even in sub-Saharan Africa, 75% of the population has mobile phones several generations more advanced ( GSMA Intelligence, 2020 ). Hence, the entry barriers are shifting from the technical requirements to the willingness of and/or need for adoption. This has been clearly demonstrated during the COVID-19 pandemic. The need for social distancing and continuing education has led to online/e-learning deployments within months ( United Nations, 2020 ). A huge amount of learning data is created accordingly. The extraction of meaningful patterns and the discovery of knowledge from these data is expected to be carried out through learning analytics and AI techniques. Inevitably, the current learning cultures, learning experiences, and classroom dynamics are changing as “we live algorithmic lives” ( Bucher, 2018 ). Thus, there is a critical need to adopt proper learning theories of educational psychology and to encourage our learners to be active participants rather than passive recipients or merely tracked objects ( Loftus and Madden, 2020 ). For example, under the constructionist framework ( Tsai, 2000 ), the technology-enhanced or AI-powered education may empower students to know their learning activities and patterns, predict their possible learning outcomes, and strategically regulate their learning behavior ( Koh et al., 2014 ; Loftus and Madden, 2020 ). On the other hand, in the era of information explosion and AI revolution, the disadvantaged students and developing countries are indeed facing a wider digital divide. To reduce the inequalities and bring more opportunities, cultivating young people’s competencies is seemed like one of the most promising means ( UNESCO, 2015 ). Meanwhile, overseas support from international organizations such as World Bank and UNESCO are imperative for developing countries in their communication infrastructure establishment (e.g., hardware, software, connectivity, electricity). Naturally, technology will not replace or hinder human learning; rather, a smart use of new technologies will facilitate transfer and acquisition of knowledge ( Azevedo et al., 2019 ).

An overarching theme from the above trends of research is that we need theories of cognitive and educational psychology to guide our understanding of the individual learner (and individual differences), in order to develop best tools, algorithms, and practices for personalized learning. Take, for example, VR (virtual reality) or AR (augmented reality) as a fast-developing technology for education. The industry has developed many different types of VR/AR applications (e.g., Google Expeditions with over 100 virtual field trips), but these have typically been developed in the views of the industry (see further discussion below) and may not be informed by theories and data from educational psychology about how students actually learn. To make VR/AR effective learning tools, we must separate the technological features from the human experiences and abilities (e.g., cognitive, linguistic, spatial abilities of the learner; see Li et al., 2020 ). For example, VR provides a high-fidelity 3D real-life virtual environment, and the technological tools are built on the assumption that 3D realism enables the learner to gain ‘perceptual grounding’ during learning (e.g., having access to visual, auditory, tactile experiences as in real world). Following the ‘embodied cognition’ theory ( Barsalou, 2008 ), we should expect VR learning to yield better learning outcomes compared with traditional classroom learning. However, empirical data suggest that there are significant individual differences in that some students benefit more than others from VR learning. It may be that the individuals with higher cognitive and perceptual abilities need no additional visuospatial information (provided in VR) to succeed in learning. In any case, we need to understand how embodied experiences (provided by the technology) interact with different learners’ inherent abilities (as well as their prior knowledge and background) for the best application of the relevant technology in education.

Big Data and AI in Education: Policy-Making

Following the revolution triggered by breakthroughs in big data and AI technology, policy-makers have attempted to formulate strategies and policies regarding how to incorporate AI and emerging technologies into primary, secondary, and tertiary education ( Pedró et al., 2019 ). Major challenges must be overcome in order to suitably integrate big data and AI into educational practice. The following three segments highlight pertinent policy-oriented challenges, gaps, and evolving trends.

(1) In digitally-driven knowledge economies, traditional formal education systems are undergoing drastic changes or even a paradigm shift ( Peters, 2018 ). Lifelong learning is quickly being adopted and implemented through online or project-based learning schemes that incorporate multiple ways of teaching ( Lenschow, 1998 ; Sharples, 2000 ; Field, 2001 ; Koper and Tattersall, 2004 ). This new concept of continual education will require micro-credits or micro-degrees to sustain learners’ efforts ( Manuel Moreno-Marcos et al., 2019 ). The need to change the scope and role of education will become evident in the near future ( Williams, 2019 ). For example, in the next few years, new instruction methods, engagement, and assessment will need to be developed in formal education to support lifelong education. The system should be based on micro-credits or micro-degrees.

(2) Solutions for integrating cutting-edge research findings, innovative theory-driven curricula, and emerging technologies into students’ learning are evidently beneficial, and perhaps even ready for adoption. However, there is an apparent divergence between a large number of pre-service and in-service teachers and their willingness to support and adopt these emerging technologies ( Pedró et al., 2019 ). Pre-service teachers have greater exposure to modern technologies and, in general, are more willing to adopt them. In-service teachers have greater practical experience and tend to more rely on it. To bridge the gap, effective teacher education programs and continuing education programs have to be developed and offered to support the adoption of these new technologies so that they can be implemented with fidelity ( O’Donnell, 2008 ). This issue could become even more pressing to tackle in light of the extended period of the COVID-19 pandemic.

(3) A suitable legislative framework is needed to protect personal data from unscrupulous collection, unauthorized disclosure, commercial exploitation, and other abuses ( Boyd and Crawford, 2012 ; Pardo and Siemens, 2014 ). Education records and personal data are highly sensitive. There are significant risks associated with students’ educational profiles, records, and other personal data. Appropriate security measures must be adopted by educational institutions. Commercial educational system providers are actively exploiting both legislative gaps and concealed data acquisition channels. Increasing numbers of industry players are implementing data-oriented business models ( Geczy, 2018 ). There is a vital role to play for legislative, regulatory, and enforcing bodies at both the national and local levels. It is pertinent that governments enact, implement, and enforce privacy and personal data protection legislation and measures. In doing so, there is a need to strike a proper balance between desirable use of personal data for educational purposes and undesirable commercial monetization and abuse of personal data.

Big Data and AI in Education: Industry

As scientific and academic aspects of big data and AI in education have their unique challenges, so does the commercialization of educational tools and systems ( Renz et al., 2020 ). Numerous countries have attempted to stimulate innovation-based growth through enhancing technology transfer and fostering academia-industry collaboration ( Huggins and Thompson, 2015 ). In the United States, this was initiated by the Bayh-Dole Act ( Mowery et al., 2001 ). Building a reciprocal and sustained partnership is strongly encouraged. It facilitates technology transfers and strengthens the links between academia and the education industry. There are several points to be considered when approaching academia-industry collaboration. It is important that collaboration is mutually beneficial. The following points highlight the overlapping spheres of benefits for both educational commerce and academia. They also expose existing gaps and future prospects.

(1) Commercializing intelligent educational tools and systems that include the latest scientific and technological advances can provide educators with tools for developing more effective curricula, pedagogical frameworks, assessments, and programs. Timely release of educational research advances onto commercial platforms is desirable by vendors from development, marketing, and revenue perspectives ( Renz and Hilbig, 2020 ). Implementation of the latest research enables progressive development of commercial products and distinctive differentiation for marketing purposes. This could also potentially solve the significant gap between what the industry knows and develops and what the academic research says with regard to student learning. Novel features may also be suitably monetized—hence, expanding revenue streams. The gaps between availability of the latest research and its practical adoption are slowing progress and negatively impacting commercial vendors. A viable solution is a closer alignment and/or direct collaboration between academia and industry.

(2) A greater spectrum of commercially and freely available tools helps maintain healthy market competition. It also helps to avoid monopolies and oligopolies that stifle innovation, limit choices, and damage markets for educational tools. Some well-stablished or free-of-charge platforms (e.g., Moodle, LMS) might show such potential of oligopolies during the COVID-19 pandemic. With more tools available on the market, educators and academics may explore novel avenues for improving education and research. New and more effective forms of education may be devised. For instance, multimodal virtual educational environments have high potential future prospects. These are environments that would otherwise be impossible in conventional physical settings (see previous discussion of VR/AR). Expanding educational markets and commerce should inevitably lead to expanding resources for research and development funding ( Popenici and Kerr, 2017 ). Collaborative research projects sponsored by the industry should provide support and opportunities for academics to advance educational research. Controversially, in numerous geographies there is a decreasing trend in collaborative research. To reverse the trend, it is desirable that academic researchers and industry practitioners increase their engagements via mutual presentations, educations, and even government initiatives. All three stakeholders (i.e., academia, industry, and government) should play more active roles.

(3) Vocational and practical education provides numerous opportunities for fruitful academia-industry collaboration. With the changing nature of work and growing technology adoption, there is an increasing demand for radical changes in vocational education—for both teachers and students ( World Development and Report, 2019 ). Domain knowledge provided by teachers is beneficially supplemented by AI-assisted learning environments in academia. Practical skills are enhanced in industrial environments with hands-on experience and feedback from both trainers and technology tools. Hence, students benefit from acquiring domain knowledge and enhancing their skills via interactions with human teachers and trainers. Equally, they benefit from gaining the practical skills via interactions with simulated and real-world technological environments. Effective vocational training demands teachers and trainers on the human-learning side, and AI environments and actual technology tools on machine-learning side. Collaboration between academia and industry, as well as balanced human and machine learning approaches are pertinent for vocational education.

Discussion and Conclusion

Big data and AI have enormous potential to realize highly effective learning and teaching. They stimulate new research questions and designs, exploit innovative technologies and tools in data collection and analysis, and ultimately become a mainstream research paradigm ( Daniel, 2019 ). Nonetheless, they are still fairly novel and unfamiliar to many researchers and educators. In this paper, we have described the general background, core concepts, and recent progress of this rapidly growing domain. Along with the arising opportunities, we have highlighted the crucial challenges and emerging trends of big data and AI in education, which are reflected in educational research, policy-making, and industry. Table 1 concisely summarizes the major challenges and possible solutions of big data and AI in education. In summary, future studies should be aimed at theory-based precision education, incorporating cross-disciplinary application, and appropriately using educational technologies. The government should be devoted to supporting lifelong learning, offering teacher education programs, and protecting personal data. With regard to the education industry, reciprocal and mutually beneficial relationships should be developed in order to enhance academia-industry collaboration.

www.frontiersin.org

Table 1. Major challenges and possible solutions for integrating big data and AI into education.

Regarding the future development of big data and AI, we advocate an in-depth dialog between the supporters of “cold” technology and “warm” humanity so that users of technology can benefit from its capacity and not see it as a threat to their livelihood. An equally important issue is that overreliance on technology may lead to an underestimation of the role of humans in education. Remember the fundamental role of schooling: the school is a great equalizer as well as a central socialization agent. We need to better understand the role of social and affective processing (e.g., emotion, motivation) in addition to cognitive processing in student learning successes (or failures). After all, human learning is a social behavior, and a number of key regions in our brains are wired to be socially engaged (see Li and Jeong, 2020 for a discussion).

It has been estimated that approximately half of the current routine jobs might be automated in the near future ( Frey and Osborne, 2017 ; World Development and Report, 2019 ). However, the teacher’s job could not be replaced. The teacher-student relationship is indispensable in students’ learning, and inspirational in students’ personal growth ( Roorda et al., 2011 ; Cheng and Tsai, 2019 ). On the other hand, new developments in technologies will enable us to collect and analyze large-scale, multimodal, and continuous real-time data. Such data-intensive and technology-driven analysis of human behavior, in real-world and simulated environments, may assist teachers in identifying students’ learning trajectories and patterns, developing corresponding lesson plans, and adopting effective teaching strategies ( Klašnja-Milicevic et al., 2017 ; Gierl and Lai, 2018 ). It may also support teachers in tackling students’ more complex problems and cultivating students’ higher-order thinking skills by freeing the teachers from their monotonous and routine tasks ( Li, 2007 ; Belpaeme et al., 2018 ). Hence, it is now imperative for us to embrace AI and technology and prepare our teachers and students for the future of AI-enhanced and technology-supported education.

The adoption of big data and AI in learning and teaching is still in its infancy and limited by technological and mindset challenges for now; however, the convergence of developments in psychology, data science, and computer science shows great promise in revolutionizing educational research, practice, and industry. We hope that the latest achievements and future directions presented in this paper will advance our shared goal of helping learners and teachers pursue sustainable development.

Author Contributions

HLu wrote the initial draft of the manuscript. PG, HLa, JG, and PL revised the drafts and provided theoretical background. SY, HO, JB, and RG contributed content for the original draft preparation of the manuscript. C-CT provided theoretical focus, design, draft feedback, and supervised throughout the research. All authors contributed to the article and approved the submitted version.

This work was financially supported by the Institute for Research Excellence in Learning Sciences of National Taiwan Normal University (NTNU) from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

Conflict of Interest

JG was employed by company Apprendis, LLC, Berlin.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abed Ibrahim, L., and Fekete, I. (2019). What machine learning can tell us about the role of language dominance in the diagnostic accuracy of german litmus non-word and sentence repetition tasks. Front. Psychol. 9:2757. doi: 10.3389/fpsyg.2018.02757

CrossRef Full Text | Google Scholar

Adjerid, I., and Kelley, K. (2018). Big data in psychology: a framework for research advancement. Am. Psychol. 73, 899–917. doi: 10.1037/amp0000190

PubMed Abstract | CrossRef Full Text | Google Scholar

Aldowah, H., Al-Samarraie, H., and Fauzy, W. M. (2019). Educational data mining and learning analytics for 21st century higher education: a review and synthesis. Telemat. Inform. 37, 13–49. doi: 10.1016/j.tele.2019.01.007

Anwar, S., Bascou, N. A., Menekse, M., and Kardgar, A. (2019). A systematic review of studies on educational robotics. J. Pre-College Eng. Educ. Res. (J-PEER) 9, 19–42. doi: 10.7771/2157-9288.1223

Azevedo, J. P. W. D., Crawford, M. F., Nayar, R., Rogers, F. H., Barron Rodriguez, M. R., Ding, E. Y. Z., et al. (2019). Ending Learning Poverty: What Will It Take?. Washington, D.C: The World Bank.

Google Scholar

Baker, R. S. J. D., D’Mello, S. K., Rodrigo, M. M. T., and Graesser, A. C. (2010). Better to be frustrated than bored: the incidence, persistence, and impact of learners’ cognitive-affective states during interactions with three different computer-based learning environments. Int. J. Human-Comp. Stud. 68, 223–241. doi: 10.1016/j.ijhcs.2009.12.003

Barsalou, L. W. (2008). “Grounding symbolic operations in the brain’s modal systems,” in Embodied Grounding: Social, Cognitive, Affective, and Neuroscientific Approaches , eds G. R. Semin and E. R. Smith (Cambridge: Cambridge University Press), 9–42. doi: 10.1017/cbo9780511805837.002

Becker, S. A., Cummins, M., Davis, A., Freeman, A., Hall, C. G., and Ananthanarayanan, V. (2017). NMC Horizon Report: 2017 Higher Education Edition. Austin, TX: The New Media Consortium.

Belpaeme, T., Kennedy, J., Ramachandran, A., Scassellati, B., and Tanaka, F. (2018). Social robots for education: a review. Sci. Robot. 3:eaat5954. doi: 10.1126/scirobotics.aat5954

Bereiter, C. (2002). Education and MIND in the Knowledge Age. Mahwah, NJ: LEA.

Boyd, D., and Crawford, K. (2012). Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inform. Commun. Soc. 15, 662–679. doi: 10.1080/1369118x.2012.678878

Bramley, N. R., Dayan, P., Griffiths, T. L., and Lagnado, D. A. (2017). Formalizing Neurath’s ship: approximate algorithms for online causal learning. Psychol. Rev. 124, 301–338. doi: 10.1037/rev0000061

Bucher, T. (2018). If Then: Algorithmic Power and Politics. New York, NY: Oxford University Press.

Carbonell, J. R. (1970). AI in CAI: an artificial-intelligence approach to computer-assisted instruction. IEEE Trans. Man-Machine Sys. 11, 190–202. doi: 10.1109/TMMS.1970.299942

Chen, C. P., and Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inform. Sci. 275, 314–347. doi: 10.1016/j.ins.2014.01.015

Chen, L., Chen, P., and Lin, Z. (2020). Artificial intelligence in education: a review. IEEE Access 8, 75264–75278. doi: 10.1109/ACCESS.2020.2988510

Chen, N.-S., Yin, C., Isaias, P., and Psotka, J. (2020). Educational big data: extracting meaning from data for smart education. Interact. Learn. Environ. 28, 142–147. doi: 10.1080/10494820.2019.1635395

Cheng, K.-H., and Tsai, C.-C. (2019). A case study of immersive virtual field trips in an elementary classroom: students’ learning experience and teacher-student interaction behaviors. Comp. Educ. 140:103600. doi: 10.1016/j.compedu.2019.103600

Cheung, M. W.-L., and Jak, S. (2018). Challenges of big data analyses and applications in psychology. Zeitschrift Fur Psychol. J. Psychol. 226, 209–211. doi: 10.1027/2151-2604/a000348

Cuthbertson, B., Socha, T. L., and Potter, T. G. (2004). The double-edged sword: critical reflections on traditional and modern technology in outdoor education. J. Adv. Educ. Outdoor Learn. 4, 133–144. doi: 10.1080/14729670485200491

Cutumisu, M., and Guo, Q. (2019). Using topic modeling to extract pre-service teachers’ understandings of computational thinking from their coding reflections. IEEE Trans. Educ. 62, 325–332. doi: 10.1109/te.2019.2925253

Daniel, B. (2015). Big data and analytics in higher education: opportunities and challenges. Br. J. Educ. Technol. 46, 904–920. doi: 10.1111/bjet.12230

Daniel, B. K. (2019). Big data and data science: a critical review of issues for educational research. Br. J. Educ. Technol. 50, 101–113. doi: 10.1111/bjet.12595

Dijcks, J. (2013). Oracle: Big data for the enterprise. Oracle White Paper . Redwood Shores, CA: Oracle Corporation.

Field, J. (2001). Lifelong education. Int. J. Lifelong Educ. 20, 3–15. doi: 10.1080/09638280010008291

Frey, C. B., and Osborne, M. A. (2017). The future of employment: how susceptible are jobs to computerisation? Technol. Forecast. Soc. Change 114, 254–280. doi: 10.1016/j.techfore.2016.08.019

Geczy, P. (2014). Big data characteristics. Macrotheme Rev. 3, 94–104.

Geczy, P. (2015). Big data management: relational framework. Rev. Bus. Finance Stud. 6, 21–30.

Geczy, P. (2018). Data-Oriented business models: gaining competitive advantage. Global J. Bus. Res. 12, 25–36.

Gierl, M. J., and Lai, H. (2018). Using automatic item generation to create solutions and rationales for computerized formative testing. Appl. Psychol. Measurement 42, 42–57. doi: 10.1177/0146621617726788

Gobert, J., Sao Pedro, M., Raziuddin, J., and Baker, R. S. (2013). From log files to assessment metrics for science inquiry using educational data mining. J. Learn. Sci. 22, 521–563. doi: 10.1080/10508406.2013.837391

Gobert, J. D., and Sao Pedro, M. A. (2017). “Digital assessment environments for scientific inquiry practices,” in The Wiley Handbook of Cognition and Assessment , eds A. A. Rupp and J. P. Leighton (West Sussex: Frameworks, Methodologies, and Applications), 508–534. doi: 10.1002/9781118956588.ch21

Gobert, J. D., Sao Pedro, M. A., Baker, R. S., Toto, E., and Montalvo, O. (2012). Leveraging educational data mining for real-time performance assessment of scientific inquiry skills within microworlds. J. Educ. Data Min. 4, 104–143. doi: 10.5281/zenodo.3554645

Goksel, N., and Bozkurt, A. (2019). “Artificial intelligence in education: current insights and future perspectives,” in Handbook of Research on Learning in the Age of Transhumanism , eds S. Sisman-Ugur and G. Kurubacak (Hershey, PA: IGI Global), 224–236 doi: 10.4018/978-1-5225-8431-5.ch014

Graesser, A. C., Chipman, P., Haynes, B. C., and Olney, A. (2005). AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Trans. Educ. 48, 612–618. doi: 10.1109/te.2005.856149

GSMA Intelligence (2020). The Mobile Economy 2020 . London: GSM Association.

Harlow, L. L., and Oswald, F. L. (2016). Big data in psychology: introduction to the special issue. Psychol. Methods 21, 447–457. doi: 10.1037/met0000120

Hew, K. F., Lan, M., Tang, Y., Jia, C., and Lo, C. K. (2019). Where is the “theory” within the field of educational technology research? Br. J. Educ. Technol. 50, 956–971. doi: 10.1111/bjet.12770

Hinojo-Lucena, F. J., Aznar-Díaz, I., Cáceres-Reche, M. P., and Romero-Rodríguez, J. M. (2019). Artificial intelligence in higher education: a bibliometric study on its impact in the scientific literature. Educ. Sci. 9:51. doi: 10.3390/educsci9010051

Huang, A. Y., Lu, O. H., Huang, J. C., Yin, C., and Yang, S. J. (2020). Predicting students’ academic performance by using educational big data and learning analytics: evaluation of classification methods and learning logs. Int. Learn. Environ. 28, 206–230. doi: 10.1080/10494820.2019.1636086

Huggins, R., and Thompson, P. (2015). Entrepreneurship, innovation and regional growth: a network theory. Small Bus. Econ. 45, 103–128. doi: 10.1007/s11187-015-9643-3

Hwang, G.-J., Spikol, D., and Li, K.-C. (2018). Guest editorial: trends and research issues of learning analytics and educational big data. Educ. Technol. Soc. 21, 134–136.

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., et al. (2014). Big data and its technical challenges. Commun. ACM. 57, 86–94. doi: 10.1145/2611567

Johnson, L., Smith, R., Willis, H., Levine, A., and Haywood, K. (2011). The 2011 Horizon Report. Austin, TX: The New Media Consortium.

Jordan, M. I., and Mitchell, T. M. (2015). Machine learning: trends, perspectives, and prospects. Science 349, 255–260. doi: 10.1126/science.aaa8415

Khechine, H., and Lakhal, S. (2018). Technology as a double-edged sword: from behavior prediction with UTAUT to students’ outcomes considering personal characteristics. J. Inform. Technol. Educ. Res. 17, 63–102. doi: 10.28945/4022

Klašnja-Milicevic, A., Ivanovic, M., and Budimac, Z. (2017). Data science in education: big data and learning analytics. Comput. Applicat. Eng. Educ. 25, 1066–1078. doi: 10.1002/cae.21844

Koh, J. H. L., Chai, C. S., and Tsai, C. C. (2014). Demographic factors, TPACK constructs, and teachers’ perceptions of constructivist-oriented TPACK. J. Educ. Technol. Soc. 17, 185–196.

Koper, R., and Tattersall, C. (2004). New directions for lifelong learning using network technologies. Br. J. Educ. Technol. 35, 689–700. doi: 10.1111/j.1467-8535.2004.00427.x

Krouska, A., Troussas, C., and Virvou, M. (2019). SN-Learning: an exploratory study beyond e-learning and evaluation of its applications using EV-SNL framework. J. Comp. Ass. Learn. 35, 168–177. doi: 10.1111/jcal.12330

Laney, D. (2001). 3D data management: controlling data volume, velocity and variety. META Group Res. Note 6, 70–73.

Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science 343, 1203–1205. doi: 10.1126/science.1248506

Lenschow, R. J. (1998). From teaching to learning: a paradigm shift in engineering education and lifelong learning. Eur. J. Eng. Educ. 23, 155–161. doi: 10.1080/03043799808923494

L’heureux, A., Grolinger, K., Elyamany, H. F., and Capretz, M. A. (2017). Machine learning with big data: challenges and approaches. IEEE Access 5, 7776–7797. doi: 10.1109/ACCESS.2017.2696365

Li, H., Gobert, J., and Dickler, R. (2019). “Evaluating the transfer of scaffolded inquiry: what sticks and does it last?,” in Artificial Intelligence in Education , eds S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, and R. Luckin (Cham: Springer), 163–168. doi: 10.1007/978-3-030-23207-8_31

Li, P., and Jeong, H. (2020). The social brain of language: grounding second language learning in social interaction. npj Sci. Learn. 5:8. doi: 10.1038/s41539-020-0068-7

Li, P., Legault, J., Klippel, A., and Zhao, J. (2020). Virtual reality for student learning: understanding individual differences. Hum. Behav. Brain 1, 28–36. doi: 10.37716/HBAB.2020010105

Li, X. (2007). Intelligent agent–supported online education. Dec. Sci. J. Innovat. Educ. 5, 311–331. doi: 10.1111/j.1540-4609.2007.00143.x

Loftus, M., and Madden, M. G. (2020). A pedagogy of data and Artificial intelligence for student subjectification. Teach. Higher Educ. 25, 456–475. doi: 10.1080/13562517.2020.1748593

Long, P., and Siemens, G. (2011). Penetrating the fog: analytics in learning and education. Educ. Rev. 46, 31–40. doi: 10.1007/978-3-319-38956-1_4

Lu, O. H. T., Huang, A. Y. Q., Huang, J. C. H., Lin, A. J. Q., Ogata, H., and Yang, S. J. H. (2018). Applying learning analytics for the early prediction of students’ academic performance in blended learning. Educ. Technol. Soc. 21, 220–232.

Macfadyen, L. P. (2017). Overcoming barriers to educational analytics: how systems thinking and pragmatism can help. Educ. Technol. 57, 31–39.

Malik, G., Tayal, D. K., and Vij, S. (2019). “An analysis of the role of artificial intelligence in education and teaching,” in Recent Findings in Intelligent Computing Techniques. Advances in Intelligent Systems and Computing , eds P. Sa, S. Bakshi, I. Hatzilygeroudis, and M. Sahoo (Singapore: Springer), 407–417.

Manuel Moreno-Marcos, P., Alario-Hoyos, C., Munoz-Merino, P. J., and Delgado Kloos, C. (2019). Prediction in MOOCs: a review and future research directions. IEEE Trans. Learn. Technol. 12, 384–401. doi: 10.1109/TLT.2018.2856808

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The Next Frontier for Innovation, Competition and Productivity. New York, NY: McKinsey Global Institute.

Mayer-Schönberger, V., and Cukier, K. (2013). Big data: A Revolution That Will Transform How we live, Work, and Think. Boston, MA: Houghton Mifflin Harcourt.

Mislevy, R. J., Yan, D., Gobert, J., and Sao Pedro, M. (2020). “Automated scoring in intelligent tutoring systems,” in Handbook of Automated Scoring , eds D. Yan, A. A. Rupp, and P. W. Foltz (London: Chapman and Hall/CRC), 403–422. doi: 10.1201/9781351264808-22

Mowery, D. C., Nelson, R. R., Sampat, B. N., and Ziedonis, A. A. (2001). The growth of patenting and licensing by US universities: an assessment of the effects of the Bayh–Dole act of 1980. Res. Pol. 30, 99–119. doi: 10.1515/9780804796361-008

Nye, B. D. (2015). Intelligent tutoring systems by and for the developing world: a review of trends and approaches for educational technology in a global context. Int. J. Art. Intell. Educ. 25, 177–203. doi: 10.1007/s40593-014-0028-6

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453. doi: 10.1126/science.aax2342

O’Donnell, C. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes in K-12 curriculum intervention research. Rev. Educ. Res. 78, 33–84. doi: 10.3102/0034654307313793

Papamitsiou, Z., and Economides, A. A. (2014). Learning analytics and educational data mining in practice: a systematic literature review of empirical evidence. Educ. Technol. Soc. 17, 49–64.

Pardo, A., and Siemens, G. (2014). Ethical and privacy principles for learning analytics. Br. J. Educ. Technol. 45, 438–450. doi: 10.1111/bjet.12152

Pedró, F., Subosa, M., Rivas, A., and Valverde, P. (2019). Artificial Intelligence in Education: Challenges and Opportunities for Sustainable Development. Paris: UNESCO.

Peters, M. A. (2018). Deep learning, education and the final stage of automation. Educ. Phil. Theory 50, 549–553. doi: 10.1080/00131857.2017.1348928

Popenici, S. A., and Kerr, S. (2017). Exploring the impact of artificial intelligence on teaching and learning in higher education. Res. Pract. Technol. Enhanced Learn. 12:22. doi: 10.1186/s41039-017-0062-8

Quadir, B., Chen, N.-S., and Isaias, P. (2020). Analyzing the educational goals, problems and techniques used in educational big data research from 2010 to 2018. Int. Learn. Environ. 1–17. doi: 10.1080/10494820.2020.1712427

Renz, A., and Hilbig, R. (2020). Prerequisites for artificial intelligence in further education: identification of drivers, barriers, and business models of educational technology companies. Int. J. Educ. Technol. Higher Educ. 17:14. doi: 10.1186/s41239-020-00193-3

Renz, A., Krishnaraja, S., and Gronau, E. (2020). Demystification of artificial intelligence in education–how much ai is really in the educational technology? Int. J. Learn. Anal. Art. Intell. Educ. (IJAI). 2, 4–30. doi: 10.3991/ijai.v2i1.12675

Roorda, D. L., Koomen, H. M. Y., Spilt, J. L., and Oort, F. J. (2011). The influence of affective teacher-student relationships on students’ school engagement and achievement: a meta-analytic approach. Rev. Educ. Res. 81, 493–529. doi: 10.3102/0034654311421793

Sao Pedro, M., Baker, R., and Gobert, J. (2013). “What different kinds of stratification can reveal about the generalizability of data-mined skill assessment models,” in Proceedings of the 3rd Conference on Learning Analytics and Knowledge (Leuven), 190–194.

Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., and Tufano, P. (2012). Analytics: the real-world use of big data. IBM Global Bus. Serv. 12, 1–20. doi: 10.1002/9781119204183.ch1

Sharples, M. (2000). The design of personal mobile technologies for lifelong learning. Comp. Educ. 34, 177–193. doi: 10.1016/s0360-1315(99)00044-5

Smutny, P., and Schreiberova, P. (2020). Chatbots for learning: a review of educational chatbots for the facebook messenger. Comp. Educ. 151:103862. doi: 10.1016/j.compedu.2020.103862

Sonderlund, A. L., Hughes, E., and Smith, J. (2019). The efficacy of learning analytics interventions in higher education: a systematic review. Br. J. Educ. Technol. 50, 2594–2618. doi: 10.1111/bjet.12720

Song, Y., Dai, X.-Y., and Wang, J. (2016). Not all emotions are created equal: expressive behavior of the networked public on China’s social media site. Comp. Hum. Behav. 60, 525–533. doi: 10.1016/j.chb.2016.02.086

Spikol, D., Ruffaldi, E., Dabisias, G., and Cukurova, M. (2018). Supervised machine learning in multimodal learning analytics for estimating success in project-based learning. J. Comp. Ass. Learn. 34, 366–377. doi: 10.1111/jcal.12263

Staats, C. (2016). Understanding implicit bias: what educators should know. Am. Educ. 39, 29–33. doi: 10.2307/3396655

Starcic, A. I. (2019). Human learning and learning analytics in the age of artificial intelligence. Br. J. Educ. Technol. 50, 2974–2976. doi: 10.1111/bjet.12879

The International Learning Sciences Forum (2019). The International Learning Sciences Forum: International Trends for Ai and Big Data in Learning Sciences. Taipei: National Taiwan Normal University.

Toh, L. P. E., Causo, A., Tzuo, P. W., Chen, I. M., and Yeo, S. H. (2016). A review on the use of robots in education and young children. J. Educ. Technol. Soc. 19, 148–163.

Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56. doi: 10.1038/s41591-018-0300-7

Tsai, C. C. (2000). Relationships between student scientific epistemological beliefs and perceptions of constructivist learning environments. Educ. Res. 42, 193–205. doi: 10.1080/001318800363836

Tsai, S. C., Chen, C. H., Shiao, Y. T., Ciou, J. S., and Wu, T. N. (2020). Precision education with statistical learning and deep learning: a case study in Taiwan. Int. J. Educ. Technol. Higher Educ. 17, 1–13. doi: 10.1186/s41239-020-00186-2

UNESCO (2015). SDG4-Education 2030, Incheon Declaration (ID) and Framework for Action. For the Implementation of Sustainable Development Goal 4, Ensure Inclusive and Equitable Quality Education and Promote Lifelong Learning Opportunities for All, ED-2016/WS/28. London: UNESCO

United Nations (2020). Policy Brief: Education During Covid-19 and Beyond. New York, NY: United Nations

VanRullen, R. (2017). Perception science in the age of deep neural networks. Front. Psychol. 8:142. doi: 10.3389/fpsyg.2017.00142

Viberg, O., Hatakka, M., Bälter, O., and Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Comput. Human Behav. 89, 98–110. doi: 10.1016/j.chb.2018.07.027

Williams, P. (2019). Does competency-based education with blockchain signal a new mission for universities? J. Higher Educ. Pol. Manag. 41, 104–117. doi: 10.1080/1360080x.2018.1520491

World Development and Report (2019). The Changing Nature of Work. Washington, DC: The World Bank/International Bank for Reconstruction and Development.

Xie, H., Chu, H.-C., Hwang, G.-J., and Wang, C.-C. (2019). Trends and development in technology-enhanced adaptive/personalized learning: a systematic review of journal publications from 2007 to 2017. Comp. Educ. 140:103599. doi: 10.1016/j.compedu.2019.103599

Yadegaridehkordi, E., Noor, N. F. B. M., Ayub, M. N. B., Affal, H. B., and Hussin, N. B. (2019). Affective computing in education: a systematic review and future research. Comp. Educ. 142:103649. doi: 10.1016/j.compedu.2019.103649

Yarkoni, T., and Westfall, J. (2017). Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122. doi: 10.1177/1745691617693393

Zawacki-Richter, O., Marín, V. I., Bond, M., and Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. Higher Educ. 16:39. doi: 10.1186/s41239-019-0171-0

Keywords : big data, artificial intelligence, education, learning, teaching

Citation: Luan H, Geczy P, Lai H, Gobert J, Yang SJH, Ogata H, Baltes J, Guerra R, Li P and Tsai C-C (2020) Challenges and Future Directions of Big Data and Artificial Intelligence in Education. Front. Psychol. 11:580820. doi: 10.3389/fpsyg.2020.580820

Received: 07 July 2020; Accepted: 22 September 2020; Published: 19 October 2020.

Reviewed by:

Copyright © 2020 Luan, Geczy, Lai, Gobert, Yang, Ogata, Baltes, Guerra, Li and Tsai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chin-Chung Tsai, dHNhaWNjQG50bnUuZWR1LnR3

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Big Data Analytics in Education: A Data-Driven Literature Review

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. (PDF) Big data in education: a state of the art, limitations, and

    big data in education research paper

  2. (PDF) The Use of Big Data in Education

    big data in education research paper

  3. (PDF) Big data method and its application in innovation education research

    big data in education research paper

  4. (PDF) Big Data in Education. A Bibliometric Review

    big data in education research paper

  5. Frontiers

    big data in education research paper

  6. (PDF) Big Data in Education Technology

    big data in education research paper

COMMENTS

  1. Big data in education: a state of the art, limitations, and ...

    Nov 2, 2020 · Big data is an essential aspect of innovation which has recently gained major attention from both academics and practitioners. Considering the importance of the education sector, the current tendency is moving towards examining the role of big data in this sector. So far, many studies have been conducted to comprehend the application of big data in different fields for various purposes ...

  2. Educational Big Data: Predictions, Applications and Challenges

    Nov 15, 2021 · The term educational big data stems from the rapidly growing educational data development, including students' inherent attributes, learning behavior, and psychological state. Educational big data has many applications that can be used for educational administration, teaching innovation, and research management.

  3. Big data in education: a state of the art, limitations, and ...

    Nov 1, 2020 · Big Data and Its Research Implications for Higher Education: Cases from UK Higher Education Institutions . Paper presented at the 2015 IIAI 4th International Confress on Advanced Applied

  4. A decade of research into the application of big data and ...

    Jul 20, 2023 · The need for data-driven decision-making primarily motivates interest in analysing Big Data in higher education. Although there has been considerable research on the value of Big Data in higher education, its application to address critical issues within the sector is still limited. This systematic review, conducted in December 2021 and encompassing 75 papers, analysed the applications of Big ...

  5. Education big data and learning analytics: a bibliometric ...

    Oct 16, 2023 · Research question 1. This study sought to examine education big data and learning analytics. The first finding answers the first research question, which is about the distribution of education big ...

  6. (PDF) Big data technology in education: Advantages ...

    Jul 1, 2018 · This study provides an in-depth review of Big Data Technology (BDT) advantages, implementations, and challenges in the education sector. BDT plays an essential role in optimizing education ...

  7. A decade of research into the application of big data and ...

    Jul 20, 2023 · Big Data and data science: A critical review of issues for educational research. Brit- ish Journal of Educational Technology , 50 (1), 101–113. https:// doi. org/ 10. 1111/ bjet. 12595

  8. Challenges and Future Directions of Big Data and Artificial ...

    Introduction. The purpose of this position paper is to present current status, opportunities, and challenges of big data and AI in education. The work has originated from the opinions and panel discussion minutes of an international conference on big data and AI in education (The International Learning Sciences Forum, 2019), where prominent researchers and experts from different disciplines ...

  9. Big data in education: a state of the art, limitations, and ...

    Nov 2, 2020 · A systematic review on big data in education is conducted in order to explore the trends, classify the research themes, and highlight the limitations and provide possible future directions in the domain. Big data is an essential aspect of innovation which has recently gained major attention from both academics and practitioners. Considering the importance of the education sector, the current ...

  10. Big Data Analytics in Education: A Data-Driven Literature Review

    In the past decade, the applications of big data and learning analytics in education have made significant headways resulting in new opportunities for educational research. However, big data analytics (BDA) has brought new challenges to educational analytics. This paper conducts a systematic data-driven Literature review of BDA in education. Using a topic modeling approach, we have identified ...