Unobtrusive and Multimodal Approach for Behavioral Engagement Detection of Students
We propose a multimodal approach for detection of students’ behavioral engagement states (i.e., On-Task vs. Off-Task), based on three unobtrusive modalities: Appearance, Context-Performance, and Mouse. Final behavioral engagement states are achieved by fusing modality-specific classifiers at the decision level. Various experiments were conducted on a student dataset collected in an authentic classroom.
Student engagement in learning is critical to achieving positive learning outcomes . Fredricks et al.  framed student engagement in three dimensions: Behavioral, emotional, and cognitive. In this work, we focus on behavioral engagement, where we aim to detect whether a student is On-Task or Off-Task [10, 11] at any time of the learning task. Towards this end, we propose a multimodal approach for detection of students’ behavioral engagement states (i.e., On-Task vs. Off-Task), based on three unobtrusive modalities: Appearance, Context-Performance, and Mouse. Final outputs of behavioral engagement states are obtained by fusing modality-specific classifiers at the decision level.
The proposed detection scheme incorporates data collected from three unobtrusive modalities: (1) Appearance: upper-body video captured using a camera; (2) Context-Performance: students’ interaction and performance data related to learning content; (3) Mouse: data related with mouse movements during the learning process. For a better evaluation of results, we analyzed the results separately for two learning tasks available: (1) Instructional, where students are watching videos; and (2) Assessment, where students are solving related questions.
Modality-specific data are fed into dedicated feature extractors [3, 9, 6], and the features are then classified with respective uni-modal classifiers (i.e., Random Forest Classifiers ). The decisions of separate classifiers are fused to output a final behavioral engagement state. For fusion, we propose to obtain a decision pool by incorporating all decision trees of modality-specific random forests and compute majority voting. This is equivalent to summing modality-specific confidence values and selecting the label with the highest confidence. Further details of the modalities, extracted features, and various fusion approaches we explored can be found in the full version of this paper .
3 Experimental Results
Through authentic classroom pilots, data were collected while the students were consuming digital learning materials for Math on laptops equipped with cameras. In total, 113 hours of data were collected from 17 9th graders for 13 sessions (40 minutes each), including the three unobtrusive modalities. For feature extraction, a sliding window of 8-seconds with 4-second overlap was utilized as in  for each modality. The collected data were labeled using HELP  by three educational experts. For final ground truth labels, the windowing was also applied over three label sets, which was followed by majority voting and validity filtering.
For the classification experiments, we divided each student’s data into 80% and 20% partitions, for training and testing, respectively. In order to reduce the effect of overfitting, we conducted leave-one-subject-out cross-validation and applied 10-fold random selection to balance training sets. The uni-modal and fusion results for different learning tasks (averaged over all runs and all student) are summarized in Table 1. As these results indicate, for Instructional sections, Appearance modality performs best (0.74) due to the lack of interactions necessary for the other modalities. For Assessment, fusing all modalities yields the best performance (0.89).
In summary, for behavioral engagement, we get relatively high results when only Appearance modality is used for Instructional sections whereas the fusion of all modalities yields better results in Assessment sections. The experiments also showed that it is beneficial to have context-dependent classification pipelines for different section types (i.e., Instructional and Assessment). In the light of these results, we can say that context plays an important role even when different tasks in the same vertical (i.e., learning) are considered.
- N. Alyuz, E. Okur, U. Genc, S. Aslan, C. Tanriover, and A. A. Esme. An unobtrusive and multimodal approach for behavioral engagement detection of students. In Proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education, MIE 2017, pages 26–32, New York, NY, USA, 2017. ACM. ISBN 978-1-4503-5557-5. doi: 10.1145/3139513.3139521. URL https://doi.acm.org/10.1145/3139513.3139521.
- S. Aslan, S. E. Mete, E. Okur, E. Oktay, N. Alyuz, U. E. Genc, D. Stanhill, and A. A. Esme. Human expert labeling process (help): Towards a reliable higher-order user state labeling process and tool to assess student engagement. Educational Technology, 57(1):53–59, 2017. ISSN 00131962. URL https://eric.ed.gov/?id=EJ1126255.
- G. Bradski and A. Kaehler. Opencv. Dr. Dobb’s journal of software tools, 3, 2000.
- R. M. Carini, G. D. Kuh, and S. P. Klein. Student engagement and student learning: Testing the linkages*. Research in Higher Education, 47(1):1–32, Feb 2006. ISSN 1573-188X. doi: 10.1007/s11162-005-8150-9. URL https://doi.org/10.1007/s11162-005-8150-9.
- C. Chen, A. Liaw, and L. Breiman. Using random forest to learn imbalanced data. University of California, Berkeley, 110:1–12, 2004.
- M. Christ, A. W. Kempa-Liehr, and M. Feindt. Distributed and parallel time series feature extraction for industrial big data applications. CoRR, abs/1610.07717, 2016. URL http://arxiv.org/abs/1610.07717.
- J. A. Fredricks, P. C. Blumenfeld, and A. H. Paris. School engagement: Potential of the concept, state of the evidence. Review of educational research, 74(1):59–109, 2004. doi: 10.3102/00346543074001059. URL https://doi.org/10.3102/00346543074001059.
- E. Okur, N. Alyuz, S. Aslan, U. Genc, C. Tanriover, and A. Arslan Esme. Behavioral engagement detection of students in the wild. In International Conference on Artificial Intelligence in Education (AIED 2017), volume 10331 of Lecture Notes in Computer Science, pages 250–261, Cham, June 2017. Springer International Publishing. ISBN 978-3-319-61425-0. doi: 10.1007/978-3-319-61425-0_21. URL https://doi.org/10.1007/978-3-319-61425-0_21.
- Z. A. Pardos, R. S. Baker, M. San Pedro, S. M. Gowda, and S. M. Gowda. Affective states and state tests: investigating how affect and engagement during the school year predict end-of-year learning outcomes. Journal of Learning Analytics, 1(1):107–128, 2014.
- R. Pekrun and L. Linnenbrink-Garcia. Academic emotions and student engagement. In Handbook of research on student engagement, pages 259–282. Springer, 2012.
- M. M. T. Rodrigo, R. Baker, L. Rossi, et al. Student off-task behavior in computer-based learning in the philippines: comparison to prior research in the usa. Teachers College Record, 115(10):1–27, 2013.