Authors:
(1) Krist Shingjergji, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands (krist.shingjergji@ou.nl);
(2) Deniz Iren, Center for Actionable Research, Open University of the Netherlands, Heerlen, The Netherlands (deniz.iren@ou.nl);
(3) Felix Bottger, Center for Actionable Research, Open University of the Netherlands, Heerlen, The Netherlands;
(4) Corrie Urlings, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands;
(5) Roland Klemke, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands.
Editor's note: This is Part 2 of 6 of a study detailing the development of a gamified method of acquiring annotated facial emotion data. Read the rest below.
A. Face expressions, action units, and their automated recognition
Facial expressions are a means for humans to express their emotions; thus, their automated detection is an important goal of affective computing. Facial expressions are movements and positions of the facial muscles that can be identified by Action Units (AUs); hierarchical components of movements of individual or groups of facial muscles that describe the changes in facial expressions [13]. There are studies focusing on the correlation between AUs and the basic emotions, namely, happiness, sadness, fear, disgust, anger, and surprise [14]. For example, Reisenzein et al. [15] reported coherence between amusement and smiling, and Wegrzyn et al. [16] presented a detailed mapping between the basic emotions and different parts of the face, e.g., lid raiser is essential for fear detection and lid tightener for anger. Apart from the basic emotions, there are studies aiming at detecting more complex emotional states, such as confusion, by utilizing AUs [17].
The strong relationship between emotional facial expressions and AUs has motivated researchers to develop AUdetection algorithms as well as curating AU-labeled face expression datasets such as CK+ [18] and DISFA [19]. For instance, Baltrusaitis et al. [20] presented an AU occurrence ˇ and intensity algorithm based on Histogram of Oriented Gradients (HOG); a method to describe an image by the distribution of intensity gradients or edge direction [21]), and geometrical features (e.g., shape and landmarks; detection and localization of certain characteristic points on the face [22]). This work also highlights the positive impact of using various datasets to the generalizability of the model performance. Shao et al. [23] presented a framework for detecting 10 AUs using the attention mechanism, i.e., finding the region-of-interest for each AU. Jacob and Stenger [24] outperformed their previous model by employing a correlation network based on a transformer-encoder architecture, to capture the relationship between different AUs for a wide range of expressions of emotions. Other prominent examples of architectures for AU detection are the JAA-Net [25], which uses high-level features of face alignment for AU detection, and DRML [26] which uses feed-forward functions to induce regions on the face that are important.
B. Explainable and Interpretable AI
As AI finds widespread application across many domains, the need for explainable AI (XAI) is rapidly growing as well. However, most explainability approaches do not target endusers, and their outcomes are not directly interpretable by humans. One way to address this issue is to improve the transparency of AI models. Model transparency focuses on explaining “how the system made a decision” [27]. There are models that are transparent by design, e.g., decision trees, and others that are “black box” and require additional tools for explainability [28]. In recent years, explanation tools have been designed to provide users insights on the decision-making process of a system. The study of Jeyakumar [12] showed that the users prefer explanations by example in most of the cases. Rosenfeld [29] presents a set of metrics that are suitable for evaluating the effectiveness of explainable AI, namely, i) the difference between the explanations’ logic and the agent’s actual performance, ii) the number of rules in the agent’s explanation, iii) the number of features used to construct the explanation, and iv) the stability of the agent’s explanation.
C. Crowdsourcing and Gamification for Data Collection
Crowdsourcing is defined as the act of outsourcing a task that is commonly performed by designated agents to a large number of individuals [30]. Crowdsourcing has been used both by industry and scientific communities for a variety of purposes, one of which is labeled data collection. In most cases, crowd workers complete a task with the motivation of monetary gain. Even though this approach has been proven cost-effective, it has also been criticized because it potentially leads to questionable data quality unless the necessary quality assurance mechanisms are put in place [31]. A subtype of crowdsourcing, games-with-a-purpose [32], provides a different kind of incentive [33] for the workers to complete the tasks to the best of their abilities, and it generally incurs no additional costs to the employer. The design of crowdsourcing tasks in the form of a game is considered a part of a much larger concept; gamification. Gamification can be defined as a technique of using game elements in non-game systems to improve user experience and engagement [34], increasing the motivation of the respondents by satisfying psychological and social needs [35]. Gamification of data collection finds application in different domains [36], such as education [37] [38] and health [39].
This paper is available on arxiv under CC BY 4.0 DEED license.