A novel database of Children’s Spontaneous Facial Expressions (LIRIS-CSE)
The computing environment is moving towards human-centered designs instead of computer centered designs and humanâs tend to communicate wealth of information through affective states or expressions. Traditional Human Computer Interaction (HCI)based systems ignores bulk of information communicated through those affective states and just caters for userâs intentional input. Generally, for evaluating and benchmarking different facial expression analysis algorithms, standardized databases are needed to enable a meaningful comparison. In the absence of comparative tests on such standardized databases it is difficult to find relative strengths and weaknesses of different facial expression recognition algorithms. Recording truly spontaneous instances of basic emotion expressions is extremely difficult, because in everyday life the basic emotions are not shown frequently. However, when they are displayed, they convey a very strong message to someoneâs surroundings.
In this article we presented novel database for Children’s Spontaneous facial Expressions (LIRIS-CSE). The database contains six universal spontaneous facial expressions shown by 12 ethnically diverse children between the ages of 6 and 12 years with mean age of 7.3 years. To the best of our knowledge, this database is first of its kind as it records and shows six universal spontaneous facial expressions of children. Previously there were few database of children expressions and all of them show posed or exaggerated expressions which are different from spontaneous or natural expressions. Thus, this database will be a milestone for human behavior researchers. This database will be a excellent resource for vision community for benchmarking and comparing results.
keywords:Facial expressions, database, deep learning, six universal expressions, expression recognition, transfer learning
OT1.cmap /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo ¡¡ /Registry (TeX) /Ordering (OT1) /Supplement 0 ¿¿ def /CMapName /TeX-OT1-0 def /CMapType 2 def 1 begincodespacerange ¡00¿ ¡7F¿ endcodespacerange 8 beginbfrange ¡00¿ ¡01¿ ¡0000¿ ¡09¿ ¡0A¿ ¡0000¿ ¡23¿ ¡26¿ ¡0000¿ ¡28¿ ¡3B¿ ¡0000¿ ¡3F¿ ¡5B¿ ¡0000¿ ¡5D¿ ¡5E¿ ¡0000¿ ¡61¿ ¡7A¿ ¡0000¿ ¡7B¿ ¡7C¿ ¡0000¿ endbfrange 40 beginbfchar ¡02¿ ¡0000¿ ¡03¿ ¡0000¿ ¡04¿ ¡0000¿ ¡05¿ ¡0000¿ ¡06¿ ¡0000¿ ¡07¿ ¡0000¿ ¡08¿ ¡0000¿ ¡0B¿ ¡0000¿ ¡0C¿ ¡0000¿ ¡0D¿ ¡0000¿ ¡0E¿ ¡0000¿ ¡0F¿ ¡0000¿ ¡10¿ ¡0000¿ ¡11¿ ¡0000¿ ¡12¿ ¡0000¿ ¡13¿ ¡0000¿ ¡14¿ ¡0000¿ ¡15¿ ¡0000¿ ¡16¿ ¡0000¿ ¡17¿ ¡0000¿ ¡18¿ ¡0000¿ ¡19¿ ¡0000¿ ¡1A¿ ¡0000¿ ¡1B¿ ¡0000¿ ¡1C¿ ¡0000¿ ¡1D¿ ¡0000¿ ¡1E¿ ¡0000¿ ¡1F¿ ¡0000¿ ¡21¿ ¡0000¿ ¡22¿ ¡0000¿ ¡27¿ ¡0000¿ ¡3C¿ ¡0000¿ ¡3D¿ ¡0000¿ ¡3E¿ ¡0000¿ ¡5C¿ ¡0000¿ ¡5F¿ ¡0000¿ ¡60¿ ¡0000¿ ¡7D¿ ¡0000¿ ¡7E¿ ¡0000¿ ¡7F¿ ¡0000¿ endbfchar endcmap CMapName currentdict /CMap defineresource pop end end
Computing paradigm has shifted from computer-centered computing to human-centered computing N1 (). This paradigm shift has created tremendous opportunity for computer vision research community to propose solution to existing problems and invent ingenious applications and products which were not though of before. One of the most important property of human-centered computing interfaces is the ability of machines to understand and react to social and affective or emotional signals N2 (); N10 ().
Mostly human’s express their emotion via facial channel, also known as facial expressions N10 (). Humans are blessed with the amazing ability to recognize facial expression robustly in real-time but for machines it still is a difficult task to decode facial expressions. Variability in pose, illumination and the way people show expressions across cultures are some of the parameters that make this task more difficult KHAN2013_jPat ().
Another problem that hinders the development of such system for real world applications is the lack of databases with natural displays of expressions 112 (). There are number of publicly available benchmark databases with posed displays of the six basic emotions 53 () i.e. happiness, anger, disgust, fear, surprise and sadness, exist but there is no equivalent of this for spontaneous / natural basic emotions. While, it has been proved that spontaneous facial expressions differ substantially from posed expressions 111 ().
Another issue with most of publicly available databases is absence of children in recorded videos or images. Research community has put lot of efforts to built databases of emotional videos or images but almost all of them contain adult emotional faces 91 (); 88 (). By excluding children’s stimuli in publicly available databases vision research community not only restricted itself to application catering only adults but also produced limited study for the interpretation of expressions developmentally N5 ().
2 Publicly available databases of children’s emotional stimuli
To the best of our knowledge there are only three publicly available databases that contains children emotional stimuli / images. They are:
The NIMH Child Emotional Faces Picture Set (NIMH-ChEFS) N6 () database has 482 emotional frames containing expressions of “fear”, “anger”, “happy” and “sad” with two gaze conditions: direct and averted gaze. Children that posed for this database were between 10 and 17 years of age. The databases is validated by 20 adult raters.
The Dartmouth Database of children Faces N7 () contains emotional images (six basic emotions) of 40 male and 40 female Caucasian children between the ages of 6 and 16 years. All facial images in the database were assessed by at least human 20 raters for facial expression identifiability and intensity. Expression of happy was most accurately identified while fear was least accurately identified by human raters. Human raters correctly classified 94.3% of the happy faces while expression of fear was correctly identified in 49.08% of the images, least identifiable by human raters. On average human raters correctly identified expression in 79.7% of the images. Refer Figure 1 for examples images from the database.
The Child Affective Facial Expression (CAFE) database N5 () is composed of 1192 emotional images (six basic emotions and neutral) of 2 to 8 years old children. Children that posed for this database were ethnically and racially diverse. Refer Figure 1 for examples frames from the database.
2.1 Weaknesses of publicly available databases of children’s emotional stimuli
Although above describe children expression databases are diverse in terms of pose, camera angles and illumination but have following drawbacks:
All of these databases contains only static images / mug shots with expression at peak intensity. According to study conducted by psychologist Bassili N8 () it was concluded that facial muscle motion/movement is fundamental to the recognition of facial expressions. He also concluded that human can robustly recognize expressions from video clip than by just looking at mug shot.
Generally, for evaluating and benchmarking different facial expression analysis algorithms, standardized databases are needed to enable a meaningful comparison. In the absence of comparative tests on such standardized databases it is difficult to find relative strengths and weaknesses of different facial expression recognition algorithms. Thus, it is utmost important to develop natural / spontaneous emotional database contains children movie clip / dynamic images. This will allow research community to built robust system for children’s natural facial expression recognition.
3 Novelty of proposed database (LIRIS-CSE)
To overcome above mentioned drawbacks of databases of children’s facial expression, we are presenting a novel emotional database that contains movie clip / dynamic images of 12 ethnically diverse children. This unique database contains spontaneous / natural facial expression of children in diverse settings (refer Figure 2 to see variations in recording scenarios) showing six universal or prototypic emotional expressions (“happiness”, “sadness”, “anger”, “surprise”, “disgust” and “fear”) N15 (); 48 (). Children are recorded in constraint free environment (no restriction on head movement, no restriction on hands movement, free sitting setting, no restriction of any sort) while they watched specially built / selected stimuli. This constraint free environment allowed us to record spontaneous / natural expression of children as they occur. The database has been validated by 22 human raters. Details of recording parameters are presented in Table 2.
The spontaneity of recorded expressions can easily be observed in Figure 1. Expressions in our proposed database are spontaneous and natural and can easily be differentiated from posed / exaggerated expressions of the other two databases. Figure 3 shows facial muscle motion / transition for different spontaneous expressions.
In total 12 (five male and seven female children) ethnically diverse children between the ages of 6 and 12 years with mean age of 7.3 years participated in our database recording session. 60 % of recordings are done in classroom / lab environment and 40% of the clips in the database are recorded in home conditions. Recording children in two different environment has been done to have different background and illumination conditions in the recorded database. Refer Figure 2 example images with different backgrounds and illumination conditions.
4 Database acquisition details
First step for the creation of proposed spontaneous expression database was the selection of visual stimuli that can induce emotions in children. Considering ethical reasons and young age of children we carefully selected stimuli and removed any stimuli that can have long term negative impact on the children. Due to these ethical reasons we did not include emotion inducer clips for the negative expression of “anger” and selected very few clips to induce emotion of “fear” and “sadness”. The same has been practiced before by Valstar et. al N9 (). Due to this very reason the proposed database contains more emotional clips of expressions of “happiness” and “surprise”.
Although there were no emotion inducer clips for the expression of “anger” but still database contains few clips where children show expression of “anger” (refer Figure 1) due to the fact that young children use expressions of “disgust” and “anger” interchangeably N12 ().
4.1 Emotion inducing stimuli
We either selected only animated cartoon / movies or small video clips of kids doing funny actions to stimuli list. The reasons for selecting videos to induce emotions in children are as follows:
All the selected videos for inducing emotions contains audio as well. Video stimuli along with audio gives immersive experience, thus is powerful emotion inducer N13 ().
Video stimuli provides more engaging experience then static images, restricting undesirable head movement.
Video stimuli can evoke emotions for a longer duration. This helped us in recording and spotting children facial expressions.
List of stimuli selected as emotion inducers are presented in Table 1. Total running length of selected stimuli is 17 minutes and 35 Seconds. One of the consideration for not selecting more stimuli is to prevent children’s lose of interest or disengagement over time toAddinThesis6 ().
|1||Disgust||YouTube||Babies Eating Lemons for the First Time Compilation 2013||42 Sec|
|2||Disgust||YouTube||On a Plane with Mr Bean (Kid puke)||50 Sec|
|3||Fear and surprise||YouTube||Ghoul Friend - A Mickey Mouse Cartoon - Disney Shows||50 Sec|
|4||Fear||YouTube||Mickey Mouse - The Mad Doctor - 1933||57 Sec|
|5||Fear & surprise||Film||âHow To Train Your Dragonâ (Monster dragon suddenly appears and kills small dragon)||121 Sec|
|6||Fear||Film||âHow To Train Your Dragonâ (Monster dragon throwing fire)||65 Sec|
|7||Fear||YouTube||Les Trois Petits Cochons||104 Sec|
|8||Happy||YouTube||Best Babies Laughing Video Compilation 2014 (three clips)||59 Sec|
|9||Happy||YouTube||Tom And Jerry Cartoon Trap Happy||81 Sec|
|10||Happy, surprise & fear||YouTube||Donald Duck- Lion Around 1950||40 Sec|
|11||Happily surprised||YouTube||Bip Bip et Coyote - Tired and feathered||44 Sec|
|12||Happily surprised||YouTube||Donald Duck - Happy Camping||53 Sec|
|13||Sad||YouTube||Fox and the Hound - Sad scene||57 Sec|
|14||Sad||YouTube||Crying Anime Crying Spree 3||14 Sec|
|15||Sad||YouTube||Bulldog and Kitten Snuggling||29 Sec|
|16||Surprise||Film||Ice Age- Scrat’s Continental Crack-Up||32 Sec|
|17||Surprise & happy||Film||Ice Age (4-5) Movie CLIP - Ice Slide (2002)||111 Sec|
|18||Happy||YouTube||bikes funny (3)||03 Sec|
|19||Happy||YouTube||bikes funny||06 Sec|
|20||Happy||YouTube||The Pink Panther in ’Pink Blue Plate||37 Sec|
|Total running length of stimuli = 17 minutes and 35 Seconds|
4.2 Recording setup
|S1-S7||Classroom||25||800 * 600|
|S8-S10||Home||25||720 * 480|
|S11-S12||Home||25||1920 * 1080|
Inspired by Li et al. N13 (), we setup high speed webcam, mounted at the top of laptop with speaker output, at a distance of 50 cm. As explained above, The audio output enhanced the visual experience of a child, thus helping us induce emotions robustly. Recording setup is illustrated in Figure 4. As mentioned in Section 3.1, children were recorded in two different environments i.e. classroom / lab environment and home environment. Details of recording parameters are presented in Table 2.
4.3 Video segmentation
After recording video for each child we carefully examined the recoding and removed any unnecessary recorded part, usually at the beginning and at the end of video recording. As the video (with audio) stimuli that children watched was the combination of different emotional videos (refer Section 4.1 for the details of visual stimuli), our recorded video contained whole spectrum of expressions in one single video. We than carefully manually segmented one single video recording of each child into segments of small video chunks such that each video chunk show one pronounced expressions. Refer Figure 3 to see results after segmentation process. It can be observed from the referred figure that each small video chunk contains neutral expression at the beginning, then shows onset of an expression, and finishes when expression is visible at its peak along with some frames after peak expression frame. Total number of small video chunks, each containing specific expression, present in this database are 208.
There are seventeen (17) video chunks present in this database that have two labels, for example “happily surprised”, “ Fear surprise” etc. This is due to the fact that for young children different expressions co-occur / blended expressions N14 (); N12 () or a visual stimuli was so immersive that transition from one expression to another expression was not pronounced. Refer Figure 5 to see example images from segmented video segments that show blended expressions.
4.4 Database validation
The database has been validated by 22 human raters / evaluators between the ages of 18 and 40 years with mean age of 26.8 years. 50% of database raters / evaluators were in the age bracket of 18 - 25 years and rest of 50% were in the age bracket of 26 - 40 years. Human evaluators who were in the age bracket of 18 - 25 years were university students and other group of evaluators were university faculty members. Human raters / evaluators were briefed about the experiment before they started validating the database.
For database validation purpose we built software that played segmented video (in random order) and records human evaluator choice of expression label. The screen capture of the software is presented in Figure 7. If required, evaluator can play video multiple times before recoding their choice for any specific video. We also provided evaluator an option of “undecided” as well. This helped us in collecting authentic responses and removing biases, as evaluator can chose this option when played video did not show any visible expression. Refer Figure 6 to see example images from the video segments that are labeled as “undecided” the most.
In summary, instructions given to human raters / evaluators were following:
Watch carefully each segmented video and select expression that is shown in the played segmented video.
If played video did not show any visible / pronounced expression, selected an option of “undecided”. Each video can be played multiple times without any upper bound on number of times video to be played.
Once expression label / response is submitted for a played segmented video then this label can not be edited.
4.5 Validation data analysis
After validation data collection, we performed statistical analysis on the gathered data and calculated confusion matrix. Refer Figure 8 to see calculated confusion matrix. Rows in the referred confusion matrix show induced intended expressions (average%), while columns show expression label given by human raters / evaluators (average %). Diagonal values represent agreement between induced intended expressions and expression label given by human evaluators, with darker colors representing greater agreement.
As per calculated results expression of “happy” was most correctly spotted by evaluators, with average accuracy of 73.23%. On the other hand expression of “ fear” was least correctly identified by evaluators, with average accuracy of 35.65%. These results are consistent with results from N7 (); N15 ().
Expression of “fear” which is least identified, is often perceptually mixed with expressions of “surprise” and “disgust”. As mentioned above, this is due to the fact that for young children different expressions co-occur (blended expressions) N14 (); N12 () or a visual stimuli was so immersive that transition from one expression to another expression was not pronounced. Refer Figure 5 to see example images from segmented video segments that show blended expressions.
Overall average accuracy of human evaluators / raters is 54.7%. As per study published by Matsumoto et al. N16 () human’s usually can spot expressions correctly 50% of the time and the easiest expression for human’s to identify are “happy” and “surprise”. These results conforms well with the results that we obtained from human evaluators as expression of “happy” was most correctly identified while average accuracy of human evaluators raters is also around 50% (54.7% to be exact).
5 Database availability
The novel database of Children’s Spontaneous Expressions (LIRIS-CSE) is available for research community freely. It can be downloaded by researcher / lab after signing and returning End User License Agreement (EULA). Website to download LIRIS-CSE database is:
6 Automatic recognition of affect, a transfer learning based approach
In recent years, researchers have been successful in developing models that can recognize affect robustly KHAN2013_jPat (); Khan_ISVC (); Khan2018 (). Most of these successful models are based on deep learning approach 117 (); 118 (); 119 (), specifically, a Convolutional Neural Network (CNN) architecture 120 (). CNNs are class of deep, feed forward neural networks that have shown robust results for applications involving visual input i.e image / object recognition 125 (), face expression analysis 119 (), semantic scene analysis / semantic segmentation 125 (); 127 (), gender classification 128 () etc.
The architecture of CNN, as seen in Figure 9, was first proposed by LeCun 120 (). It is a multi-stage or multi-layer architecture. This essentially means there are multiple stages in CNN for feature extraction. Every stage in the network has an input and output which is composed of arrays known as feature maps. Every output feature map consists of patterns or features extracted on locations of the input feature map. Every stage is made up of three layers after which classification takes place corr1803 (); zeilerconv (); 123 (). These layers are:
Convolution layer: This layer makes use of filters, which are convolved with the image, producing activation or feature maps.
Feature Pooling layer: This layer is inserted to reduce the size of the image representation, to make the computation efficient. The number of parameters is also reduced which in turn controls over-fitting.
Classification layer: This is the fully connected layer. This layer computes the probability / score learned classes from the extracted features from convolution layer in the preceding steps.
CNN requires large database to learn concept 129 (), making it impractical for different applications. This bottleneck is usually avoided using transfer learning technique 121 (). Transfer learning is a machine learning approach that focuses on ability to apply relevant knowledge from previous learning experiences to a different but related problem.
We have used transfer learning approach to built framework for expression recognition using our proposed database as the size of our database is not sufficient to robustly train all layers of CNN. We used pre-trained VGG model (vgg16, a 16 layered architecture), which is a deep convolutional network trained for object recognition 125 (). It is developed and trained by Oxford University’s Visual Geometry Group (VGG) and shown to achieve robust performance on the ImageNet dataset 126 () for object recognition.
We replaced last fully connected layer of VGG pre-trained model with dense layer having five outputs. Number of output of last dense layer corresponds to number of classes to be recognized, in our experiment we learned concept of five classes i.e. five expression to be recognized (out of six universal expression, expression of ”anger” was not included in this experiment as there are few clips for ”anger”, for explanation see Section 4). We trained last dense layer with images (frames from videos) from our proposed database using softmax activation function and ADAM optimizer 130 (). With above mentioned parameters proposed CNN achieved average expression accuracy of 72.45% on our proposed database (five expressions).
In this article we presented novel database for Children’s Spontaneous Expressions (LIRIS-CSE). The database contains six universal spontaneous expression shown by 12 ethnically diverse children between the ages of 6 and 12 years with mean age of 7.3 years. There were five male and seven female children. 60% of recordings were done in classroom / lab environment and 40% of the clips in the database were recorded in home conditions.
The LIRIS-CSE database contains 208 small video chunk contains neutral expression at the beginning. Then these video segments show onset of an expression, and finishes when expression is visible at its peak along with some frames after peak expression frame. The database has been validated by 22 human raters / evaluators between the ages of 18 and 40 years.
To the best of our knowledge, this database is first of its kind as it records and shows six universal spontaneous expressions of children. Previously there were few database of children expressions and all of them show posed or exaggerated expressions which are different from spontaneous or natural expressions. Thus, this database will be a milestone for human behavior researchers. This database will be a excellent resource for vision community for benchmarking and comparing results.
For benchmarking automatic recognition of expression we have also provided result using Convolutional Neural Network (CNN) architecture with transfer learning approach. Proposed approach obtained average expression accuracy of 72.45% on our proposed database, LIRIS-CSE (five expressions).
- M. Pantic, A. Pentland, A. Nijholt, T. Huang, Human computing and machine understanding of human behavior: A survey, 2006.
- M. Pantic, Machine analysis of facial behaviour: naturalistic and dynamic behaviour, Philosophical Transactions of the Royal Society B: Biological Sciences, 364 (1535) (2009) 3505–3513.
- B. C. Ko, A brief review of facial emotion recognition based on visual information, Sensors 18 (2). doi:https://doi.org/10.3390/s18020401.
R. A. Khan, A. Meyer, H. Konik, S. Bouakaz,
for reliable, real-time facial expression recognition for low resolution
images, Pattern Recognition Letters 34 (10) (2013) 1159 – 1168.
- M. Valstar, M. Pantic, Induced disgust, happiness and surprise: an addition to the MMI facial expression database, in: International Language Resources and Evaluation Conference, 2010.
- P. Ekman, Universals and cultural differences in facial expressions of emotion, in: Nebraska Symposium on Motivation, Lincoln University of Nebraska Press, 1971, pp. 207–283.
- B. M. S., L. G., B. B., S. T. J., M. J. R., A prototype for automatic recognition of spontaneous facial actions, in: Advances in Neural Information Processing Systems, 2002.
- L. P., C. J. F., K. T., S. J., A. Z., M. I., The extended cohn-kande dataset (CK+): A complete facial expression dataset for action unit and emotion-specified expression., in: IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2010.
- M. Pantic, M. F. Valstar, R. Rademaker, L. Maat, Web-based database for facial expression analysis, in: IEEE International Conference on Multimedia and Expo, 2005.
- L. V., T. C., The child affective facial expression (CAFE) set: validity and reliability from untrained adults, Frontiers in Psychology 5 (1532). doi:http://doi.org/10.3389/fpsyg.2014.01532.
- H. L. Egger, D. S. Pine, E. Nelson, E. Leibenluft, M. Ernst, K. E. Towbin, A. Angold, The nimh child emotional faces picture set (NIMH-ChEFS): a new set of children’s facial emotion stimuli, International Journal of Methods in Psychiatric Research.
K. A. Dalrymple, J. Gomez, B. Duchaine,
The dartmouth database of
childrenâs faces: Acquisition and validation of a new face stimulus set,
PLOS ONE 8 (11) (2013) 1–7.
- Q. Gan, S. Nie, S. Wang, Q. Ji, Differentiating between posed and spontaneous expressions with latent regression bayesian network, in: Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- J. Bassili, Emotion recognition: The role of facial movement and the relative importance of upper and lower areas of the face.
- E. P, F. W, Pictures of facial affect, Consulting Psychologists.
- P. Ekman, Facial expression of emotion, Psychologist 48 (1993) 384–392.
- M. F. Valstar, M. Pantic, Induced disgust, happiness and surprise: an addition to the mmi facial expression database, in: Proceedings of Int’l Conf. Language Resources and Evaluation, Workshop on EMOTION, Malta, 2010, pp. 65–70.
- W. SC, R. JA., Children’s recognition of disgust in others, Psychological Bulletin 139 (2013) 271–299.
- X. Li, T. Pfister, X. Huang, G. Zhao, M. PietikÃ¤inen, A spontaneous micro-expression database: Inducement, collection and baseline, in: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–6. doi:10.1109/FG.2013.6553717.
- R. A. Khan, A. Meyer, H. Konik, S. Bouakaz, Exploring human visual system: study to aid the development of automatic facial expression recognition framework, in: Computer Vision and Pattern Recognition Workshop, 2012.
- M. W. Sullivan, M. Lewis, Emotional expressions of young infants and children: A practitioner’s primer, Infants & Young Children.
- D. Matsumoto, H. S. Hwang, Reading facial expressions of emotion, Tech. rep., American Psychological Association (APA) , Psychological Science Agenda (2011).
- R. A. Khan, A. Meyer, S. Bouakaz, Automatic affect analysis: From children to adults, in: Advances in Visual Computing, Springer International Publishing, 2015, pp. 304–313.
R. A. Khan, A. Meyer, H. Konik, S. Bouakaz,
Saliency-based framework for
facial expression recognition, Frontiers of Computer Sciencedoi:10.1007/s11704-017-6114-9.
H.-W. Ng, N. V. Dung, V. Vassilios, W. Stefan,
Deep learning for emotion
recognition on small datasets using transfer learning, in: International
Conference on Multimodal Interaction, ICMI ’15, ACM, New York, NY, USA, 2015,
- D. Hamester, P. Barros, S. Wermter, Face expression recognition with a 2-channel convolutional neural network, in: International Joint Conference on Neural Networks, 2015.
- L. Chen, M. Zhou, W. Su, M. Wu, J. She, K. Hirota, Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction, Information Sciences 428 (2018) 49–61.
- Y. LeCun, K. Kavukcuoglu, C. Farabet, Convolutional networks and applications in vision, in: IEEE International Symposium on Circuits and Systems, 2010.
- B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, A. Torralba, Places: A 10 million image database for scene recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (6) (2018) 1452–1464. doi:10.1109/TPAMI.2017.2723009.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, CoRR arXiv:1606.00915.
- O. Arriaga, M. Valdenegro-Toro, P. PlÃ¶ger, Real-time convolutional neural networks for emotion and gender classification, CoRR arXiv:1710.07557.
- M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, S. Nasrin, B. C. V. Esesn, A. A. S. Awwal, V. K. Asari, The history began from alexnet: A comprehensive survey on deep learning approaches, CoRR arXiv:1803.01164.
- I. Hadji, R. P. Wildes, What do we understand about convolutional networks?, CoRR abs/1803.08834.
- M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), Computer Vision – ECCV 2014, Springer International Publishing, Cham, 2014, pp. 818–833.
- F. Zhou, B. Wu, Z. Li, Deep meta-learning: Learning to learn in the concept space, CoRR arXiv:1802.03596.
- S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering 22 (10) (2010) 1345–1359.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: Computer Vision and Pattern Recognition, 2009.
- D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, CoRR arXiv:1412.6980.