ACII A-VB 2022 — Hume AI | ML Competitions

The ACII Affective Vocal Bursts (A-VB) Workshop & Competition

Understanding a critically understudied modality of emotional expression

Alice Baird, Panagiotis Tzirakis, Jeffrey Brooks,
Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen

The 2022 ACII Affective Vocal Burst Workshop & Challenge (A-VB) is a workshop-based challenge that introduces the problem of understanding emotion in vocal bursts – a wide range of non-verbal vocalizations that includes laughs, grunts, gasps, and much more. With affective states informing both mental and physical wellbeing, the core focus of the A-VB workshop is the broader discussion of current strategies in affective computing for modeling vocal emotional expression. Within this first iteration of the A-VB Challenge, the participants will be presented with four emotion-focused sub-challenges that utilize the large-scale and “in-the-wild” Hume-VB dataset. The dataset and the four sub-challenges draw attention to new innovations in emotion science as it pertains to vocal expression, addressing low- and high-dimensional theories of emotional expression, cultural variation, and “call types” (laugh, cry, sigh, etc.).

arXiv Proceedings White Paper GitHub

Baselines and Results

The A-VB white paper details the baseline results. Participant results for each task are as follows. * A member of the team is an A-VB organiser, therefore this result is excluded from the official rankings.

        
TeamTaskTest CCC
Organisers ComParE BLA-VB High0.5214
Organisers End2You BLA-VB High0.5686
TeamEP-ITSA-VB High0.6554
SclabCNUA-VB High0.6677
HCAIA-VB High0.6846
HCCLA-VB High0.7237
Anonymous (Winners!)A-VB High0.7295
EIHW*A-VB High0.7363





 
        
TeamTaskTest CCC
Organisers ComParE BLA-VB Two0.4986
Organisers End2You BLA-VB Two0.5084
SclabCNUA-VB Two0.6202
TeamEP-ITSA-VB Two0.629
HCCL (Winners!)A-VB Two 0.6854
EIHW*A-VB Two0.7066




        
TeamTaskTest CCC
ComParE BLA-VB Culture0.3887
End2You BLA-VB Culture0.4401
TeamEP-ITSA-VB Culture0.5199
HCAIA-VB Culture0.5258
SclabCNUA-VB Culture0.5495
HCCL (Winners!)A-VB Culture0.6017
EIHW*A-VB Culture0.6195





        
TeamTaskTest UAR
ComParE BLA-VB Type0.3839
End2You BLA-VB Type0.4172
TeamEP-ITSA-VB Type0.4902
SclabCNUA-VB Type0.497
AVBA-VB Type0.519
EIHW*A-VB Type0.5618
HCAI (Winners!)A-VB Type0.5856

Workshop Schedule

Monday 17th October, ACII (Virtual).

NYC GMT-4TOKYO GMT+9TypeTalk TitleWho

00-06:1019:00-19:10OrganisersWorkshop WelcomeAlice Baird
10-06:2519:10-19:25PaperThe ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expressionAlice Baird
30-06:4519:30-19:45PaperJointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic EmbeddingBagus Tris Atmaja
50-07:0519:50-20:05PaperPredicting Affective Vocal Bursts with Finetuned wav2vec 2.0Bagus Tris Atmaja
05-07:5020:05-20:50KeynoteNonverbals and what you might have wanted to know about themAnton Batliner
00-08:1521:00-21:15Coffee Break
15-08:3021:15-21:30PaperClassification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Network Networks and Deep Acoustic EmbeddingsZafi Sherhan Syed
35-08:5021:35-21:50PaperSelf-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal BurstDang-Linh Trinh
55-09:1021:55-22:10PaperFine-tuning Wav2vec for Vocal-burst Emotion RecognitionHyung-Jeong Yang
15-10:0022:15-23:00KeynoteNew resources for measuring expressive communicationAlan Cowen
15-10:3023:15-23:10Coffee Break
30-10:4523:30-23:45PaperAn Efficient Multitask Learning Architecture for Affective Vocal Burst AnalysisTobias Hallmen
50-11:0523:50-00:05PaperSelf-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal BurstsVincent Keras
10-11:3000:10-00:30OrganisersWinner Announcements and Closing RemarksAlice Baird

NYC GMT-4	TOKYO GMT+9	Type	Talk Title	Who
06:00-06:10	19:00-19:10	Organisers	Workshop Welcome	Alice Baird
06:10-06:25	19:10-19:25	Paper	The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression	Alice Baird
06:30-06:45	19:30-19:45	Paper	Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding	Bagus Tris Atmaja
06:50-07:05	19:50-20:05	Paper	Predicting Affective Vocal Bursts with Finetuned wav2vec 2.0	Bagus Tris Atmaja
07:05-07:50	20:05-20:50	Keynote	Nonverbals and what you might have wanted to know about them	Anton Batliner
08:00-08:15	21:00-21:15	Coffee Break
08:15-08:30	21:15-21:30	Paper	Classification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Network Networks and Deep Acoustic Embeddings	Zafi Sherhan Syed
08:35-08:50	21:35-21:50	Paper	Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst	Dang-Linh Trinh
08:55-09:10	21:55-22:10	Paper	Fine-tuning Wav2vec for Vocal-burst Emotion Recognition	Hyung-Jeong Yang
09:15-10:00	22:15-23:00	Keynote	New resources for measuring expressive communication	Alan Cowen
10:15-10:30	23:15-23:10	Coffee Break
10:30-10:45	23:30-23:45	Paper	An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis	Tobias Hallmen
10:50-11:05	23:50-00:05	Paper	Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts	Vincent Keras
11:10-11:30	00:10-00:30	Organisers	Winner Announcements and Closing Remarks	Alice Baird

Keynote Speakers

Dr Anton Batliner. University of Augsburg, Germany. “Nonverbals and what you might have wanted to know about them”.

Dr. Anton Batliner received his doctoral degree in Phonetics in 1978 from LMU Munich. He is now with the Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany. His main research interests are all (cross-linguistic) aspects of prosody and (computational) paralinguistics; h-index > 50, > 13000 citations.

Dr. Alan Cowen. Hume AI, New York, USA. “New resources for measuring expressive communication"

Dr. Alan Cowen is an applied mathematician and computational emotion scientist developing new data-driven methods to study human experience and expression. He was previously a researcher at the University of California and visiting scientist at Google, where he helped establish affective computing research efforts. His discoveries have been featured in leading journals such as Nature, PNAS, Science Advances, and Nature Human Behavior and covered in press outlets ranging from CNN to Scientific American. His research applies new computational tools to address how emotional behaviors can be evoked, conceptualized, predicted, and annotated, how they influence our social interactions, and how they bring meaning to our everyday lives.

Important Dates

Challenge Opening (data available): May 27, 2022
Baselines information released: July 1, 2022
‘Other Topics’ deadline [CMT]: July 22, 2022 August 1, 2022
(Included in ACII Proceedings)
Notification of Acceptance: July 29, 2022 August 8, 2022
Camera Ready: August 15, 2022
Competition deadline: September 2, 2022 September 14. 2022
Competition Technical Report submission deadline [CMT]: September 6, 2022 September 16, 2022
(Peer Reviewed by A-VB technical committee, not included in ACII Proceedings)
Notification of Acceptance: September 16, 2022 September 23, 2022
Workshop: October 17, 2022

Paper Submission

As well as the test set results, all participants in the A-VB competition should also submit a technical report to describe their approach and results. We suggest that this paper is no more than 4 pages, and can be uploaded to arXiv, as well as the CMT.

The baseline white paper provides a more extensive description of the data as well as baseline results. Competition papers should include the following citation for the data repository:

@article{Cowen2022HumeVB,
     title={The Hume Vocal Burst Competition Dataset {(H-VB)} | Raw Data [ExVo: updated 02.28.22] [Data set]},
     author={Cowen, Alan and Baird, Alice and Tzirakis, Panagiotis and Opara, Michael and Kim, Lauren and Brooks, Jeff and Metrick, Jacob},
     journal={Zenodo}, 
     doi = {https://doi.org/10.5281/zenodo.6308780},
     year={2022}}

@misc{BairdA-VB2022,
    author = {Baird, Alice and Tzirakis, Panagiotis and Batliner, Anton and  Schuller, Björn and Keltner, Dacher and Cowen, Alan},
    title = {The ACII 2022 Affective Vocal Bursts Workshop and Competition: Understanding a critically understudied modality of emotional expression},
    publisher = {arXiv},
    doi = {[to appear]},
    year = {2022}}

Competition Tasks and Rules

The High-Dimensional Emotion Task (A-VB High).

The A-VB High track, explores a high-dimensional emotion space for understanding vocal bursts. Participants will be challenged with predicting the intensity of 10 emotions (Awe, Excitement, Amusement, Awkwardness, Fear, Horror, Distress, Triumph, Sadness, and Surprise) associated with each vocal burst as a multi-output regression task. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across all 10 emotions. The baseline for this challenge will be based on CCC.

The Two-Dimensional Emotion Task (A-VB Two).

In the A-VB Two track, we investigate a low-dimensional emotion space that is based on the circumplex model of affect. Participants will predict values of arousal and valence (on a scale from 1=unpleasant/subdued, 5=neutral, 9=pleasant/stimulated) as a regression task. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across the two dimensions. The baseline for this challenge will be based on CCC.

The Cross-Cultural Emotion Task (A-VB Culture).

In the A-VB Culture track, participants will be challenged with predicting the intensity of 10 emotions associated with each vocal burst as a multi-output regression task, using a model or multiple models that generate predictions specific to each of the four cultures (the U.S., China, Venezuela, or South Africa). Specifically, annotations of each vocal burst will consist of culture-specific ground truth, meaning that the ground truth for each sample will be the average of annotations solely from the country of origin of the sample. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across all 10 emotions. The baseline for this challenge will be based on CCC.

The Expressive Burst-Type Task (A-VB Type).

In the A-VB Type task, participants will be challenged with classifying the type of expressive vocal burst from 7 classes (Gasp, Laugh, Cry, Scream, Grunt, Groan, Pant, Other). Participants will report the Unweighted Average Recall (UAR) as a measure of performance.

Data and Team Registration

This package includes the raw data for a subset of The Hume Vocal Burst Database (H-VB), including all train, validation, and test recordings and corresponding emotion ratings for the train and validation recordings.

This dataset contains 59,201 audio recordings of vocal bursts from 1,702 speakers, from 4 cultures—the U.S, South Africa, China, and Venezuela—ranging in age from 20 to 39.5 years old. The duration of data in this version of H-VB is 36 Hours (Mean: 02.23 sec). The emotion ratings correspond to ten emotion concepts, listed below, and averaged 0-100 intensities for each emotion concept, with each sample having been rated by an average of 85.2 raters.

Emotion Labels: Awe, Excitement, Amusement, Awkwardness, Fear, Horror, Distress, Triumph, Sadness, Surprise

	
             Train 
             Validation 
           Test 
         
           HH:MM:SS 
            12:19:06 
            12:05:45 
           12:22:12 
         
           Samples 
            19,990 
            19,396 
           19,815 
         
           Speakers 
            571 
            568 
           563 
         
           F:M 
            305:266 
            324:244
           --
         
           USA 
            206
            206
           --
         
           China 
            79
            76
           --
         
           South Africa 
            244
            244
           --
         
           Venezuela 
            42
            42
           --

	Train	Validation	Test
HH:MM:SS	12:19:06	12:05:45	12:22:12
Samples	19,990	19,396	19,815
Speakers	571	568	563
F:M	305:266	324:244	--
USA	206	206	--
China	79	76	--
South Africa	244	244	--
Venezuela	42	42	--

**Figure 1.** t-SNE representation of the emotional space of the Hume-VB dataset (training set only).

An overview of the data can be found at this Zenodo repository. To gain access, register your team by emailing competitions@hume.ai with the following information:

Team Name, Researcher Name, Affiliation, and Research Goals

Restricted Access: After registering your team, you will receive an End User License Agreement (EULA) for signature. Please note that this dataset is provided only for competition use. Requests for use of the data beyond the competition should be directed to Hume AI (hello@hume.ai).

Results Submission

For all tasks, participants should submit their test set results as a zip file to competitions@hume.ai, following these guidelines:

Predictions should be submitted as a comma-delineated CSV with the following naming convention: [taskname]_[team name]_[submission no].csv
The CSV should contain only one prediction per test set file.

Key Note Speakers

During the workshop, we will host talks from renowned experts in the field. Full list to be announced shortly.

Organizers

Alice Baird. Hume AI, New York, USA. Alice Baird is an audio researcher with interdisciplinary expertise in machine learning, computational paralinguistics, stress, and emotional well-being. She completed her PhD at the University of Augsburg’s Chair of Embedded Intelligence for Health Care and Wellbeing in 2021, where she was supervised by Dr Björn Schuller. Her work on emotion understanding from speech, physiological, and multimodal data has been published extensively in leading journals and conferences including INTERSPEECH, ICASSP, IEEE Intelligent Systems, and the IEEE Journal of Biomedical and Health Informatics (i10-index: 29). Alice has had extensive experience with competition organization, holding data chair for both the INTERSPEECH Computational Paralinguistics Challenge (ComParE) and ACM MM Multimodal Sentiment Challenge (MuSe). She recently joined Hume AI as an AI research scientist.

Panagiotis Tzirakis. Hume AI, New York, USA. Dr Tzirakis is a computer scientist and AI researcher with expertise in deep learning and emotion recognition across modalities. He earned his Ph.D. with the Intelligent Behaviour Understanding Group (iBUG) at Imperial College London, where he advanced multimodal emotion recognition efforts. He has published in top outlets including Information Fusion, International Journal of Computer Vision, and several IEEE conference proceedings (e.g. ICASSP, INTERSPEECH) on topics including 3D facial motion synthesis, multi-channel speech enhancement, the detection of Gibbon calls, and emotion recognition from audio and video (i10-index: 16). He recently joined Hume AI as an AI research scientist.

Jeffrey Brooks. Hume AI, New York, USA. Dr. Brooks is a computational psychologist with expertise in emotion, face perception, and social neuroscience. He earned his Ph.D. from the Social Cognitive and Neural Sciences Lab at NYU, where he researched the computational and neural mechanisms of emotion perception and social evaluation. His research has been published in the Proceedings of the National Academy of Sciences, Nature Human Behavior, and other leading journals.

Anton Batliner. University of Augsburg, Germany. Anton Batliner received his doctoral degree in Phonetics in 1978 at LMU Munich. He is now with the Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany. His main research interests are all (cross-linguistic) aspects of prosody and (computational) paralinguistics. Amongst other events, he co-organized the previous INTERSPEECH challenges on Computational Paralinguistics. He is co-editor/author of two books and author/co-author of more than 300 technical articles, with an i10-index of 172 and over 12,000 citations.

Björn Schuller. Imperial College London, United Kingdom. Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM – the Group on Language, Audio, & Music at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin, Germany amongst other Professorships and Affiliations. He (co-)authored 1000+ publications (43k+ citations, i10-index: 563), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community including Technical Chair of INTERSPEECH 2019 and organization of more than 25 research challenges.

Dacher Keltner. The University of California, Berkeley, California, U.S.A. Dr. Keltner is one of the world’s foremost emotion scientists. He is a professor of psychology at UC Berkeley and the director of the Greater Good Science Center. He has over 200 scientific publications (i10-index: 222) and six books, including Born to Be Good, The Compassionate Instinct, and The Power Paradox. He has written for many popular outlets, from The New York Times to Slate. He was also the scientific advisor behind Pixar’s Inside Out, is involved with the education of health care providers and judges, and has consulted extensively for Google, Facebook, Apple, and Pinterest, on issues related to emotion and well-being.

Alan Cowen. Hume AI, New York, U.S.A. Dr. Cowen is an applied mathematician and computational emotion scientist developing new data-driven methods to study human experience and expression. He was previously a researcher at the University of California and visiting scientist at Google, where he helped establish affective computing research efforts. His discoveries have been featured in top journals such as Nature, PNAS, Science Advances, and Nature Human Behavior (i10-index: 16) and covered in press outlets ranging from CNN to Scientific American. His research applies new computational tools to address how emotional behaviors can be evoked, conceptualized, predicted, and annotated, how they influence our social interactions, and how they bring meaning to our everyday lives.

The ACII Affective Vocal Bursts (A-VB) Workshop & Competition

Understanding a critically understudied modality of emotional expression

Alice Baird, Panagiotis Tzirakis, Jeffrey Brooks,
Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen

Baselines and Results

Workshop Schedule

Keynote Speakers

Important Dates

Paper Submission

Other Topics

Competition Tasks and Rules

Data and Team Registration

Results Submission

Key Note Speakers

Organizers

Any questions: competitions@hume.ai

Hume AI

LinkedIn

Twitter

Team	Task	Test UAR
ComParE BL	A-VB Type	0.3839
End2You BL	A-VB Type	0.4172
TeamEP-ITS	A-VB Type	0.4902
SclabCNU	A-VB Type	0.497
AVB	A-VB Type	0.519
EIHW*	A-VB Type	0.5618
HCAI (Winners!)	A-VB Type	0.5856

The ACII Affective Vocal Bursts (A-VB) Workshop & Competition

Understanding a critically understudied modality of emotional expression

Alice Baird, Panagiotis Tzirakis, Jeffrey Brooks, Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen

Baselines and Results

Workshop Schedule

Keynote Speakers

Important Dates

Paper Submission

Other Topics

Competition Tasks and Rules

Data and Team Registration

Results Submission

Key Note Speakers

Organizers

Any questions: competitions@hume.ai

Hume AI

LinkedIn

Twitter

Alice Baird, Panagiotis Tzirakis, Jeffrey Brooks,
Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen