The ACII Affective Vocal Bursts (A-VB) Workshop & Competition

Understanding a critically understudied modality of emotional expression

Alice Baird, Panagiotis Tzirakis, Jeffrey Brooks,
Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen

The 2022 ACII Affective Vocal Burst Workshop & Challenge (A-VB) is a workshop-based challenge that introduces the problem of understanding emotion in vocal bursts – a wide range of non-verbal vocalizations that includes laughs, grunts, gasps, and much more. With affective states informing both mental and physical wellbeing, the core focus of the A-VB workshop is the broader discussion of current strategies in affective computing for modeling vocal emotional expression. Within this first iteration of the A-VB Challenge, the participants will be presented with four emotion-focused sub-challenges that utilize the large-scale and “in-the-wild” Hume-VB dataset. The dataset and the four sub-challenges draw attention to new innovations in emotion science as it pertains to vocal expression, addressing low- and high-dimensional theories of emotional expression, cultural variation, and “call types” (laugh, cry, sigh, etc.).

arXiv Proceedings White Paper GitHub

Baselines and Results

The A-VB white paper details the baseline results. Participant results for each task are as follows. * A member of the team is an A-VB organiser, therefore this result is excluded from the official rankings.

TeamTaskTest CCC
Organisers ComParE BLA-VB High0.5214
Organisers End2You BLA-VB High0.5686
TeamEP-ITSA-VB High0.6554
SclabCNUA-VB High0.6677
HCAIA-VB High0.6846
HCCLA-VB High0.7237
Anonymous (Winners!)A-VB High0.7295
EIHW*A-VB High0.7363


TeamTaskTest CCC
Organisers ComParE BLA-VB Two0.4986
Organisers End2You BLA-VB Two0.5084
SclabCNUA-VB Two0.6202
TeamEP-ITSA-VB Two0.629
HCCL (Winners!)A-VB Two 0.6854
EIHW*A-VB Two0.7066


TeamTaskTest CCC
ComParE BLA-VB Culture0.3887
End2You BLA-VB Culture0.4401
TeamEP-ITSA-VB Culture0.5199
HCAIA-VB Culture0.5258
SclabCNUA-VB Culture0.5495
HCCL (Winners!)A-VB Culture0.6017
EIHW*A-VB Culture0.6195


TeamTaskTest UAR
ComParE BLA-VB Type0.3839
End2You BLA-VB Type0.4172
TeamEP-ITSA-VB Type0.4902
SclabCNUA-VB Type0.497
AVBA-VB Type0.519
EIHW*A-VB Type0.5618
HCAI (Winners!)A-VB Type0.5856

Workshop Schedule

Monday 17th October, ACII (Virtual).

NYC GMT-4TOKYO GMT+9TypeTalk TitleWho
06:00-06:1019:00-19:10OrganisersWorkshop WelcomeAlice Baird
06:10-06:2519:10-19:25PaperThe ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expressionAlice Baird
06:30-06:4519:30-19:45PaperJointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic EmbeddingBagus Tris Atmaja
06:50-07:0519:50-20:05PaperPredicting Affective Vocal Bursts with Finetuned wav2vec 2.0Bagus Tris Atmaja
07:05-07:5020:05-20:50KeynoteNonverbals and what you might have wanted to know about themAnton Batliner
08:00-08:1521:00-21:15Coffee Break
08:15-08:3021:15-21:30PaperClassification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Network Networks and Deep Acoustic EmbeddingsZafi Sherhan Syed
08:35-08:5021:35-21:50PaperSelf-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal BurstDang-Linh Trinh
08:55-09:1021:55-22:10PaperFine-tuning Wav2vec for Vocal-burst Emotion RecognitionHyung-Jeong Yang
09:15-10:0022:15-23:00KeynoteNew resources for measuring expressive communicationAlan Cowen
10:15-10:3023:15-23:10Coffee Break
10:30-10:4523:30-23:45PaperAn Efficient Multitask Learning Architecture for Affective Vocal Burst AnalysisTobias Hallmen
10:50-11:0523:50-00:05PaperSelf-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal BurstsVincent Keras
11:10-11:3000:10-00:30OrganisersWinner Announcements and Closing RemarksAlice Baird

Keynote Speakers

Dr Anton Batliner. University of Augsburg, Germany. “Nonverbals and what you might have wanted to know about them”.

Dr. Anton Batliner received his doctoral degree in Phonetics in 1978 from LMU Munich. He is now with the Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany. His main research interests are all (cross-linguistic) aspects of prosody and (computational) paralinguistics; h-index > 50, > 13000 citations.


Dr. Alan Cowen. Hume AI, New York, USA. “New resources for measuring expressive communication"

Dr. Alan Cowen is an applied mathematician and computational emotion scientist developing new data-driven methods to study human experience and expression. He was previously a researcher at the University of California and visiting scientist at Google, where he helped establish affective computing research efforts. His discoveries have been featured in leading journals such as Nature, PNAS, Science Advances, and Nature Human Behavior and covered in press outlets ranging from CNN to Scientific American. His research applies new computational tools to address how emotional behaviors can be evoked, conceptualized, predicted, and annotated, how they influence our social interactions, and how they bring meaning to our everyday lives.

Important Dates

  • Challenge Opening (data available): May 27, 2022

  • Baselines information released: July 1, 2022

  • ‘Other Topics’ deadline [CMT]: July 22, 2022 August 1, 2022
    (Included in ACII Proceedings)

  • Notification of Acceptance: July 29, 2022 August 8, 2022

  • Camera Ready: August 15, 2022

  • Competition deadline: September 2, 2022 September 14. 2022

  • Competition Technical Report submission deadline [CMT]: September 6, 2022 September 16, 2022
    (Peer Reviewed by A-VB technical committee, not included in ACII Proceedings)

  • Notification of Acceptance: September 16, 2022 September 23, 2022

  • Workshop: October 17, 2022

Paper Submission

As well as the test set results, all participants in the A-VB competition should also submit a technical report to describe their approach and results. We suggest that this paper is no more than 4 pages, and can be uploaded to arXiv, as well as the CMT.

The baseline white paper provides a more extensive description of the data as well as baseline results. Competition papers should include the following citation for the data repository:

@article{Cowen2022HumeVB,
     title={The Hume Vocal Burst Competition Dataset {(H-VB)} | Raw Data [ExVo: updated 02.28.22] [Data set]},
     author={Cowen, Alan and Baird, Alice and Tzirakis, Panagiotis and Opara, Michael and Kim, Lauren and Brooks, Jeff and Metrick, Jacob},
     journal={Zenodo}, 
     doi = {https://doi.org/10.5281/zenodo.6308780},
     year={2022}}

@misc{BairdA-VB2022,
    author = {Baird, Alice and Tzirakis, Panagiotis and Batliner, Anton and  Schuller, Björn and Keltner, Dacher and Cowen, Alan},
    title = {The ACII 2022 Affective Vocal Bursts Workshop and Competition: Understanding a critically understudied modality of emotional expression},
    publisher = {arXiv},
    doi = {[to appear]},
    year = {2022}}

Other Topics

For those interested in submitting research to the A-VB workshop outside of the competition, we encourage contributions covering the following topics:

  • Detecting and Understanding Nonverbal Vocalizations

  • Modeling Vocal Emotional Expression

  • Cross-Cultural Emotional Expression Modeling

  • Other topics related to Auditory Affective Computing

These submissions will be included in the ACII IEEE proceedings. Authors are asked to submit papers up to 6 pages (including references) following the submission guidelines from the ACII 2022 conference. Directions for submitting papers will be announced soon (submissions will be handled via the conference submission system (EasyChair) ). All submissions will be reviewed single-blind.

Competition Tasks and Rules

The High-Dimensional Emotion Task (A-VB High).

The A-VB High track, explores a high-dimensional emotion space for understanding vocal bursts. Participants will be challenged with predicting the intensity of 10 emotions (Awe, Excitement, Amusement, Awkwardness, Fear, Horror, Distress, Triumph, Sadness, and Surprise) associated with each vocal burst as a multi-output regression task. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across all 10 emotions. The baseline for this challenge will be based on CCC.


The Two-Dimensional Emotion Task (A-VB Two).

In the A-VB Two track, we investigate a low-dimensional emotion space that is based on the circumplex model of affect. Participants will predict values of arousal and valence (on a scale from 1=unpleasant/subdued, 5=neutral, 9=pleasant/stimulated) as a regression task. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across the two dimensions. The baseline for this challenge will be based on CCC.

The Cross-Cultural Emotion Task (A-VB Culture).

 In the A-VB Culture track, participants will be challenged with predicting the intensity of 10 emotions associated with each vocal burst as a multi-output regression task, using a model or multiple models that generate predictions specific to each of the four cultures  (the U.S., China, Venezuela, or South Africa). Specifically, annotations of each vocal burst will consist of culture-specific ground truth, meaning that the ground truth for each sample will be the average of annotations solely from the country of origin of the sample. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across all 10 emotions. The baseline for this challenge will be based on CCC.

The Expressive Burst-Type Task (A-VB Type).

 In the A-VB Type task, participants will be challenged with classifying the type of expressive vocal burst from 7 classes (Gasp, Laugh, Cry, Scream, Grunt, Groan, Pant, Other). Participants will report the Unweighted Average Recall (UAR) as a measure of performance.

Data and Team Registration

This package includes the raw data for a subset of The Hume Vocal Burst Database (H-VB), including all train, validation, and test recordings and corresponding emotion ratings for the train and validation recordings.

This dataset contains 59,201 audio recordings of vocal bursts from 1,702 speakers, from 4 cultures—the U.S, South Africa, China, and Venezuela—ranging in age from 20 to 39.5 years old. The duration of data in this version of H-VB is 36 Hours (Mean: 02.23 sec). The emotion ratings correspond to ten emotion concepts, listed below, and averaged 0-100 intensities for each emotion concept, with each sample having been rated by an average of 85.2 raters.

Emotion Labels: Awe, Excitement, Amusement, Awkwardness, Fear, Horror, Distress, Triumph, Sadness, Surprise


Train Validation Test
HH:MM:SS 12:19:06 12:05:45 12:22:12
Samples 19,990 19,396 19,815
Speakers 571 568 563
F:M 305:266 324:244 --
USA 206 206 --
China 79 76 --
South Africa 244 244 --
Venezuela 42 42 --

Figure 1. t-SNE representation of the emotional space of the Hume-VB dataset (training set only).

An overview of the data can be found at this Zenodo repository. To gain access, register your team by emailing competitions@hume.ai with the following information:

Team Name, Researcher Name, Affiliation, and Research Goals

Restricted Access: After registering your team, you will receive an End User License Agreement (EULA) for signature. Please note that this dataset is provided only for competition use. Requests for use of the data beyond the competition should be directed to Hume AI (hello@hume.ai).

Results Submission

For all tasks, participants should submit their test set results as a zip file to competitions@hume.ai, following these guidelines:

  • Predictions should be submitted as a comma-delineated CSV with the following naming convention: [taskname]_[team name]_[submission no].csv

  • The CSV should contain only one prediction per test set file.

Key Note Speakers

During the workshop, we will host talks from renowned experts in the field. Full list to be announced shortly.

Organizers

Alice Baird. Hume AI, New York, USA. Alice Baird is an audio researcher with interdisciplinary expertise in machine learning, computational paralinguistics, stress, and emotional well-being. She completed her PhD at the University of Augsburg’s Chair of Embedded Intelligence for Health Care and Wellbeing in 2021, where she was supervised by Dr Björn Schuller. Her work on emotion understanding from speech, physiological, and multimodal data has been published extensively in leading journals and conferences including INTERSPEECH, ICASSP, IEEE Intelligent Systems, and the IEEE Journal of Biomedical and Health Informatics (i10-index: 29). Alice has had extensive experience with competition organization, holding data chair for both the INTERSPEECH Computational Paralinguistics Challenge (ComParE) and ACM MM Multimodal Sentiment Challenge (MuSe). She recently joined Hume AI as an AI research scientist. 

Panagiotis Tzirakis. Hume AI, New York, USA. Dr Tzirakis is a computer scientist and AI researcher with expertise in deep learning and emotion recognition across modalities. He earned his Ph.D. with the Intelligent Behaviour Understanding Group (iBUG) at Imperial College London, where he advanced multimodal emotion recognition efforts. He has published in top outlets including Information Fusion, International Journal of Computer Vision, and several IEEE conference proceedings (e.g. ICASSP, INTERSPEECH) on topics including 3D facial motion synthesis, multi-channel speech enhancement, the detection of Gibbon calls, and emotion recognition from audio and video (i10-index: 16). He recently joined Hume AI as an AI research scientist. 

Jeffrey Brooks. Hume AI, New York, USA. Dr. Brooks is a computational psychologist with expertise in emotion, face perception, and social neuroscience. He earned his Ph.D. from the Social Cognitive and Neural Sciences Lab at NYU, where he researched the computational and neural mechanisms of emotion perception and social evaluation. His research has been published in the Proceedings of the National Academy of Sciences, Nature Human Behavior, and other leading journals.

Anton Batliner. University of Augsburg, Germany. Anton Batliner received his doctoral degree in Phonetics in 1978 at LMU Munich. He is now with the Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany. His main research interests are all (cross-linguistic) aspects of prosody and (computational) paralinguistics. Amongst other events, he co-organized the previous INTERSPEECH challenges on Computational Paralinguistics. He is co-editor/author of two books and author/co-author of more than 300 technical articles, with an i10-index of 172 and over 12,000 citations.

Björn Schuller. Imperial College London, United Kingdom. Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM – the Group on Language, Audio, & Music at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin, Germany amongst other Professorships and Affiliations. He (co-)authored 1000+ publications (43k+ citations, i10-index: 563), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community including Technical Chair of INTERSPEECH 2019 and organization of more than 25 research challenges. 

Dacher Keltner. The University of California, Berkeley, California, U.S.A. Dr. Keltner is one of the world’s foremost emotion scientists. He is a professor of psychology at UC Berkeley and the director of the Greater Good Science Center. He has over 200 scientific publications (i10-index: 222) and six books, including Born to Be Good, The Compassionate Instinct, and The Power Paradox. He has written for many popular outlets, from The New York Times to Slate. He was also the scientific advisor behind Pixar’s Inside Out, is involved with the education of health care providers and judges, and has consulted extensively for Google, Facebook, Apple, and Pinterest, on issues related to emotion and well-being.

Alan Cowen. Hume AI, New York, U.S.A. Dr. Cowen is an applied mathematician and computational emotion scientist developing new data-driven methods to study human experience and expression. He was previously a researcher at the University of California and visiting scientist at Google, where he helped establish affective computing research efforts. His discoveries have been featured in top journals such as Nature, PNAS, Science Advances, and Nature Human Behavior (i10-index: 16) and covered in press outlets ranging from CNN to Scientific American. His research applies new computational tools to address how emotional behaviors can be evoked, conceptualized, predicted, and annotated, how they influence our social interactions, and how they bring meaning to our everyday lives.