MSP-Conversation corpus - Multimodal Speech Processing (MSP) Laboratory

Links:

MSP-Conversation corpus:

A large naturalistic speech database with emotional traces

The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 2.0 contains 310 conversations with duration between 10-20 minutes (77 hours 26 minutes total). The conversations were annotated by at least six workers.

Spontaneous speech emotional data with emotional traces

A key feature of the corpus is that the recordings overlap with the recordings included in the MSP-Podcast database, which contains sentence-level annotations of short segments retrieved from podcasts. The MSP-Podcast corpus is not appropriate to study contextual information, as the isolated turns are separately evaluated, missing the temporal relationship between consecutive speaking turns. The MSP-Conversation corpus complements the MSP-Podcast corpus, providing the perfect platform to explore temporal information.

The proposed approach has the advantage that we can easily balance the emotional content and speaker demography by choosing the right podcasts. The approach does not intentionally manipulate or induce the speakers, resulting in a flexible and scalable approach to collect emotional data. The MSP-Podcast corpus was recorded as part of our NSF project "CCRI: New: Creating the largest speech emotional database by leveraging existing naturalistic recordings" (NSF CNS: 2016719). For further information on the corpus, please read:

Luz Martinez-Lucas, Mohammed Abdelwahab, and Carlos Busso, "The MSP-conversation corpus," in Interspeech 2020, Shanghai, China, October 2020, pp. 1823-1827. [pdf] [cited] [bib] [slides]

Release of the Corpus: Academic License

The corpus is now available under an Academic License (free of cost). Please download this pdf. The process requires your institution to sign the agreement. A couple of notes about this form:

Instructions:

It has to be signed by someone with signing authority in behalf of the university (usually someone from the sponsored research office).
The license is a standard FDP data transfer form. It should be easy for you to obtain a signature.
Sign the 3rd page of the pdf, as shown in the image below. Send the signed form to Prof. Carlos Busso

Some of our Publications using this Corpus:

Luz Martinez-Lucas, Wei-Cheng Lin, and Carlos Busso, "Analyzing continuous-time and sentence-level annotations for speech emotion recognition," IEEE Transactions on Affective Computing, vol. 15, no. 3, pp. 1754-1768, July-September 2024. [pdf] [cited] [bib]
Luz Martinez-Lucas and Carlos Busso, "Dynamic speech emotion recognition using a conditional neural process," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), Seoul, Republic of Korea, April 2024, pp. 12036-12040. [pdf] [cited] [bib]
Luz Martinez-Lucas, Mohammed Abdelwahab, and Carlos Busso, "The MSP-conversation corpus," in Interspeech 2020, Shanghai, China, October 2020, pp. 1823-1827. [pdf] [cited] [bib] [slides]

This material is based upon work supported by the National Science Foundation under Grants CNS:1823166 and CNS:2016719. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.