This data listing is a DCASE Community effort to collect curated meta-information about DCASE related datasets into a uniform structure. The DCASE is a community for research on Detection and Classification of Acoustic Scenes and Events, and the community offers a platform for discussion of the different perspectives and approaches, from algorithm development to practical applications and their commercial value.
The list focuses specifically on pre-packaged datasets rather than online data repositories. Datasets included in the list are well documented, packaged for easy usage, and have a free or open license. Many of the listed datasets have been used in DCASE Challenges or peer-reviewed academic papers.
Datasets are placed roughly into a couple of data collections at the high level based on the audio content analysis type they are mainly focusing on. Some datasets can be used for multiple content analysis tasks, and in these cases, they are placed into multiple collections.
The data listing is maintained through a Github repository. In case you notice datasets missing, errors, or you want to contribute otherwise to the data listings, you can raise issues in the repository or fork it and make a pull request with your edits. Proposals for new data collections are welcomed as well.
The list is maintained by Toni Heittola.
This collection pools together all task-specific collections to ease the data search across collections.
Task-specific data collections
An acoustic scene is a descriptor for the surrounding acoustic environment defined by physical and social situations in the scene. The acoustic scene is identified by scene label, for example, “outdoor market”, “busy street”, and “office”. The goal of automatic acoustic scene classification is to classify a test recording into one of the predefined classes that characterize the environment in which it was recorded.
Audio captions are free-text descriptions of the audio recordings' content using natural language.
This data list pulls together various type of datasets containing everyday sounds. These datasets are suitable for research focusing on sound event detection, sound event detection and localization, or audio tagging. A sound event corresponds to an audio segment that is attributed to a specific sound source and that is perceived as an entity. Sound event has start and end timestamps along with a textual label that is related to the sound source. Some datasets in this list contains either the strong annotations, annotations with start and end timestamps, or weak annotations, annotations with sound presence at clip/time-segment level.
- Voice datasets list maintained by Jim Schwoebel
- A categorization of robust speech processing datasets