AudioSet

sounds

Download Site Gemmeke2017

Label				Value	Description
General information
	Name			AudioSet	Full dataset name
	ID			sounds/audioset	Datalist id for external indexing
	Abbreviation			AudioSet	Official dataset abbreviation, e.g. one used in the original paper
	Provider			Google
	Year			2017	Dataset release year
	Modalities			Audio Video	Data modalities included in the dataset
	Collection name			AudioSet	Common name for all related datasets, used to group datasets coming from same source
	Research domain			Tagging Audio-visual	Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator
	Related datasets name			AudioSet-Strong
	Download			Download (None)
	Companion site			Site	Link to the companion site for the dataset
	Citation			[Gemmeke2017] Audio Set: An ontology and human-labeled dataset for audio events
Audio
Label				Value	Description
	Data
		Data type		Audio	Possible values: Audio \| Features
		File format
			File format type	Constant	Possible values: Constant \| Variable
			Lossy compression	Yes	is audio compressed in a lossy manner
		Channels
			Setup	Mono	Possible values: Mono \| Stereo \| Binaural \| Ambisonic \| Array \| Multi-Channel \| Variable
			Number of channels	1
		Material
			Source	Youtube	Possible values: Original \| Youtube \| Freesound \| Online \| Crowdsourced \| [Dataset name]
	Content
		Content type		Freefield	Possible values: Freefield \| Synthetic \| Isolated
	Recording
		Setup		Uncontrolled	Possible values: Near-field \| Far-field \| Mixed \| Uncontrolled \| Unknown
		Spot type		Unknown	Possible values: Fixed \| Moving \| Unknown
	Files
		Count		2084320 files	Total number of files
		Total duration (minutes)		347386.666 min	Total duration of the dataset in minutes
		File length		Constant	Characterization of the file lengths, possible values: Constant \| Quasi-constant \| Variable
		File length (seconds)		10 sec	Approximate length of files
Meta
Label				Value	Description
	Types			Tag	List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc.
	Scene
		Classes		False	Possible values: True \| False \| Almost
	Event
		Classes		527	Number of event classes
		Classes		False	Possible values: True \| False \| Almost
		Annotation
			Type	Weak	Possible values: Strong \| Weak \| Location \| None
			Labelled amount (%)	100 %	Percentage of all data, amount of data which is labelled
			Validated amount (%)	100 %	Percentage of all data, amount of data which is validated by human
			Strong annotations amount (%)	0 %	Percentage of all data, amount of data which has strong annotations
			Overlapping event instances	Yes
		Labeling
			Hierarchical	Yes
			Ontology name	Yes
		Instance
			Count	2084320	Count of all event instances in the dataset
			Average instances per class	3297.9	Average per class instance count
Cross-validation setup
Label				Value	Description
		Provided		Yes
		Folds		1
		Sets		Train Test	Set types provided in the split, possible values: Train \| Test \| Val \| Dev \| Eval

AudioSet

General information

Audio

Data

File format

Channels

Material

Content

Recording

Files

Meta

Scene

Event

Annotation

Labeling

Instance

Cross-validation setup