EPIC-SOUNDS

sounds

Download Site Huh2023

Label				Value	Description
General information
	Name			EPIC-SOUNDS	Full dataset name
	ID			sounds/epic_sounds	Datalist id for external indexing
	Abbreviation			EPIC-SOUNDS	Official dataset abbreviation, e.g. one used in the original paper
	Provider			University of Oxford, University of Bristol
	Year			2023	Dataset release year
	Modalities			Audio \| Video	Data modalities included in the dataset
	Collection name			EPIC-KITCHENS	Common name for all related datasets, used to group datasets coming from same source
	Research domain			SED Strong annotation	Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator
	Related datasets name			EPIC-KITCHENS-100
	License			Creative Commons, CC BY-NC 4.0
	Download			Download
	Companion site			Site	Link to the companion site for the dataset
	Citation			[Huh2023] Epic-Sounds: A Large-scale Dataset of Actions That Sound
Audio
Label				Value	Description
	Data
		Data type		Audio	Possible values: Audio \| Features
		File format
			File format type	Constant	Possible values: Constant \| Variable
			Lossy compression	No	is audio compressed in a lossy manner
			Bit rate	16	Bit depth of audio, possible values: 8 \| 16 \| 24 \| 32
			Sampling rate (kHz)	24 kHz	Sampling rate in kHz, possible values: 8 \| 16 \| 22.05 \| 32 \| 44.1 \| 48
		Channels
			Setup	Mono	Possible values: Mono \| Stereo \| Binaural \| Ambisonic \| Array \| Multi-Channel \| Variable
			Number of channels	1
		Material
			Source	EPIC-KITCHENS-100	Possible values: Original \| Youtube \| Freesound \| Online \| Crowdsourced \| [Dataset name]
	Content
		Content type		Freefield	Possible values: Freefield \| Synthetic \| Isolated
		Scene		Variable	Is the scene class constant for single recording, possible values: Constant \| Variable
		Event / Spatial location		Moving	Possible values: Constant \| Moving \| Unknown
	Recording
		Setup		Uncontrolled	Possible values: Near-field \| Far-field \| Mixed \| Uncontrolled \| Unknown
		Setup count		1	Amount of different recording setups (microphone and recording device) used
		Spot type		Unknown	Possible values: Fixed \| Moving \| Unknown
	Files
		Count		700 files	Total number of files
		Total duration (minutes)		6002 min	Total duration of the dataset in minutes
		File length		Variable	Characterization of the file lengths, possible values: Constant \| Quasi-constant \| Variable
		File length (seconds)		10-3708 sec	Approximate length of files
Meta
Label				Value	Description
	Types			Event Timestamp	List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc.
	Event
		Classes		44	Number of event classes
		Classes		False	Possible values: True \| False \| Almost
		Classes		background beep ceramic / glass collision ceramic / marble collision ceramic / wood collision ceramic-only collision click cloth-only collision cut / chop drink / eat footstep glass / marble collision glass-only collision hoover / fan human kettle / mixer / appliance kneading metal / ceramic collision metal / cloth collision metal / glass collision metal / marble collision metal / paper collision metal / plastic collision metal / wood collision metal-only collision open / close paper-only collision plastic / ceramic collision plastic / glass collision plastic / marble collision plastic / paper collision plastic / wood collision plastic-only collision pour rustle scrub / scrape / scour / wipe sizzling / boiling slide object spray stir / mix / whisk water wood / glass collision wood-only collision zip
		Annotation
			Type	Strong	Possible values: Strong \| Weak \| Location \| None
			Annotations per item	1	How many annotations there are available per item (possible multi-annotator setup)
			Labelled amount (%)	66.7 %	Percentage of all data, amount of data which is labelled
			Validated amount (%)	66.7 %	Percentage of all data, amount of data which is validated by human
			Strong annotations amount (%)	66.7 %	Percentage of all data, amount of data which has strong annotations
			Overlapping event instances	Yes
		Labeling
			Hierarchical	No
			Ontology name	Yes
		Instance
			Count	117553	Count of all event instances in the dataset
			Average instances per class	2672	Average per class instance count
Cross-validation setup
Label				Value	Description
		Provided		Yes
		Sets		Train Val Test	Set types provided in the split, possible values: Train \| Test \| Val \| Dev \| Eval
Baseline
Label				Value	Description
		Download		Download	Link to baseline system source code
		Citation		[Huh2023]	Paper to cite for the baseline

EPIC-SOUNDS

General information

Audio

Data

File format

Channels

Material

Content

Recording

Files

Meta

Event

Annotation

Labeling

Instance

Cross-validation setup

Baseline