Wearable SELD FOA

sounds

Download Site Nagatomo2022

Label				Value	Description
General information
	Name			Wearable SELD FOA	Full dataset name
	ID			sounds/wearable_seld_foa	Datalist id for external indexing
	Abbreviation			Wearable SELD FOA	Official dataset abbreviation, e.g. one used in the original paper
	Provider			NTT Media Intelligence Laboratories
	Year			2022	Dataset release year
	Modalities			Audio	Data modalities included in the dataset
	Collection name			Wearable SELD	Common name for all related datasets, used to group datasets coming from same source
	Research domain			SELD	Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator
	Related datasets name			Wearable SELD Earphone Wearable SELD Mounting
	License			Non-commercial
	Download			Download (64.1 GB)
	Companion site			Site	Link to the companion site for the dataset
	Citation			[Nagatomo2022] Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Audio
Label				Value	Description
	Data
		Data type		Audio	Possible values: Audio \| Features
		File format
			File format type	Constant	Possible values: Constant \| Variable
			File format	wav	Possible value: wav \| aiff \| flac \| mp3 \| aac \| ogg
			Lossy compression	No	is audio compressed in a lossy manner
			Sampling rate (kHz)	48 kHz	Sampling rate in kHz, possible values: 8 \| 16 \| 22.05 \| 32 \| 44.1 \| 48
		Channels
			Setup	Ambisonic	Possible values: Mono \| Stereo \| Binaural \| Ambisonic \| Array \| Multi-Channel \| Variable
			Number of channels	4
		Material
			Source	Original	Possible values: Original \| Youtube \| Freesound \| Online \| Crowdsourced \| [Dataset name]
	Content
		Content type		Freefield	Possible values: Freefield \| Synthetic \| Isolated
		Scene		Constant	Is the scene class constant for single recording, possible values: Constant \| Variable
		Event / Spatial location		Constant	Possible values: Constant \| Moving \| Unknown
	Recording
		Setup		Near-field	Possible values: Near-field \| Far-field \| Mixed \| Uncontrolled \| Unknown
		Setup count		1	Amount of different recording setups (microphone and recording device) used
		Spot type		Fixed	Possible values: Fixed \| Moving \| Unknown
	Files
		Count		500 files	Total number of files
		Total duration (minutes)		500 min	Total duration of the dataset in minutes
		File length		Constant	Characterization of the file lengths, possible values: Constant \| Quasi-constant \| Variable
		File length (seconds)		60 sec	Approximate length of files
Meta
Label				Value	Description
	Types			Event Spatial location	List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc.
	Event
		Classes		12	Number of event classes
		Classes		organ piano toy train toy gun shot metallophone bicycle bell security buzzer shaker handclap woodblock shaking bell hit drum
		Annotation
			Type	Strong	Possible values: Strong \| Weak \| Location \| None
			Source	Synthetic	Possible values: Experts \| Crowdsourced \| Synthetic \| Metadata \| Automatic
			Annotations per item	1	How many annotations there are available per item (possible multi-annotator setup)
			Labelled amount (%)	100 %	Percentage of all data, amount of data which is labelled
			Validated amount (%)	100 %	Percentage of all data, amount of data which is validated by human
			Strong annotations amount (%)	100 %	Percentage of all data, amount of data which has strong annotations
		Labeling
			Hierarchical	No
		Instance
Cross-validation setup
Label				Value	Description
		Provided		Yes
		Folds		1
		Sets		Train Test	Set types provided in the split, possible values: Train \| Test \| Val \| Dev \| Eval

Wearable SELD FOA

General information

Audio

Data

File format

Channels

Material

Content

Recording

Files

Meta

Event

Annotation

Labeling

Instance

Cross-validation setup