General information |
Label |
Value |
Description |
|
Name |
EPIC-SOUNDS |
Full dataset name |
|
ID |
sounds/epic_sounds
|
Datalist id for external indexing |
|
Abbreviation |
EPIC-SOUNDS |
Official dataset abbreviation, e.g. one used in the original paper |
|
Provider |
University of Oxford, University of Bristol |
|
|
Year |
2023 |
Dataset release year |
|
Modalities |
Audio | Video
|
Data modalities included in the dataset |
|
Collection name |
EPIC-KITCHENS |
Common name for all related datasets, used to group datasets coming from same source |
|
Research domain |
SED
Strong annotation
|
Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator |
|
Related datasets name |
|
|
|
License |
Creative Commons, CC BY-NC 4.0 |
|
|
Download |
Download
|
|
|
Companion site |
Site
|
Link to the companion site for the dataset |
|
Citation |
[Huh2023] Epic-Sounds: A Large-scale Dataset of Actions That Sound
|
|
Audio |
Label |
Value |
Description |
|
Data |
|
|
Data type |
Audio
|
Possible values: Audio | Features |
|
|
File format |
|
|
|
File format type |
Constant
|
Possible values: Constant | Variable |
|
|
|
Lossy compression |
No
|
is audio compressed in a lossy manner |
|
|
|
Bit rate |
16 |
Bit depth of audio, possible values: 8 | 16 | 24 | 32 |
|
|
|
Sampling rate (kHz) |
24 kHz |
Sampling rate in kHz, possible values: 8 | 16 | 22.05 | 32 | 44.1 | 48 |
|
|
Channels |
|
|
|
Setup |
Mono
|
Possible values: Mono | Stereo | Binaural | Ambisonic | Array | Multi-Channel | Variable |
|
|
|
Number of channels |
1 |
|
|
|
Material |
|
|
|
Source |
EPIC-KITCHENS-100
|
Possible values: Original | Youtube | Freesound | Online | Crowdsourced | [Dataset name] |
|
Content |
|
|
Content type |
Freefield
|
Possible values: Freefield | Synthetic | Isolated |
|
|
Scene |
Variable
|
Is the scene class constant for single recording, possible values: Constant | Variable |
|
|
Event / Spatial location |
Moving
|
Possible values: Constant | Moving | Unknown |
|
Recording |
|
|
Setup |
Uncontrolled
|
Possible values: Near-field | Far-field | Mixed | Uncontrolled | Unknown |
|
|
Setup count |
1 |
Amount of different recording setups (microphone and recording device) used |
|
|
Spot type |
Unknown
|
Possible values: Fixed | Moving | Unknown |
|
Files |
|
|
Count |
700 files |
Total number of files |
|
|
Total duration (minutes) |
6002 min |
Total duration of the dataset in minutes |
|
|
File length |
Variable
|
Characterization of the file lengths, possible values: Constant | Quasi-constant | Variable |
|
|
File length (seconds) |
10-3708 sec |
Approximate length of files |
Meta |
Label |
Value |
Description |
|
Types |
Event
Timestamp
|
List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc. |
|
Event |
|
|
Classes |
44 |
Number of event classes |
|
|
Classes |
False
|
Possible values: True | False | Almost |
|
|
Classes |
- background
- beep
- ceramic / glass collision
- ceramic / marble collision
- ceramic / wood collision
- ceramic-only collision
- click
- cloth-only collision
- cut / chop
- drink / eat
- footstep
- glass / marble collision
- glass-only collision
- hoover / fan
- human
- kettle / mixer / appliance
- kneading
- metal / ceramic collision
- metal / cloth collision
- metal / glass collision
- metal / marble collision
- metal / paper collision
- metal / plastic collision
- metal / wood collision
- metal-only collision
- open / close
- paper-only collision
- plastic / ceramic collision
- plastic / glass collision
- plastic / marble collision
- plastic / paper collision
- plastic / wood collision
- plastic-only collision
- pour
- rustle
- scrub / scrape / scour / wipe
- sizzling / boiling
- slide object
- spray
- stir / mix / whisk
- water
- wood / glass collision
- wood-only collision
- zip
|
|
|
|
Annotation |
|
|
|
Type |
Strong
|
Possible values: Strong | Weak | Location | None |
|
|
|
Annotations per item |
1 |
How many annotations there are available per item (possible multi-annotator setup) |
|
|
|
Labelled amount (%) |
66.7 % |
Percentage of all data, amount of data which is labelled |
|
|
|
Validated amount (%) |
66.7 % |
Percentage of all data, amount of data which is validated by human |
|
|
|
Strong annotations amount (%) |
66.7 % |
Percentage of all data, amount of data which has strong annotations |
|
|
|
Overlapping event instances |
Yes
|
|
|
|
Labeling |
|
|
|
Hierarchical |
No
|
|
|
|
|
Ontology name |
Yes
|
|
|
|
Instance |
|
|
|
Count |
117553 |
Count of all event instances in the dataset |
|
|
|
Average instances per class |
2672 |
Average per class instance count |
Cross-validation setup |
Label |
Value |
Description |
|
|
Provided |
Yes
|
|
|
|
Sets |
Train
Val
Test
|
Set types provided in the split, possible values: Train | Test | Val | Dev | Eval |
Baseline |
Label |
Value |
Description |
|
|
Download |
Download
|
Link to baseline system source code |
|
|
Citation |
[Huh2023]
|
Paper to cite for the baseline |