General information | |||||
| Label | Value | Description | |||
|---|---|---|---|---|---|
| Name | Dataset for Environmental Sound Classification, unlabeled dataset | Full dataset name | |||
| ID | sounds/esc_us | Datalist id for external indexing | |||
| Abbreviation | ESC-US | Official dataset abbreviation, e.g. one used in the original paper | |||
| Provider | ESC | ||||
| Year | 2015 | Dataset release year | |||
| Modalities | Audio | Data modalities included in the dataset | |||
| Collection name | ESC dataset | Common name for all related datasets, used to group datasets coming from same source | |||
| Research domain | Unlabelled | Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator | |||
| Related datasets name | |||||
| License | Creative Commons, CC BY-NC 3.0 | ||||
| Download | Download | ||||
| Citation | [Piczak2015] ESC: Dataset for Environmental Sound Classification | ||||
Audio | |||||
| Label | Value | Description | |||
Data | |||||
| Data type | Audio | Possible values: Audio | Features | |||
File format | |||||
| File format type | Constant | Possible values: Constant | Variable | |||
| File format | ogg | Possible value: wav | aiff | flac | mp3 | aac | ogg | |||
| Lossy compression | Yes | is audio compressed in a lossy manner | |||
| Bit rate | 16 | Bit depth of audio, possible values: 8 | 16 | 24 | 32 | |||
| Sampling rate (kHz) | 44.1 kHz | Sampling rate in kHz, possible values: 8 | 16 | 22.05 | 32 | 44.1 | 48 | |||
Channels | |||||
| Setup | Mono | Possible values: Mono | Stereo | Binaural | Ambisonic | Array | Multi-Channel | Variable | |||
| Number of channels | 1 | ||||
Material | |||||
| Source | Freesound | Possible values: Original | Youtube | Freesound | Online | Crowdsourced | [Dataset name] | |||
Content | |||||
| Content type | Freefield | Possible values: Freefield | Synthetic | Isolated | |||
Recording | |||||
| Setup | Unknown | Possible values: Near-field | Far-field | Mixed | Uncontrolled | Unknown | |||
| Spot type | Unknown | Possible values: Fixed | Moving | Unknown | |||
Files | |||||
| Count | 250000 files | Total number of files | |||
| Total duration (minutes) | 20833.3 min | Total duration of the dataset in minutes | |||
| File length | Constant | Characterization of the file lengths, possible values: Constant | Quasi-constant | Variable | |||
| File length (seconds) | 5 sec | Approximate length of files | |||
Meta | |||||
| Label | Value | Description | |||
| Types | Event | List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc. | |||
Event | |||||
Annotation | |||||
| Labelled amount (%) | 0 % | Percentage of all data, amount of data which is labelled | |||
Instance | |||||
| Count | 250000 | Count of all event instances in the dataset | |||
Info | |||||
| Label | Value | Description | |||
| Comments | unlabeled dataset, suitable for unsupervised pre-training | ||||