General information | |||||
| Label | Value | Description | |||
|---|---|---|---|---|---|
| Name | VGGSound | Full dataset name | |||
| ID | sounds/vgg_sound | Datalist id for external indexing | |||
| Abbreviation | VGGSound | Official dataset abbreviation, e.g. one used in the original paper | |||
| Provider | VGG | ||||
| Year | 2020 | Dataset release year | |||
| Modalities | Audio Video | Data modalities included in the dataset | |||
| Collection name | VGG | Common name for all related datasets, used to group datasets coming from same source | |||
| Research domain | Tagging Weak annotation Audio-visual | Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator | |||
| License | Creative Commons, Youtube | ||||
| Download | Download (None) | ||||
| Companion site | Site | Link to the companion site for the dataset | |||
| Citation | [Chen2020] VGGSound: A Large-scale Audio-Visual Dataset | ||||
Audio | |||||
| Label | Value | Description | |||
Data | |||||
| Data type | Audio | Possible values: Audio | Features | |||
File format | |||||
| File format type | Constant | Possible values: Constant | Variable | |||
Channels | |||||
Material | |||||
| Source | Youtube | Possible values: Original | Youtube | Freesound | Online | Crowdsourced | [Dataset name] | |||
Content | |||||
| Content type | Freefield | Possible values: Freefield | Synthetic | Isolated | |||
Recording | |||||
| Setup | Unknown | Possible values: Near-field | Far-field | Mixed | Uncontrolled | Unknown | |||
| Spot type | Unknown | Possible values: Fixed | Moving | Unknown | |||
Files | |||||
| Count | 199176 files | Total number of files | |||
| Total duration (minutes) | 33196 min | Total duration of the dataset in minutes | |||
| File length | Constant | Characterization of the file lengths, possible values: Constant | Quasi-constant | Variable | |||
| File length (seconds) | 10 sec | Approximate length of files | |||
Meta | |||||
| Label | Value | Description | |||
| Types | Tag | List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc. | |||
Scene | |||||
Event | |||||
| Classes | 310 | Number of event classes | |||
| Classes | False | Possible values: True | False | Almost | |||
Annotation | |||||
| Type | Weak | Possible values: Strong | Weak | Location | None | |||
| Source | Experts | Possible values: Experts | Crowdsourced | Synthetic | Metadata | Automatic | |||
| Annotations per item | 1 | How many annotations there are available per item (possible multi-annotator setup) | |||
| Labelled amount (%) | 100 % | Percentage of all data, amount of data which is labelled | |||
| Validated amount (%) | 0 % | Percentage of all data, amount of data which is validated by human | |||
| Strong annotations amount (%) | 0 % | Percentage of all data, amount of data which has strong annotations | |||
| Overlapping event instances | No | ||||
Labeling | |||||
| Hierarchical | No | ||||
Instance | |||||
| Count | 199176 | Count of all event instances in the dataset | |||
| Average instances per class | 642.503225806 | Average per class instance count | |||
Cross-validation setup | |||||
| Label | Value | Description | |||
| Provided | Yes | ||||
| Folds | 1 | ||||
| Sets | Train Test | Set types provided in the split, possible values: Train | Test | Val | Dev | Eval | |||
Baseline | |||||
| Label | Value | Description | |||
| Download | Download | Link to baseline system source code | |||
| Citation | [Chen2020] | Paper to cite for the baseline | |||