| General information | 
    
    | Label | Value | Description | 
    
        
    
    |  | Name | EPIC-SOUNDS | Full dataset name | 
    
    
        
    
    |  | ID | sounds/epic_sounds | Datalist id for external indexing | 
    
    
        
    
    |  | Abbreviation | EPIC-SOUNDS | Official dataset abbreviation, e.g. one used in the original paper | 
    
    
        
    
    |  | Provider | University of Oxford, University of Bristol |  | 
    
    
        
    
    |  | Year | 2023 | Dataset release year | 
    
    
        
    
    |  | Modalities | Audio | Video | Data modalities included in the dataset | 
    
    
        
    
    |  | Collection name | EPIC-KITCHENS | Common name for all related datasets, used to group datasets coming from same source | 
    
    
        
    
    |  | Research domain | SED
        
             Strong annotation | Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator | 
    
    
        
    
    |  | Related datasets name |  |  | 
    
    
        
    
    |  | License | Creative Commons, CC BY-NC 4.0 |  | 
    
    
        
    
    |  | Download | Download |  | 
    
    
        
    
    |  | Companion site | Site | Link to the companion site for the dataset | 
    
    
        
    
    |  | Citation | [Huh2023] Epic-Sounds: A Large-scale Dataset of Actions That Sound |  | 
    
                    
    
| Audio | 
    
    | Label | Value | Description | 
    
        
    
    |  | Data | 
        
            
    
    |  |  | Data type | Audio | Possible values: Audio | Features | 
        
        
            
    
    |  |  | File format | 
            
                
    
    |  |  |  | File format type | Constant | Possible values: Constant | Variable | 
            
            
            
                
    
    |  |  |  | Lossy compression | No | is audio compressed in a lossy manner | 
            
            
                
    
    |  |  |  | Bit rate | 16 | Bit depth of audio, possible values: 8 | 16 | 24 | 32 | 
            
            
                
    
    |  |  |  | Sampling rate (kHz) | 24 kHz | Sampling rate in kHz, possible values: 8 | 16 | 22.05 | 32 | 44.1 | 48 | 
            
        
        
            
    
    |  |  | Channels | 
            
                
    
    |  |  |  | Setup | Mono | Possible values: Mono | Stereo | Binaural | Ambisonic | Array | Multi-Channel | Variable | 
            
            
                
    
    |  |  |  | Number of channels | 1 |  | 
            
        
        
            
    
    |  |  | Material | 
            
                
    
    |  |  |  | Source | EPIC-KITCHENS-100 | Possible values: Original | Youtube | Freesound | Online | Crowdsourced | [Dataset name] | 
            
            
        
    
    
        
    
    |  | Content | 
        
            
    
    |  |  | Content type | Freefield | Possible values: Freefield | Synthetic | Isolated | 
        
        
            
    
    |  |  | Scene | Variable | Is the scene class constant for single recording, possible values: Constant | Variable | 
        
        
            
                
    
    |  |  | Event / Spatial location | Moving | Possible values: Constant | Moving | Unknown | 
            
            
        
    
    
        
    
    |  | Recording | 
        
            
    
    |  |  | Setup | Uncontrolled | Possible values: Near-field | Far-field | Mixed | Uncontrolled | Unknown | 
        
        
            
    
    |  |  | Setup count | 1 | Amount of different recording setups (microphone and recording device) used | 
        
        
            
    
    |  |  | Spot type | Unknown | Possible values: Fixed | Moving | Unknown | 
        
    
    
        
    
    |  | Files | 
        
            
    
    |  |  | Count | 700 files | Total number of files | 
        
        
            
    
    |  |  | Total duration (minutes) | 6002 min | Total duration of the dataset in minutes | 
        
        
            
    
    |  |  | File length | Variable | Characterization of the file lengths, possible values: Constant | Quasi-constant | Variable | 
        
        
            
    
    |  |  | File length (seconds) | 10-3708 sec | Approximate length of files | 
        
    
                    
    
| Meta | 
    
    | Label | Value | Description | 
    
        
    
    |  | Types | Event
    
            Timestamp | List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc. | 
    
    
    
        
    
    |  | Event | 
        
            
    
    |  |  | Classes | 44 | Number of event classes | 
        
        
            
    
    |  |  | Classes | False | Possible values: True | False | Almost | 
        
        
            
    
    |  |  | Classes | 
    
            background beep ceramic / glass collision ceramic / marble collision ceramic / wood collision ceramic-only collision click cloth-only collision cut / chop drink / eat footstep glass / marble collision glass-only collision hoover / fan human kettle / mixer / appliance kneading metal / ceramic collision metal / cloth collision metal / glass collision metal / marble collision metal / paper collision metal / plastic collision metal / wood collision metal-only collision open / close paper-only collision plastic / ceramic collision plastic / glass collision plastic / marble collision plastic / paper collision plastic / wood collision plastic-only collision pour rustle scrub / scrape / scour / wipe sizzling / boiling slide object spray stir / mix / whisk water wood / glass collision wood-only collision zip  |  | 
        
        
            
    
    |  |  | Annotation | 
            
                
    
    |  |  |  | Type | Strong | Possible values: Strong | Weak | Location | None | 
            
            
            
                
    
    |  |  |  | Annotations per item | 1 | How many annotations there are available per item (possible multi-annotator setup) | 
            
            
                
    
    |  |  |  | Labelled amount (%) | 66.7 % | Percentage of all data, amount of data which is labelled | 
            
            
                
    
    |  |  |  | Validated amount (%) | 66.7 % | Percentage of all data, amount of data which is validated by human | 
            
            
                
    
    |  |  |  | Strong annotations amount (%) | 66.7 % | Percentage of all data, amount of data which has strong annotations | 
            
            
                
    
    |  |  |  | Overlapping event instances | Yes |  | 
            
        
        
            
    
    |  |  | Labeling | 
            
                
    
    |  |  |  | Hierarchical | No |  | 
            
            
                
    
    |  |  |  | Ontology name | Yes |  | 
            
        
        
            
    
    |  |  | Instance | 
            
                
    
    |  |  |  | Count | 117553 | Count of all event instances in the dataset | 
            
            
                
    
    |  |  |  | Average instances per class | 2672 | Average per class instance count | 
            
        
    
    
    
                    
    
| Cross-validation setup | 
    
    | Label | Value | Description | 
    
        
    
    |  |  | Provided | Yes |  | 
    
    
    
        
    
    |  |  | Sets | Train
        
             Val
        
             Test | Set types provided in the split, possible values: Train | Test | Val | Dev | Eval | 
    
                    
    
| Baseline | 
    
    | Label | Value | Description | 
    
        
    
    |  |  | Download | Download | Link to baseline system source code | 
    
    
        
    
    |  |  | Citation | [Huh2023] | Paper to cite for the baseline |