| General information | 
    
    | Label | Value | Description | 
    
        
    
    |  | Name | SONYC Urban Sound Tagging | Full dataset name | 
    
    
        
    
    |  | ID | sounds/sonyc | Datalist id for external indexing | 
    
    
        
    
    |  | Abbreviation | SONYC-UST | Official dataset abbreviation, e.g. one used in the original paper | 
    
    
        
    
    |  | Provider | NYU |  | 
    
    
        
    
    |  | Year | 2020 | Dataset release year | 
    
    
        
    
    |  | Modalities | Audio | Data modalities included in the dataset | 
    
    
        
    
    |  | Collection name | SONYC | Common name for all related datasets, used to group datasets coming from same source | 
    
    
        
    
    |  | Research domain | Tagging
        
             Weak annotation
        
             Urban | Related domains, e.g., Scenes, Mobile devices, Audio-visual, Open set, Ambient noise, Unlabelled, Multiple sensors, SED, SELD, Tagging, FL, Strong annotation, Weak annotation, Unlabelled, Multi-annotator | 
    
    
    
        
    
    |  | License | Creative Commons, CC BY 4.0 |  | 
    
    
        
    
    |  | Download | Download
    
    (4.45GB) |  | 
    
    
    
        
    
    |  | Citation | [Bello2019] Sonyc: a system for monitoring, analyzing, and mitigating urban noise pollution. |  | 
    
                    
    
| Audio | 
    
    | Label | Value | Description | 
    
        
    
    |  | Data | 
        
            
    
    |  |  | Data type | Audio | Possible values: Audio | Features | 
        
        
            
    
    |  |  | File format | 
            
                
    
    |  |  |  | File format type | Constant | Possible values: Constant | Variable | 
            
            
                
    
    |  |  |  | File format | wav | Possible value: wav | aiff | flac | mp3 | aac | ogg | 
            
            
                
    
    |  |  |  | Lossy compression | No | is audio compressed in a lossy manner | 
            
            
                
    
    |  |  |  | Bit rate | 16 | Bit depth of audio, possible values: 8 | 16 | 24 | 32 | 
            
            
                
    
    |  |  |  | Sampling rate (kHz) | 48 kHz | Sampling rate in kHz, possible values: 8 | 16 | 22.05 | 32 | 44.1 | 48 | 
            
        
        
            
    
    |  |  | Channels | 
            
                
    
    |  |  |  | Setup | Mono | Possible values: Mono | Stereo | Binaural | Ambisonic | Array | Multi-Channel | Variable | 
            
            
                
    
    |  |  |  | Number of channels | 1 |  | 
            
        
        
            
    
    |  |  | Material | 
            
                
    
    |  |  |  | Source | Original | Possible values: Original | Youtube | Freesound | Online | Crowdsourced | [Dataset name] | 
            
            
        
    
    
        
    
    |  | Content | 
        
            
    
    |  |  | Content type | Freefield | Possible values: Freefield | Synthetic | Isolated | 
        
        
            
    
    |  |  | Scene | Constant | Is the scene class constant for single recording, possible values: Constant | Variable | 
        
        
            
                
    
    |  |  | Event / Spatial location | Unknown | Possible values: Constant | Moving | Unknown | 
            
            
        
    
    
        
    
    |  | Recording | 
        
            
    
    |  |  | Setup | Uncontrolled | Possible values: Near-field | Far-field | Mixed | Uncontrolled | Unknown | 
        
        
            
    
    |  |  | Setup count | 1 | Amount of different recording setups (microphone and recording device) used | 
        
        
            
    
    |  |  | Spot type | Fixed | Possible values: Fixed | Moving | Unknown | 
        
    
    
        
    
    |  | Files | 
        
            
    
    |  |  | Count | 18510 files | Total number of files | 
        
        
            
    
    |  |  | Total duration (minutes) | 3085 min | Total duration of the dataset in minutes | 
        
        
            
    
    |  |  | File length | Constant | Characterization of the file lengths, possible values: Constant | Quasi-constant | Variable | 
        
        
            
    
    |  |  | File length (seconds) | 10 sec | Approximate length of files | 
        
    
                    
    
| Meta | 
    
    | Label | Value | Description | 
    
        
    
    |  | Types | Tag
        
             Annotator
        
             Timestamp
        
             Geolocation
        
             Presence
        
             Proximity | List of meta data types provided for the data, possible values: Event, Tag, Scene, Caption, Geolocation, Spatial location, Annotator, Timestamp, Presence, Proximity, etc. | 
    
    
    
        
    
    |  | Event | 
        
            
    
    |  |  | Classes | 23 | Number of event classes | 
        
        
        
            
    
    |  |  | Classes | 
    
            alert signal amplified speech car alarm car horn chainsaw dog dog barking whining engine engine of uncertain size hoe ram human voice ice-cream truck jackhammer large crowd large rotating saw large sounding engine machinery impact medium sounding engine mobile music music music from uncertain source non-machinery impact other unknown alert signal other unknown human voice other unknown impact machinery other unknown powered saw person or small group shouting person or small group talking pile driver powered saw reverse beeper rock drill siren small medium rotating saw small sounding engine stationary music  |  | 
        
        
            
    
    |  |  | Annotation | 
            
                
    
    |  |  |  | Type | Weak | Possible values: Strong | Weak | Location | None | 
            
            
                
    
    |  |  |  | Source | Experts
        
             Crowdsourced | Possible values: Experts | Crowdsourced | Synthetic | Metadata | Automatic | 
            
            
                
    
    |  |  |  | Annotations per item | 3 | How many annotations there are available per item (possible multi-annotator setup) | 
            
            
                
    
    |  |  |  | Labelled amount (%) | 100 % | Percentage of all data, amount of data which is labelled | 
            
            
            
                
    
    |  |  |  | Strong annotations amount (%) | 0 % | Percentage of all data, amount of data which has strong annotations | 
            
            
                
    
    |  |  |  | Overlapping event instances | Yes |  | 
            
        
        
            
    
    |  |  | Labeling | 
            
                
    
    |  |  |  | Hierarchical | Yes |  | 
            
            
                
    
    |  |  |  | Ontology name | Yes |  | 
            
        
        
            
    
    |  |  | Instance | 
            
                
    
    |  |  |  | Count | 18510 | Count of all event instances in the dataset | 
            
            
                
    
    |  |  |  | Average instances per class | 804.78 | Average per class instance count | 
            
        
    
    
    
                    
    
| Cross-validation setup | 
    
    | Label | Value | Description | 
    
        
    
    |  |  | Provided | Yes |  | 
    
    
        
    
    |  |  | Folds | 3 |  | 
    
    
        
    
    |  |  | Sets | Train
        
             Test
        
             Val | Set types provided in the split, possible values: Train | Test | Val | Dev | Eval | 
    
                    
    
| Baseline | 
    
    | Label | Value | Description | 
    
        
    
    |  |  | Download | Download | Link to baseline system source code | 
    
    
                    
    
| Info | 
    
    | Label | Value | Description | 
    
        
    
    |  |  | Evaluation campaign | DCASE2019 task5, DCASE2020 task5 | Evaluation campaigns where the dataset was used. |