TAU Urban Audio-Visual Scenes 2021, Development
Audio-Visual Scene Classification
Results table for "Audio-Visual Scene Classification" task with "TAU Urban Audio-Visual Scenes 2021, Development" dataset.
The results table can be sorted by clicking column headers (the second click will change the sorting order). Detailed information about the publication where results were published can be seen by clicking the item in the "Publication" column.
Rank | Publication | Year | ID | Publication / Title | Model identifier | accuracy | logloss |
---|---|---|---|---|---|---|---|
Naranjo-Alcazar2021 | 2021 | c8fb7b05910bbbe89aa00c8b3b00ef2c | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Multi-Modal (Late Fusion), Gammatone | 90.0 | ||
Naranjo-Alcazar2021 | 2021 | 86e05ed25150c29251245e6eda38b2dc | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Multi-Modal (Early Fusion), Gammatone | 89.2 | ||
Naranjo-Alcazar2021 | 2021 | af239bc38de7f2dfea9c2481b1fce164 | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Multi-Modal (Late Fusion), log-Mel | 88.7 | ||
Naranjo-Alcazar2021 | 2021 | 3977cb2075a33f63981b7c7d50af363d | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Multi-Modal (Early Fusion), log-Mel | 88.5 | ||
Naranjo-Alcazar2021 | 2021 | 005d74f476c41f13fd7b3cfe9d3e6748 | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Visual-Only, log-Mel | 87.0 | ||
Naranjo-Alcazar2021 | 2021 | 4e0a72eb73e91540742d968bcb9c4790 | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Audio-Only, log-Mel | 68.4 | ||
Naranjo-Alcazar2021 | 2021 | 70734aaf94506a2df347d4fd9fbe69e8 | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Visual-Only, Gammetone | 87.0 | ||
Naranjo-Alcazar2021 | 2021 | 549060d6b669175ae0d87b6b69039d10 | Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification | Audio-Only, Gammetone | 69.0 | ||
Okazaki2021 | 2021 | 069fa08bfe899e3354f42eed0cfd5f67 | A Multi-Modal Fusion Approach for Audio-Visual Scene Classification Enhanced by CLIP Variants | E02 | 96.1 | 0.149 | |
Okazaki2021 | 2021 | 82a18b92c57e21f04fe642d3a0d460e2 | A Multi-Modal Fusion Approach for Audio-Visual Scene Classification Enhanced by CLIP Variants | E01 | 95.8 | 0.238 |