Datasets

Classes for dataset handling

Dataset

dcase_util.datasets.Dataset

This is the base class, and all the specialized datasets are inherited from it. One should never use base class itself.

Usage examples:

 1# Create class
 2dataset = TUTAcousticScenes_2017_DevelopmentSet(data_path='data')
 3# Initialize dataset, this will make sure dataset is downloaded, packages are extracted,
 4# and needed meta files are created
 5dataset.initialize()
 6# Show meta data
 7dataset.meta.show()
 8# Get all evaluation setup folds
 9folds = dataset.folds()
10# Get all evaluation setup folds
11train_data_fold1 = dataset.train(fold=folds[0])
12test_data_fold1 = dataset.test(fold=folds[0])

Dataset([name, storage_name, data_path, ...])

Dataset base class

Dataset.initialize()

Initialize the dataset, download, extract files and prepare the dataset for the usage.

Dataset.download_packages(**kwargs)

Download dataset packages over the internet to the local path

Dataset.extract_packages(**kwargs)

Extract the dataset packages

Dataset.debug_packages([local_check, ...])

Debug remote packages associated to the dataset.

Dataset.prepare()

Prepare dataset for the usage.

Dataset.process_meta_item(item[, absolute_path])

Process single meta data item

Dataset.check_filelist()

Generates hash from file list and check does it matches with one saved in filelist.hash.

Dataset.show([mode, indent, show_meta])

Show dataset information.

Dataset.log([show_meta])

Log dataset information.

Dataset.load()

Load dataset meta data and cross-validation sets into the container.

Dataset.load_meta()

Load meta data into the container.

Dataset.load_crossvalidation_data()

Load cross-validation into the container.

Dataset.audio_files

Get all audio files in the dataset

Dataset.audio_file_count

Get number of audio files in dataset

Dataset.meta

Get meta data for dataset.

Dataset.meta_count

Number of meta data items.

Dataset.error_meta

Get audio error meta data for dataset.

Dataset.error_meta_count

Number of error meta data items.

Dataset.folds([mode])

List of fold ids

Dataset.fold_count

Number of fold in the evaluation setup.

Dataset.evaluation_setup_filename([...])

Evaluation setup filename generation.

Dataset.train([fold, absolute_paths])

List of training items.

Dataset.test([fold, absolute_paths])

List of testing items.

Dataset.eval([fold, absolute_paths])

List of evaluation items.

Dataset.train_files([fold, absolute_paths])

List of training files.

Dataset.test_files([fold, absolute_paths])

List of testing files.

Dataset.eval_files([fold, absolute_paths])

List of evaluation files.

Dataset.validation_split([fold, ...])

List of validation files.

Dataset.validation_files_dataset([fold])

List of validation files delivered by the dataset.

Dataset.validation_files_random([fold, ...])

List of validation files selected randomly from the training material.

Dataset.validation_files_balanced([fold, ...])

List of validation files randomly selecting while maintaining data balance.

Dataset.scene_labels()

List of unique scene labels in the meta data.

Dataset.scene_label_count()

Number of unique scene labels in the meta data.

Dataset.event_labels(**kwargs)

List of unique event labels in the meta data.

Dataset.event_label_count(**kwargs)

Number of unique event labels in the meta data.

Dataset.tags()

List of unique audio tags in the meta data.

Dataset.tag_count()

Number of unique audio tags in the meta data.

Dataset.file_meta(filename)

Meta data for given file

Dataset.file_error_meta(filename)

Error meta data for given file

Dataset.file_features(filename)

Pre-calculated acoustic features for given file

Dataset.relative_to_absolute_path(path)

Converts relative path into absolute path.

Dataset.absolute_to_relative_path(path)

Converts absolute path into relative path.

Dataset.dataset_bytes()

Total download size of the dataset in bytes.

Dataset.dataset_size_string()

Total download size of the dataset in a string.

Dataset.dataset_size_on_disk()

Total size of the dataset currently stored locally.

AcousticSceneDataset

dcase_util.datasets.AcousticSceneDataset

AcousticSceneDataset(*args, **kwargs)

Acoustic scene dataset baseclass

Specialized classes inherited AcousticSceneDataset:

TUTAcousticScenes_2016_DevelopmentSet([...])

TUT Acoustic scenes 2016 development dataset

TUTAcousticScenes_2016_EvaluationSet([...])

TUT Acoustic scenes 2016 evaluation dataset

TUTAcousticScenes_2017_DevelopmentSet([...])

TUT Acoustic scenes 2017 development dataset

TUTAcousticScenes_2017_EvaluationSet([...])

TUT Acoustic scenes 2017 evaluation dataset

TUTAcousticScenes_2017_FeaturesSet([...])

TUT Acoustic scenes 2017 features dataset

TUTUrbanAcousticScenes_2018_DevelopmentSet([...])

TUT Urban Acoustic Scenes 2018 Development dataset

TUTUrbanAcousticScenes_2018_LeaderboardSet([...])

TUT Urban Acoustic Scenes 2018 Leaderboard dataset

TUTUrbanAcousticScenes_2018_EvaluationSet([...])

TUT Urban Acoustic Scenes 2018 Evaluation dataset

TUTUrbanAcousticScenes_2018_Mobile_DevelopmentSet([...])

TUT Urban Acoustic Scenes 2018 Mobile Development dataset

TUTUrbanAcousticScenes_2018_Mobile_LeaderboardSet([...])

TUT Urban Acoustic Scenes 2018 Mobile Leaderboard dataset

TUTUrbanAcousticScenes_2018_Mobile_EvaluationSet([...])

TUT Urban Acoustic Scenes 2018 Mobile Evaluation dataset

TAUUrbanAcousticScenes_2019_DevelopmentSet([...])

TAU Urban Acoustic Scenes 2019 Development dataset

TAUUrbanAcousticScenes_2019_LeaderboardSet([...])

TAU Urban Acoustic Scenes 2019 Leaderboard dataset

TAUUrbanAcousticScenes_2019_EvaluationSet([...])

TAU Urban Acoustic Scenes 2019 Evaluation dataset

TAUUrbanAcousticScenes_2019_Mobile_DevelopmentSet([...])

TAU Urban Acoustic Scenes 2019 Mobile Development dataset

TAUUrbanAcousticScenes_2019_Mobile_LeaderboardSet([...])

TAU Urban Acoustic Scenes 2019 Mobile Leaderboard dataset

TAUUrbanAcousticScenes_2019_Mobile_EvaluationSet([...])

TAU Urban Acoustic Scenes 2019 Mobile Evaluation dataset

TAUUrbanAcousticScenes_2019_Openset_DevelopmentSet([...])

TAU Urban Acoustic Scenes 2019 Open set Development dataset

TAUUrbanAcousticScenes_2019_Openset_LeaderboardSet([...])

TAU Urban Acoustic Scenes 2019 Open set Leaderboard dataset

TAUUrbanAcousticScenes_2019_Openset_EvaluationSet([...])

TAU Urban Acoustic Scenes 2019 Open set Evaluation dataset

TAUUrbanAcousticScenes_2020_Mobile_DevelopmentSet([...])

TAU Urban Acoustic Scenes 2020 Mobile Development dataset

TAUUrbanAcousticScenes_2020_Mobile_EvaluationSet([...])

TAU Urban Acoustic Scenes 2020 Mobile Evaluation dataset

TAUUrbanAcousticScenes_2020_3Class_DevelopmentSet([...])

TAU Urban Acoustic Scenes 2020 3Class Development dataset

TAUUrbanAcousticScenes_2020_3Class_EvaluationSet([...])

TAU Urban Acoustic Scenes 2020 3Class Evaluation dataset

TAUUrbanAudioVisualScenes_2021_DevelopmentSet([...])

TAU Urban Audio-Visual Scenes 2021 Development dataset

TAUUrbanAudioVisualScenes_2021_EvaluationSet([...])

TAU Urban Audio-Visual Scenes 2021 Evaluation dataset

TAUUrbanAcousticScenes_2021_Mobile_EvaluationSet([...])

TAU Urban Acoustic Scenes 2021 Mobile Evaluation dataset

TAUUrbanAcousticScenes_2022_Mobile_DevelopmentSet([...])

TAU Urban Acoustic Scenes 2022 Mobile Development dataset

TAUUrbanAcousticScenes_2022_Mobile_EvaluationSet([...])

TAU Urban Acoustic Scenes 2022 Mobile Evaluation dataset

DCASE2018_Task5_DevelopmentSet([...])

Task 5, Monitoring of domestic activities based on multi-channel acoustics, development set

DCASE2018_Task5_EvaluationSet([...])

Task 5, Monitoring of domestic activities based on multi-channel acoustics, evaluation set

SoundEventDataset

dcase_util.datasets.SoundEventDataset

SoundEventDataset(*args, **kwargs)

Sound event dataset baseclass

SoundEventDataset.event_label_count([...])

Number of unique scene labels in the meta data.

SoundEventDataset.event_labels([scene_label])

List of unique event labels in the meta data.

SoundEventDataset.train([fold, ...])

List of training items.

SoundEventDataset.test([fold, ...])

List of testing items.

Specialized classes inherited SoundEventDataset:

TUTRareSoundEvents_2017_DevelopmentSet([...])

TUT Acoustic scenes 2017 development dataset

TUTRareSoundEvents_2017_EvaluationSet([...])

TUT Acoustic scenes 2017 evaluation dataset

TUTSoundEvents_2017_DevelopmentSet([...])

TUT Sound events 2017 development dataset

TUTSoundEvents_2017_EvaluationSet([...])

TUT Sound events 2017 evaluation dataset

TUTSoundEvents_2016_DevelopmentSet([...])

TUT Sound events 2016 development dataset

TUTSoundEvents_2016_EvaluationSet([...])

TUT Sound events 2016 evaluation dataset

TUT_SED_Synthetic_2016([storage_name, ...])

TUT SED Synthetic 2016

AudioTaggingDataset

dcase_util.datasets.AudioTaggingDataset

AudioTaggingDataset(*args, **kwargs)

Audio tag dataset baseclass

DCASE2017_Task4tagging_DevelopmentSet([...])

DCASE 2017 Large-scale weakly supervised sound event detection for smart cars

DCASE2017_Task4tagging_EvaluationSet([...])

DCASE 2017 Large-scale weakly supervised sound event detection for smart cars

CHiMEHome_DomesticAudioTag_DevelopmentSet([...])

Constructor

Helpers

dcase_util.datasets.

Helper functions to access Dataset classes.

dataset_list([data_path, group, display])

List of datasets available

dataset_factory(dataset_class_name, **kwargs)

Factory to get correct dataset class based on name

dataset_exists(dataset_class_name)

Check dataset class based on name