dcase_util.datasets.SoundEventDataset

class dcase_util.datasets.SoundEventDataset(*args, **kwargs)[source]

Sound event dataset baseclass

Constructor

Parameters
namestr

Dataset name Default value ‘dataset’

storage_namestr

Name to be used when storing dataset on disk Default value ‘dataset’

data_pathstr

Root path where the dataset is stored. If None, os.path.join(tempfile.gettempdir(), ‘dcase_util_datasets’) is used Default value None

local_pathstr

Direct storage path setup for the dataset. If None, data_path and storage_name are used to create one Default value None

show_progress_in_consolebool

Show progress in console Default value True

log_system_progressbool

Show progress in log Default value True

use_ascii_progress_barbool

Show progress bar using ASCII characters. Use this if your console does not support UTF-8 characters. Default value True

dataset_groupstr

Dataset group label, one of [‘scene’, ‘event’, ‘tag’] Default value ‘base class’

dataset_metadict

Dictionary containing metadata about the dataset, e.g., collecting device information, dataset authors. Default value None

evaluation_setup_folderstr

Directory name where evaluation setup files are stores Default value ‘evaluation_setup’

evaluation_setup_file_extensionstr

Setup file extension Default value ‘txt’

meta_filenamestr

Filename to be used for main meta file (contains all files with their reference data) of the dataset Default value ‘meta.txt’

error_meta_filenamestr

Filename for the error annotation file Default value ‘error.txt’

filelisthash_filenamestr

Filename for filelist hash file Default value ‘filelist.python.hash’

filelisthash_exclude_dirsstr

Directories to be excluded from filelist hash calculation Default value None

crossvalidation_foldsint

Count fo cross-validation folds. Indexing starts from one. Default value None

package_listlist of dict

Package list, remote files associated to the dataset. Item format: {

‘content_type’: ‘documentation’, # Possible values [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’,

‘documentation’, ‘examples’]

‘remote_file’: ‘https://zenodo.org/record/45759/files/TUT-sound-events-2016-development.doc.zip’, # URL ‘remote_bytes’: 70918, # Size of remote file in bytes ‘remote_md5’: ‘33fd26a895530aef607a07b08704eacd’, # MD5 hash of remote file ‘filename’: ‘TUT-sound-events-2016-development.doc.zip’, # Filename relative to self.local_path always

} Default value None

package_extract_parametersdict

Extra parameters for package extraction. Default value None

included_content_typeslist of str or str

Indicates what content type should be processed. One or multiple from [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’, ‘documentation’, ‘examples’]. If None given, [‘all’] is used. Parameter can be also comma separated string. Default value None

audio_pathslist of str

List of paths to include audio material associated to the dataset. If None given, [‘audio’] is used. Default value None

default_audio_extensionstr

Default audio extension Default value ‘wav’

reference_data_presentbool

Reference data is delivered with the dataset Default value True

check_metabool

Check meta data during the initilization. Default value True

active_scenes: list of str

List of active scene classes, if none given all classes are considered. Default value None

active_events: list of str

List of active event classes, if none given all classes are considered. Default value None

__init__(*args, **kwargs)[source]

Constructor

Parameters
namestr

Dataset name Default value ‘dataset’

storage_namestr

Name to be used when storing dataset on disk Default value ‘dataset’

data_pathstr

Root path where the dataset is stored. If None, os.path.join(tempfile.gettempdir(), ‘dcase_util_datasets’) is used Default value None

local_pathstr

Direct storage path setup for the dataset. If None, data_path and storage_name are used to create one Default value None

show_progress_in_consolebool

Show progress in console Default value True

log_system_progressbool

Show progress in log Default value True

use_ascii_progress_barbool

Show progress bar using ASCII characters. Use this if your console does not support UTF-8 characters. Default value True

dataset_groupstr

Dataset group label, one of [‘scene’, ‘event’, ‘tag’] Default value ‘base class’

dataset_metadict

Dictionary containing metadata about the dataset, e.g., collecting device information, dataset authors. Default value None

evaluation_setup_folderstr

Directory name where evaluation setup files are stores Default value ‘evaluation_setup’

evaluation_setup_file_extensionstr

Setup file extension Default value ‘txt’

meta_filenamestr

Filename to be used for main meta file (contains all files with their reference data) of the dataset Default value ‘meta.txt’

error_meta_filenamestr

Filename for the error annotation file Default value ‘error.txt’

filelisthash_filenamestr

Filename for filelist hash file Default value ‘filelist.python.hash’

filelisthash_exclude_dirsstr

Directories to be excluded from filelist hash calculation Default value None

crossvalidation_foldsint

Count fo cross-validation folds. Indexing starts from one. Default value None

package_listlist of dict

Package list, remote files associated to the dataset. Item format: {

‘content_type’: ‘documentation’, # Possible values [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’,

‘documentation’, ‘examples’]

‘remote_file’: ‘https://zenodo.org/record/45759/files/TUT-sound-events-2016-development.doc.zip’, # URL ‘remote_bytes’: 70918, # Size of remote file in bytes ‘remote_md5’: ‘33fd26a895530aef607a07b08704eacd’, # MD5 hash of remote file ‘filename’: ‘TUT-sound-events-2016-development.doc.zip’, # Filename relative to self.local_path always

} Default value None

package_extract_parametersdict

Extra parameters for package extraction. Default value None

included_content_typeslist of str or str

Indicates what content type should be processed. One or multiple from [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’, ‘documentation’, ‘examples’]. If None given, [‘all’] is used. Parameter can be also comma separated string. Default value None

audio_pathslist of str

List of paths to include audio material associated to the dataset. If None given, [‘audio’] is used. Default value None

default_audio_extensionstr

Default audio extension Default value ‘wav’

reference_data_presentbool

Reference data is delivered with the dataset Default value True

check_metabool

Check meta data during the initilization. Default value True

active_scenes: list of str

List of active scene classes, if none given all classes are considered. Default value None

active_events: list of str

List of active event classes, if none given all classes are considered. Default value None

Methods

__init__(*args, **kwargs)

Constructor

absolute_to_relative_path(path)

Converts absolute path into relative path.

check_filelist()

Generates hash from file list and check does it matches with one saved in filelist.hash.

check_metadata()

Checking meta data and cross-validation setup.

dataset_bytes()

Total download size of the dataset in bytes.

dataset_size_on_disk()

Total size of the dataset currently stored locally.

dataset_size_string()

Total download size of the dataset in a string.

debug_packages([local_check, remote_check])

Debug remote packages associated to the dataset.

download_packages(**kwargs)

Download dataset packages over the internet to the local path

eval([fold, absolute_paths, scene_label, ...])

List of evaluation items.

eval_files([fold, absolute_paths, ...])

List of evaluation files.

evaluation_setup_filename([setup_part, ...])

Evaluation setup filename generation.

event_label_count([scene_label])

Number of unique scene labels in the meta data.

event_labels([scene_label])

List of unique event labels in the meta data.

extract_packages(**kwargs)

Extract the dataset packages

file_error_meta(filename)

Error meta data for given file

file_features(filename)

Pre-calculated acoustic features for given file

file_meta(filename)

Meta data for given file

folds([mode])

List of fold ids

initialize()

Initialize the dataset, download, extract files and prepare the dataset for the usage.

load()

Load dataset meta data and cross-validation sets into the container.

load_crossvalidation_data()

Load cross-validation into the container.

load_meta()

Load meta data into the container.

log([show_meta])

Log dataset information.

prepare()

Prepare dataset for the usage.

process_meta_container(container)

Process meta container.

process_meta_item(item[, absolute_path])

Process single meta data item

relative_to_absolute_path(path)

Converts relative path into absolute path.

scene_label_count()

Number of unique scene labels in the meta data.

scene_labels()

List of unique scene labels in the meta data.

show([mode, indent, show_meta])

Show dataset information.

tag_count()

Number of unique audio tags in the meta data.

tags()

List of unique audio tags in the meta data.

test([fold, absolute_paths, scene_label, ...])

List of testing items.

test_files([fold, absolute_paths, ...])

List of testing files.

train([fold, absolute_paths, scene_label, ...])

List of training items.

train_files([fold, absolute_paths, ...])

List of training files.

validation_files_balanced([fold, ...])

List of validation files randomly selecting while maintaining data balance.

validation_files_dataset([fold])

List of validation files delivered by the dataset.

validation_files_random([fold, ...])

List of validation files selected randomly from the training material.

validation_split([fold, training_meta, ...])

List of validation files.

Attributes

audio_file_count

Get number of audio files in dataset

audio_files

Get all audio files in the dataset

error_meta

Get audio error meta data for dataset.

error_meta_count

Number of error meta data items.

fold_count

Number of fold in the evaluation setup.

logger

meta

Get meta data for dataset.

meta_count

Number of meta data items.