dcase_util.datasets.AudioTaggingDataset
- class dcase_util.datasets.AudioTaggingDataset(*args, **kwargs)[source]
Audio tag dataset baseclass
Constructor
- Parameters
- namestr
Dataset name Default value ‘dataset’
- storage_namestr
Name to be used when storing dataset on disk Default value ‘dataset’
- data_pathstr
Root path where the dataset is stored. If None, os.path.join(tempfile.gettempdir(), ‘dcase_util_datasets’) is used Default value None
- local_pathstr
Direct storage path setup for the dataset. If None, data_path and storage_name are used to create one Default value None
- show_progress_in_consolebool
Show progress in console Default value True
- log_system_progressbool
Show progress in log Default value True
- use_ascii_progress_barbool
Show progress bar using ASCII characters. Use this if your console does not support UTF-8 characters. Default value True
- dataset_groupstr
Dataset group label, one of [‘scene’, ‘event’, ‘tag’] Default value ‘base class’
- dataset_metadict
Dictionary containing metadata about the dataset, e.g., collecting device information, dataset authors. Default value None
- evaluation_setup_folderstr
Directory name where evaluation setup files are stores Default value ‘evaluation_setup’
- evaluation_setup_file_extensionstr
Setup file extension Default value ‘txt’
- meta_filenamestr
Filename to be used for main meta file (contains all files with their reference data) of the dataset Default value ‘meta.txt’
- error_meta_filenamestr
Filename for the error annotation file Default value ‘error.txt’
- filelisthash_filenamestr
Filename for filelist hash file Default value ‘filelist.python.hash’
- filelisthash_exclude_dirsstr
Directories to be excluded from filelist hash calculation Default value None
- crossvalidation_foldsint
Count fo cross-validation folds. Indexing starts from one. Default value None
- package_listlist of dict
Package list, remote files associated to the dataset. Item format: {
‘content_type’: ‘documentation’, # Possible values [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’,
- ‘documentation’, ‘examples’]
‘remote_file’: ‘https://zenodo.org/record/45759/files/TUT-sound-events-2016-development.doc.zip’, # URL ‘remote_bytes’: 70918, # Size of remote file in bytes ‘remote_md5’: ‘33fd26a895530aef607a07b08704eacd’, # MD5 hash of remote file ‘filename’: ‘TUT-sound-events-2016-development.doc.zip’, # Filename relative to self.local_path always
} Default value None
- package_extract_parametersdict
Extra parameters for package extraction. Default value None
- included_content_typeslist of str or str
Indicates what content type should be processed. One or multiple from [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’, ‘documentation’, ‘examples’]. If None given, [‘all’] is used. Parameter can be also comma separated string. Default value None
- audio_pathslist of str
List of paths to include audio material associated to the dataset. If None given, [‘audio’] is used. Default value None
- default_audio_extensionstr
Default audio extension Default value ‘wav’
- reference_data_presentbool
Reference data is delivered with the dataset Default value True
- check_metabool
Check meta data during the initilization. Default value True
- active_scenes: list of str
List of active scene classes, if none given all classes are considered. Default value None
- active_events: list of str
List of active event classes, if none given all classes are considered. Default value None
- __init__(*args, **kwargs)[source]
Constructor
- Parameters
- namestr
Dataset name Default value ‘dataset’
- storage_namestr
Name to be used when storing dataset on disk Default value ‘dataset’
- data_pathstr
Root path where the dataset is stored. If None, os.path.join(tempfile.gettempdir(), ‘dcase_util_datasets’) is used Default value None
- local_pathstr
Direct storage path setup for the dataset. If None, data_path and storage_name are used to create one Default value None
- show_progress_in_consolebool
Show progress in console Default value True
- log_system_progressbool
Show progress in log Default value True
- use_ascii_progress_barbool
Show progress bar using ASCII characters. Use this if your console does not support UTF-8 characters. Default value True
- dataset_groupstr
Dataset group label, one of [‘scene’, ‘event’, ‘tag’] Default value ‘base class’
- dataset_metadict
Dictionary containing metadata about the dataset, e.g., collecting device information, dataset authors. Default value None
- evaluation_setup_folderstr
Directory name where evaluation setup files are stores Default value ‘evaluation_setup’
- evaluation_setup_file_extensionstr
Setup file extension Default value ‘txt’
- meta_filenamestr
Filename to be used for main meta file (contains all files with their reference data) of the dataset Default value ‘meta.txt’
- error_meta_filenamestr
Filename for the error annotation file Default value ‘error.txt’
- filelisthash_filenamestr
Filename for filelist hash file Default value ‘filelist.python.hash’
- filelisthash_exclude_dirsstr
Directories to be excluded from filelist hash calculation Default value None
- crossvalidation_foldsint
Count fo cross-validation folds. Indexing starts from one. Default value None
- package_listlist of dict
Package list, remote files associated to the dataset. Item format: {
‘content_type’: ‘documentation’, # Possible values [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’,
- ‘documentation’, ‘examples’]
‘remote_file’: ‘https://zenodo.org/record/45759/files/TUT-sound-events-2016-development.doc.zip’, # URL ‘remote_bytes’: 70918, # Size of remote file in bytes ‘remote_md5’: ‘33fd26a895530aef607a07b08704eacd’, # MD5 hash of remote file ‘filename’: ‘TUT-sound-events-2016-development.doc.zip’, # Filename relative to self.local_path always
} Default value None
- package_extract_parametersdict
Extra parameters for package extraction. Default value None
- included_content_typeslist of str or str
Indicates what content type should be processed. One or multiple from [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’, ‘documentation’, ‘examples’]. If None given, [‘all’] is used. Parameter can be also comma separated string. Default value None
- audio_pathslist of str
List of paths to include audio material associated to the dataset. If None given, [‘audio’] is used. Default value None
- default_audio_extensionstr
Default audio extension Default value ‘wav’
- reference_data_presentbool
Reference data is delivered with the dataset Default value True
- check_metabool
Check meta data during the initilization. Default value True
- active_scenes: list of str
List of active scene classes, if none given all classes are considered. Default value None
- active_events: list of str
List of active event classes, if none given all classes are considered. Default value None
Methods
__init__
(*args, **kwargs)Constructor
absolute_to_relative_path
(path)Converts absolute path into relative path.
check_filelist
()Generates hash from file list and check does it matches with one saved in filelist.hash.
check_metadata
()Checking meta data and cross-validation setup.
dataset_bytes
()Total download size of the dataset in bytes.
dataset_size_on_disk
()Total size of the dataset currently stored locally.
dataset_size_string
()Total download size of the dataset in a string.
debug_packages
([local_check, remote_check])Debug remote packages associated to the dataset.
download_packages
(**kwargs)Download dataset packages over the internet to the local path
eval
([fold, absolute_paths])List of evaluation items.
eval_files
([fold, absolute_paths])List of evaluation files.
evaluation_setup_filename
([setup_part, ...])Evaluation setup filename generation.
event_label_count
(**kwargs)Number of unique event labels in the meta data.
event_labels
(**kwargs)List of unique event labels in the meta data.
extract_packages
(**kwargs)Extract the dataset packages
file_error_meta
(filename)Error meta data for given file
file_features
(filename)Pre-calculated acoustic features for given file
file_meta
(filename)Meta data for given file
folds
([mode])List of fold ids
initialize
()Initialize the dataset, download, extract files and prepare the dataset for the usage.
load
()Load dataset meta data and cross-validation sets into the container.
load_crossvalidation_data
()Load cross-validation into the container.
load_meta
()Load meta data into the container.
log
([show_meta])Log dataset information.
prepare
()Prepare dataset for the usage.
process_meta_container
(container)Process meta container.
process_meta_item
(item[, absolute_path])Process single meta data item
relative_to_absolute_path
(path)Converts relative path into absolute path.
scene_label_count
()Number of unique scene labels in the meta data.
scene_labels
()List of unique scene labels in the meta data.
show
([mode, indent, show_meta])Show dataset information.
tag_count
()Number of unique audio tags in the meta data.
tags
()List of unique audio tags in the meta data.
test
([fold, absolute_paths])List of testing items.
test_files
([fold, absolute_paths])List of testing files.
train
([fold, absolute_paths])List of training items.
train_files
([fold, absolute_paths])List of training files.
validation_files_balanced
([fold, ...])List of validation files randomly selecting while maintaining data balance.
validation_files_dataset
([fold])List of validation files delivered by the dataset.
validation_files_random
([fold, ...])List of validation files selected randomly from the training material.
validation_split
([fold, training_meta, ...])List of validation files.
Attributes
audio_file_count
Get number of audio files in dataset
audio_files
Get all audio files in the dataset
error_meta
Get audio error meta data for dataset.
error_meta_count
Number of error meta data items.
fold_count
Number of fold in the evaluation setup.
logger
meta
Get meta data for dataset.
meta_count
Number of meta data items.