dcase_util.datasets.Dataset
- class dcase_util.datasets.Dataset(name='dataset', storage_name='dataset', data_path=None, local_path=None, show_progress_in_console=True, log_system_progress=True, use_ascii_progress_bar=True, dataset_group='base class', dataset_meta=None, evaluation_setup_folder='evaluation_setup', evaluation_setup_file_extension='txt', meta_filename='meta.txt', error_meta_filename='error.txt', filelisthash_filename='filelist.python.hash', filelisthash_exclude_dirs=None, crossvalidation_folds=None, package_list=None, package_extract_parameters=None, included_content_types=None, audio_paths=None, default_audio_extension='wav', reference_data_present=True, check_meta=True, active_scenes=None, active_events=None, **kwargs)[source]
Dataset base class
The specific dataset classes are inherited from this class, and only needed methods are reimplemented.
Constructor
- Parameters
- namestr
Dataset name Default value ‘dataset’
- storage_namestr
Name to be used when storing dataset on disk Default value ‘dataset’
- data_pathstr
Root path where the dataset is stored. If None, os.path.join(tempfile.gettempdir(), ‘dcase_util_datasets’) is used Default value None
- local_pathstr
Direct storage path setup for the dataset. If None, data_path and storage_name are used to create one Default value None
- show_progress_in_consolebool
Show progress in console Default value True
- log_system_progressbool
Show progress in log Default value True
- use_ascii_progress_barbool
Show progress bar using ASCII characters. Use this if your console does not support UTF-8 characters. Default value True
- dataset_groupstr
Dataset group label, one of [‘scene’, ‘event’, ‘tag’] Default value ‘base class’
- dataset_metadict
Dictionary containing metadata about the dataset, e.g., collecting device information, dataset authors. Default value None
- evaluation_setup_folderstr
Directory name where evaluation setup files are stores Default value ‘evaluation_setup’
- evaluation_setup_file_extensionstr
Setup file extension Default value ‘txt’
- meta_filenamestr
Filename to be used for main meta file (contains all files with their reference data) of the dataset Default value ‘meta.txt’
- error_meta_filenamestr
Filename for the error annotation file Default value ‘error.txt’
- filelisthash_filenamestr
Filename for filelist hash file Default value ‘filelist.python.hash’
- filelisthash_exclude_dirsstr
Directories to be excluded from filelist hash calculation Default value None
- crossvalidation_foldsint
Count fo cross-validation folds. Indexing starts from one. Default value None
- package_listlist of dict
Package list, remote files associated to the dataset. Item format: {
‘content_type’: ‘documentation’, # Possible values [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’,
- ‘documentation’, ‘examples’]
‘remote_file’: ‘https://zenodo.org/record/45759/files/TUT-sound-events-2016-development.doc.zip’, # URL ‘remote_bytes’: 70918, # Size of remote file in bytes ‘remote_md5’: ‘33fd26a895530aef607a07b08704eacd’, # MD5 hash of remote file ‘filename’: ‘TUT-sound-events-2016-development.doc.zip’, # Filename relative to self.local_path always
} Default value None
- package_extract_parametersdict
Extra parameters for package extraction. Default value None
- included_content_typeslist of str or str
Indicates what content type should be processed. One or multiple from [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’, ‘documentation’, ‘examples’]. If None given, [‘all’] is used. Parameter can be also comma separated string. Default value None
- audio_pathslist of str
List of paths to include audio material associated to the dataset. If None given, [‘audio’] is used. Default value None
- default_audio_extensionstr
Default audio extension Default value ‘wav’
- reference_data_presentbool
Reference data is delivered with the dataset Default value True
- check_metabool
Check meta data during the initilization. Default value True
- active_scenes: list of str
List of active scene classes, if none given all classes are considered. Default value None
- active_events: list of str
List of active event classes, if none given all classes are considered. Default value None
- __init__(name='dataset', storage_name='dataset', data_path=None, local_path=None, show_progress_in_console=True, log_system_progress=True, use_ascii_progress_bar=True, dataset_group='base class', dataset_meta=None, evaluation_setup_folder='evaluation_setup', evaluation_setup_file_extension='txt', meta_filename='meta.txt', error_meta_filename='error.txt', filelisthash_filename='filelist.python.hash', filelisthash_exclude_dirs=None, crossvalidation_folds=None, package_list=None, package_extract_parameters=None, included_content_types=None, audio_paths=None, default_audio_extension='wav', reference_data_present=True, check_meta=True, active_scenes=None, active_events=None, **kwargs)[source]
Constructor
- Parameters
- namestr
Dataset name Default value ‘dataset’
- storage_namestr
Name to be used when storing dataset on disk Default value ‘dataset’
- data_pathstr
Root path where the dataset is stored. If None, os.path.join(tempfile.gettempdir(), ‘dcase_util_datasets’) is used Default value None
- local_pathstr
Direct storage path setup for the dataset. If None, data_path and storage_name are used to create one Default value None
- show_progress_in_consolebool
Show progress in console Default value True
- log_system_progressbool
Show progress in log Default value True
- use_ascii_progress_barbool
Show progress bar using ASCII characters. Use this if your console does not support UTF-8 characters. Default value True
- dataset_groupstr
Dataset group label, one of [‘scene’, ‘event’, ‘tag’] Default value ‘base class’
- dataset_metadict
Dictionary containing metadata about the dataset, e.g., collecting device information, dataset authors. Default value None
- evaluation_setup_folderstr
Directory name where evaluation setup files are stores Default value ‘evaluation_setup’
- evaluation_setup_file_extensionstr
Setup file extension Default value ‘txt’
- meta_filenamestr
Filename to be used for main meta file (contains all files with their reference data) of the dataset Default value ‘meta.txt’
- error_meta_filenamestr
Filename for the error annotation file Default value ‘error.txt’
- filelisthash_filenamestr
Filename for filelist hash file Default value ‘filelist.python.hash’
- filelisthash_exclude_dirsstr
Directories to be excluded from filelist hash calculation Default value None
- crossvalidation_foldsint
Count fo cross-validation folds. Indexing starts from one. Default value None
- package_listlist of dict
Package list, remote files associated to the dataset. Item format: {
‘content_type’: ‘documentation’, # Possible values [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’,
- ‘documentation’, ‘examples’]
‘remote_file’: ‘https://zenodo.org/record/45759/files/TUT-sound-events-2016-development.doc.zip’, # URL ‘remote_bytes’: 70918, # Size of remote file in bytes ‘remote_md5’: ‘33fd26a895530aef607a07b08704eacd’, # MD5 hash of remote file ‘filename’: ‘TUT-sound-events-2016-development.doc.zip’, # Filename relative to self.local_path always
} Default value None
- package_extract_parametersdict
Extra parameters for package extraction. Default value None
- included_content_typeslist of str or str
Indicates what content type should be processed. One or multiple from [‘all’, ‘audio’, ‘video’, ‘features’, ‘meta’, ‘code’, ‘documentation’, ‘examples’]. If None given, [‘all’] is used. Parameter can be also comma separated string. Default value None
- audio_pathslist of str
List of paths to include audio material associated to the dataset. If None given, [‘audio’] is used. Default value None
- default_audio_extensionstr
Default audio extension Default value ‘wav’
- reference_data_presentbool
Reference data is delivered with the dataset Default value True
- check_metabool
Check meta data during the initilization. Default value True
- active_scenes: list of str
List of active scene classes, if none given all classes are considered. Default value None
- active_events: list of str
List of active event classes, if none given all classes are considered. Default value None
Methods
__init__
([name, storage_name, data_path, ...])Constructor
Converts absolute path into relative path.
Generates hash from file list and check does it matches with one saved in filelist.hash.
check_metadata
()Checking meta data and cross-validation setup.
Total download size of the dataset in bytes.
Total size of the dataset currently stored locally.
Total download size of the dataset in a string.
debug_packages
([local_check, remote_check])Debug remote packages associated to the dataset.
download_packages
(**kwargs)Download dataset packages over the internet to the local path
eval
([fold, absolute_paths])List of evaluation items.
eval_files
([fold, absolute_paths])List of evaluation files.
evaluation_setup_filename
([setup_part, ...])Evaluation setup filename generation.
event_label_count
(**kwargs)Number of unique event labels in the meta data.
event_labels
(**kwargs)List of unique event labels in the meta data.
extract_packages
(**kwargs)Extract the dataset packages
file_error_meta
(filename)Error meta data for given file
file_features
(filename)Pre-calculated acoustic features for given file
file_meta
(filename)Meta data for given file
folds
([mode])List of fold ids
Initialize the dataset, download, extract files and prepare the dataset for the usage.
load
()Load dataset meta data and cross-validation sets into the container.
Load cross-validation into the container.
Load meta data into the container.
log
([show_meta])Log dataset information.
prepare
()Prepare dataset for the usage.
process_meta_container
(container)Process meta container.
process_meta_item
(item[, absolute_path])Process single meta data item
Converts relative path into absolute path.
Number of unique scene labels in the meta data.
List of unique scene labels in the meta data.
show
([mode, indent, show_meta])Show dataset information.
Number of unique audio tags in the meta data.
tags
()List of unique audio tags in the meta data.
test
([fold, absolute_paths])List of testing items.
test_files
([fold, absolute_paths])List of testing files.
train
([fold, absolute_paths])List of training items.
train_files
([fold, absolute_paths])List of training files.
validation_files_balanced
([fold, ...])List of validation files randomly selecting while maintaining data balance.
validation_files_dataset
([fold])List of validation files delivered by the dataset.
validation_files_random
([fold, ...])List of validation files selected randomly from the training material.
validation_split
([fold, training_meta, ...])List of validation files.
Attributes
Get number of audio files in dataset
Get all audio files in the dataset
Get audio error meta data for dataset.
Number of error meta data items.
Number of fold in the evaluation setup.
logger
Get meta data for dataset.
Number of meta data items.