dcase_util.features.OpenL3Extractor

class dcase_util.features.OpenL3Extractor(fs=48000, hop_length_samples=None, hop_length_seconds=0.02, model=None, input_repr='mel256', content_type='music', embedding_size=6144, center=True, batch_size=32, verbose=False, **kwargs)[source]

OpenL3 Embedding extractor class

Constructor

Parameters
fsint

Sampling rate of the incoming signal. If not 48kHz audio will be resampled. Default value 48000

hop_length_samplesint

Hop length in samples. Default value None

hop_length_secondsfloat

Hop length in seconds. Default value 0.02

modelkeras.models.Model or None

Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size. Default value None

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for model. Ignored if model is a valid Keras model. Default value “mel256”

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model. Default value “music”

embedding_size6144 or 512

Embedding dimensionality. Ignored if model is a valid Keras model. Default value 6144

centerbool

If True, pads beginning of signal so timestamps correspond to center of window. Default value True

batch_sizeint

Batch size used for input to embedding model Default value 32

verbosebool

If True, prints verbose messages. Default value False

__init__(fs=48000, hop_length_samples=None, hop_length_seconds=0.02, model=None, input_repr='mel256', content_type='music', embedding_size=6144, center=True, batch_size=32, verbose=False, **kwargs)[source]

Constructor

Parameters
fsint

Sampling rate of the incoming signal. If not 48kHz audio will be resampled. Default value 48000

hop_length_samplesint

Hop length in samples. Default value None

hop_length_secondsfloat

Hop length in seconds. Default value 0.02

modelkeras.models.Model or None

Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size. Default value None

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for model. Ignored if model is a valid Keras model. Default value “mel256”

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model. Default value “music”

embedding_size6144 or 512

Embedding dimensionality. Ignored if model is a valid Keras model. Default value 6144

centerbool

If True, pads beginning of signal so timestamps correspond to center of window. Default value True

batch_sizeint

Batch size used for input to embedding model Default value 32

verbosebool

If True, prints verbose messages. Default value False

Methods

__init__([fs, hop_length_samples, ...])

Constructor

extract(y)

Extract features for the audio signal.

log([level])

Log container content

show([mode, indent, visualize])

Print container content

to_html([indent])

Get container information in a HTML formatted string

to_string([ui, indent])

Get container information in a string

Attributes

description

Extractor description

label

Extractor label

logger

Logger instance