Data processing
Data manipulation
There is a few different utilities to manipulate data:
dcase_util.data.Normalizer
, calculating normalization factors and normalizing data.dcase_util.data.RepositoryNormalizer
, normalizing data repositories at once.dcase_util.data.Aggregator
, aggregating data inside sliding processing window.dcase_util.data.Sequencer
, sequencing data matrices.dcase_util.data.Stacker
, stacking data matrices based on given vector recipe.dcase_util.data.Selector
, selecting data segments of data based on events with onset and offset.dcase_util.data.Masker
, masking data segments of data based on events with onset and offset.
Normalization
dcase_util.data.Normalizer
class can be used for calculating normalization factors (mean and standard deviation) for the data without reading all the data in at once. Intermediate statistics are accumulated while reading data in in small portions.
Calculating normalization factors file-by-file:
data = dcase_util.utils.Example.feature_container()
# Initialize normalizer
normalizer = dcase_util.data.Normalizer()
# Accumulate -- feed data per file in
normalizer.accumulate(data=data)
# After accumulation calculate normalization factors (mean + std)
normalizer.finalize()
# Save
normalizer.save(filename='norm_factors.cpickle')
# Load
normalizer = dcase_util.data.Normalizer().load(filename='norm_factors.cpickle')
Using with statement:
data = dcase_util.utils.Example.feature_container()
# Accumulate
with dcase_util.data.Normalizer() as normalizer:
normalizer.accumulate(data=data)
# Save
normalizer.save(filename='norm_factors.cpickle')
Initializing normalizer with pre-calculated values:
data = dcase_util.utils.Example.feature_container()
normalizer = dcase_util.data.Normalizer(
**data.stats
)
Normalize data:
data = dcase_util.utils.Example.feature_container()
normalizer = dcase_util.data.Normalizer().load(filename='norm_factors.cpickle')
normalizer.normalize(data)
Original data
(Source code, png, hires.png, pdf)
Normalized data
(Source code, png, hires.png, pdf)
Aggregation
Data aggregator class (dcase_util.data.Aggregator
) can be used to process data matrix in a sliding processing window.
This processing stage can be used to collapse data with certain window lengths by
calculating mean and std of them, or flatten the matrix into single vector.
Supported processing methods:
flatten
mean
std
cov
kurtosis
skew
All of these processing methods can combined.
Calculating mean and standard deviation in 10 frame window, with 1 frame hop:
data = dcase_util.utils.Example.feature_container()
print(data.shape)
# (40, 501)
data_aggregator = dcase_util.data.Aggregator(
recipe=['mean', 'std'],
win_length_frames=10,
hop_length_frames=1,
)
data = data_aggregator.aggregate(data)
print(data.shape)
# (80, 501)
Original data
(Source code, png, hires.png, pdf)
Aggregated data
(Source code, png, hires.png, pdf)
Flattening data matrix with 10 frames into one single vector, with 1 frame hop:
data = dcase_util.utils.Example.feature_container()
print(data.shape)
# (40, 501)
data_aggregator = dcase_util.data.Aggregator(
recipe=['flatten'],
win_length_frames=10,
hop_length_frames=1,
)
data_aggregator.aggregate(data)
print(data.shape)
# (400, 501)
Original data
(Source code, png, hires.png, pdf)
Aggregated data
(Source code, png, hires.png, pdf)
Sequencing
Sequencer class (dcase_util.data.Sequencer
) processes data matrices into sequences (images).
Sequences can overlap, and sequencing grid can be altered between calls (shifted).
Creating sequence:
data = dcase_util.utils.Example.feature_container()
print(data.shape)
# (40, 501)
data_sequencer = dcase_util.data.Sequencer(
sequence_length=10,
hop_length=100
)
sequenced_data = data_sequencer.sequence(data)
print(sequenced_data.shape)
# (40, 10, 5)
sequenced_data.show()
# DataMatrix3DContainer :: Class
# Data
# data : matrix (40,10,5)
# Dimensions
# time_axis : 1
# Timing information
# time_resolution : None
# Meta
# stats : Calculated
# metadata : -
# processing_chain : -
# Duration
# Frames : 10
# Data
# Dimensions
# time_axis : 1
# data_axis : 0
# sequence_axis : 2
Original data
(Source code, png, hires.png, pdf)
Sequenced data
(Source code, png, hires.png, pdf)
Stacking
Stacker class (dcase_util.data.Stacker
) stacks the data stored in the data repository based on recipe. This class can be used, for example, to create a new feature matrix containing data extracted with multiple feature extractors. With a recipe one can either select full matrix, only part of data vectors with start and end index, or select individual data rows.
Example:
# Load data repository
repo = dcase_util.utils.Example.feature_repository()
# Show labels in the repository
print(repo.labels)
# Select full matrix from 'mel' and with default stream (0) (40 mel bands).
data = dcase_util.data.Stacker(recipe='mel').stack(repo)
print(data.shape)
# (40, 501)
# Select full matrix from 'mel' and define stream 0 (40 mel bands).
data = dcase_util.data.Stacker(recipe='mel=0').stack(repo)
print(data.shape)
# (40, 501)
# Select full matrix from 'mel' and 'mfcc' with default stream (0) (40 mel bands + 20 mfccs).
data = dcase_util.data.Stacker(recipe='mel;mfcc').stack(repo)
print(data.shape)
# (60, 501)
# Select data from 'mfcc' matrix with default stream (0), and omit first coefficient (19 mfccs).
data = dcase_util.data.Stacker(recipe='mfcc=1-19').stack(repo)
print(data.shape)
# (19, 501)
# Select data from 'mfcc' matrix with default stream (0), select coefficients 1,5,7 (3 mfccs).
data = dcase_util.data.Stacker(recipe='mfcc=1,5,7').stack(repo)
print(data.shape)
# (3, 501)
Original data
(Source code, png, hires.png, pdf)
Stacked data
Selecting 1st, 5th, and 7th row from the MFCC feature matrix.
(Source code, png, hires.png, pdf)
Data encoding
Data encoders can be used to convert reference metadata into binary matrices.
One-hot
OneHotEncoder class (dcase_util.data.OneHotEncoder
) can used to create binary matrix where single class is active
throughout the signal. This encoder is suitable for multi-class single-label classification applications.
Example:
# Initilize encoder
onehot_encoder = dcase_util.data.OneHotEncoder(
label_list=['class A','class B','class C'],
time_resolution=0.02
)
# Encode
binary_matrix = onehot_encoder.encode(
label='class B',
length_seconds=10.0
)
# Visualize
binary_matrix.plot()
(Source code, png, hires.png, pdf)
Many-hot
ManyHotEncoder class (dcase_util.data.ManyHotEncoder
) can used to create binary matrix where multiple classes are active
throughout the signal. This encoder is suitable for multi-class multi-label classification applications such as audio tagging.
Example:
# Initilize encoder
manyhot_encoder = dcase_util.data.ManyHotEncoder(
label_list=['class A','class B','class C'],
time_resolution=0.02
)
# Encode
binary_matrix = manyhot_encoder.encode(
label_list=['class A', 'class B'],
length_seconds=10.0
)
# Visualize
binary_matrix.plot()
(Source code, png, hires.png, pdf)
Event roll
EventRollEncoder class (dcase_util.data.EventRollEncoder
) can used to create binary matrix where multiple events are active
within specified time segments. This encoder is suitable for event detection applications.
Example:
# Metadata
meta = dcase_util.containers.MetaDataContainer([
{
'filename': 'test1.wav',
'event_label': 'cat',
'onset': 1.0,
'offset': 3.0
},
{
'filename': 'test1.wav',
'event_label': 'dog',
'onset': 2.0,
'offset': 6.0
},
{
'filename': 'test1.wav',
'event_label': 'speech',
'onset': 5.0,
'offset': 8.0
},
])
# Initilize encoder
event_roll_encoder = dcase_util.data.EventRollEncoder(
label_list=meta.unique_event_labels,
time_resolution=0.02
)
# Encode
event_roll = event_roll_encoder.encode(
metadata_container=meta,
length_seconds=10.0
)
# Visualize
event_roll.plot()
(Source code, png, hires.png, pdf)
Probability encoding
ProbabilityEncoder class (dcase_util.data.ProbabilityEncoder
) can used to process 2D data matrix (class, time) with probabilities.
Collapsing matrix over time axis to vector with per class values:
p = dcase_util.data.ProbabilityEncoder()
probabilities = numpy.array(
[
[0.2, 0.3, 0.1],
[0.4, 0.6, 0.7]
]
)
out = p.collapse_probabilities(
probabilities=probabilities,
time_axis=1,
operator='prod'
)
print(out)
# [0.006 0.168]
Collapsing data in the matrix with sliding window over time axis:
probabilities = numpy.array(
[
[0.1, 0.1, 0.1],
[0.2, 0.2, 0.2],
[0.1, 0.3, 0.3],
[0.1, 0.1, 0.1],
]
)
out = p.collapse_probabilities_windowed(
probabilities=probabilities,
window_length=2,
time_axis=1
)
print(out)
# [[0.2 0.1 0.1]
# [0.4 0.2 0.2]
# [0.4 0.3 0.3]
# [0.2 0.1 0.1]]
Binarizing probabilities in the matrix with global threshold:
probabilities = numpy.array(
[
[0.1, 0.5, 0.1],
[0.2, 0.2, 0.2],
[0.1, 0.6, 0.7],
[0.1, 0.6, 0.6],
]
)
out = p.binarization(
probabilities=probabilities,
binarization_type='global_threshold',
threshold=0.5,
time_axis=1
)
print(out)
# [[0 1 0]
# [0 0 0]
# [0 1 1]
# [0 1 1]]
Binarizing probabilities in the matrix with class wise thresholds:
probabilities = numpy.array(
[
[0.1, 0.5, 0.1],
[0.2, 0.2, 0.2],
[0.1, 0.6, 0.7],
[0.1, 0.6, 0.6],
]
)
out = p.binarization(
probabilities=probabilities,
binarization_type='class_threshold',
threshold=[0.5, 0.2, 0.1, 0.4],
time_axis=1
)
print(out)
# [[0 1 0]
# [1 1 1]
# [1 1 1]
# [0 1 1]]
Binarizing probabilities in the matrix with frame wise max:
probabilities = numpy.array(
[
[0.1, 0.5, 0.1],
[0.2, 0.2, 0.2],
[0.1, 0.6, 0.7],
[0.1, 0.6, 0.6],
]
)
out = p.binarization(
probabilities=probabilities,
binarization_type='frame_max',
time_axis=1
)
print(out)
# [[0 0 0]
# [1 0 0]
# [0 1 1]
# [0 1 0]]
Decision encoding
DecisionEncoder class (dcase_util.data.DecisionEncoder
) can used to process binary 2D data matrix (class, time)
with frame wise activity.
Majority vote:
d = dcase_util.data.DecisionEncoder(label_list=['A', 'B', 'C'])
activity_matrix = numpy.array([
[0, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 1],
[1, 0, 0, 1, 0, 0]
])
out = d.majority_vote(
frame_decisions=activity_matrix,
time_axis=1
)
print(out)
# B
Many hot encoding:
d = dcase_util.data.DecisionEncoder(label_list=['A', 'B', 'C'])
activity_matrix = numpy.array([
[1, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 1],
[1, 0, 0, 1, 1, 0]
])
out = d.many_hot(
frame_decisions=activity_matrix,
time_axis=1
)
print(out)
# [['A', 'C'], ['B'], ['B'], ['A', 'C'], ['A', 'C'], ['B']]
Translating activity array into start and end index pairs:
activity_array = numpy.array([1, 1, 1, 0, 0, 1, 1, 0, 1])
d = dcase_util.data.DecisionEncoder()
out = d.find_contiguous_regions(
activity_array=activity_array
)
print(out)
# [[0 3]
# [5 7]
# [8 9]]
Filter activity matrix with median filter:
activity_matrix = numpy.array([
[0, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 1, 1],
[1, 0, 0, 1, 0, 0]
])
d = dcase_util.data.DecisionEncoder()
out = d.process_activity(
activity_matrix=activity_matrix,
window_length=3,
time_axis=1
)
print(out)
# [[0 0 0 1 1 0]
# [0 1 1 1 1 1]
# [0 0 0 0 0 0]]