Musical Structure and Segmentation Demo
Music Information Retrieval - MUSS1112 |
November 14, 2022 |
Martin Hartmann - martin.hartmann@jyu.fi |
1 Introduction
In this demo we will have a look at two segmentation functions that are available in MIDI toolbox, and we will test them using a large collection of songs. We will also use an optimization algorithm to try to find an optimal parameter for applying a threshold to the results obtained to one of the segmentation functions.
2 What is MIDI Toolbox?
MIDI Toolbox provides a set of Matlab functions, which together have all the necessary machinery to analyze and visualize MIDI data. The development of the Toolbox has been part of ongoing research involved in topics relating to musical data mining, modelling music perception and decomposing the data for and from perceptual experiments.
3 Downloading MIDI Toolbox
The latest version of MIDI toolbox can be downloaded from this website: https://github.com/miditoolbox/
4 Notematrix (NMAT) representation
MIDI files read into MIDI Toolbox are represented as matrices that always contain 7 columns and a number of rows that equals the amount of note events in the MIDI file. The column labels are as follows:
ONSET (BEATS) | DURATION (BEATS) | MIDI channel | MIDI PITCH | VELOCITY | ONSET (SEC) | DURATION (SEC) |
---|---|---|---|---|---|---|
… | … | … | … | … | … | … |
… | … | … | … | … | … | … |
The first column indicates the onset of the notes in beats (based on pulses per quarter note) and the second column the duration of the notes in these same beat-values. The third column denotes the MIDI channel (1-16), and the fourth the MIDI pitch, where middle C (C4) is 60. The fifth column is the velocity describing how fast the key of the note is pressed, in other words, how loud the note is played (0-127). The last two columns correspond to the first two (onset in beats, duration in beats) except that seconds are used instead of beats.
Often one wants to refer only to pitch or duration values in the notematrix. For clarity and convenience, these columns may be called by few basic selector functions that refer to each specific column only. These are onset
(either 'beat'
or 'sec'
, the former is the default), dur
(either 'beat'
or 'sec'
), channel
, pitch
, and velocity
.
5 Starting up
We will use MIDI toolbox in this demo, so please install it and add it to the MATLAB path if it is not there yet, e.g.:
addpath(genpath('~/Downloads/miditoolbox'))
If miditoolbox is in the path, you should be able to run this command and have a look at the functions in MIDI toolbox:
help miditoolbox
A better to ensure that things are working is to read in a midi file that is included with MIDI toolbox. We can create a waveform using FM synthesis and listen to the result:
a = readmidi('laksin.mid') playsound(a)
We can also use shepard tones instead of FM synthesis:
playsound(a,'shepard')
6 The Essen collection
The Essen folksong collection (http://essen.themefinder.org) consists of 9,354 folk songs (Schaffrath, 1995) which have been annotated based on perceived musical grouping structures. We will work with a smaller subset in NMAT format (6236 songs), which is available here. Please download essen.mat and then load the .mat file:
load('~/Downloads/essen.mat')
7 Having a look at the Essen collection .mat file
This workspace contains several variables. nm
is a cell array containing the note matrices of each song. Because it is a cell, we access its elements using key brackets {}
. For instance, let’s have a look at the notematrix for the 7th song:
>> nm{7} ans = 0 2.0000 1.0000 65.0000 100.0000 0 1.0000 2.0000 1.0000 1.0000 60.0000 100.0000 1.0000 0.5000 3.0000 1.0000 1.0000 65.0000 100.0000 1.5000 0.5000 4.0000 1.0000 1.0000 65.0000 100.0000 2.0000 0.5000 5.0000 1.0000 1.0000 65.0000 100.0000 2.5000 0.5000 6.0000 1.0000 1.0000 67.0000 100.0000 3.0000 0.5000 7.0000 1.0000 1.0000 69.0000 100.0000 3.5000 0.5000 8.0000 2.0000 1.0000 65.0000 100.0000 4.0000 1.0000 10.0000 2.0000 1.0000 69.0000 100.0000 5.0000 1.0000 12.0000 1.0000 1.0000 70.0000 100.0000 6.0000 0.5000 13.0000 0.5000 1.0000 69.0000 100.0000 6.5000 0.2500 13.5000 0.5000 1.0000 67.0000 100.0000 6.7500 0.2500 14.0000 1.0000 1.0000 67.0000 100.0000 7.0000 0.5000 15.0000 1.0000 1.0000 69.0000 100.0000 7.5000 0.5000 16.0000 1.0000 1.0000 65.0000 100.0000 8.0000 0.5000 17.0000 1.0000 1.0000 69.0000 100.0000 8.5000 0.5000 18.0000 2.0000 1.0000 67.0000 100.0000 9.0000 1.0000 21.0000 1.0000 1.0000 65.0000 100.0000 10.5000 0.5000 22.0000 1.0000 1.0000 65.0000 100.0000 11.0000 0.5000 23.0000 1.0000 1.0000 69.0000 100.0000 11.5000 0.5000 24.0000 2.0000 1.0000 72.0000 100.0000 12.0000 1.0000 26.0000 1.0000 1.0000 65.0000 100.0000 13.0000 0.5000 27.0000 1.0000 1.0000 67.0000 100.0000 13.5000 0.5000 28.0000 1.0000 1.0000 69.0000 100.0000 14.0000 0.5000 29.0000 1.0000 1.0000 72.0000 100.0000 14.5000 0.5000 30.0000 0.5000 1.0000 69.0000 100.0000 15.0000 0.2500 30.5000 1.5000 1.0000 65.0000 100.0000 15.2500 0.7500 33.0000 1.0000 1.0000 60.0000 100.0000 16.5000 0.5000 34.0000 1.0000 1.0000 65.0000 100.0000 17.0000 0.5000 35.0000 1.0000 1.0000 69.0000 100.0000 17.5000 0.5000 36.0000 2.0000 1.0000 72.0000 100.0000 18.0000 1.0000 38.0000 2.0000 1.0000 69.0000 100.0000 19.0000 1.0000 40.0000 2.0000 1.0000 67.0000 100.0000 20.0000 1.0000 42.0000 2.0000 1.0000 65.0000 100.0000 21.0000 1.0000
- What is the duration of the last note of this song?
There are also variables called name
and keyword
, which contains the filename and name of each song, respectively.
- What is the filename of song 3925?
keyword
is a character (char) array, so it is like a 2D matrix with characters instead of numbers. For instance, to access the information of song 2000, we need to look for all columns in row 2000:
keyword(2000,:)
version
is also a character array containing information about the .mat file in Finnish.
- What type of information is contained in the variable
met
and how is it represented?
For the following, we will make use of information contained in phrase
, subcollection
, genre
, and nm
. phrase
tells us, for each piece, what note numbers correspond with annotated boundaries. subcollection
and genre
can be used to identify the genre of each piece.
8 Visualization
Let’s have a look at the piano roll representation of the first song in the collection:
pianoroll(nm{1})
- What is the pitch class of song 5689?
We can use this function based on pianoroll to visualize the annotated boundaries. Please paste the following code into a new file and save it as visualboundaries.m. It can be saved in your current folder or in the miditoolbox folder (or in any other folder, as long as it belongs to the MATLAB path).
function fig = visualboundaries(nmat,b,gt) % VISUALBOUNDARIES - Visualize a pianoroll with predicted and actual boundaries % FIG = VISUALBOUNDARIES(NMAT,B,GT) % % Input arguments: % NMAT = a note matrix % B = an index vector or a logical vector corresponding to predicted boundaries (such as obtained from a segmentation function such as segmentgestalt.m) or actual boundaries (i.e., manual annotations). B can also be a boundary profile, but in this case it will be interpolated to fit the number of beats of the notematrix. % GT = an index vector or a logical vector for a second point process, such as a ground truth that can be compared against the boundaries in B. GT can be a boundary profile, but it is interpolated to fit the number of beats of the notematrix. (OPTIONAL ARGUMENT) % % Outputs: % A pianoroll with predicted boundary indications, which are plotted as dotted black lines, and actual boundary indications, plotted as magenta dashed lines. % % NOTE: the number of points of B and GT should be equal to the number of rows of the NMAT. if nargin == 3, ground = 1; else, ground = 0; end; if islogical(b) b = find(b == 1); end on=onset(nmat); minimi = min(pitch(nmat)); maksimi = max(pitch(nmat)); pianoroll(nmat) % using only 'sec' as option set(gca,'Xgrid','off'); hold on if ~mod(b,1) % if integer for k=1:length(b) plot([on(b(k)) on(b(k))],[minimi-2 maksimi+2],'k:','LineWidth',2); end else ylim = get(gca,'YLim'); xlim = get(gca,'XLim'); if ~isrow(b) b = b'; end bmap = mapminmax(b,ylim(1),ylim(2)); x = 1:numel(bmap); xp = linspace(x(1), x(end), xlim(end)-xlim(1)); bimap = interp1(x,bmap,xp); plot(bimap,'k:','LineWidth',2) end if ground if islogical(gt) gt = find(gt == 1); end if ~mod(gt,1) for k=1:length(gt) plot([on(gt(k)) on(gt(k))],[minimi-2 maksimi+2],'m--','LineWidth',2); end else ylim = get(gca,'YLim'); xlim = get(gca,'XLim'); if ~isrow(gt) gt = gt'; end gtmap = mapminmax(gt,ylim(1),ylim(2)); x = 1:numel(gtmap); xp = linspace(x(1), x(end), xlim(end)-xlim(1)); gtimap = interp1(x,gtmap,xp); plot(gtimap,'m--','LineWidth',2) end end hold off
Let’s now visualize the annotated boundaries for song 600 on top of its pianoroll, for instance.
visualboundaries(nm{600},phrase{600})
Listen to song 600. Would you agree with these boundary indications?
For a better exploration of the data, it is easier to write things like this:
k = 600; visualboundaries(nm{k},phrase{k})
Now you can just change the value of k to have a look at different songs. Or we can use a for
loop to visualize songs 10 to 15. We need to add figure
to create a new figure at each iteration of the loop.
for k = 10:15 figure visualboundaries(nm{k},phrase{k}) end
- You can add a title to each figure that is generated in your loop. For instance, a title can be used to inform us what song number is each figure about. You can add the title after the call to visualboundaries. Add
title(sprintf('Song %d', k))
to yourfor
loop and run it again.
If you want to close all your figures at the same time, you can type close all
.
9 Boundary prediction
MIDI toolbox includes three algorithms for boundary prediction, segmentgestalt, segmentprob, and boundary. You can read more about them in the MIDI toolbox documentation (pp. 32-33), which should be located in ~/Downloads/miditoolbox/documentation/
. We will focus on segmentgestalt and boundary.
Let’s have a look at how does segmentgestalt segment song 600. Let’s focus on its first output, which are the locations of each clang. We can store the result in sg{k}
so we know that we refer to song 600:
k = 600; sg{k} = segmentgestalt(nm{k})
Let’s note that sg{k}
is a binary vector. Each element of the vector is a note, and notes that correspond to boundaries are indicated with ones.
sg{k}
We can visually compare this prediction with the ground truth:
visualboundaries(nm{k},sg{k},phrase{k})
In this figure, predicted boundary indications are plotted as dotted black lines, whereas actual boundary indications are plotted as magenta dashed lines.
The boundaries look correct, but in the binary vector they appear one note event late. So they need to be circularly shifted down by 1.
sg{k} = circshift(sg{k},-1)
So we can fix this function. Please edit your segmentgestalt.m file by typing:
edit segmentgestalt.m
Now, add the following to the end of the function to shift the binary vectors of clangs and segments down by 1:
c = circshift(c,-1); s = circshift(s,-1);
10 Performance statistics
We can use this function to estimate the performance of different segmentation algorithms. Please paste this code into a new file and save it as perfstats.m.
function [precision recall f_measure] = perfstats(ppi,gt) % PERFSTATS - Obtain perfomance statistics (precision, recall, and f-measure) from point process data given a ground truth % [PRECISION RECALL F_MEASURE] = PERFSTATS(PPI,GT) % % Input arguments: % PPI = an index vector corresponding to the predicted point process % GT = an index vector for the actual point process or ground truth % % Output: % PRECISION = hits divided by total number of estimated elements % RECALL = hits divided by total number of annotations % F_MEASURE = harmonic mean between precision and recall % % NOTE: Any NaNs included in the ground truth data will be interpreted as false negatives in the evaluation pp_bin = zeros(max([max(gt) max(ppi)]),1); gt_bin = pp_bin; pp_bin(ppi) = 1; gt_bin(gt(~isnan(gt))) = 1; % add zeros to predicted boundaries and ones to ground truth if ground truth contained NaNs if sum(isnan(gt)) > 0, pp_bin = [pp_bin; zeros(sum(isnan(gt)),1)]; gt_bin = [gt_bin; ones(sum(isnan(gt)),1)]; end hits = numel(nonzeros(pp_bin & gt_bin)); num_estimated = numel(nonzeros(pp_bin)); num_annotations = numel(nonzeros(gt_bin)); precision = hits / num_estimated; recall = hits / num_annotations; f_measure = 2 * (precision * recall / (precision + recall));
Let’s first create a variable with genre labels, which will use later. We will also remove some songs that have incomplete data.
genrelabels = { 'ALTDEU' 'BALLADE' 'BOEHME' 'DVA' 'ERK1' 'ERK2' 'ERK3' 'ERK5' 'FINK' 'KINDER' 'VARIANT' 'ZUCCAL' 'CHINA' 'ZHEN' 'HUANG' 'SENGANSU' 'PELOG1' 'MEXICO' 'OTHER' }; emptyCells = cellfun(@isempty,phrase); phrase(emptyCells) = []; nm(emptyCells) = []; genre(emptyCells) = []; phrase(find(genre == 0)) = []; nm(find(genre == 0)) = []; genre(find(genre == 0)) = [];
Now, let’s try running perfstats with our prediction and the ground truth. The function will give us three values (precision, recall, and f-measure). Each value goes to a different output. Let’s call these outputs p(k)
, r(k)
, and f(k)
, once again, to indicate that we are obtaining results for song 600.
[p(k) r(k) f(k)] = perfstats(sg{k}, phrase{k})
However, we get an error, because perfstats only accepts data that is formatted like phrase{k}
(note onset numbers); it does not understand the binary format of sg{k}
. So we need to find the vector indices of sg{k}
that correspond to boundaries (that is, that are ones):
sg_indices{k} = find(sg{k} == 1) [p(k) r(k) f(k)] = perfstats(sg_indices{k}, phrase{k})
- We could add an
if
statement to the perfstats function: If the content of any of the inputs is just ones and zeros, then let’s find the indices of those elements that are equal to one. Please modify the perfstats function accordingly to avoid errors when using binary input(s). This workaround has been implemented in the visualboundaries function, so it will be helpful to have a look at it.
We can go a step further and try to obtain results for the whole collection using a for
loop. We need to tell matlab how many notematrices we have (that is, how many elements we have in the nm
variable). We can use the built-in numel
function for this.
for k = 1:numel(nm) sg{k} = segmentgestalt(nm{k}); sg_indices{k} = find(sg{k} == 1); [p(k) r(k) f(k)] = perfstats(sg_indices{k}, phrase{k}); end
Then we can average the results that are not NaN
:
mean_precision_across_songs = mean(p(~isnan(p))) mean_recall_across_songs = mean(r(~isnan(r))) mean_f_measure_across_songs = mean(f(~isnan(f)))
And get the performance measures for each class:
pg = accumarray(genre,p,[],@(x) mean(x(~isnan(x)))); rg = accumarray(genre,r,[],@(x) mean(x(~isnan(x)))); fg = accumarray(genre,f,[],@(x) mean(x(~isnan(x)))); musical_genre = genrelabels(unique(genre)); precision = pg(unique(genre)); recall = rg(unique(genre)); f_measure = fg(unique(genre)); number_of_songs = histc(genre,unique(genre)); disp(table(mean_precision_across_songs,mean_recall_across_songs, mean_f_measure_across_songs)) disp(table(musical_genre,precision,recall,f_measure,number_of_songs))
The other algorithm, boundary, gives us a different kind of information regarding boundary locations. Instead of a vector of zeros and ones, we get a boundary profile:
clear sg k = 1; sg{k} = boundary(nm{k},'fig'); sg{k}
In order to evaluate this model, we can pick peaks over a given threshold. So instead of choosing indices for values equal to 1, we will chose indices for values greater than a given threshold thres
, such as .1:
thres = .1; for k = 1:numel(nm) sg{k} = boundary(nm{k}); sg_indices{k} = find(sg{k} > thres); [p(k) r(k) f(k)] = perfstats(sg_indices{k}, phrase{k}); end
The results depend on the threshold that we use. But how can we find an optimal threshold? Moving on to the next section of this demo…
11 Optimization
To find a threshold value that gives us the best results, we can either try a few thresholds (say .25, .5, and .75) or use optimization to (hopefully) find the threshold that maximizes the performance of the algorithm. We will try using optimization, for which we need to make an objective function (also called cost function). The following objective function, optimize, receives a threshold value as an input and outputs the negative of the F-measure (by convention, optimization algorithms minimize an objective function, so that’s why we are interested in the negative of the performance) for the boundary algorithm. So we have one design variable (threshold value) and one objective (to minimize the negative of the F-measure).
function res = optimize(input) thres = input; load ('~/Downloads/essen.mat'); genrelabels = { 'ALTDEU' 'BALLADE' 'BOEHME' 'DVA' 'ERK1' 'ERK2' 'ERK3' 'ERK5' 'FINK' 'KINDER' 'VARIANT' 'ZUCCAL' 'CHINA' 'ZHEN' 'HUANG' 'SENGANSU' 'PELOG1' 'MEXICO' 'OTHER' }; emptyCells = cellfun(@isempty,phrase); phrase(emptyCells) = []; nm(emptyCells) = []; genre(emptyCells) = []; phrase(find(genre == 0)) = []; nm(find(genre == 0)) = []; genre(find(genre == 0)) = []; % segment for k = 1:numel(nm) b{k} = boundary(nm{k}); b_thres{k} = find(b{k} > thres); [p(k) r(k) f(k)] = perfstats(b_thres{k},phrase{k}); end neg_mean_precision_across_songs = -mean(p(~isnan(p))); neg_mean_recall_across_songs = -mean(r(~isnan(r))); neg_mean_f_measure_across_songs = -mean(f(~isnan(f))); res = neg_mean_f_measure_across_songs; disp(sprintf('Threshold = %0.5g, F-measure = %0.5g', thres ,-res))
Please paste this code into a new file and save it as optimize.m.
Now we can use the following script to run the optimization. Note that the threshold to be used should be higher than 0 and lower than 1, so we need to use an optimization algorithm that allows us to set lower and upper bounds. This is possible to do with constrained optimization algorithms such as fminbnd. This function may fail to give us a global solution but we can run it many times to see if it always converges with the same solution.
tic nVars = 1; lb = .1; % threshold should not exceed .1 ub = 1; % threshold should not exceed 1 fun = @optimize; [x fval]= fminbnd(fun,lb,ub); time = toc
- What seems to be the optimal threshold value for boundary?
It is possible to create a scatter plot to visualize what is the threshold value and its performance at each point of the optimization. For that, we need a plotting function.
Please paste this code into a new file and save it as OPTIMPLOTXFVAL.m.
function stop = optimplotxfval(x,optimValues,state,varargin) % OPTIMPLOTXFVAL Plotting function based on the built-in Matlab function OPTIMPLOTFVAL. % Plot current point and negative of the value of the objective function at each iteration. stop = false; switch state case 'iter' plotscalar(x,optimValues.fval,optimValues.iteration); end function plotscalar(x,fval,iteration) % PLOTSCALAR initializes or updates a plot of the function value % at each iteration. if iteration == 0 plotfval = scatter(x,-fval,'kd','MarkerFaceColor',[1 0 1]); title(getString(message('MATLAB:optimfun:funfun:optimplots:TitleCurrentFunctionValue',sprintf('%g',fval))),'interp','none'); xlabel('Threshold'); set(plotfval,'Tag','optimplotfval'); ylabel('Performance'); else plotfval = findobj(get(gca,'Children'),'Tag','optimplotfval'); newX = [get(plotfval,'Xdata') x]; newY = [get(plotfval,'Ydata') -fval]; set(plotfval,'Xdata',newX, 'Ydata',newY); set(get(gca,'Title'),'String',getString(message('MATLAB:optimfun:funfun:optimplots:TitleCurrentFunctionValue',sprintf('%g',-fval)))); end
Now we can add the plot function to our optimization script:
tic nVars = 1; lb = .1; % threshold should not exceed .1 ub = 1; % threshold should not exceed 1 fun = @optimize; options = optimset('Display','iter','PlotFcns',@optimplotxfval); [x fval]= fminbnd(fun,lb,ub,options); time = toc
- Which of these two segmentation algorithms in MIDI toolbox (segmentgestalt, boundary) seems to yield optimal results for the Essen collection?