Review: Music and Probability (ch. 1-3)

January 24, 2007

This is my initial summary of chapters 1-3 of David Temperley’s new book. More to come as I read through the book!Music and Probability presents a collection of Temperley’s new Bayesian models for both monophonic and polyphonic rhythm detection and key detection from symbolic (MIDI-like) data. It also reviews several other Bayesian approaches to various problems in music perception. The first two chapters provide some motivation for the Bayesian approach and a basic review of conditional probability and cross-entropy.

Monophonic rhythm model

The first new model presented is for the detection of rhythm in monophonic data. The data consist of note onsets and off times. Pitch is also available in the dataset, but I believe this model does not use pitch information in any way. The output of the model is a metrical grid that has been aligned with the input data. The grid provides three levels of metric information: the mid-level tactus (corresponding, say, to quarter notes in 3/4 time), a higher-level (the measure level in this example) and a lower level (eighth notes). Either level can have a duple or triple relation to neighboring levels. Additionally, the model supports the “phase” of the upper level with respect to the tactus (there can be tactus pickup notes).

Input to the model is in terms of imprecise (expressive) human performance. Without any other information such as a score, the model determines the most likely metrical grid to fit the data. In essence, the model simply computes the most likely metrical structure given the surface-level performance, using Bayes’ rule: P(structure|surface) is proportional to P(surface|structure) * P(structure). In terms of his model, the goal is to find the metrical grid that maximizes P(grid|onset pattern). This is computed by maximizing P(onset pattern|grid) * P(grid).

These two probability expressions are computed using a generative model for metrical grids and onset patterns based on such grids. For instance, the model starts by making a probabilistic choice of duple vs. triple meter at the tactus level. These probabilities are derived from a training corpus (based on the Essen folksong collection). For example, there is a 76% chance of choosing duple meter (and 24% to choose triple). All these choice points in the generative model yield probabilities that can be multiplied together to give the overall probability of generating a particular metrical grid, and also for generating the given onset pattern based on the generated grid. (i.e. the data likelihood is computed given the grid). This likelihood is computed over all possible grids — a seemingly intractable problem that is naturally solved via dynamic programming.

Several paragraphs at the end of the chapter point out differences between this model and Raphael (2002). Results are given that show how Temperley’s earlier meter model (1999) performed better than this new Bayesian model, although he argues that this is not grounds for dismissing the model. As an aside, Temperley presents a useful metric for comparing results of rhythm models.


One comment

  1. The problem that I have with expectation-based or probability-based theories of music, is that, in much of the music that people listen to, the probability of the occurrence of the next note is always 100%.

    Uncertainty about what note is coming next may have some effect on our appreciation of music, but such uncertainty is neither necessary nor sufficient in order to achieve musicality.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: