I submitted my written thesis in early August 2008. Rather than continue to hold off on posting it for IP reasons, I’m going to opt for the “open is better” attitude and just post it. I hope that this will encourage people to chat with me about where this research stands, where it’s going, and how they can get involved.
If this is your first visit to this site, I’ll briefly catch you up: This is a blog for my Masters thesis at the MIT Media Lab, on a music browser called MusicBox. In this blog, you can find screenshots and videos of the project, which show the interactive maps that MusicBox creates.
I intend to continue developing MusicBox in the coming year, but will likely not update this blog with the developments, since the project as a thesis is complete. Once I decide where I will post updates, I will add a link here. (The new place will have a more complete set of screenshots and videos.)
If you would like to read my written thesis (10.2 MB, 124 pages), you can download the PDF. Feel free to send me feedback, either by posting a comment here or by emailing me.
Here is what MusicBox looked like at the time my thesis was submitted:
A quick guide to the parts of this interface is displayed here:
(See my written thesis for much more detail.)
I had a good meeting with Paul today. We talked about a lot of things I need to think about, but a couple of them were some simple features to add. Here are some screenshots.
Album art view
Paul says lots of people judge music based on its cover, and that viewing album art would be valuable to those kinds of people.
For selecting songs within a particular distance of a given song.
I’m trying to wrap up the main coding tasks this week. Here’s what the interface to the MusicBox software looks like today:
I’ve added a few new features, and done a fair amount of debugging, over the past 1-2 weeks. Here are screenshots of some of the new features…
Create a tour
The “Create a tour” feature helps when you want to get a general idea of what music is sitting in front of you. You select a tour size, and the program selects that many songs evenly-dispersed through the space. With one click, you have a new playlist made up of a sampling of songs that represents the variety of music in the library.
In the screenshots, the glowing pink songs are the songs in the tour, and are automatically added to your working list of songs on the right. You can create a tour of whatever portion of the library you are currently viewing.
Wander from here
The “Wander from here” feature is a smart shuffle. If you like the song you are currently listening to, you can click on “Wander from here”, and the program will do a random walk through the space, starting with the current track. To choose each track, it will only venture a particular distance (that you set) from the previous track. This way, you can avoid those abrupt (and sometimes unpleasant) acoustic shifts you might experience in an iTunes shuffle.
In the screenshots, the yellow song is the current song, and the pink songs are those that have been selected for the “Wander from here” feature. The first screenshot shows a random walk with a small jump size (more smooth acoustic changes from track to track), and the next shows one with a large jump size (more dramatic changes).
Playlists as paths
Say you want to start with your hard-hitting rap and hip-hop, and move step by step to your classical music. The playlist path function lets you do this by drawing a path through the space; it selects the songs along that path.
The first screenshot was taken while the user was drawing the path, and the next screenshot shows the resulting selection of songs along that path.
Instead of viewing each song circle colored according to the genre in the ID3 tag, you can view each song as a mini soundsieve-style image. These images summarize some acoustic characteristics of each song, like a visual fingerprint. This view is inspired by my previous project on visualizing music, soundsieve.
Amongst the new additions:
- full playlist integration
- showing genre legend with elements sized by number of occurrences
- individual song images (from soundsieve project)
I implemented the first set of contextual data for my test set of music. I’m grabbing moods off of allmusic.com. These are descriptors like “spooky”, “lively”, or “epic”, which are usually listed per album. I’m incorporating these features into the PCA by treating each descriptor as a separate feature, giving each song with the given descriptor a “1″, and all others a “0″. These contextual data are being added into the model that already has audio-based data.
As expected, this stratifies the whole data set, and is an interesting way of separating out the albums… those albums with the same sets of mood tags appear in a stripe across the space, and their position in the stripe comes from the audio features. If I had to guess what the audio feature separation means here, I’d say it’s distributing music on a spectrum of edgier, more bursty sounds (on the left in this visual representation) to smoother, softer sounds (on the right). I am guessing this purely from inspection.
Here are some examples:
- I see a stripe of this album: Backstreet Boys — “Black & Blue”. It moves from “Shining Star” (click to listen; you need Windows Media Player) on the edgy side, to “How Did I Fall In Love With You“ (click to listen IF YOU DARE; you need Windows Media Player) on the smooth side.
- I see a stripe of this album: The Beatles — “Sgt. Pepper’s Lonely Hearts Club Band”. It moves from “Getting Better” on the edgy side, to “Lovely Rita” in the middle, to “A Day In the Life” on the smooth side.
- I see a stripe of this album: 10,000 Maniacs — “Our Time in Eden”. It moves from “Candy Everyone Wants“ on the edgy side, to “How You’ve Grown“ on the smooth side.
Also, Busta Rhymes’ smoothest piece in the test set, “Hot Fudge”, is way less smooth than the Beatles’ edgiest piece.
I was able to add some rhythm features to the model. I’m using Thomas Lidy and Andreas Rauber’s Rhythm Histograms (RH), summarizing each histogram (one histogram per song) with four features: mean, median, standard deviation, and (at Thomas’s recommendation) kurtosis. Here’s how the map of the same music library looks now:
You can compare with the last version, before RH features were used:
Unfortunately the Rhythm Histogram code couldn’t open my classical music, so you have to disregard that data (to which I gave the mean value of the rest of the songs’ RH data).
So the main thing that happened here is that the rap music popped out. And perhaps a wee bit the jazz.
Now posting the equivalent of the last post‘s video, except in 2D, built with Processing. Just click the image below to load the 1.5 MB QuickTime video (larger version here — 12 MB). This is the version I demo-ed for sponsors April 1st and 2nd.
Songs/circles are still located based on a PCA of timbre, tempo, duration, number of distinct sections, and time signature stability. The circles’ colors are mapped straight from genre tag (see the legend at bottom-left).
Here’s a more detailed look at the layout of this small music library:
A few things to note:
- Some of the genre colorings are fairly well separated/clustered based on the small set of audio features I’m using. Note how classical music sits almost on its own. Jazz sits neatly nearby, between Classical and everything else. Rock is a nice mess that overlaps with Pop, Country, Rap, and Dance. These are relationships that make sense if you think about how loose or tight each genre label is.
- In a few instances, there are pairs of songs that are actually the same track at different bit rates (I had duplicates-with-different-bit-rates in the test library). One example of this is the light-blue-and-white pair just south of Jazz. This is The Flaming Lips’ “The Observer”, at bit rates 256 and 192. (One of those has a bad genre tag as well — That’s why it’s white.) Looking at The Flaming Lips’ “Zaireeka” album (not pictured) was also very cool, since you can see how the complementary stereo tracks are either very similar or very different.
- I listened to the leftmost Country song; it is Willie Nelson’s “Still is still moving to me”. According to this representation of the music library, this is the “country song most like rock”. You can certainly argue for or against that, but it’s still interesting to be able to start making characterizations like this.
- This is a small library (~150 songs). As it gets bigger, and as the feature set grows (i.e. when I implement new features), hopefully these patterns will just become more meaningful… We’ll see.
Just posting a short video of the unfinished 3D browser. I’ve decided to focus on a 2D interface built with Processing, but wanted to show anyone who’s interested how far I went with the 3D version. Just click the image to load the 10 MB QuickTime video.
Songs/spheres are located based on a PCA of timbre, tempo, duration, number of distinct sections, and time signature stability. The spheres’ colors are mapped straight from genre tag (e.g. yellow is “Classical”).
Ok, this is pretty ugly… but I’m very excited to see it. It is the first test image coming out of the PCA engine I’ve built. It’s mapping test tracks based on timbre values, number of sections, duration, and time signature. JFreeChart was a great recommendation, thanks Paul :)
(The “null”s are Bach.)
I’ll use this kind of interface as I clean up the code and test out new features. Then, on to 3-D!