The responses I got to my previous post about evaluation requirements for my thesis pretty much boils down to this: I need to clarify what I hope my contribution will be before I can decide how to appropriately test it.
To make my goals more clear, here are the kinds of questions I’m trying to ask with this new interface:
- Does this spatial interface give a better understanding of the overall scope of your music?
- Is the music->visual mapping intuitive?
- To introduce the notion of “really fuzzy searching”… Is a spatial representation more appropriate than text-based lists for music browsing?
- Does the interface help you see/examine your listening patterns?
- Does the interface help you see the relationship between multiple people’s music libraries?
- Can you more easily find music of a certain style/type, or for a particular activity, than you can with a more traditional music browser?
- Can looking at your music and someone else’s music at the same time in the space help you find recommendations for new music?
- Does this interface change what things are important when it comes to looking at your library? Do you find that you are looking for, or thinking about, different sorts of things than you would with a traditional music browser?
- Does the interface make you more aware of the context for your music-browsing decisions?
I’m at a big decision point in my thesis. I have a very primitive music browser implemented in both 2D and 3D. I want to choose the number of dimensions (2 or 3) for my main project before I move too much farther in developing the interface. I just don’t have time to develop them both.
My biggest concern: I had been pushing for a 3D interface throughout the proposal process, but I’m worried that continuing with it will force me in my remaining time to focus much more on elements of 3D interfaces (e.g. how to orient the user, how to show the overall cloud shape despite obscuration) than on elements core to my own thesis motivations (e.g. how to organize music, how to find patterns in music listening).
I think a 2D interface is currently more easy to develop than a 3D interface, and that perhaps I should focus on only two dimensions and have a better chance of making an interface that demonstrates all the things I had hoped to show (outlined in my proposal).
In the end, my thesis is not about interfaces; it is about the organizational model itself. That organizational model is the use of audio and contextual data to organize a music collection in a fuzzy manner that I think is more appropriate for this type of data, in addition to providing others with a framework to add onto it, both in terms of input features and output interface. This approach is in opposition to what we see in most music browsers (well, and data browsers in general), which limit organization to non-configurable lists and, ultimately, text labels.
So, my thesis work becomes: (1) an implementation of this organizational model, (2) made publicly-available, along with (3) demonstration(s) of an interface built on top of the model. An analog to this manner of thinking is the Echo Nest’s recent announcement of their AudioAnalysis API. Last year, they made this tool (1) available to others (2) — it gave me numbers, and I built an interface on top of it (3). In this thesis, I am the one providing the numbers, and letting others build interfaces on top.
Even though the main contribution is the model, I will demonstrate one such interface with a 2D representation of a music collection that is user-configurable and dynamically updated through RSS feeds.
Here are the main questions:
- Am I losing something integral to the project if I move down from three dimensions to two?
- Is this line of thinking (that my contribution is more an organizational model than an interface) too dangerous?
- Am I contributing enough?
“For the system you are designing, I would put an emphasis on first and foremost building a really killer application based on your underlying research, grounding this in a discussion of important precedents and your reasons for doing this work, following the design process and how important decisions were made, capped with your own critical analysis of the system, including what works best, what remains most problematic, and what major challenges remain. If you wanted to do some limited user testing, that would be fine, but for this particular project I would be more interested in your ideas and your evaluation.”
John, Paul, and Henry: Please let me know if you have any concerns about my plans for the evaluation part of my thesis work.
I’m getting more into my coding, and now am trying to answer the question, “What is a feature?” Specifically, what are the features that I can glean from the audio to get a meaningful distinction between one song and the next, and what is the general description of this thing I call a “feature”.
My project is not intended to be focused on figuring out or implementing these features; I am focused on bringing them together into a navigable representation. But in the design of the Feature class I’m finding myself wondering:
- Do features operate at the song level, or at the section level? I should have the ability to deal with either type, but then, I am sometimes mapping song sections, and sometimes whole songs. What do I show to the user in the interface if I’m only really mapping a section of a song?
- Should I try to choose one section to be representative of the piece as a whole, and just do my analysis on that section?
- What are the must-have features people have already written code for, that can be easily adapted and plugged in to my engine?
- What kind of rhythm-based features can I pull out? (I mention this because I am sorely lacking in the rhythm arena.)
I will start with features like this (for each track):
- number of sections
- number of types of sections (counted by timbre type)
- number of types of sections (counted by pitch pattern type)
- mode of timbres
- mean level of specific timbre coefficients (coefficients shown visually at the bottom of this page)
- mean loudness (or max, maybe)
- confidence level of autotag assignment with tag1, tag2, tag3, etc… (multiple features here)
- frequency of appearance of tag assignment with tagA, tagB, tagC, etc… (multiple features here)
- time signature
- time signature stability
- track duration
(Note that, right now, I am not talking about similarity measures for pairs of songs, but rather quantifiable measures for one song at a time. I’ll deal with similarity later.)