I’m getting more into my coding, and now am trying to answer the question, “What is a feature?” Specifically, what are the features that I can glean from the audio to get a meaningful distinction between one song and the next, and what is the general description of this thing I call a “feature”.
My project is not intended to be focused on figuring out or implementing these features; I am focused on bringing them together into a navigable representation. But in the design of the Feature class I’m finding myself wondering:
- Do features operate at the song level, or at the section level? I should have the ability to deal with either type, but then, I am sometimes mapping song sections, and sometimes whole songs. What do I show to the user in the interface if I’m only really mapping a section of a song?
- Should I try to choose one section to be representative of the piece as a whole, and just do my analysis on that section?
- What are the must-have features people have already written code for, that can be easily adapted and plugged in to my engine?
- What kind of rhythm-based features can I pull out? (I mention this because I am sorely lacking in the rhythm arena.)
I will start with features like this (for each track):
- number of sections
- number of types of sections (counted by timbre type)
- number of types of sections (counted by pitch pattern type)
- mode of timbres
- mean level of specific timbre coefficients (coefficients shown visually at the bottom of this page)
- mean loudness (or max, maybe)
- confidence level of autotag assignment with tag1, tag2, tag3, etc… (multiple features here)
- frequency of appearance of tag assignment with tagA, tagB, tagC, etc… (multiple features here)
- time signature
- time signature stability
- track duration
(Note that, right now, I am not talking about similarity measures for pairs of songs, but rather quantifiable measures for one song at a time. I’ll deal with similarity later.)