What to do with missing data?
My newest problem is one that I knew I’d come across eventually: What do I do with songs that have missing data? This most recently came up when I was adapting Thomas Lidy’s rhythm feature code… It couldn’t open some of the MP3s in my test set, so I have no rhythm feature data for those songs. Lacking a better idea, I just gave them the mean values of all the other songs. But this doesn’t seem right… I can’t really give them any value. But if I don’t give them any values, the PCA can’t process these tracks anymore; I can’t put them in the space at all.
So, not sure what to do about these songs. Anyone have any suggestions?
Comment by N
Posted on April 15, 2008 at 4:32 pm
how about adding a transcoding step for these tracks to some format that is readable by Lidy’s code, if it fails reading them in the first place?
e.g. transcode them to MP3 CBR 192 kbps might work?
Comment by Anita
Posted on April 15, 2008 at 4:41 pm
Yes, I’d definitely like to debug the problem I’m having running Lidy’s code in particular, but I’m sure this same kind of problem will come up again as I add more features. For example, for Echo Nest analysis, sometimes the file does not process and I don’t know what went wrong. So there are some instances in which I will simply not be able to remedy the missing data… what to do with them?
Comment by N
Posted on April 15, 2008 at 6:49 pm
may i suggest that your fastest alternative is probably to apply Pareto’s law 80/20. i.e. just drop the 20% of your library that are “faulty” tracks so you can get on with your project and keep focused.