Name That Tune: Finding a song from a sung query
Bryan Pardo, University of Michigan
February 10, 2004
Music Information Retrieval has become an active area of research motivated by the increasing importance of internet-based music distribution. Online catalogs are already approaching one million songs, so it is important to study new techniques for searching these vast stores of audio. One approach to finding music that has received much attention is "Query by Humming" _LP_QBH_RP_. This approach enables users to retrieve songs and information about them by singing, humming, or whistling a melodic fragment. In QBH systems, the query is a digital audio recording of a melodic fragment, and the ultimate target is a complete digital audio recording of a piece.
We have created a QBH system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme _LP_query_RP_ is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes _LP_targets_RP_ stored in a database. The top pieces are returned to the user in order of similarity to the sung theme. We describe two approaches to measuring similarity between database themes and the sung query. In the first, queries are compared to database themes using probabilistic string-alignment algorithms. Here, similarity between target and query is determined by edit cost. In the second approach, pieces in the database are represented as hidden Markov models _LP_HMMs_RP_. In this approach, the query is treated as an observation sequence and a target is judged similar to the query if its HMM has a high likelihood of generating the query. Experiments show that while no approach is clearly superior in retrieval ability, string matching often has a significant speed advantage.
Moreover, neither approach surpasses human performance.
Bryan Pardo is a doctoral candidate in the Electrical Engineering and Computer Science department of the University of Michigan, with a specialization in Intelligent Systems. He applies machine learning, probabilistic natural language processing, and database search techniques to auditory user interfaces for human computer interaction. Bryan takes a broader view of natural language than is traditional in computational linguistics, including timbre and prosody _LP_timing, pitch contour, loudness_RP_, with an emphasis on music. In addition to his research activities, Bryan is also an adjunct professor of Music at Madonna University in Livonia, Michigan, where he teaches a course in music technology and also performs regularly throughout Michigan on saxophone and clarinet with his band, Into the Freylakh.