New approach for discovering already entered songs on MB

I am trying to develop a new, faster, cleaner approach to see which songs(recording), album(release), and artists(artist) are already on MusicBrainz, and I’ve noticed that creating an advanced query like this:

Intro release:Polvo+de+Estrellas artist:Alberto+Plaza

or this equivalent:

http://musicbrainz.org/search?query=Intro+release%3APolvo%2Bde%2BEstrellas+artist%3AAlberto%2BPlaza&type=recording&limit=25&advanced=1

generates a good output.

Then, parsing and comparing the output with BeautifulSoup


soup = BeautifulSoup('http://www.the.url')
out = soup.findAll('tbody')[0].findAll('a')

we can easily obtain the links for the recording, release, and artist:

[Intro,
Alberto Plaza,
Polvo de Estrellas,
Intro,
Alberto Plaza,
Polvo de estrellas,
Milagro de Abril,
Alberto Plaza,
Polvo de Estrellas,
No Seas Cruel,
Alberto Plaza…]

However, I still need to figure out how to filter a recording, release, or artist without a perfect score, like this one:
Sol+Luminoso release:Indi artist:Indi or
http://musicbrainz.org/search?query=Sol%2BLuminoso+release%3AIndi+artist%3AIndi&type=recording&limit=5&advanced=1
it returns:

If we are *too* strict with the three fields we will loose some of the songs already in the database, so we need to assign some flexibility. For instance, the artist can be retrieved correctly, and also the release, but the recording can be wrong.

An approach could be to calculate the Levenshtein distance for each field, and relate that with the quantity of letters for each field (more letters can imply a larger distance.

(Preliminary tests with the level of *strictness* for each field indicate that while artist is more strict (it doesn’t find anything for ‘viglienponi’), release and recording are more relaxe (it retrieves the correct release for ‘anything artist:vigliensoni recording:twist and shout’)
 

 

Leave a Reply

Your email address will not be published. Required fields are marked *