I am trying to develop a new, faster, cleaner approach to see which songs(recording), album(release), and artists(artist) are already on MusicBrainz, and I’ve noticed that creating an advanced query like this:
Intro release:Polvo+de+Estrellas artist:Alberto+Plaza
or this equivalent:
generates a good output.
Then, parsing and comparing the output with BeautifulSoup
soup = BeautifulSoup('http://www.the.url')
out = soup.findAll('tbody').findAll('a')
we can easily obtain the links for the recording, release, and artist:
However, I still need to figure out how to filter a recording, release, or artist without a perfect score, like this one:
Sol+Luminoso release:Indi artist:Indi or
If we are *too* strict with the three fields we will loose some of the songs already in the database, so we need to assign some flexibility. For instance, the artist can be retrieved correctly, and also the release, but the recording can be wrong.
An approach could be to calculate the Levenshtein distance for each field, and relate that with the quantity of letters for each field (more letters can imply a larger distance.
(Preliminary tests with the level of *strictness* for each field indicate that while artist is more strict (it doesn’t find anything for ‘viglienponi’), release and recording are more relaxe (it retrieves the correct release for ‘anything artist:vigliensoni recording:twist and shout’)