The Chilean record labels contribution, demystified

Using the data from the BDMC, we can see that there is 37335 songs with an assigned record label (93% of the BDMC total entries), and the number of labels is 334.

The distribution of edited songs by the record labels is shown here:

We can see in this distribution the ‘short-head’ and the ‘long-tail’ of music.

If we make a zoom in the ‘short-head’ and select labels with more than 100 published songs, we can see:

We can see that the first column, with 26% of the total published songs correspond to ‘Independent’ editions, so they are isolated publications that can not count as record labels. Interesting results can be seen in the figure. Emi Odeón and Warner Music are the two biggest record labels (9% and 6% respectively), but Oveja Negra, label of the SCD (the Chilean music copyright society), has also a 6% of participation on published songs after only 10 years of work. We can also see that ‘Sello Azul’, a second label belonging to SCD devoted to new people, has a 3%, so the SCD is responsible for almost the same amount of Chilean music that the biggest record label. However, this effect can also be seen like as the SCD takes much care on the distribution of their own products and most of their represented artist are in their database.

If we do a per-year analysis, we can see:

We can see that the songs in the BDMC database belong mostly to the 90’s and 00’s decades, with a 23% and 69% percent of the total. Songs before that period are only a 1% of the total. We can also see that the peak of published songs was in 2007, and this amount has decreased rapidly in the following years, reaching in 2011 the same number of songs that in 1993.

Matching BDMC and MusicBrainz

 

I have been querying MusicBrainz with the data from the BDMC, as a first outcome:

  • In the BDMC there is a total of:
    • 40132 entries
    • 3343 different artists
    • 3085 different albums
    • 32570 songs with different names

From that total, there are

  • 457 artist names (with the EXACT spelling that can be found in MusicBrainz)
  • 2886 artists that can not be found

This is only the 14% of the total amount. However, there are some artist names that are not properly spelled, but are close to the original, in the databases (e.g., ‘DJ Mendez’ instead of ‘DJ Méndez’, or ‘Alvaro Henriquez’ instead of ‘Álvaro Henríquez’), and those should be considered as found artistsAlso, some of the artist have the same name with other artist, such as ‘Mito’. The Chilean ‘Mito’ appears as the third entry in MB, without an explicit country, only with a disambiguation (‘Chilean’).

After running the script again considering if the entry in the BDCH matches some of the aliases for each artist in MB, the numbers are a bit better:

  • 565 (17%) artists were recognized
  • 56 (2%) have CL as the country (2%)
  • 72 (2%) have another country as the country type
So, if we extract this last number of artists from the database, which are very likely to not be chilean, we end up with 493 recognized artists.

 

I’ve been also correcting the many inconsistencies of the BDCH: renaming artist with different spellings and entering accents for artists without them. I have done 25% of it (10^4 entries) and the new numbers I got are:

  • 3308 different artists
  • 551 artists were recognized (17% of the total)
    • 466 possibly Chilean (14%)
      • 56 Chilean (explicitly declared)
      • 410 undeclared country
      • 177 groups (38% of the recognized possibly Chilean artists)
      • 142 people (30% of the recognized possibly Chilean artists)
      • 147 undefined (32% of the recognized possibly Chilean artists)
    • 75 non-chilean artists (should be discarded from the database)

Our idea is to provide MB with a big file with all data in our database with the corresponding MBIDs for artist, title, and album (if any).

  • From the 551 recognized artists using the out_correct file, there are:
    •  9454 titles (out_BDMC_w_artist_MBID)

During the last days I’ve been trying to solve the following problem: for the Chilean artist Dogma there are 8 different entries with the same score (100):

Score Name Sort Name Type Begin End
100 Døgma Døgma
100 Dogma (German trance artist) Dogma
100 Dogma (portuguese band) Dogma Group 1996 2003
100 Dogma (Brazilian progressive rock band) Dogma Group 1996
100 Dogma (Swiss trance duo Robin Mandrysch & Guido Walter) Dogma Group
100 Dogma (goa trance duo Damir Ludvig & Goran Stetic) Dogma Group
100 Dogma (Chilean artist) Dogma
100 Dogma (Italo-dance artist) Dogma

It seems that I need to take a look to the disambiguation field and look for the ‘Chile’ word (or a derivative) to consider it as the artist we are looking for.

Very first numbers…

I was granted with access to the BDMC (“La Base de Datos de la Música Chilena”, compiled by the SCD, the Chilean Copyright Society). Here are some numbers related to the amount of information that this database have:

bdmc

  • 40132 total songs
  • 32569 different songs (so, 7563 cover songs or with same name?)
  • 3342 different artists
  • 3085 different albums (some noise, though, as in the case of “Obras Sinfónicas en Vivo CD1″ and “Obras Sinfónicas en Vivo CD2″, and some possible identical names between releases)
  • 79 different genres (tags)
  • 432 different record labels
However, there is some noise in this data because entries with different styles appear as different things (e.g.,  “DJ Méndez y Yoan Amor” and “DJ Méndez – Yoan Amor”, “A ti”, “A Ti”, and “A tí”). A process of normalization of the data is required for further processing!

It is interesting to see how the BDMC has a different scope when comparing it with other sources of Chilean music information, as in musicapopular.cl, mus.cl, portaldisc.cl, and vccl.tv. BDMC has in it only songs that already have generated some copyrights for its authors, so most of the songs have been air played.

I have already scraped the data from all other sites, preliminary numbers are:

mus.cl

  • 502 album reviews
  • 332 interviews
  • 564 concert review

musicapopular.cl

  • 3353 artist biographies (I still need to extract the full discographies)

portaldisc.cl

  • 3634 album reviews (although there is some noise because there are some non-Chilean artists)

vccl.tv

  • 1661 videoclips

Base de Datos de la Música Chilena

The ‘Base de datos de la música chilena‘ (BDCH, the Chilean-music database) is a project developed by the Fundación Música de Chile (FMC, the Music of Chile Foundation) that allows radio stations with a secure, easy and fast access to the biggest online repository of chilean music.

The scope of genres of the collection is wide, ranging from rock, pop and hip-hop, to classical and folklore, and are structured in 52 genres and sub genres. For an easy navigation, BDCH provides with search methods that allow users to query for artist, song, album, publication year or genre.

The songs in the collection belong to artists that are already members of the ‘Sociedad Chilena del Derecho de Autor’ (SCD, the Chilean copyright society), so most of the time they have been air played in commercial radio stations across Chile.

The website provides access to the statistics about the contents in the database:

Total songs :: 39958
Total directories :: 3140
Total albums and directorios :: 3140
Total album artist :: 1818
Total song artists :: 3163

Average song length :: 03:36
Standard deviation of the song length :: 01:50
Longest song :: 01:00:46
Shortest song :: 00:00
Database total length :: 2393:36:04

Avg. file size :: 4.53 Mb
Total file size :: 176.69 Gb

Avg. file bitrate :: 176.02 Kbps