Extracting and parsing spanish-formatted dates

We have collected birth and death dates for many of the Chilean artists that belong to our databases. Most of this data comes from musicapopular.cl. However, this data doesn’t have any formatting rules and it seems that different people entered data with different criteria. In other words, there are different styles of spanish dates.

As MB supports fields for date periods in the form YYYY-MM-DD, we developed a script using regular expression to parse all dates to this format. Done.

musicapopular.cl parsing outcome

I just finished parsing the http://musicapopular.cl website. As a first outcome, I can see the following numbers:

  • There are 1547 bands in their database
  • There are 1826 people. This value does not mean 1825 soloist because in the database there is some people, such as managers, writers, journalists and music producers working for the Chilean music industry. These people should be part of the PEOPLE table and linked to a resource, if necessary.
  • Here is the list of the genres and the number of artists associated with it:
    • Balada 252
    • Bolero 113
    • Canción melódica 119
    • Vals 29
    • Nueva Ola 128
    • Neofolklore 34
    • TV pop 180
    • Canto 150
    • Trova 150
    • Fusión latinoamericana 439
    • Pop 833
    • Funk 112
    • Jazz 565
    • Tango 28
    • Ranchera 48
    • Corrido 48
    • Tropical 204
    • Folclor 270
    • Música orquestada 48
    • Canto a lo poeta 21
    • Canto Nuevo 57
    • Música andina 22
    • Música infantil 37
    • Nueva Canción Chilena 61
    • Rock 789
    • Cueca 112
    • Tonada 43
    • Electr?nica 224
    • Hiphop 145
    • Música experimental 208
    • Música típica 45
    • Foxtrot 13
    • Fusión étnica 55
    • Música clásica 38
    • Música contempor?nea 101
    • Música incidental 25
    • Rock progresivo 39
    • Música chilota 9
    • Proyección folclórica 62
    • Metal 78
    • Punk 55



As n idea, there is information about birth date and dead death for many musicians, it would be great to create a memorial with the dates.

Very first numbers…

I was granted with access to the BDMC (“La Base de Datos de la Música Chilena”, compiled by the SCD, the Chilean Copyright Society). Here are some numbers related to the amount of information that this database have:


  • 40132 total songs
  • 32569 different songs (so, 7563 cover songs or with same name?)
  • 3342 different artists
  • 3085 different albums (some noise, though, as in the case of “Obras Sinfónicas en Vivo CD1″ and “Obras Sinfónicas en Vivo CD2″, and some possible identical names between releases)
  • 79 different genres (tags)
  • 432 different record labels
However, there is some noise in this data because entries with different styles appear as different things (e.g.,  “DJ Méndez y Yoan Amor” and “DJ Méndez – Yoan Amor”, “A ti”, “A Ti”, and “A tí”). A process of normalization of the data is required for further processing!

It is interesting to see how the BDMC has a different scope when comparing it with other sources of Chilean music information, as in musicapopular.cl, mus.cl, portaldisc.cl, and vccl.tv. BDMC has in it only songs that already have generated some copyrights for its authors, so most of the songs have been air played.

I have already scraped the data from all other sites, preliminary numbers are:


  • 502 album reviews
  • 332 interviews
  • 564 concert review


  • 3353 artist biographies (I still need to extract the full discographies)


  • 3634 album reviews (although there is some noise because there are some non-Chilean artists)


  • 1661 videoclips