Visualising the one million popular songs extracted from the Million Song
Dataset (the MillionSongSubset) [1]. The Self Organising Map is built using the
minisom library [2].
1. Get the SQLite database with the songs' metadata
2. Export it to CSV filtering the invalid songs
sqlite3 -header -csv track_metadata.db < filter_data.sql > track_metadata.csvThe attribute artist_mbid is dropped from the dataset because it is only an
external identifier for the artist in the musicbrainz.org database.
The attribute track_7digitalid is dropped from the dataset because it is only
an external identifier for the artist in the external 7digital database.
Songs without a year, shs_work or shs_perf information are discarded.
10000 songs should be exported to the CSV due to memory constraints
3. Run the script and log its result
python -u analyse.py 2>&1 | tee "$(date --iso-8601='minutes').log"[1] BERTIN-MAHIEUX, Thierry. Million Song Dataset, official website. Available at: http://millionsongdataset.com/. Accessed on 05 May 2022.
[2] VETTIGLI, Giuseppe. MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map. Available at: https://github.com/JustGlowing/minisom. Accessed on 05 May 2022.