Project Description: Using Exportify, I gathered my Spotify playlist data and wondered: can this reveal my mood and well-being over the years based on when I listened to songs and the sentiment in their names and lyrics?
Data: The dataset spans April 2016 to August 2024, including artist, track, genre, and an attribute called Valence (high = positive, low = negative). Read more about it here.
Motivation & Hypothesis: This period covers a significant part of my life, from college in Indiana to married life in San Francisco, with a pandemic in the middle. My playlists serve as emotional journals, and I want to explore how they reflect my evolving moods and life events.
Key focus areas for this project include:
By doing this analysis, I hope to reveal how my music choices reflect my life experiences and emotions.
I was fully expecting my top artist to be Charli XCX but it is actually… Future??? I did not see this in my Future.
To dig into this deeper, I looked into when I added all the songs with Future on them… to find most of it was this year.
Bingo! This year, Future took the lead, thanks to his 2024 albums We Don’t Trust You and We Still Don’t Trust You.
Spring is “BRAT Green” with Charli XCX; winter features an eclectic mix of Disney, Britney Spears, and BTS. (P.S. Click “BRAT” if you’re unfamiliar with the term.)
My curation became erratic post-2022 due to full-time work, with a notable spike in August 2024 during a stressful period.
I added the most songs during the pandemic in 2020 and my unemployment in 2024—two challenging times.
I normalized the data by dividing the number of songs added each season by the number of days in that season and then by the 9 years in my dataset (almost a third of my life!). Winter sees the most activity, likely due to long college breaks and bad weather.
My taste remains consistent year-round, favoring high-energy, danceable songs.
I focused on comparing 2019-2020 and 2023-2024 to see how they differed. My rationale for this comparison is as follows::
2020 was the year of the pandemic, and 2019 is a baseline to compare pre and post COVID-19 music preferences
2023 was the year I got married but 2024 was the year I was stuck without a job in a bad market
Despite the pandemic, 2020 featured more upbeat songs (grad school acceptance!), while 2024 showed declines in most traits (except tempo) during my job search.
Initially, I plotted Valence over time and included a monthly moving average, but the resulting graph was too messy and hard to interpret.
Instead, I plotted a graph to analyze and visualize the average monthly valence of my Spotify playlist data over time, with annotations for significant life events.
Observations:
These patterns suggest my music preferences mirror my emotional state, encouraging deeper exploration through sentiment analysis to understand my emotions over time.
I was surprised to find ‘Modern Rock’ appearing in my winter playlists. It seems like an anomaly; it has nearly vanished in later years.
I was then curious to see which years’ wintertime this rock music phase happened in
It is pretty apparent that as time passed, this genre no longer appeared in my playlists, appearing as little as 1 time in 2023 (talk about ROCK bottom :P)
I then tried to visualize this differently with a stacked bar chart and over the top 20 genres overall
This chart displays the popularity of various music genres across seasons:
I also did this by top 5 Genres over the years separated by season:
Key points:
A boxplot visually represents data spread and symmetry by dividing it into quartiles, showing the median and potential outliers. For more on interpreting boxplots, see this guide.
Key findings:
In summary, my music preferences shift with the seasons, with summer evoking nostalgia and autumn sparking interest in new music.
I also looked at yearly trends to see if this nostalgic streak holds up over time:
The trend of favoring older music remains consistent, except for Spring 2023 and part of Winter 2024, when I leaned towards newer releases.
The Pareto principle (80/20 rule) suggests that 80% of effects come from 20% of causes, guiding efforts toward the most impactful areas. In my case, nearly 50% of my music is concentrated in just 30 of 1412 genres, indicating a preference pattern. While the graph below only shows the top 100 genres, masking the full Pareto distribution, this observation hints at potential insights for building a recommendation engine, though I’ll explore that later.
t-SNE simplifies complex data into a 2D map to reveal patterns, while KMeans groups similar items into clusters. In this analysis, t-SNE visualizes the data, and KMeans clusters songs based on audio features. Using the elbow method, I determined four optimal clusters. Click on any of the following links to learn more about these methods: t-SNE, K-means clustering, Elbow method. Here’s the result:
This graph shows a t-SNE (t-Distributed Stochastic Neighbor Embedding) visualization of song clustering based on audio features. Key takeaways are:
Number of Clusters: The graph and centroid table show 4 clusters (0, 1, 2, 3), represented by different colors.
Spatial Distribution: Clusters aren’t perfectly separated, showing shared characteristics and variability within genres.
Feature Importance: Energy differentiates clusters, especially high-energy (Cluster 1) vs. low-energy (Cluster 2). Acousticness and instrumentalness influence Cluster 2’s separation.
Popularity: Cluster 1 is the most popular, but popularity isn’t the main clustering factor.
Tempo and Duration: Variations in tempo and duration across clusters contribute to differentiation, though not visible in the t-SNE plot.
In summary, the t-SNE plot effectively visualizes clusters based on audio features, reflecting the complexity of music categorization, with overlap showing shared characteristics across genres.
I also looked at the seasonal distribution of clusters for a kick
Key Takeaways:
This analysis reveals how music preferences shift with the seasons, useful for playlist curation, release timing, and personalized recommendations.
Winter playlists are the looongest, especially in 2020.
To set the stage for the next phase of my project—sentiment analysis of song lyrics using LLMs—I created word clouds for track names by season to identify frequently occurring words.
While Spotify tracks song “Valence,” can I gauge overall mood through word clouds of track titles? After all, a picture is worth a thousand words!
This analysis could lead to mood tracking through music, enhance therapy with tailored playlists, and help understand how seasonal music preferences affect mental health.
Privacy and data security are critical, especially when using emotional data. AI’s role in therapy must be carefully regulated.
I’m diving into sentiment analysis on song lyrics using LLMs to uncover emotional patterns in titles and lyrics. Stay tuned!
© Copyright of Varshini Srinivas