Varshini Srinivas

Hi, I’m Varshi! 👋

I’m a data scientist with a background in consulting, industrial engineering, and operations research. I love storytelling with data using 🐍 Python, 📊 Tableau, and 💾 SQL. In my spare time, I love to 🍳 cook, make 🎨 linocut prints, and volunteer at 🐾 Muttville SF!

Soundtrack of My Life: Unraveling My Spotify Saga Over The Last 9 Years

Author: Varshi S, 9 minute read.

Click here to see the Jupyter Notebook for this EDA!


Project Description: Using Exportify, I gathered my Spotify playlist data and wondered: can this reveal my mood and well-being over the years based on when I listened to songs and the sentiment in their names and lyrics?

Data: The dataset spans April 2016 to August 2024, including artist, track, genre, and an attribute called Valence (high = positive, low = negative). Read more about it here.

Motivation & Hypothesis: This period covers a significant part of my life, from college in Indiana to married life in San Francisco, with a pandemic in the middle. My playlists serve as emotional journals, and I want to explore how they reflect my evolving moods and life events.

Key focus areas for this project include:

  1. Track my evolving music taste over time.
  2. Examine playlist patterns by seasons (Am I as predictable as I think?).
  3. Use LLMs for sentiment analysis of lyrics (and maybe build a recommender system based on that?)

By doing this analysis, I hope to reveal how my music choices reflect my life experiences and emotions.

Note: Data shows song additions, not play frequency. Play data would enable more in-depth analysis of listening patterns.

Findings and Visualizations:

NOTE: Click on any graph to zoom in!

1. Top Artist of All Time:

I was fully expecting my top artist to be Charli XCX but it is actually… Future??? I did not see this in my Future.

Zoomable Image

To dig into this deeper, I looked into when I added all the songs with Future on them… to find most of it was this year.

Zoomable Image

Bingo! This year, Future took the lead, thanks to his 2024 albums We Don’t Trust You and We Still Don’t Trust You.

2. Seasonal Artists:

Spring is “BRAT Green” with Charli XCX; winter features an eclectic mix of Disney, Britney Spears, and BTS. (P.S. Click “BRAT” if you’re unfamiliar with the term.)

Zoomable Image

3. Playlist Curation Frequency by Month/Season

My curation became erratic post-2022 due to full-time work, with a notable spike in August 2024 during a stressful period.

Zoomable Image

4. Songs Added by Season over the years:

I added the most songs during the pandemic in 2020 and my unemployment in 2024—two challenging times.

Zoomable Image

5. Song Additions Normalized by Season:

I normalized the data by dividing the number of songs added each season by the number of days in that season and then by the 9 years in my dataset (almost a third of my life!). Winter sees the most activity, likely due to long college breaks and bad weather.

Zoomable Image

6. Seasonal Music Traits:

My taste remains consistent year-round, favoring high-energy, danceable songs.

Zoomable Image

7. Comparing 2019-2020 vs. 2023-2024:

I focused on comparing 2019-2020 and 2023-2024 to see how they differed. My rationale for this comparison is as follows::

Despite the pandemic, 2020 featured more upbeat songs (grad school acceptance!), while 2024 showed declines in most traits (except tempo) during my job search.


8. Valence Over Time:

Initially, I plotted Valence over time and included a monthly moving average, but the resulting graph was too messy and hard to interpret.

Note, the gap in the graph is around the time I briefly used Apple Music - a time we will not speak of again… hehe.

Zoomable Image

Instead, I plotted a graph to analyze and visualize the average monthly valence of my Spotify playlist data over time, with annotations for significant life events. Zoomable Image Observations:

These patterns suggest my music preferences mirror my emotional state, encouraging deeper exploration through sentiment analysis to understand my emotions over time.

9. Top Genres by Season:

I was surprised to find ‘Modern Rock’ appearing in my winter playlists. It seems like an anomaly; it has nearly vanished in later years.

I was then curious to see which years’ wintertime this rock music phase happened in

It is pretty apparent that as time passed, this genre no longer appeared in my playlists, appearing as little as 1 time in 2023 (talk about ROCK bottom :P)

I then tried to visualize this differently with a stacked bar chart and over the top 20 genres overall

Zoomable Image

This chart displays the popularity of various music genres across seasons:

I also did this by top 5 Genres over the years separated by season:

Key points:

10. Nostalgia vs. New Music:

A boxplot visually represents data spread and symmetry by dividing it into quartiles, showing the median and potential outliers. For more on interpreting boxplots, see this guide.

Zoomable Image

Key findings:

In summary, my music preferences shift with the seasons, with summer evoking nostalgia and autumn sparking interest in new music.

I also looked at yearly trends to see if this nostalgic streak holds up over time: Zoomable Image

The trend of favoring older music remains consistent, except for Spring 2023 and part of Winter 2024, when I leaned towards newer releases.

11. Genre Pareto Analysis:

The Pareto principle (80/20 rule) suggests that 80% of effects come from 20% of causes, guiding efforts toward the most impactful areas. In my case, nearly 50% of my music is concentrated in just 30 of 1412 genres, indicating a preference pattern. While the graph below only shows the top 100 genres, masking the full Pareto distribution, this observation hints at potential insights for building a recommendation engine, though I’ll explore that later.

Zoomable Image

12. t-SNE & K-Means Clustering:

t-SNE simplifies complex data into a 2D map to reveal patterns, while KMeans groups similar items into clusters. In this analysis, t-SNE visualizes the data, and KMeans clusters songs based on audio features. Using the elbow method, I determined four optimal clusters. Click on any of the following links to learn more about these methods: t-SNE, K-means clustering, Elbow method. Here’s the result:

This graph shows a t-SNE (t-Distributed Stochastic Neighbor Embedding) visualization of song clustering based on audio features. Key takeaways are:

In summary, the t-SNE plot effectively visualizes clusters based on audio features, reflecting the complexity of music categorization, with overlap showing shared characteristics across genres.

I also looked at the seasonal distribution of clusters for a kick

Key Takeaways:

This analysis reveals how music preferences shift with the seasons, useful for playlist curation, release timing, and personalized recommendations.

13. Track Duration by Season:

Winter playlists are the looongest, especially in 2020.

Zoomable Image

Zoomable Image

14. Word Clouds by Season:

To set the stage for the next phase of my project—sentiment analysis of song lyrics using LLMs—I created word clouds for track names by season to identify frequently occurring words.

While Spotify tracks song “Valence,” can I gauge overall mood through word clouds of track titles? After all, a picture is worth a thousand words!

Zoomable Image Zoomable Image Zoomable Image Zoomable Image

Applications of My Analysis:

This analysis could lead to mood tracking through music, enhance therapy with tailored playlists, and help understand how seasonal music preferences affect mental health.

Ethical Considerations:

Privacy and data security are critical, especially when using emotional data. AI’s role in therapy must be carefully regulated.


C’est Fin! Now for Next Steps:

I’m diving into sentiment analysis on song lyrics using LLMs to uncover emotional patterns in titles and lyrics. Stay tuned!


Thanks for reading! Feel free to share your thoughts or feedback at battler_haft_0h@icloud.com!

© Copyright of Varshini Srinivas