Introduction Every year, Spotify users eagerly await the release of Spotify Wrapped, a personalized year-in-review showcasing their most listened-to songs, artists, and genres. What if I told you there's a way to get a sneak peek at your Spotify statistics before the official release? In this guide, I'll walk you through a Spotify Wrapped hack that allows you to create your own personalized stats using your Spotify streaming data. This way you won’t need to wait for Spotify Wrapped, and you will also be able to create stats that Spotify won’t show you. Prerequisites Similar to one of , we will use Jupyter Notebook for this one. It’s a great tool for experimenting and working with data. my earlier projects If you haven’t installed Jupyter Notebook yet, follow the instructions on their . Once installed, you can create a new Jupyter Notebook and get ready for diving into your Spotify stats. official website Gathering and Sanitizing Data To get started, you'll need to request your Spotify streaming data. You can do this (make sure you request the “Extended streaming history”). It will take some time for Spotify to send you your data. Requesting only the “Account data” will be faster and will also give you last year’s streaming history. However, it is way less detailed and you will have to adapt the code. here Once you have the data, we can import it. You will get multiple JSON files. Each file consists of an array of objects containing information about a played song or podcast episode: {
	"ts": "2023-01-30T16:36:40Z",
	"username": "",
	"platform": "linux",
	"ms_played": 239538,
	"conn_country": "DE",
	"ip_addr_decrypted": "",
	"user_agent_decrypted": "",
	"master_metadata_track_name": "Wonderwall - Remastered",
	"master_metadata_album_artist_name": "Oasis",
	"master_metadata_album_album_name": "(What's The Story) Morning Glory? (Deluxe Remastered Edition)",
	"spotify_track_uri": "spotify:track:7ygpwy2qP3NbrxVkHvUhXY",
	"episode_name": null,
	"episode_show_name": null,
	"spotify_episode_uri": null,
	"reason_start": "remote",
	"reason_end": "remote",
	"shuffle": false,
	"skipped": false,
	"offline": false,
	"offline_timestamp": 0,
	"incognito_mode": false
} This allows you not only to figure out when and on which device you listened to a song but also gives you information such as if and when you skipped it. We will simply merge all of them into a single Pandas data frame: path_to_json = 'my_spotify_data/'
frames = []
for file_name in [file for file in os.listdir(path_to_json) if file.endswith('.json')]:
    frames.append(pd.read_json(path_to_json + file_name))

df = pd.concat(frames) Afterward, we'll sanitize it by removing podcasts, filtering out short play durations, and converting timestamps to a more readable format: # drop all rows containing podcasts
df = df[df['spotify_track_uri'].notna()]

# drop all songs which were playing less than 15 seconds
df = df[df['ms_played'] > 15000]

# convert ts from string to datetime
df['ts'] = pd.to_datetime(df['ts'], utc=False)
df['date'] = df['ts'].dt.date

# drop all columns which are not needed
columns_to_keep = [
    'ts',
    'date',
    'ms_played',
    'platform',
    'conn_country',
    'master_metadata_track_name',
    'master_metadata_album_artist_name',
    'master_metadata_album_album_name',
    'spotify_track_uri'
]
df = df[columns_to_keep]

df = df.sort_values(by=['ts'])
songs_df = df.copy() Analyzing and Visualizing Your Spotify Stats Top Songs of All Time Let's kick things off by exploring your all-time favorite songs. We can easily unveil our top tracks based on your streaming history: df = songs_df.copy()


df = df.groupby(['spotify_track_uri']).size().reset_index().rename(columns={0: 'count'})
df = df.sort_values(by=['count'], ascending=False).reset_index()

df = df.merge(songs_df.drop_duplicates(subset='spotify_track_uri'))
df = df[['master_metadata_track_name', 'master_metadata_album_artist_name', 'master_metadata_album_album_name', 'count']]

df.head(20) Top Songs in 2023 Curious about this year's music trends? We can use this function to reveal the top songs of 2023: def top_songs_in_year(year):
    df = songs_df.copy()

    df['year'] = df['ts'].dt.year

    df = df.loc[(df['year'] == year)]

    print(f"Time listened in {year}: {datetime.timedelta(milliseconds=int(df['ms_played'].sum()))}")

    df = df.groupby(['spotify_track_uri']).size().reset_index().rename(columns={0: 'count'})
    df = df.sort_values(by=['count'], ascending=False).reset_index()

    df = df.merge(songs_df.drop_duplicates(subset='spotify_track_uri'))
    df = df[['master_metadata_track_name',
             'master_metadata_album_artist_name',
             'master_metadata_album_album_name',
             'count']]

    return df.head(20) Interactivity With Widgets That works very well already, but why settle for that? We can use interactive widgets to customize the queries using UI elements. This allows us to find out your top songs in any specific time range effortlessly: @interact
def top_songs(date_range=date_range_slider):
    df = songs_df.copy()

    time_range_start = pd.Timestamp(date_range[0])
    time_range_end = pd.Timestamp(date_range[1])

    df = df.loc[(df['date'] >= time_range_start.date())
                & (df['date'] <= time_range_end.date())]

    df = df.groupby(['spotify_track_uri']).size().reset_index().rename(columns={0: 'count'})
    df = df.sort_values(by=['count'], ascending=False).reset_index()

    df = df.merge(songs_df.drop_duplicates(subset='spotify_track_uri'))
    df = df[['master_metadata_track_name',
             'master_metadata_album_artist_name',
             'master_metadata_album_album_name',
             'count']]

    return df.head(20) Temporal and Weekday Distribution Now that we know our top songs, top artists, and top albums, we can go a little further. For example, exploring which days of the week we're most active on Spotify: def plot_weekday_distribution():
    df = songs_df.copy()

    df['year'] = df['ts'].dt.year
    df['weekday'] = df['ts'].dt.weekday

    df = df.groupby(['year', 'weekday']).size().reset_index(name='count')

    fig, ax = plt.subplots(figsize=(12, 8))

    for year, data in df.groupby('year'):
        ax.plot(data['weekday'], data['count'], label=str(year))

    weekdays_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    plt.xticks(range(7), weekdays_order)

    plt.title('Weekday Distribution of Played Tracks Over Years')
    plt.xlabel('Weekday')
    plt.ylabel('Number of Played Tracks')
    plt.legend(title='Year')

    plt.show() How to Do It Yourself Ready to dive into your own Spotify stats? Check out my to find all the code, including even more functions to explore your listening stats. GitHub repository Conclusion Creating your Spotify stats before the official release not only adds an element of fun but also provides insights into your unique listening habits. As we eagerly anticipate Spotify Wrapped, why not get a head start on your music analysis adventure? Get ready to groove into your personalized Spotify Wrapped experience!

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

Spotify Wrapped Hack: Create Your Own Stats Before the Official Release

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

How I Used Python and Folium to Visualize My Outdoor Activities

10 Ways to Optimize Your Database

10 Best React Native Chart Libraries

10 Best Datasets for Time Series Analysis

12 Mistakes that Data Scientists Make and How to Avoid Them

13 Best Datasets for Power BI Practice

How I Used Python and Folium to Visualize My Outdoor Activities

10 Ways to Optimize Your Database

10 Best React Native Chart Libraries

10 Best Datasets for Time Series Analysis

12 Mistakes that Data Scientists Make and How to Avoid Them

13 Best Datasets for Power BI Practice

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps