Finding the Next Best Songs with Machine Learning

Finding the Next Best Songs with Machine Learning

- 42 mins

Music is an important component of our daily lives. We dance, sing, enjoy, and cry simply because of music. But what music is right for any given moment? Spotify tries to use song, playlist, and other data to predict this.

In this article, I seek to demonstrate the steps needed to generate a preliminary model for predicting the next best song for a music playlist on Spotify. Any playlist! Even your own.

The process I guide you through in this article consists of numerous steps. Firstly, we collect 1000 Spotify playlists to predict the genre of. Then, we we select 5000 songs from the top 10 genres on Spotify. By knowing the metafeatures (such as “key”, “acousticness”, and “danceability”) of these songs and calculating the metafeatures of the playlists, we can predict, using these as independent variables in supervised machine learning, the genre of the playlists under examination. From there, we can use a gaussian mixture model to place a variety of songs under each genre in a high dimensional space. Thereafter, if we query where that playlist (relative to its metafeatures) is in the high dimensional space, we can recommend the songs closest in Euclidean distance within that genre to the user.

What I hope you’ll learn about by reading this article is the following:

While this is only a preliminary model to get you started, the results are promising and provide unique insight into a variety of machine learning methods for predictive inference.

See the GitHub repo for code here!

Getting Started: Data Collection and Cleaning

Spotify API

Just as a DJ tunes music to appeal to the audience, Spotify uses vast amounts of data and machine learning algorithms to seamlessly play the “next best” song once a user finishes the playlist they are listening to. Typically, this song is predicted using features such as the name of the playlist, song traits, and the similarity of preferences across users with corresponding taste (via methods like collaborative filtering).

With machine learning, comes data. While the Spotify API could provide us the information of any song we choose to query, we need to list of playlists to predict the next best songs for. The best approach to quickly get started is to download some playlist data then! So, we use the Million Playlist Dataset hosted by AICrowd (Spotify, 2020). This dataset was fortunately released by Spotify and is available for “non-commercial, open research use”. A million playlists, however, is a LOT of data. We instead take a small sample of 1000 playlists from this dataset to test our model.

Data Uploading

First, we can define some global variables for the Spotify API. I use the spotipy library to make things easier. Check out this article for setting up the API key (Tingle, 2019)!

# define global variables
SIZE = 500 # this defines how many playlists we want

# for Spotify API
cid = 'INSERT CID'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Once our data is downloaded (as JSONs), we can unpack the JSONs and find the URIS of the songs contained within each playlist. We define functions here to vary the size of unpacking without re-running all cells.

def unpack(json_name):
    unpack a json playlist file to obtain a playlist
    input: a file name
    return: SIZE list of playlists
    # Opening JSON file 
    f = open(json_name) 

    # returns JSON object as  
    # a dictionary 
    data = json.load(f) 
    playlists = data['playlists']
    return playlists

playlists = unpack(json_name='playlists.json')
def find_uris(playlists, start=0, SIZE=SIZE):
    ouput the uri list for a given playlist
    input: a playlist
    return: a list of all track uris, and the playlist uri
    track_uris = [[i['track_uri'] for i in playlists[j]['tracks']] for j in range(start, SIZE)]
    pids = [playlists[i]['pid'] for i in range(start, SIZE)]
    return track_uris, pids 

# check track_uris for first playlist in dataset
uri_list, pids = find_uris(playlists, start, SIZE)

The result of this is getting a list of the track URIs (the associated ID) and associated playlist ID (PID) for each song.

Selecting Variables

The Spotify API provides a lot of information on a given playlist, such as its name, the number of followers it has, how many tracks it contains, whether or not it is collaborative, and the tracks within it all have a variety of features. There is a lot of data to collect!

To avoid overload, when querying the API, we make sure to only query the necessary features for song prediction. In general, this means a variable that is measurable, independent, and is a float, integer, or binary value. It also means features that I believe will actually contribute to predicting the genres of playlists. Features like “liveness” which was a float that denotes the “background audience” of a song, telling us whether or not the track is performed live or not, I deemed not important for genre prediction and hence did not store them. However, aspects of songs like “danceability”, “speechiness”, and “valence”, were all extremely important. Overall, I recommend use of 9 features as variables for representing songs and thus playlists (descriptions from Spotify API):

Generating the Playlist DataFrame

What we do with these features is query the values for each song within each playlist. From there, we take the mean of the values to have a holistic feature score for each playlist. This, in essence, allows us to treat our playlist as a specific song that is a combination of all of the songs within the playlist.

We have a couple of functions we use to do this. Firstly, calculating the means:

def playlist_summarise(playlist_uri):
    where we query playlist uris with spotify API
    input: list of uris for a given playlist
    return: the mean features of the given playlist
    all_key = np.zeros(len(playlist_uri))
    all_acousticness = np.zeros(len(playlist_uri))
    all_danceability = np.zeros(len(playlist_uri))
    all_energy = np.zeros(len(playlist_uri))
    all_instrumentalness = np.zeros(len(playlist_uri))
    all_loudness = np.zeros(len(playlist_uri))
    all_speechiness = np.zeros(len(playlist_uri))
    all_valence = np.zeros(len(playlist_uri))
    all_tempo = np.zeros(len(playlist_uri))

    # unpack each uri
    for i in tqdm(range(len(playlist_uri))):
        # query spotify api
        audio_features = sp.audio_features(playlist_uri[i])
        all_key[i] = audio_features[0]['key']
        all_acousticness[i] = audio_features[0]['acousticness']
        all_danceability[i] = audio_features[0]['danceability']
        all_energy[i] = audio_features[0]['energy']
        all_instrumentalness[i] = audio_features[0]['instrumentalness']
        all_loudness[i] = audio_features[0]['loudness']
        all_speechiness[i] = audio_features[0]['speechiness']
        all_valence[i] = audio_features[0]['valence']
        all_tempo[i] = audio_features[0]['tempo']
    # calculate means
    key = np.mean(all_key)
    acousticness = np.mean(all_acousticness)
    danceability = np.mean(all_danceability)
    energy = np.mean(all_energy)
    instrumentalness = np.mean(all_instrumentalness)
    loudness = np.mean(all_loudness)
    speechiness = np.mean(all_speechiness)
    valence = np.mean(all_valence)
    tempo = np.mean(all_tempo)
    # return all means
    return [key, acousticness, danceability, energy,
            instrumentalness, loudness, speechiness,
            valence, tempo]

From there we need a function to normalise our values of our dataframe. We do this so that the machine learning models can best interpret the features with equal weighting and appropriate distribution. This is done using the MinMaxScaler function from sci-kit learn.

def normalize_df(df, col_names):
    x = df.values #returns a numpy array
    min_max_scaler = MinMaxScaler()
    x_scaled = min_max_scaler.fit_transform(x)
    df = pd.DataFrame(x_scaled, columns=col_names)
    return df

Finally, a function that calls the functions above to generate the pandas dataframe of all playlists we have calculated the mean features of:

def create_playlist_dataframe(playlists):
    summary function to allow ease of playlist transformation into a df
    input: SIZE list of playlists
    output: dataframe with all mean playlist features
    # find uris and playlist ids
    uri_list, pids = (playlists, start, SIZE)
    # set up dataframe
    col_names = ['pid', 'key', 'acousticness', 'danceability', 'energy', 'instrumentalness', 'loudness', 'speechiness', 'valence', 'tempo']
    df = pd.DataFrame(columns=col_names) # generate empty df
    # iterate through and get features for each playlist
    for i in range(SIZE):
        features = playlist_summarise(uri_list[i])
        features.insert(0, 0)
        df.loc[i] = features
    df = normalize_df(df, col_names)
    # insert ids
    df['pid'] = pids
    return df
playlist_df = create_playlist_dataframe(playlists)

The output of this function is something like the following:

An example dataframe of 500 playlists which the mean features were calculated for.

Because we do not use our playlist dataframe for the supervised training of our models, we do not need a large amount. The playlists we use as our test dataset in essence, labelling them as a specific type of genre depending on their features.

Generating the Song DataFrame

In order to recommend the next best song, a variety of songs must be chosen and their features and genre known. This is so that we can train our machine learning model on the songs, and then use that model to classify our playlists as a particular genre (the one that corresponds most with the mean of the playlist’s features).

For generating the song dataframe, we use the Spotify API to query the top songs from 10 different genres. I selected a variety of the most popular genres to get a diverse range of features and to increase the pool of songs I could sample from. The number 10 is chosen as it was enough genres to provide a good spread of song types without overcomplicating the classification problem.The genres selected are: ‘pop’, ‘hip-hop’, ‘edm’, ‘latin’, ‘rock’, ‘r-n-b’, ‘country’, ‘jazz’, ‘classical’, and ‘alternative’.

# convert genre to label encoded
genres = ['pop', 'hip-hop', 'edm', 'latin', 'rock',
          'r-n-b', 'country', 'jazz', 'classical',
# number of songs to query per genre
song_num = 100
n_requests = 20
# we generate using genre seeds
pop_uris = []
hip_hop_uris = []
edm_uris = []
latin_uris = []
rock_uris = []
randb_uris = []
country_uris = []
jazz_uris = []
classical_uris = []
alternative_uris = []

# query spotify api for each genre n_requests time
# this method bypasses the 100 limit on song queries
for i in range(n_requests):
    pop_recs = sp.recommendations(seed_genres=['pop'], limit=song_num)
    pop_uris += [i['uri'] for i in pop_recs['tracks']]
    hip_hop_recs = sp.recommendations(seed_genres=['hip-hop'], limit=song_num)
    hip_hop_uris += [i['uri'] for i in hip_hop_recs['tracks']]
    edm_recs = sp.recommendations(seed_genres=['edm'], limit=song_num)
    edm_uris += [i['uri'] for i in edm_recs['tracks']]
    latin_recs = sp.recommendations(seed_genres=['latin'], limit=song_num)
    latin_uris += [i['uri'] for i in latin_recs['tracks']]
    rock_recs = sp.recommendations(seed_genres=['rock'], limit=song_num)
    rock_uris += [i['uri'] for i in rock_recs['tracks']]
    randb_recs = sp.recommendations(seed_genres=['r-n-b'], limit=song_num)
    randb_uris += [i['uri'] for i in randb_recs['tracks']]
    country_recs = sp.recommendations(seed_genres=['country'], limit=song_num)
    country_uris += [i['uri'] for i in country_recs['tracks']]
    jazz_recs = sp.recommendations(seed_genres=['jazz'], limit=song_num)
    jazz_uris += [i['uri'] for i in jazz_recs['tracks']]
    classical_recs = sp.recommendations(seed_genres=['classical'], limit=song_num)
    classical_uris += [i['uri'] for i in classical_recs['tracks']]
    alternative_recs = sp.recommendations(seed_genres=['alternative'], limit=song_num)
    alternative_uris += [i['uri'] for i in alternative_recs['tracks']]

# turn into sets to remove duplicates
pop_uris = list(set(pop_uris))
hip_hop_uris = list(set(hip_hop_uris))
edm_uris = list(set(edm_uris))
latin_uris = list(set(latin_uris))
rock_uris = list(set(rock_uris))
randb_uris = list(set(randb_uris))
country_uris = list(set(country_uris))
jazz_uris = list(set(jazz_uris))
classical_uris = list(set(classical_uris))
alternative_uris = list(set(alternative_uris))

To overcome a constraint on how many times we can query the Spotify API, we build an algorithmic solution to allow querying in multiple batches. The problem with this however, is that the sampling has the potential to query the same song twice. Hence, using the set() function we filter out duplicates. In the code contained in the notebook, we make 20,000 song requests, and only 5,201 of those were valid due to duplicates. Once we gather all of the URIs for the songs in each genre we want to query, we can compile them and put them into a dataframe, each containing their features and the labelled genre.

# compile uris
song_uris = (pop_uris + hip_hop_uris + edm_uris + latin_uris + rock_uris + randb_uris + country_uris + jazz_uris + classical_uris + alternative_uris)
# create a list for labels
genre_list = ((['pop'] * len(pop_uris)) + (['hip-hop'] * len(hip_hop_uris)) + 
              (['edm'] * len(edm_uris)) + (['latin'] * len(latin_uris)) +
              (['rock'] * len(rock_uris)) + (['r-n-b'] * len(randb_uris)) +
              (['country'] * len(country_uris)) + (['jazz'] * len(jazz_uris)) + 
              (['classical'] * len(classical_uris)) + (['alternative'] * len(alternative_uris)))

def create_song_dataframe(song_uris):
    combine all song URIS into a df
    input: song uris
    output: dataframe with all song features
    # set up dataframe
    col_names = ['uri', 'genre', 'key', 'acousticness', 'danceability', 'energy', 'instrumentalness', 'loudness', 'speechiness', 'valence', 'tempo']
    df = pd.DataFrame(columns=col_names) # generate empty df
    # iterate through and get features for each playlist
    for i in tqdm(range(len(song_uris))):
        # get song features
        audio_features = sp.audio_features(song_uris[i])
        key = audio_features[0]['key']
        acousticness = audio_features[0]['acousticness']
        danceability = audio_features[0]['danceability']
        energy = audio_features[0]['energy']
        instrumentalness = audio_features[0]['instrumentalness']
        loudness = audio_features[0]['loudness']
        speechiness = audio_features[0]['speechiness']
        valence = audio_features[0]['valence']
        tempo = audio_features[0]['tempo']
        features = [key, acousticness, danceability, energy, instrumentalness, loudness, speechiness, valence, tempo]
        features.insert(0, 0)
        features.insert(0, 0)
        df.loc[i] = features
    df = normalize_df(df, col_names)
    # insert uris and genres
    df['uri'] = song_uris
    df['genre'] = genre_list
    return df
song_df = create_song_dataframe(song_uris)

We can label encode these genres for use in machine learning models as well:

# label encode
for i in range(len(genres)):
    song_df['genre'] = np.where(song_df['genre'] == genres[i], i, song_df['genre'])

From there we get a dataframe such as the following:

The labelled song dataframe. We use this data to train our model and test the accuracy (once train_test_split is used).

Supervised Machine Learning: Classifying Playlists

In this section, we go through the process predicting the genre of a playlist using supervised learning methods. Here I demonstrate the results of K-Nearest-Neighbours (KNN) and a neural network. In the notebook I also test a Bayesian logistic regression approach using PyStan.

Now that the data is in a usable format, we build machine learning models to train on the genre labels of our song dataframe and predict the genre labels of our playlist dataframe. We can also measure the accuracy of our models by using a train-test-split approach.

# set up unlabelled song_df (but with index for reference)
X = song_df.drop(columns=['uri', 'genre']).values
y = song_df[['genre']].values.ravel()
# train test split for model testing
X_train, X_test, y_train, y_test = train_test_split(X, list(y), test_size=0.05, random_state=2)


The KNN model provides a strong baseline for our future models. The assumption of KNN is that similar data points exist in close proximity in a space. This falls perfectly in-line with my hypothesis. The value of K is an indicator for a specific number of samples that the algorithm should classify as groups. The most frequent label (genre) within these groups will be the label for all of them. Hence, to predict using KNN we simply put a playlist in that space with classified groups, and whichever group it falls into is the label it receives.

Warning: we have to be careful because KNN can suffer from the curse of dimensionality, which occurs when considering too many features. However, in this case I only have 9, which means the model likely doesn’t suffer from the curse of dimensionality.

For finding the best value of K, we build an algorithm that iterates through values of K ranging from 1 to 50, and uses 10-Fold cross-validation to find the average accuracy for each K. By doing so, we can find the best value of K and validate that value using our cross-validation, such that we are not overfitting to a single training dataset.

#increment k from 1 to 50 and save the testing accuracy to find best k
k_range = range(1, 50)
scores_list = []

# test across values of k
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors=k)
    kf = KFold(n_splits=10, shuffle=True)
    score_acc_list = []
    # implement k folding (10)
    for train_index, test_index in kf.split(X):
        X_train, X_test = list(X[train_index]), list(X[test_index])
        y_train, y_test = list(y[train_index]), list(y[test_index]), y_train)
        y_pred = knn.predict(X_test)
        score_acc_list.append(accuracy_score(y_test, y_pred))

The following is a plot of the results:

Testing different values of K using k-fold cross validation.

We then select the optimal value of K, fit our model using that value, and then predict on our X_test.

# k somewhere near 40 is best
knn = KNeighborsClassifier(n_neighbors=40), y_train)
y_pred = knn.predict(X_test)

We can test our predictions using a confusion matrix and outputting the accuracy score using the actual labels of our predictions.

## using sklearn functions
#Create the confusion matrix using test data and predictions
cm = confusion_matrix(y_test, y_pred)
#plot the confusion matrix
plt.figure(figsize=(14, 12))
ax = plt.subplot()
labels = song_df['genre'].tolist()
ax.set_xlabel('Predicted labels')
ax.set_ylabel('True labels')
ax.set_title('Confusion Matrix')
#Show the accuracy score 
print("Accuracy Score", accuracy_score(y_test, y_pred))

Our confusion matrix for the KNN. Output accuracy score: 0.4326923076923077.

Neural Network

We can do the same process as above but with a neural network and test our accuracy to see which model we should use.

We build an 8-layer neural network with maximum width of 96 neurons. The goal for implementing this neural network is to achieve a higher classification accuracy than my KNN model. Using Keras features, we implement some unique tuning to optimize the classification accuracy of the neural network.

# using keras create NN
def classification_model():
    #Create the model
    model = Sequential()
    #Add 1 layer with 12 nodes, input of 9 dim with relu function
    model.add(Dense(12, input_dim=9, activation='relu', name='Dense_1'))
    model.add(Dropout(0.1, input_shape=(12,), name='Dropout_1'))
    # Add another layer
    model.add(Dense(24, input_dim=12, activation='relu', name='Dense_2'))
    # dropout layers lets us prevent overfitting
    model.add(Dropout(0.1, input_shape=(24,), name='Dropout_2'))
    # Add another layer
    model.add(Dense(48, input_dim=24, activation='relu', name='Dense_3'))
    # add tanh layer for sigmoid classification if i want to output embeddings
    model.add(Dense(96, input_dim=48, activation='tanh', name='Dense_4'))
    model.add(Dense(10, input_dim=96, activation='softmax', name='Output_Layer'))
    # Compile the model using cat cross ent loss function and adam optimizer with learning rate, 
    # accuracy correspond to the metric displayed
    opt = Adam(learning_rate=0.02)
    loss = CategoricalCrossentropy(label_smoothing=0.2)
    model.compile(loss=loss, optimizer=opt, metrics=['accuracy'])
    return model

# define model
classifier = KerasClassifier(build_fn=classification_model, epochs=3000, batch_size=300, verbose=0)
# implement early stopping to prevent epoch maximisation which may lead to overfitting
es = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=400)
history =, y_train, validation_split = 0.05, callbacks=[es])

Firstly, we use ‘Dropout’ layers (Keras, 2020) to prevent overfitting. This applies to the training process of my neural network and randomly sets input units to 0 with a frequency of 0.1 at each step during training time. This acts as a form of regularization to temporarily remove neurons from the forward pass and not update weights on the back propagation, making the model less sensitive to specific neuron weights and more generalizable (Brownlee, Dropout Regularization in Deep Learning Models With Keras, 2016). The ReLU (Rectified Linear Unit) activation layers act as the default neurons in my neural network. The function “is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero” (Brownlee, A Gentle Introduction to the Rectified Linear Unit (ReLU), 2019).

For concision purposes, I will not go into the specific details for the rest of the model other than briefly mentioning the other features implemented. A learning rate for the Adam optimizer allows control over how quickly the model is adapted to the problem. Label smoothing allows us to make the model less overconfident in its predictions. This regularization method allows the model to not “overclassify” a playlist — but rather restrains the largest logit from becoming much bigger than the rest. This allows the model to think about different genres and a combination of genres, rather than being overconfident towards one. Finally, an early stopping method prevents the model overfitting by running too many epochs.

We measure the accuracy and loss of the model using the following graphs and they help consider the overfitting/underfitting of the model to balance the bias-variance trade-off by tuning hyperparameters (train model loss should never go to 0, otherwise we are overfitting).

Model accuracy graph over 600 epochs.

Model loss graph over 600 epochs.

We then test the model and output the confusion matrix and accuracy score as with the KNN.

#Train the model with the train data, y_train)
#Predict the model with the test data
y_pred = classifier.predict(X_test)

Our confusion matrix for the neural network. Output accuracy score: 0.48659003831417624. This is better than the KNN.

From the results we can see that the neural network performs better at classifying the test songs correctly. We now implement this method for our playlists.

Predicting Genres of Playlists

Using the neural network, we can predict the genres of our unseen playlists that we generated before.

# set up unlabelled dfs
playlist_df_ul = playlist_df.drop(columns=['pid'])
song_df_ul = song_df.drop(columns=['uri', 'genre'])
# knn predictions
nn_classes = classifier.predict(playlist_df_ul.values)

Our confusion matrix for the neural network. Output accuracy score: 0.48659003831417624. This is better than the KNN.

Unsupervised Machine Learning: Finding the Most Relevant Songs

In this section, we use an unsupervised clustering method, a Gaussian Mixture Model(GMM), to find the songs with the closest Euclidean distance in a high dimensional space to a playlist classified within the genre of songs. These songs are hypothesized to be the likely “next best”. While we cannot visualise a high dimensional space to represent all the features of a song at once, we can still think about the “similarity” of a playlist and song as just the Euclidean distance between the two in this space.

We use a GMM because it can successfully find a probabilistic representation of a playlist in a range of clusters, each, in theory, containing similar songs from the specific genre the playlist has been classified to. The advantage of using a GMM instead of K-means clustering, both of which are generally easy-to-apply unsupervised models, is that our GMM can handle non-circular clusters of data, as we have specified using the “full” covariance type. The second advantage is that a GMM performs soft-clustering, telling us the probabilities that a given playlist belongs to each of the possible clusters. This is useful for finding songs that are similar but outside of the cluster assigned (which may be necessary if the songs in the cluster a playlist is defined to run out).

Mathematically, we can write the likelihood that any given sample came from a Gaussian 𝑘 in our GMM as

where 𝜃 represents the parameters of our Gaussian (mean, covariance, weight).

Similarly, we can write the likelihood of observing a data point given that it came from our Gaussian 𝑘 as

The representation of a normally distributed likelihood. Image by Author.

To take into account all possible distributions, we can simply use the sum rule, and marginalise over all other samples under the assumption that they are independent of one another (Maklin, 2019).

We use the log likelihood here because the logarithm of a product is the sum of the logarithms.

In order to calculate the parameters of our Gaussians, we use the Expectation Maximisation algorithm, which helps us find the local maximum likelihood estimates of our parameters. To summarise this process, iteratively the EM algorithm performs an expectation (E) step, “which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step” (Wikipedia, 2020). See here for more details on GMMs.

We write overarching functions to implement the neural network and then use the GMM to find the probability of a playlist belonging to a certain group of songs. The songs nearest in Euclidean distance in this high dimensional space represented by our different features are the most similar and thus the best to recommend next. We can even select a new playlist by URI on Spotify, calculate the mean features, and then recommend songs for that too!

def predict_song(playlist_index, uri_label, own_playlist):
    # if uri is provided
    if own_playlist == True:
        playlist_uris = [i['track']['uri'] for i in sp.playlist(uri_label)['tracks']['items']]
        features = np.array(playlist_summarise(playlist_uris))
        playlist_name = sp.playlist(uri_label)['name']
        print(f'Name of playlist: {playlist_name}')
        playlist_prediction = playlist_prediction = classifier.predict(playlist.reshape(1, 9))
        print(f'The playlist is genre: {genres[playlist_prediction[0]]}')
    # if querying playlist from dataset
        print(f"Name of playlist: {playlists[playlist_index]['name']}")
        features = playlist_df_ul.values[playlist_index]
        playlist_prediction = playlist_prediction = classifier.predict(playlist.reshape(1, 9))
        print(f'The playlist is genre: {genres[playlist_prediction[0]]}')
    # generate songs of specific genre
    genre_songs = song_df.loc[song_df['genre'] == playlist_prediction[0]]
    genre_songs = genre_songs.drop(columns = ['genre']).reset_index(drop=True)
    # so we take all genre songs we have and gaussian process 
    # fit a Gaussian Mixture Model
    clf = mixture.GaussianMixture(n_components=(len(genre_songs))//n_requests, covariance_type='full', random_state=0) = ['uri']).values)
    # predict classes using GMM
    classes = clf.predict(genre_songs.drop(columns = ['uri']).values)
    # recommend top x songs
    most_recommended_songs = clf.predict_proba(features.reshape(1,-1))[0]
    # print(most_recommended_songs)
    max_index, max_value = max(enumerate(most_recommended_songs), key=operator.itemgetter(1))
    # take the songs
    songs_index = np.where(classes == max_index)
    selected_songs = genre_songs.loc[songs_index]
    selected_songs_uris = selected_songs['uri'].values
    # make sure songs aren't already in playlist
    if own_playlist == False:
        playlist_uris, pid = find_uris(playlists, start=playlist_index-1, SIZE=playlist_index)
        playlist_uris = playlist_uris[0]
    # remove overlapping songs
    for element in playlist_uris:
        if element in selected_songs_uris:
    print('The recommended songs, in no particular order, are:')
    counter = 0
    for i in selected_songs_uris:
        counter +=1
        print(f"{sp.track(i)['name']}, by {sp.track(i)['artists'][0]['name']}")
        if counter == 20:
# using NN

Here we classify the playlist called pump as hip-hop, and then recommend the most similar hip-hop songs!

Pretty cool, right?

Concluding Thoughts

The most constraining factor on our predictions, I believe, is the assumption that the mean of all features in a playlist is an accurate representation of the genre of a playlist. Many playlists are not created as “genres” to begin with. For example, for the playlist called “pump”, should this be hip-hop, pop, rock, or EDM? While a playlist might be the sum of its songs, taking the mean of all features has the capacity to be affected by outliers, and we simply aren’t using enough classes to get closer to the true genre of a given playlist. As a future task and potential improvement, it may be worth taking the median of features instead, as this is less prone to being affected by anomalous songs in playlists. Despite this, songs are diverse. A song can hardly be classified to a single genre, and in order to do so, Spotify now has over 5,000 genres (Davison, 2020). Hence, given I was only querying from 10 genres for my songs which I used to train my models, it is likely that we poorly classify playlists. Also, our sample size used to train our models is relatively small. Perhaps we could make better predictions by using a larger sample size. This would require more computational power, however.

Some improvements beyond testing the median as a better measure of a playlist’s features, could be to use a sentiment analysis approach on the name of the playlist as well. If I could rank the name “pump” on a scale of 0 to 1 in terms of low to high energy, for example, then we would have another feature to predict on. I could also change methods entirely. As seen in past challenges with Spotify datasets (Hamed Zamani, 2019), most high-performing teams use collaborative filtering where they “create an incomplete playlist-track matrix and use matrix factorization to learn a low-dimensional dense representation for each playlist and track. They learn similar representations for the tracks that often occur together in user-created playlists.”

Despite this, we have successfully ran through the process of recommending songs for a playlist on Spotify using supervised and unsupervised machine learning methods. Congratulations on getting to the end of this tutorial!


Brownlee, J. (2016). Dropout Regularization in Deep Learning Models With Keras. Retrieved from Machine Learning Mastery:

Brownlee, J. (2019). A Gentle Introduction to the Rectified Linear Unit (ReLU). Retrieved from Machine Learning Mastery:

Davison, C. (2020). Spotify Users Are Noticing Something Very Strange About Their Top Genres. Retrieved from PureWow:


Hamed Zamani, M. S. (2019). An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation. Retrieved from ACM Digital Library:

Keras. (2020). Retrieved from Drouput layer:

Maklin, C. (2019). Gaussian Mixture Models Clustering Algorithm Explained. Retrieved from Medium:,of%20the%2%200bell%20shape%20curve

Sean M. O’Brien, D. B. (n.d.). Bayesian Multivariate Logistic Regression. Retrieved from Duke Statistics:

Spotify. (2020). Explore. Retrieved from Spotify For Developers:

Spotify. (2020). Spotify Million Playlist Dataset Challenge. Retrieved from AIcrowd: Licensing: “The dataset and challenge will be available on an ongoing, open-ended basis, and allow for non-commercial, open research use. We hope that this re-release will enable further research and improvements in the field of music recommendation and automatic playlist continuation.”

Stan. (2020). Multi-Logit Regression. Retrieved from Stan User’s Guide:

Tingle, M. (2019). Retrieved from

Wikipedia. (2020). Expectation–maximization algorithm. Retrieved from Wikipedia:

Alexander Bricken

Alexander Bricken

Travelling the world.

virtual resume facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora quora personal page