Finding the Next Best Songs with Machine Learning
- 42 minsMusic is an important component of our daily lives. We dance, sing, enjoy, and cry simply because of music. But what music is right for any given moment? Spotify tries to use song, playlist, and other data to predict this.
In this article, I seek to demonstrate the steps needed to generate a preliminary model for predicting the next best song for a music playlist on Spotify. Any playlist! Even your own.
The process I guide you through in this article consists of numerous steps. Firstly, we collect 1000 Spotify playlists to predict the genre of. Then, we we select 5000 songs from the top 10 genres on Spotify. By knowing the metafeatures (such as “key”, “acousticness”, and “danceability”) of these songs and calculating the metafeatures of the playlists, we can predict, using these as independent variables in supervised machine learning, the genre of the playlists under examination. From there, we can use a gaussian mixture model to place a variety of songs under each genre in a high dimensional space. Thereafter, if we query where that playlist (relative to its metafeatures) is in the high dimensional space, we can recommend the songs closest in Euclidean distance within that genre to the user.
What I hope you’ll learn about by reading this article is the following:
- How to use the Spotify API and pre-process data.
- Supervised methods such as K-nearest neighbours and neural networks.
- Unsupervised methods such as Gaussian Mixture Models.
- A novel way to think about recommendation systems as clustering in high dimensional spaces.
While this is only a preliminary model to get you started, the results are promising and provide unique insight into a variety of machine learning methods for predictive inference.
See the GitHub repo for code here!
Getting Started: Data Collection and Cleaning
Spotify API
Just as a DJ tunes music to appeal to the audience, Spotify uses vast amounts of data and machine learning algorithms to seamlessly play the “next best” song once a user finishes the playlist they are listening to. Typically, this song is predicted using features such as the name of the playlist, song traits, and the similarity of preferences across users with corresponding taste (via methods like collaborative filtering).
With machine learning, comes data. While the Spotify API could provide us the information of any song we choose to query, we need to list of playlists to predict the next best songs for. The best approach to quickly get started is to download some playlist data then! So, we use the Million Playlist Dataset hosted by AICrowd (Spotify, 2020). This dataset was fortunately released by Spotify and is available for “non-commercial, open research use”. A million playlists, however, is a LOT of data. We instead take a small sample of 1000 playlists from this dataset to test our model.
Data Uploading
First, we can define some global variables for the Spotify API. I use the spotipy library to make things easier. Check out this article for setting up the API key (Tingle, 2019)!
# define global variables
SIZE = 500 # this defines how many playlists we want
# for Spotify API
cid = 'INSERT CID'
secret = 'INSERT SECRET KEY'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
Once our data is downloaded (as JSONs), we can unpack the JSONs and find the URIS of the songs contained within each playlist. We define functions here to vary the size of unpacking without re-running all cells.
def unpack(json_name):
'''
unpack a json playlist file to obtain a playlist
input: a file name
return: SIZE list of playlists
'''
# Opening JSON file
f = open(json_name)
# returns JSON object as
# a dictionary
data = json.load(f)
f.close()
playlists = data['playlists']
return playlists
playlists = unpack(json_name='playlists.json')
def find_uris(playlists, start=0, SIZE=SIZE):
'''
ouput the uri list for a given playlist
input: a playlist
return: a list of all track uris, and the playlist uri
'''
track_uris = [[i['track_uri'] for i in playlists[j]['tracks']] for j in range(start, SIZE)]
pids = [playlists[i]['pid'] for i in range(start, SIZE)]
return track_uris, pids
# check track_uris for first playlist in dataset
start=0
uri_list, pids = find_uris(playlists, start, SIZE)
The result of this is getting a list of the track URIs (the associated ID) and associated playlist ID (PID) for each song.
Selecting Variables
The Spotify API provides a lot of information on a given playlist, such as its name, the number of followers it has, how many tracks it contains, whether or not it is collaborative, and the tracks within it all have a variety of features. There is a lot of data to collect!
To avoid overload, when querying the API, we make sure to only query the necessary features for song prediction. In general, this means a variable that is measurable, independent, and is a float, integer, or binary value. It also means features that I believe will actually contribute to predicting the genres of playlists. Features like “liveness” which was a float that denotes the “background audience” of a song, telling us whether or not the track is performed live or not, I deemed not important for genre prediction and hence did not store them. However, aspects of songs like “danceability”, “speechiness”, and “valence”, were all extremely important. Overall, I recommend use of 9 features as variables for representing songs and thus playlists (descriptions from Spotify API):
-
key (int): The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
-
acousticness (float): A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
-
danceability (float): Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
-
energy (float): Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
-
instrumentalness (float): Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
-
loudness (float): The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
-
speechiness (float): Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
-
valence (float): A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
-
tempo (float): The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
Generating the Playlist DataFrame
What we do with these features is query the values for each song within each playlist. From there, we take the mean of the values to have a holistic feature score for each playlist. This, in essence, allows us to treat our playlist as a specific song that is a combination of all of the songs within the playlist.
We have a couple of functions we use to do this. Firstly, calculating the means:
def playlist_summarise(playlist_uri):
'''
where we query playlist uris with spotify API
input: list of uris for a given playlist
return: the mean features of the given playlist
'''
all_key = np.zeros(len(playlist_uri))
all_acousticness = np.zeros(len(playlist_uri))
all_danceability = np.zeros(len(playlist_uri))
all_energy = np.zeros(len(playlist_uri))
all_instrumentalness = np.zeros(len(playlist_uri))
all_loudness = np.zeros(len(playlist_uri))
all_speechiness = np.zeros(len(playlist_uri))
all_valence = np.zeros(len(playlist_uri))
all_tempo = np.zeros(len(playlist_uri))
# unpack each uri
for i in tqdm(range(len(playlist_uri))):
# query spotify api
audio_features = sp.audio_features(playlist_uri[i])
all_key[i] = audio_features[0]['key']
all_acousticness[i] = audio_features[0]['acousticness']
all_danceability[i] = audio_features[0]['danceability']
all_energy[i] = audio_features[0]['energy']
all_instrumentalness[i] = audio_features[0]['instrumentalness']
all_loudness[i] = audio_features[0]['loudness']
all_speechiness[i] = audio_features[0]['speechiness']
all_valence[i] = audio_features[0]['valence']
all_tempo[i] = audio_features[0]['tempo']
# calculate means
key = np.mean(all_key)
acousticness = np.mean(all_acousticness)
danceability = np.mean(all_danceability)
energy = np.mean(all_energy)
instrumentalness = np.mean(all_instrumentalness)
loudness = np.mean(all_loudness)
speechiness = np.mean(all_speechiness)
valence = np.mean(all_valence)
tempo = np.mean(all_tempo)
# return all means
return [key, acousticness, danceability, energy,
instrumentalness, loudness, speechiness,
valence, tempo]
From there we need a function to normalise our values of our dataframe. We do this so that the machine learning models can best interpret the features with equal weighting and appropriate distribution. This is done using the MinMaxScaler function from sci-kit learn.
def normalize_df(df, col_names):
x = df.values #returns a numpy array
min_max_scaler = MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled, columns=col_names)
return df
Finally, a function that calls the functions above to generate the pandas dataframe of all playlists we have calculated the mean features of:
def create_playlist_dataframe(playlists):
'''
summary function to allow ease of playlist transformation into a df
input: SIZE list of playlists
output: dataframe with all mean playlist features
'''
# find uris and playlist ids
uri_list, pids = (playlists, start, SIZE)
# set up dataframe
col_names = ['pid', 'key', 'acousticness', 'danceability', 'energy', 'instrumentalness', 'loudness', 'speechiness', 'valence', 'tempo']
df = pd.DataFrame(columns=col_names) # generate empty df
# iterate through and get features for each playlist
for i in range(SIZE):
features = playlist_summarise(uri_list[i])
features.insert(0, 0)
df.loc[i] = features
df = normalize_df(df, col_names)
# insert ids
df['pid'] = pids
return df
playlist_df = create_playlist_dataframe(playlists)
playlist_df
The output of this function is something like the following:
Because we do not use our playlist dataframe for the supervised training of our models, we do not need a large amount. The playlists we use as our test dataset in essence, labelling them as a specific type of genre depending on their features.
Generating the Song DataFrame
In order to recommend the next best song, a variety of songs must be chosen and their features and genre known. This is so that we can train our machine learning model on the songs, and then use that model to classify our playlists as a particular genre (the one that corresponds most with the mean of the playlist’s features).
For generating the song dataframe, we use the Spotify API to query the top songs from 10 different genres. I selected a variety of the most popular genres to get a diverse range of features and to increase the pool of songs I could sample from. The number 10 is chosen as it was enough genres to provide a good spread of song types without overcomplicating the classification problem.The genres selected are: ‘pop’, ‘hip-hop’, ‘edm’, ‘latin’, ‘rock’, ‘r-n-b’, ‘country’, ‘jazz’, ‘classical’, and ‘alternative’.
# convert genre to label encoded
genres = ['pop', 'hip-hop', 'edm', 'latin', 'rock',
'r-n-b', 'country', 'jazz', 'classical',
'alternative']
# number of songs to query per genre
song_num = 100
n_requests = 20
# we generate using genre seeds
pop_uris = []
hip_hop_uris = []
edm_uris = []
latin_uris = []
rock_uris = []
randb_uris = []
country_uris = []
jazz_uris = []
classical_uris = []
alternative_uris = []
# query spotify api for each genre n_requests time
# this method bypasses the 100 limit on song queries
for i in range(n_requests):
pop_recs = sp.recommendations(seed_genres=['pop'], limit=song_num)
pop_uris += [i['uri'] for i in pop_recs['tracks']]
hip_hop_recs = sp.recommendations(seed_genres=['hip-hop'], limit=song_num)
hip_hop_uris += [i['uri'] for i in hip_hop_recs['tracks']]
edm_recs = sp.recommendations(seed_genres=['edm'], limit=song_num)
edm_uris += [i['uri'] for i in edm_recs['tracks']]
latin_recs = sp.recommendations(seed_genres=['latin'], limit=song_num)
latin_uris += [i['uri'] for i in latin_recs['tracks']]
rock_recs = sp.recommendations(seed_genres=['rock'], limit=song_num)
rock_uris += [i['uri'] for i in rock_recs['tracks']]
randb_recs = sp.recommendations(seed_genres=['r-n-b'], limit=song_num)
randb_uris += [i['uri'] for i in randb_recs['tracks']]
country_recs = sp.recommendations(seed_genres=['country'], limit=song_num)
country_uris += [i['uri'] for i in country_recs['tracks']]
jazz_recs = sp.recommendations(seed_genres=['jazz'], limit=song_num)
jazz_uris += [i['uri'] for i in jazz_recs['tracks']]
classical_recs = sp.recommendations(seed_genres=['classical'], limit=song_num)
classical_uris += [i['uri'] for i in classical_recs['tracks']]
alternative_recs = sp.recommendations(seed_genres=['alternative'], limit=song_num)
alternative_uris += [i['uri'] for i in alternative_recs['tracks']]
# turn into sets to remove duplicates
pop_uris = list(set(pop_uris))
hip_hop_uris = list(set(hip_hop_uris))
edm_uris = list(set(edm_uris))
latin_uris = list(set(latin_uris))
rock_uris = list(set(rock_uris))
randb_uris = list(set(randb_uris))
country_uris = list(set(country_uris))
jazz_uris = list(set(jazz_uris))
classical_uris = list(set(classical_uris))
alternative_uris = list(set(alternative_uris))
To overcome a constraint on how many times we can query the Spotify API, we build an algorithmic solution to allow querying in multiple batches. The problem with this however, is that the sampling has the potential to query the same song twice. Hence, using the set() function we filter out duplicates. In the code contained in the notebook, we make 20,000 song requests, and only 5,201 of those were valid due to duplicates. Once we gather all of the URIs for the songs in each genre we want to query, we can compile them and put them into a dataframe, each containing their features and the labelled genre.
# compile uris
song_uris = (pop_uris + hip_hop_uris + edm_uris + latin_uris + rock_uris + randb_uris + country_uris + jazz_uris + classical_uris + alternative_uris)
# create a list for labels
genre_list = ((['pop'] * len(pop_uris)) + (['hip-hop'] * len(hip_hop_uris)) +
(['edm'] * len(edm_uris)) + (['latin'] * len(latin_uris)) +
(['rock'] * len(rock_uris)) + (['r-n-b'] * len(randb_uris)) +
(['country'] * len(country_uris)) + (['jazz'] * len(jazz_uris)) +
(['classical'] * len(classical_uris)) + (['alternative'] * len(alternative_uris)))
def create_song_dataframe(song_uris):
'''
combine all song URIS into a df
input: song uris
output: dataframe with all song features
'''
# set up dataframe
col_names = ['uri', 'genre', 'key', 'acousticness', 'danceability', 'energy', 'instrumentalness', 'loudness', 'speechiness', 'valence', 'tempo']
df = pd.DataFrame(columns=col_names) # generate empty df
# iterate through and get features for each playlist
for i in tqdm(range(len(song_uris))):
# get song features
audio_features = sp.audio_features(song_uris[i])
key = audio_features[0]['key']
acousticness = audio_features[0]['acousticness']
danceability = audio_features[0]['danceability']
energy = audio_features[0]['energy']
instrumentalness = audio_features[0]['instrumentalness']
loudness = audio_features[0]['loudness']
speechiness = audio_features[0]['speechiness']
valence = audio_features[0]['valence']
tempo = audio_features[0]['tempo']
features = [key, acousticness, danceability, energy, instrumentalness, loudness, speechiness, valence, tempo]
features.insert(0, 0)
features.insert(0, 0)
df.loc[i] = features
df = normalize_df(df, col_names)
# insert uris and genres
df['uri'] = song_uris
df['genre'] = genre_list
return df
song_df = create_song_dataframe(song_uris)
We can label encode these genres for use in machine learning models as well:
# label encode
for i in range(len(genres)):
song_df['genre'] = np.where(song_df['genre'] == genres[i], i, song_df['genre'])
song_df
From there we get a dataframe such as the following:
Supervised Machine Learning: Classifying Playlists
In this section, we go through the process predicting the genre of a playlist using supervised learning methods. Here I demonstrate the results of K-Nearest-Neighbours (KNN) and a neural network. In the notebook I also test a Bayesian logistic regression approach using PyStan.
Now that the data is in a usable format, we build machine learning models to train on the genre labels of our song dataframe and predict the genre labels of our playlist dataframe. We can also measure the accuracy of our models by using a train-test-split approach.
# set up unlabelled song_df (but with index for reference)
X = song_df.drop(columns=['uri', 'genre']).values
y = song_df[['genre']].values.ravel()
# train test split for model testing
X_train, X_test, y_train, y_test = train_test_split(X, list(y), test_size=0.05, random_state=2)
K-Nearest-Neighbours
The KNN model provides a strong baseline for our future models. The assumption of KNN is that similar data points exist in close proximity in a space. This falls perfectly in-line with my hypothesis. The value of K is an indicator for a specific number of samples that the algorithm should classify as groups. The most frequent label (genre) within these groups will be the label for all of them. Hence, to predict using KNN we simply put a playlist in that space with classified groups, and whichever group it falls into is the label it receives.
Warning: we have to be careful because KNN can suffer from the curse of dimensionality, which occurs when considering too many features. However, in this case I only have 9, which means the model likely doesn’t suffer from the curse of dimensionality.
For finding the best value of K, we build an algorithm that iterates through values of K ranging from 1 to 50, and uses 10-Fold cross-validation to find the average accuracy for each K. By doing so, we can find the best value of K and validate that value using our cross-validation, such that we are not overfitting to a single training dataset.
#increment k from 1 to 50 and save the testing accuracy to find best k
k_range = range(1, 50)
scores_list = []
# test across values of k
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
kf = KFold(n_splits=10, shuffle=True)
score_acc_list = []
# implement k folding (10)
for train_index, test_index in kf.split(X):
X_train, X_test = list(X[train_index]), list(X[test_index])
y_train, y_test = list(y[train_index]), list(y[test_index])
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
score_acc_list.append(accuracy_score(y_test, y_pred))
scores_list.append(np.mean(score_acc_list))
The following is a plot of the results:
We then select the optimal value of K, fit our model using that value, and then predict on our X_test
.
# k somewhere near 40 is best
knn = KNeighborsClassifier(n_neighbors=40)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
We can test our predictions using a confusion matrix and outputting the accuracy score using the actual labels of our predictions.
## using sklearn functions
#Create the confusion matrix using test data and predictions
cm = confusion_matrix(y_test, y_pred)
#plot the confusion matrix
plt.figure(figsize=(14, 12))
ax = plt.subplot()
sns.heatmap(cm,annot=True,ax=ax)
labels = song_df['genre'].tolist()
ax.set_xlabel('Predicted labels')
ax.set_ylabel('True labels')
ax.set_title('Confusion Matrix')
ax.xaxis.set_ticklabels(genres)
ax.yaxis.set_ticklabels(genres)
plt.show()
#Show the accuracy score
print("Accuracy Score", accuracy_score(y_test, y_pred))
Neural Network
We can do the same process as above but with a neural network and test our accuracy to see which model we should use.
We build an 8-layer neural network with maximum width of 96 neurons. The goal for implementing this neural network is to achieve a higher classification accuracy than my KNN model. Using Keras features, we implement some unique tuning to optimize the classification accuracy of the neural network.
# using keras create NN
def classification_model():
#Create the model
model = Sequential()
#Add 1 layer with 12 nodes, input of 9 dim with relu function
model.add(Dense(12, input_dim=9, activation='relu', name='Dense_1'))
model.add(Dropout(0.1, input_shape=(12,), name='Dropout_1'))
# Add another layer
model.add(Dense(24, input_dim=12, activation='relu', name='Dense_2'))
# dropout layers lets us prevent overfitting
model.add(Dropout(0.1, input_shape=(24,), name='Dropout_2'))
# Add another layer
model.add(Dense(48, input_dim=24, activation='relu', name='Dense_3'))
# add tanh layer for sigmoid classification if i want to output embeddings
model.add(Dense(96, input_dim=48, activation='tanh', name='Dense_4'))
model.add(Dense(10, input_dim=96, activation='softmax', name='Output_Layer'))
# Compile the model using cat cross ent loss function and adam optimizer with learning rate,
# accuracy correspond to the metric displayed
opt = Adam(learning_rate=0.02)
loss = CategoricalCrossentropy(label_smoothing=0.2)
model.compile(loss=loss, optimizer=opt, metrics=['accuracy'])
return model
# define model
classifier = KerasClassifier(build_fn=classification_model, epochs=3000, batch_size=300, verbose=0)
# implement early stopping to prevent epoch maximisation which may lead to overfitting
es = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=400)
history = classifier.fit(X_train, y_train, validation_split = 0.05, callbacks=[es])
Firstly, we use ‘Dropout’ layers (Keras, 2020) to prevent overfitting. This applies to the training process of my neural network and randomly sets input units to 0 with a frequency of 0.1 at each step during training time. This acts as a form of regularization to temporarily remove neurons from the forward pass and not update weights on the back propagation, making the model less sensitive to specific neuron weights and more generalizable (Brownlee, Dropout Regularization in Deep Learning Models With Keras, 2016). The ReLU (Rectified Linear Unit) activation layers act as the default neurons in my neural network. The function “is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero” (Brownlee, A Gentle Introduction to the Rectified Linear Unit (ReLU), 2019).
For concision purposes, I will not go into the specific details for the rest of the model other than briefly mentioning the other features implemented. A learning rate for the Adam optimizer allows control over how quickly the model is adapted to the problem. Label smoothing allows us to make the model less overconfident in its predictions. This regularization method allows the model to not “overclassify” a playlist — but rather restrains the largest logit from becoming much bigger than the rest. This allows the model to think about different genres and a combination of genres, rather than being overconfident towards one. Finally, an early stopping method prevents the model overfitting by running too many epochs.
We measure the accuracy and loss of the model using the following graphs and they help consider the overfitting/underfitting of the model to balance the bias-variance trade-off by tuning hyperparameters (train model loss should never go to 0, otherwise we are overfitting).
We then test the model and output the confusion matrix and accuracy score as with the KNN.
#Train the model with the train data
classifier.fit(X_train, y_train)
#Predict the model with the test data
y_pred = classifier.predict(X_test)
From the results we can see that the neural network performs better at classifying the test songs correctly. We now implement this method for our playlists.
Predicting Genres of Playlists
Using the neural network, we can predict the genres of our unseen playlists that we generated before.
# set up unlabelled dfs
playlist_df_ul = playlist_df.drop(columns=['pid'])
song_df_ul = song_df.drop(columns=['uri', 'genre'])
# knn predictions
nn_classes = classifier.predict(playlist_df_ul.values)
print(nn_classes)
Unsupervised Machine Learning: Finding the Most Relevant Songs
In this section, we use an unsupervised clustering method, a Gaussian Mixture Model(GMM), to find the songs with the closest Euclidean distance in a high dimensional space to a playlist classified within the genre of songs. These songs are hypothesized to be the likely “next best”. While we cannot visualise a high dimensional space to represent all the features of a song at once, we can still think about the “similarity” of a playlist and song as just the Euclidean distance between the two in this space.
We use a GMM because it can successfully find a probabilistic representation of a playlist in a range of clusters, each, in theory, containing similar songs from the specific genre the playlist has been classified to. The advantage of using a GMM instead of K-means clustering, both of which are generally easy-to-apply unsupervised models, is that our GMM can handle non-circular clusters of data, as we have specified using the “full” covariance type. The second advantage is that a GMM performs soft-clustering, telling us the probabilities that a given playlist belongs to each of the possible clusters. This is useful for finding songs that are similar but outside of the cluster assigned (which may be necessary if the songs in the cluster a playlist is defined to run out).
Mathematically, we can write the likelihood that any given sample came from a Gaussian 𝑘 in our GMM as
Similarly, we can write the likelihood of observing a data point given that it came from our Gaussian 𝑘 as
To take into account all possible distributions, we can simply use the sum rule, and marginalise over all other samples under the assumption that they are independent of one another (Maklin, 2019).
In order to calculate the parameters of our Gaussians, we use the Expectation Maximisation algorithm, which helps us find the local maximum likelihood estimates of our parameters. To summarise this process, iteratively the EM algorithm performs an expectation (E) step, “which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step” (Wikipedia, 2020). See here for more details on GMMs.
We write overarching functions to implement the neural network and then use the GMM to find the probability of a playlist belonging to a certain group of songs. The songs nearest in Euclidean distance in this high dimensional space represented by our different features are the most similar and thus the best to recommend next. We can even select a new playlist by URI on Spotify, calculate the mean features, and then recommend songs for that too!
def predict_song(playlist_index, uri_label, own_playlist):
# if uri is provided
if own_playlist == True:
playlist_uris = [i['track']['uri'] for i in sp.playlist(uri_label)['tracks']['items']]
features = np.array(playlist_summarise(playlist_uris))
playlist_name = sp.playlist(uri_label)['name']
print(f'Name of playlist: {playlist_name}')
playlist_prediction = playlist_prediction = classifier.predict(playlist.reshape(1, 9))
print(f'The playlist is genre: {genres[playlist_prediction[0]]}')
# if querying playlist from dataset
else:
print(f"Name of playlist: {playlists[playlist_index]['name']}")
features = playlist_df_ul.values[playlist_index]
playlist_prediction = playlist_prediction = classifier.predict(playlist.reshape(1, 9))
print(f'The playlist is genre: {genres[playlist_prediction[0]]}')
# generate songs of specific genre
genre_songs = song_df.loc[song_df['genre'] == playlist_prediction[0]]
genre_songs = genre_songs.drop(columns = ['genre']).reset_index(drop=True)
# so we take all genre songs we have and gaussian process
# fit a Gaussian Mixture Model
clf = mixture.GaussianMixture(n_components=(len(genre_songs))//n_requests, covariance_type='full', random_state=0)
clf.fit(genre_songs.drop(columns = ['uri']).values)
# predict classes using GMM
classes = clf.predict(genre_songs.drop(columns = ['uri']).values)
# recommend top x songs
most_recommended_songs = clf.predict_proba(features.reshape(1,-1))[0]
# print(most_recommended_songs)
max_index, max_value = max(enumerate(most_recommended_songs), key=operator.itemgetter(1))
# take the songs
songs_index = np.where(classes == max_index)
selected_songs = genre_songs.loc[songs_index]
selected_songs_uris = selected_songs['uri'].values
# make sure songs aren't already in playlist
if own_playlist == False:
playlist_uris, pid = find_uris(playlists, start=playlist_index-1, SIZE=playlist_index)
playlist_uris = playlist_uris[0]
# remove overlapping songs
for element in playlist_uris:
if element in selected_songs_uris:
selected_songs_uris.remove(element)
print('\n')
print('The recommended songs, in no particular order, are:')
counter = 0
for i in selected_songs_uris:
counter +=1
print(f"{sp.track(i)['name']}, by {sp.track(i)['artists'][0]['name']}")
if counter == 20:
break
return
# using NN
predict_song(playlist_index=0,
uri_label='',
own_playlist=False)
Pretty cool, right?
Concluding Thoughts
The most constraining factor on our predictions, I believe, is the assumption that the mean of all features in a playlist is an accurate representation of the genre of a playlist. Many playlists are not created as “genres” to begin with. For example, for the playlist called “pump”, should this be hip-hop, pop, rock, or EDM? While a playlist might be the sum of its songs, taking the mean of all features has the capacity to be affected by outliers, and we simply aren’t using enough classes to get closer to the true genre of a given playlist. As a future task and potential improvement, it may be worth taking the median of features instead, as this is less prone to being affected by anomalous songs in playlists. Despite this, songs are diverse. A song can hardly be classified to a single genre, and in order to do so, Spotify now has over 5,000 genres (Davison, 2020). Hence, given I was only querying from 10 genres for my songs which I used to train my models, it is likely that we poorly classify playlists. Also, our sample size used to train our models is relatively small. Perhaps we could make better predictions by using a larger sample size. This would require more computational power, however.
Some improvements beyond testing the median as a better measure of a playlist’s features, could be to use a sentiment analysis approach on the name of the playlist as well. If I could rank the name “pump” on a scale of 0 to 1 in terms of low to high energy, for example, then we would have another feature to predict on. I could also change methods entirely. As seen in past challenges with Spotify datasets (Hamed Zamani, 2019), most high-performing teams use collaborative filtering where they “create an incomplete playlist-track matrix and use matrix factorization to learn a low-dimensional dense representation for each playlist and track. They learn similar representations for the tracks that often occur together in user-created playlists.”
Despite this, we have successfully ran through the process of recommending songs for a playlist on Spotify using supervised and unsupervised machine learning methods. Congratulations on getting to the end of this tutorial!
References
Brownlee, J. (2016). Dropout Regularization in Deep Learning Models With Keras. Retrieved from Machine Learning Mastery: https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/
Brownlee, J. (2019). A Gentle Introduction to the Rectified Linear Unit (ReLU). Retrieved from Machine Learning Mastery: https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
Davison, C. (2020). Spotify Users Are Noticing Something Very Strange About Their Top Genres. Retrieved from PureWow: https://www.purewow.com/entertainment/spotify-wrapped-genres
Gelman, A. J.-S. (2008). A WEAKLY INFORMATIVE DEFAULT PRIOR DISTRIBUTION FOR LOGISTIC AND OTHER REGRESSION MODELS. Retrieved from ArXiv: https://arxiv.org/pdf/0901.4011.pdf
Hamed Zamani, M. S. (2019). An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation. Retrieved from ACM Digital Library: https://dl.acm.org/doi/abs/10.1145/3344257
Keras. (2020). keras.io. Retrieved from Drouput layer: https://keras.io/api/layers/regularization_layers/dropout/
Maklin, C. (2019). Gaussian Mixture Models Clustering Algorithm Explained. Retrieved from Medium: https://towardsdatascience.com/gaussian-mixture-models-d13a5e915c8e#:~:text=Gaussian%20mixture%20models%20can%20be,of%20the%2%200bell%20shape%20curve
Sean M. O’Brien, D. B. (n.d.). Bayesian Multivariate Logistic Regression. Retrieved from Duke Statistics: http://www2.stat.duke.edu/courses/Fall03/sta216/lecture10.pdf
Spotify. (2020). Explore. Retrieved from Spotify For Developers: https://developer.spotify.com/
Spotify. (2020). Spotify Million Playlist Dataset Challenge. Retrieved from AIcrowd: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge Licensing: “The dataset and challenge will be available on an ongoing, open-ended basis, and allow for non-commercial, open research use. We hope that this re-release will enable further research and improvements in the field of music recommendation and automatic playlist continuation.”
Stan. (2020). Multi-Logit Regression. Retrieved from Stan User’s Guide: https://mc-stan.org/docs/2_25/stan-users-guide/multi-logit-section.html
Tingle, M. (2019). Retrieved from https://medium.com/@maxtingle/getting-started-with-spotifys-api-spotipy-197c3dc6353b
Wikipedia. (2020). Expectation–maximization algorithm. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm