Upload
dakiry
View
54
Download
9
Embed Size (px)
Citation preview
Igor Kostiuk | 2016
Tags: #music, #recommender_systems, #deep_learning, #neural_networks, #mel_spectrograms
How to train your music recommender system
Recommender systems are a family of methods that seek to predict the rating or preference that a user would give to an item © Wiki
Is there something similar to something else?
There are two common ways to make recommendations.
Collaborative filtering
- cold start problem (requires a large amount of information on a user in order to make accurate recommendations)
- will not recommend rare or new songs, games, etc. (popular items will be much easier to recommend than unpopular items)
- bad scalability
+ content-agnostic
Example: Last.fm recommends music based on a comparison of the listening habits of similar users.
http://ru.anime-characters-fight.wikia.com/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Anime-heroes-wallpaper-hd-cool-7.jpg
Popularity
Content-based filtering
- can only make recommendations that are similar to the original seed
- semantic gap between audio or video, and the various aspects of music / movie that affect user preferences (genre, mood)
- obvious recommendations ( Doom Doom 4 etc. )→
http://static.giantbomb.com/uploads/original/13/137381/2846580-doom.jpg
There is nothing more similar to the tea kettle than the other tea kettle
Approaches1. Automatic generation of social tagsSocial tags are user-generated keywords associated with song.Predicting these social tags directly from MP3 files avoids the ''cold-start problem''.Using a set of one vs all classifiers for every tag, we can map audio features onto social tags collected from the Web.
2. Music genre classificationAttempt to classify songs into a set of genre classes. Clustering – each cluster represents a specific genre.Setting label to each cluster by choosing the “majority vote” - which genre was the most common in that cluster.
https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
Deep Learning approachPredicting listening preferences from audio signals by training a regression model to predict the latent representations of songs that were obtained from a collaborative filtering model.
Datafrom a collaborative filtering model
Dataraw mp3
Latent factors vector extractingmatrix factorization
Mel-spectrograms extracting
Deep neural network
input output
prediction
Advantages
+ Effectiveness in recommending new and unpopular songs
+ Good recommendations despite the semantic gap
Development stages
Data retrieval
The Echo Nest Taste Profile Subset
http://labrosa.ee.columbia.edu/millionsong/tasteprofile
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBSUJE12A6D4F8CF5 2b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBVFZR12A6D4F8AE3 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5
Taste Profile subset is big. Some numbers:
1,019,318 unique users 384,546 unique MSD songs 48,373,586 user - song - play count triplets
Data retrieval
https://www.7digital.com/
We are able to attain 29 second audio clips for over 99% of the dataset.
Original dataset has no raw audio, only precomputed, badly documented features.
Weighted matrix factorization
https://youtu.be/o8PiWO8C3zs
song_id
user_id
song_id
user_id
Weighted matrix factorization
n songs
m users ≈ *
m u
sers
f
f
n songs
R P
Q
R – rating matrix m*nP – user matrix m*fQ – song matrix f*nf – number of features
Weighted matrix factorization
Alternating Least Squares
http://mendeley.github.io/mrec/https://github.com/benanne/wmfhttps://github.com/benanne/theano_wmf
Weighted matrix factorization
iteration
erro
r
http://benanne.github.io/2014/08/05/spotify-cnns.html
Mel-spectrograms
A mel-spectrograms is a kind of time-frequency representation.
It is obtained from an audio signal by computing the Fourier transforms of short, overlapping windows.
Finally, the frequency axis is changed from a linear scale to a mel scale.
https://en.wikipedia.org/wiki/Mel_scale
Mel-spectrograms
series = np.sin(time)
# filename = "The Prodigy - Invaders Must Die.mp3"# filename = "Lady GaGa - Poker Face.mp3"
Mel-spectrograms
Used log-compressed mel-spectrograms with 128 components and the window size and hop size 1024 and 512 audio frames respectively.
https://github.com/librosa/librosa
http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html#librosa.feature.melspectrogram
T-SNE
https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding
1024 1024 * 2
1024 * 4
Mel-spectrograms
Convolutional neural network
The deep neural network baseline architecture could be consisted of two convolutional layers and two fully connected layers.
http://benanne.github.io/2014/08/05/spotify-cnns.html
Convolutional neural network
http://benanne.github.io/2014/08/05/spotify-cnns.html
259 x 128 x 1
4 x 128 x 1
259 x 4 x 32
4s
0.0029s
Filters
Convolutional neural network
The network can be trained on windows of 3 seconds sampled randomly from the audio clips.
The last layer of the network is the output layer, which predicts 40 latent factors obtained from the collaborative filtering.
http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014
Album cover based models
1) series = (np.sin(time) - np.sin(time / np.pi))https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29 2) Deep content-based music recommendationhttp://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf
3) Collaborative Filtering for Implicit Feedback Datasetshttp://yifanhu.net/PUB/cf.pdf
4) Alternating Least Squares Method for Collaborative Filtering http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/
5) Recommending music on Spotify with deep learninghttp://benanne.github.io/2014/08/05/spotify-cnns.html
6) * http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendation.pdf
http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf
http://ismir2011.ismir.net/papers/PS6-10.pdf
http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/
http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems
Let’s stay in touch:
https://www.facebook.com/neverdraw
https://www.linkedin.com/in/awesomengineer
Github
https://github.com/spaceuniverse
Thanks
http://cdn.gymnasticstracks.com/wp-content/uploads/2015/09/httyd.jpg