An Analysis of Rate Your Music Ratings Is Today's Music...

Preview:

Citation preview

1

An Analysis of Rate Your Music Ratings Is Today's Music Really Worse?

Aaron LevineOctober 2016

2

Introduction

● Rate Your Music (RYM) is one of the largest online databases of music– It's also the largest database that contains user submitted reviews and

ratings [1]

● Top albums from the most recent decade are rated lower than top albums from prior decades– Is music just getting worse?

● Hypothesis: we know which albums from 60's and 70's are classics and rate them appropriately. Albums that were added after RYM went online are fiercely debated and the average rating declines accordingly– This will be supported if average rating declines after RYM launched

● Let's use data from RYM to quantify this effect!

[1] https://en.wikipedia.org/wiki/List_of_online_music_databases

3

Data Gathering

● RYM has no API– Data gathered using HTML scraping and parsing

– Time consuming, limited amount of data gathered for this small project

● Focus on top albums– “Worst” albums from prior decades are forgotten but more recent “worst”

albums are ridiculed on the site and receive many ratings and reviews● Entirely different effect, for a future project!

– Initial goal was to analyze all albums between 1990 and 2014 with over 3.5/5 rating and over 500 reviews

● RYM's ratings are strict: 3.5/5 actually is quite good for the site

● Scraped data from top of year charts until all albums with over 3.5/5 and over 500 reviews were collected

4

● RYM ranks albums by a proprietary algorithm that takes into account number of ratings and votes– An album with 3.4/5 but 1000 ratings could be ranked

above an album with 3.5/5 and 600 ratings

● After scraping enough HTML from the RYM charts to get all albums with over 3.5/5 and over 500 ratings for the year, a Gaussian distribution was created when the rating requirement was disregarded– Reason: tail on the lower end of the rating scale

consisting of albums with more votes

● Ultimately, it doesn't make sense to apply a universal ratings cutoff when the whole point of this project was to determine how much the ratings varied from year to year!

● Final selection: Scraped HTML for “top” 680 albums per year, from 1990 to 2014, disregarded all albums with less than 500 votes

Data Selection

5

RYM Ratings by Year

Scroll to see changes!

6

1990

7

1991

8

1992

9

1993

10

1994

11

1995

12

1996

13

1997

14

1998

15

1999

16

2000

17

2001

RYM Launched

First year of RYM: Mean lower than all years but 1992,1999

18

2002

Second year of RYM: Lowest mean so far

19

2003

20

2004

Mean keeps declining....

21

2005

22

2006

23

2007

24

2008

User Reviews added [1]Immediately a 0.05 decline in mean from 2007!Largest previous change was 0.03

[1]wikipedia.org/wiki/Rate_Your_Music

25

2009

26

2010

27

2011

28

2012

29

2013

30

2014

31

1990 vs 2014

32

Three Distinct RYM Ratings Eras

● Pre RYM ( Up to 2000):– Top albums discussed in pop culture,

users have solidified opinions before rating

● Transitionary period (2001-2007):– RYM online, but still in early period.

Internet less universally prevalent, lack of reviews

● Modern RYM (2008-Present)– Reviews added, average rating of top

albums drops an unprecedented 0.05 in first year of modern era, continues declining throughout era

– Increased awareness of good but not classic albums shrinks width of top 680 distribution

33

Modern RYM Era: A Closer Look

● Mean declines linearly throughout era– Apply a linear regression to

quantify the effect

● Result for year > 2007: – f(x) = 33.15 – 0.15x

● Strong linear relationship– R2 = 0.9788

– Residual standard error: 0.0051

34

Corrected RYM Ratings

● Correct Modern RYM ratings to account for drop in mean● For each rating, use:

– Corrected rating =

(initial rating – mean(year)) X σ(PreRYM)/σ(year)) + mean(PreRYM)● mean(year) is from linear regression from 2008-2014● mean(PreRYM) = mean(1990-2000) = 3.66● σ(PreRYM) = σ(1990-2000) = 0.16● σ(year) = σ for each year of the modern RYM era

● This simple method produces a distribution from 2008-2014 with the same mean and σ and the 1990-2000 distribution– Simplest way of correcting for this effect. Many more sophisticated

ways to correct can be implemented in future

35

Corrected ModernRYM Vs. PreRYM

Corrected

36

Top 10: PreRYM and ModernRYM

Rating Year Artist Album

4.22 1997 Radiohead Ok Computer

4.21 1991 My Bloody Valentine Loveless

4.18 1993 Wu Tang Clan Enter the Wu Tang

4.17 1994 Nas Illmatic

4.16 2000 Radiohead Kid A

4.15 1995 GZA Liquid Swords

4.14 2000 Godspeed You Black Emperor

Lift Your Skinny Fists Like Antennas to Heaven

4.12 1994 Portishead Dummy

4.11 1996 DJ Shadow Endtroducing

4.09 1997 Godspeed You Black Emperor

F A ∞♯ ♯

Nothing from ModernRYM is in top 10

Before Corrections

37

Top 10: PreRYM and ModernRYM

Rating Year Artist Album

4.41 2012 Kendrick Lamar Good Kid Maad City

4.22 1997 Radiohead Ok Computer

4.21 1993 My Bloody Valentine Loveless

4.19 2010 Kanye West My Beautiful Dark Twisted Fantasy

4.19 2014 D'Angelo Black Messiah

4.18 1993 Wu Tang Clan Enter the Wu Tang

4.18 2012 Swans The Seer

4.17 1994 Nas Illmatic

4.16 2000 Radiohead Kid A

4.15 2009 Vektor Black Future

After Corrections

Mixture of ModernRYM and PreRYM in top 10

38

Results and Conclusions

● Data supports hypothesis: ratings decline after creation of RYM● Simple corrections to ModernRYM to account for ratings declines

produces a top album list with an equal mixture of albums from both eras– More sophisticated corrections taking into account number of votes and

genre could be added as part of a larger future project

● Future work could also examine the volatile transition period from 2001-2007– More difficult to examine because of a relatively rapidly changing and

increasing user base

● Apparently I really need to listen to Kendrick Lamar's 2012 album!