Importing the data
First, we import pandas and read in the assignment data.
title | movieId | |
---|---|---|
11 | Star Wars: Episode IV - A New Hope (1977)" | 11 |
12 | Finding Nemo (2003)" | 12 |
13 | Forrest Gump (1994)" | 13 |
14 | American Beauty (1999)" | 14 |
22 | Pirates of the Caribbean: The Curse of the Bla... | 22 |
userId | movieId | rating | |
---|---|---|---|
0 | 1 | 809 | 4.0 |
1 | 1 | 601 | 5.0 |
2 | 1 | 238 | 5.0 |
3 | 1 | 664 | 4.5 |
4 | 1 | 3049 | 3.0 |
userId | uniqueId | |
---|---|---|
0 | 1000 | kmh1234-wd4321-iamawesome6789 |
1 | 1001 | ca6faa08-232e-4ac1-a364-13c062fd3ae4 |
2 | 1002 | rdb |
3 | 1003 | skins428 |
4 | 1004 | ergunner |
[122, 558, 788]
Users who watched my movies
I check the ratings for one of my assigned movies (122).
userId | movieId | rating | |
---|---|---|---|
20 | 1 | 122 | 3.5 |
89 | 2 | 122 | 4.5 |
160 | 4 | 122 | 2.0 |
237 | 5 | 122 | 3.5 |
333 | 6 | 122 | 3.0 |
Ratings for my movies
Let’s gather the ratings for all my assigned movies.
[ userId movieId rating
20 1 122 3.5
89 2 122 4.5
160 4 122 2.0
237 5 122 3.5
333 6 122 3.0,
userId movieId rating
33 1 558 3.0
264 5 558 2.5
368 7 558 2.5
406 8 558 3.5
437 9 558 3.5,
userId movieId rating
22 1 788 4.0
118 3 788 3.0
171 4 788 3.0
213 5 788 2.5
322 6 788 2.0]
I then explore the case of movie 122 first, without generalizing, in order to figure out if I have the match right. I first get the users who watched movie 122, and get the count.
4393
4393.0
I then get the all the movie ratings of users who watched movie 122.
count 338355
mean 0.8752878
std 0.3303928
min False
25% 1
50% 1
75% 1
max True
dtype: object
userId | movieId | rating | |
---|---|---|---|
0 | 1 | 809 | 4.0 |
1 | 1 | 601 | 5.0 |
2 | 1 | 238 | 5.0 |
3 | 1 | 664 | 4.5 |
4 | 1 | 3049 | 3.0 |
And finally perform the rating calculation.
movieId | rec122 | title | |
---|---|---|---|
120 | 120 | 95.128614 | The Lord of the Rings: The Fellowship of the R... |
121 | 121 | 94.627817 | The Lord of the Rings: The Two Towers (2002)" |
603 | 603 | 93.922149 | The Matrix (1999)" |
597 | 597 | 89.050763 | Titanic (1997)" |
604 | 604 | 88.071933 | The Matrix Reloaded (2003)" |
I got suspicious when I saw the top recommendations, so just to double check that 122 is actually LOTR3, I look at the titles info.
title The Lord of the Rings: The Return of the King ...
movieId 122
rec122 100
Name: 122, dtype: object
Generalize the exploration
I know abstract what I gathered from the exploration and put it into functions. The names are slightly verbose but it made it easier for me to follow what was going on.
These functions now create the output for the assignment.
And victory, let’s create the solutions and save them to CSV.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
122 | 120 | 0.95 | 121 | 0.95 | 603 | 0.94 | 597 | 0.89 | 604 | 0.88 |
558 | 603 | 0.93 | 557 | 0.93 | 597 | 0.91 | 607 | 0.89 | 604 | 0.88 |
788 | 603 | 0.94 | 329 | 0.91 | 607 | 0.91 | 13 | 0.91 | 597 | 0.90 |
122,120,0.95,121,0.95,603,0.94,597,0.89,604,0.88
558,603,0.93,557,0.93,597,0.91,607,0.89,604,0.88
788,603,0.94,329,0.91,607,0.91,13,0.91,597,0.9
122,121,4.86,243,3.69,120,3.69,2164,3.56,1894,3.17
558,36658,3.85,414,2.99,786,2.58,557,2.56,9331,2.45
788,9331,4.85,243,4.68,786,3.85,134,3.63,1900,3.44
11,603,0.96,1892,0.94,1891,0.94,120,0.93,1894,0.93
121,120,0.95,122,0.95,603,0.94,597,0.89,604,0.88
8587,603,0.92,597,0.9,607,0.87,120,0.86,13,0.86