The main goal of this machine learning project is to build a Movie Recommendation System engine that recommends movies to users. This Python, ML, Data Science project is designed to understand the functioning of a recommendation system. I developed an Item Based Collaborative Filter. This helped me gain experience of implementing my Python, Data Science, and Machine Learning skills in a real-life project.
Dataset used
I have used the TMDB5000 Movies Dataset. That data I have used consists of 5000 movies in the movies.csv file, applied over 5000 movies in the movies.csv.
Essential Libraries
Scikit-Learn, Matplotlib, Pandas and Numpy.
Data Pre-processing
After retrieving data from the movies.csv andratings.csv datasets, I observed that the userId column, as well as the movieId column, consisted of integers. Furthermore, I needed to convert the genres present in the movie_data dataframe into a more usable format by the users. In order to do so, I first created a one-hot encoding to create a matrix that comprises of corresponding genres for each of the films. I then created a search matrix that will allow us to perform an easy search of the films by specifying the genre present in our list.
There are movies that have several genres. For the movie recommendation system to make sense of the ratings through recommenderlab, I convert the matrix into a sparse matrix. This new matrix is of the class realRatingMatrix. I then overviewed some important parameters that provided various options for building recommendation systems for movies.
Content-Based Filtering — Make Tags
Content-Based Filtering involves suggesting based on the content that user provided. For example If a user watches a movie A and This Movie Recommendation System will suggest you the similar movies of A. This thing is possible by using the title, genres, tags, overview context and movie-actors and director or crew.