Movie Data Analysis Project

Jaden
3 min readApr 27, 2020

Module 1 is finally coming to an end, the final project signifies the completion of the module. I was assigned the task of helping Microsoft create a movie. Using my novice coding skill I went to work to come up with actionable insights on what movie they should make.

I began by scraping IMDbs website, IMDb is a very popular online database that consists of almost every detail you need on a particular movie. IMDb has a list of the most popular movies sorted by the year they were released. I used a python library called BeautifulSoup to build a function that would take in the start and end year as arguments and return 150 movie titles for each year in a list. I decided that three decades would be enough data to do my EDA.

My next step was using the TMDb API. This API is crowd sourced so individuals are able to append information to this dataset but it’s regularly checked so the information is accurate. The TMDb API has a search by title function so I passed the title and year as parameters to the API to obtain the movie’s unique ID. With this ID i was able to obtain more data

After filtering the Movie titles into the API, out of the original 4500 only 4205 movies were found on the API. We lost about 300 movie titles but we still have enough for our research.

Next I used the unique IDs to make an API call to return a Json of the details of the movie. Details such as budget, revenue, domestic gross, genre, and popularity rating. After parsing out the necessary information it was time to do some exploratory analysis.

Does a higher budget bring in bigger profits?

We see a slight positive relationship between a movie’s budget and gross. Meaning typically the more budget you have the more gross you will have. But that’s not always true, we can see that there is a possibility of lower budget movies exceeding, in term of profit, those that have higher budget.

What is the most popular genre?

The most popular movie genre is Science-Fiction, AKA sci-fi. At this point I had the idea that popularity meant profit. It kind of makes sense because a popular movie may have more people watching it, buying the merchandise, etc. But I later find out this isn’t the case.

The highest grossing genre is actually animation. Sci-Fi came in 4th. So there is no relation between how popular a movie is and its gross.

Does the release month of the movie impact gross?

Yes, there seems to be a big difference in terms of gross of a movie that was released in June than in January. Movies released in June, on average, earn 248 million in gross.

Conclusion

The ideal released date is during the spring/summer time as the weather is warmer and more people are outside looking for activities to do. The movie should a Science-fiction genre based animation film to maximize the fanbase and gross.

--

--