Predicting NBA Player Salaries through Statistics

Jaden
2 min readAug 29, 2021

In every professional team sport, player acquisitions during free agency is arguably one of the most important factors to a team’s success when done correctly. We’ve seen championship dreams solidify during these times, way before the season begins. In recent memory players like LeBron James and Chris Bosh joining the Miami Heats in the 2010 free agency and Kevin Durant shocking the world by joining the Golden State Warriors in 2016 help propel their teams to multiple world titles. On the other end, we have contracts like Chandler Parson(4-year, $94M) and Joakim Noah(4-year, $72M) that only increase the teams chances of getting a better lottery pick. Free agency is vital to a teams success and in order to predict a players value my model uses a players previous seasons statistics.

Data Collection

The data collection consist of two parts, the salary and the statistics. For a players salary, I use Spotrac’s website where I collected over 2000 individual contracts that were sign in the past 20 years. Since the motive of this project was to focus on free agency signing; majority of the contracts were remove due to them not fitting this criteria. Next we move on to a players statistics, the best way to gather this data is using Basketball-References API. A simple data frame module returns every statistical output of a player group by season. After combining the 2 datasets we are left with 573 rows of data and over 30 features. Resulting in about a decade of free agent contracts

Feature Selection

The methodology was to figure other which of the features correlated with a players AAV(Annual Average Salary)

Using this table, we were able to understand the relationship of the features to a players value. Removing features like games played, offensive rebounds, and personal fouls. Defensive stats like blocks and steals were pretty low on the list and offensive stats like PER(Player Efficiency Rating) were highly correlated with AAV.

Model Results

The data was trained through a Linear Regression and a Random Forest Regressor, both came back with similar results. In terms of R² both models were around .45–.55 range. For further analysis on the model check out my GitHub repo here.

--

--