Game Predictor

Calculate metrics for estimated review number, downloads, revenue and cluster positive reviews distribution

Project: Steam Game Analytics Tool

For additional technical understanding of the development, refer to the Python Notebook

Data Source and Usage

This tool uses a dataset from Steam, featuring over 83,000 games and diverse metadata like titles, publishers, genres, and release dates. The actual model and analysis has been done only on games up until 2022. IQR method was used to remove outliers for total amount of reviews and price. This resulted in the predictive model only considering games with a maximum of 279 reviews and a maximum price of 13.57 USD. The distributions of how these variables looked before and after applying IQR are shown below.


Figure 1: Distribution transformation of Total User Reviews before and after applying IQR


Figure 2: Distribution transformation of Price before and after applying IQR

Predicted Multiple Linear Regression Model

The results for the amount of predicted reviews are based on the Multiple Linear Regression trained with the data as outlined above. The evaluation metrics of the model provided essential insights, but they also highlight important limitations:

It's important for users to understand these limitations and use the model's predictions as a guide, rather than a definitive forecast. Creative judgment and market understanding should complement these insights.

Estimated downloads and Revenue

Estimated downloads number is calculated by multiplying the number of predicted user reviews by 35. This ratio is based on Simon Carless' analysis, stating that the estimated owners of the game based on the number of reviews is a ratio that is closer to 30 in recent years

Revenue is the estimated downloads of the game multiplied by a factor of 0.38. This number is based on Weinbaum's calculation of multiplying VAT (0.93),returns (0.92),average regional price (0.8), average discount (0.8) and platform cut (0.7).

K-Means Clustering

K-Means Clustering is used to offer deeper insights by comparing games to similar titles. This process involves determining the optimal number of clusters, then analyzing the distribution of positive reviews within each cluster. Four clusters were formed, and each hold different distributions for the percentage of positive reviews. Users can use this as a reference to where a game belongs to.

Figure 3: The four clusters a game can belong to

Considerations

When using this tool, please consider the following:

Remember: This tool aims to support, not dictate, your game development journey.