Winpredict


Project maintained by juseotin Hosted on GitHub Pages — Theme by mattgraham

Objectives-Winning Games(LOL)

By Justin Seo(miseo@ucsd.edu)

Introduction

Our dataset comprises all professional League of Legends games that took place in 2023. This dataset includes various match statistics, focusing on specific events and overall performance metrics. The dataset is extensive, containing detailed information on each game’s events, team performances, and outcomes.

Our primary question is: “How do specific early game events impact the likelihood of a team winning?” By addressing this question, we aim to provide insights into the critical events that can shift the balance of a game. Understanding these key events can help teams strategize better and improve their chances of winning.

Although the data does not explicitly indicate the impact of each event, we can infer this information from columns such as “event1”, “event2”, “event3”, “blueJungleKDA”, “redJungleKDA”, and the final “result” of the game.

League of Legends is a complex game where early game advantages can snowball into a decisive victory. Players and coaches often debate which early game actions (like securing the first tower or achieving a high KDA) are most beneficial for winning the game. Through this analysis, we aim to identify which early game events significantly impact the overall win rate.

The columns we are interested in to answer our question include: “event1”, “event2”, “event3”, “blueJungleGP”, “blueJungleKDA”, “redJungleGP”, “redJungleKDA”, and “result”. These columns provide information about the early game events, jungle performance, and the final result of each match. By analyzing these columns, we can determine the influence of specific events on the game’s outcome.

Columns:

I created additional columns,

Data Cleaning and Exploratory Data Analysis

The raw data was loaded from a CSV file containing various match statistics for professional League of Legends games. Checked for any missing values in the dataset and decided to either fill them with appropriate values or drop them based on their significance. Removed the columns that are not being used, and added several new columns.

game event1 event2 event3 blueJungleKDA redJungleKDA result First_Event_Red Second_Event_Red Third_Event_Red Red_More_Objectives
ESPORTSTMNT03_3087499 BLUE: first_blood BLUE: dragon BLUE: herald 0 0 1 0 0 0 False
ESPORTSTMNT01_3309691 BLUE: first_blood BLUE: dragon RED: herald 0 0 1 0 0 1 False
ESPORTSTMNT01_3311651 BLUE: first_blood BLUE: dragon BLUE: herald 0 0 1 0 0 0 False
ESPORTSTMNT01_3309743 RED: first_blood RED: dragon RED: herald 1.56 0 1 1 1 1 True
ESPORTSTMNT01_3311712 BLUE: first_blood RED: herald RED: dragon 9 0 1 0 1 1 True

This pie chart shows the distribution of the First_Event_Red column, indicating whether the first event was advantageous for the Red team (1) or the Blue team (0). This visualization helps in understanding the frequency of the first event being advantageous for each team and its relation to the game result. The probability of red team taking the first objectives is 39%. I made two more pie charts for the second and third objective as well, red team had a probability of 41% to take the second objective, and 36.8% for the third objective. This pie chart showed that Red team has lower percentage of taking objectives than the Blue team.

I made another bar chart which displays the rate for Red team to win while not perfectly securing the first three objectives, when Red team did take the first objectives, or did not take the first objectives. Comnparing these two charts helps in understanding the significance of securing the first objective in the overall game outcome. This indicates that taking the first objectives helps team winning with all objectives secure, and they are going to have a different way other than winning by taking all objectives if the opponent team took the first objectives.

First_Event_Red Second_Event_Red Third_Event_Red
0.638691 0.610242 0.56899
0.390387 0.410317 0.368113

Assessment of Missingness

I used the difference in proportions of missing event1 between games that were won and games that were lost as an test statistics.

Observed Difference: 0, P-value: 1

Observed difference indicates that there were no difference in the result of the game when there is a missingness of event1. With a P-value of 1, we fail to reject the null hypothesis, suggesting that the missingness of event1 is not significantly dependent on the game result. This also means that the game competition takes a good care of preventing the unknown accidents during the game, so it does not control the result of the game. By performing this analysis, I was able to gain insights into whether the missingness of event1 is potentially NMAR and consider additional data that might help transform it to MAR.

Hypothesis Testing

There were no missing values in the data I used.

The Null and Alternative Hypothesis came from the idea to figure out if the first three events are significant for the team winning to get to know about the relationship between objectives and game results. The test statistic was the difference between the game winning when took advantages in the first three events, and the game winning without taking advantages in the first, in a significance level of 5%.

Result: observed differnce: 0.3 p-value = 1, fail to reject the null hypothesis. The observation we saw in our data did not happen due to random chance, so taking advantages in first three events is not related to winning the game.

Framing a Prediction Problem

Type: Classification (Binary Classification)

Prediction Problem: Predict whether a team wins the game based on their performance in the jungle laner’s KDA and the first three events.

Response Variable: The response variable is result, which indicates whether the team won (1) or lost (0) the game. This binary outcome is crucial for understanding the effectiveness of early game advantages in determining the final outcome of the game.

Evaluation Metric: F1-score

Reason for Choosing F1-score:

At the “time of prediction,” we use information from the first three events, which are part of the early game. This ensures that our model is practical and can be applied in real-time scenarios, where decisions and predictions need to be made based on the ongoing game’s current state.

In this classification problem, we’re predicting whether a team will win based on the jungle laner’s performance early game events. We use features available at the time of prediction to ensure the model’s practicality. The F1-score is chosen as the evaluation metric due to its suitability for potentially imbalanced datasets, providing a balanced measure of precision and recall.

Baseline Model

Model Type: Random Forest Classifier

Quantitative Features: Red_More_Objectives(Binary indicator if the Red team had more objectives in the first three events), redJungleKDA(KDA score for the Red team’s jungler.)

F1-score: 0.645

I believe my current model is good since it would catch if it is the jungle laner who contributes the most to loss of both objectives and a game, however, with a importance of 0.8 in redJungleKDA, I would like to add the difference between the KDA of the jungle laners in each team for a better prediction.

Final Model

Model Type: Random Forest Classifier

Hyperparameters that worked the best

Number of Trees (n_estimators): Controls the number of trees in the forest.

Maximum Depth (max_depth): Controls the maximum depth of the trees.

Minimum Samples Split (min_samples_split): The minimum number of samples required to split an internal node.

Categorical Features: event1(Encoded using OneHotEncoder)

Quantitative Features: Red_More_Objectives(Binary indicator if the Red team had more objectives in the first three events), redJungleKDA(KDA score for the Red team’s jungler),Total_Objectives(Sum of blueJungleGP and redJungleGP),KDA_Difference(Difference between blueJungleKDA and redJungleKDA)

F1-score: 0.695(increased from 0.65)

I decided to add the Total_Objectives since I am trying to find the relation between the objectives and the game result in a bigger pie. Before I built the baseline model, the first three objectives were not significant to the result of the game, so I started thinking about the position related to taking objectives which is jungle laner. I added the columns comparing the KDA of the jungle laners in each team so it could include the part that was never brought up before, and was found highly important. By adding new features (Total_Objectives and KDA_Difference) and the best working the hyperparameters, the final model is expected to improve upon the baseline model. The pipeline ensures that all preprocessing steps and model training are performed in a single step, making the process efficient and reproducible.

Fairness Analysis

Group X: Teams with a high redJungleKDA (above the median value).

Group Y: Teams with a low redJungleKDA (below or equal to the median value).

Evaluation Metric: F-1 Score

Hypotheses

p-value: 0.99

Observed Difference: 0.13817487933550343

We fail to reject the null hypothesis, with a p-value of 0.99, which indicates the Jungle laner’s performance do not effect the result of the game. This concludes to that our model was fair, and the jungle laner’s performance do not effect the game result.