Data-Driven Prediction of Athletes’ Performance based on their Social Media Presence

We investigated whether proxies of athlete social media activity are useful features for a machine learning model to predict athletes’ performance in subsequent competitions. We extracted millions of tweets that NBA basketball players posted themselves or were tagged in and derived features reflecting players’ mood, social media behaviour, and sleep quality before games. Using these and other non-social media-related features, we performed statistical tests to examine whether the features significantly improve the accuracy of a random forest model for predicting players’ BPM scores in upcoming games.

Paper authors: Frank Dreyer, Jannik Greif, Kolja Günther, Myra Spiliopoulou, and Uli Niemann

25th International Conference on Discovery Science

By Uli Niemann in Research

July 26, 2022

Abstract

It is well known in the sports industry that the performance of athletes is strongly influenced by physiological and psychological factors. In recent years, many researchers have analysed whether athlete-generated social media content can be used as proxies for such performance factors, with some promising results. In this study, we investigated whether such proxies are useful features for a machine learning model to predict athletes’ performance in subsequent competitions. We extracted millions of tweets that NBA basketball players posted themselves or were tagged in and derived features reflecting players’ mood, social media behaviour, and sleep quality before games. Using these and other non-social media-related features, we performed statistical tests to examine whether the features significantly improve the accuracy of a random forest model for predicting players’ BPM scores in upcoming games. The results show that, in particular, the number of tweets a player is tagged in prior to a game significantly improves the predictions of the model. Our findings provide insights for practitioners on the effects of social media on athlete performance that can be used prospectively for mental health awareness training and optimisation of pre-game routines.

Important figure

Figure 3. Relationship between each significant feature and Box Plus/Minus (BPM) score. Relationship between each significant feature and Box Plus/Minus (BPM) score.

BibTeX citation

@InProceedings{Dreyer:DS2022,
  author    = {Dreyer, Frank and Greif, Jannik and G{\"u}nther, Kolja and Spiliopoulou, Myra and Niemann, Uli},
  title     = {{Data-Driven Prediction of Athletes' Performance based on their Social Media Presence}},
  booktitle = {Discovery Science (DS)},
  year      = {2022},
  note    = {Accepted; to appear},
}
Posted on:
July 26, 2022
Length:
2 minute read, 282 words
Categories:
Research
Tags:
Text Mining Predictive Modeling Sports Analytics Social Media
See Also:
Classification of cardiac cohorts based on morphological and hemodynamic features derived from 4D PC-MRI data
Gender-Specific Differences in Patients With Chronic Tinnitus - Baseline Characteristics and Treatment Effects
Development and internal validation of a depression severity prediction model for tinnitus patients based on questionnaire responses and socio-demographics