Odds of orioles winning world series
The question then, is whats the amount of uncertainty we have about how teams will finish compared to a normal season?Seems like a problem for Bayesian statistics.Setup, well build a simple model to predict a teams wins using their preseason odds of winning the World Series.Well use 2019 data to build the model, and then see how how the results change using the 2020 odds comparing a normal season to a short one.
First well scrape the data on 2019 results and odds for 20library(tidyverse) library(rvest) library(jtcr) library(ggrepel) library(scales) library(rethinking) theme_set(theme_jtc # get 2019 results results_2019_url - names_2019 - c team "wins "losses "pct "games_back "home "road rename_table - function(x) names(x).odds - bind_rows(odds_2019, odds_2020) arrange(team).First lets take a look at the distributions.Below is a plot that shows win percentage of a team that is expected to win 50 of their games based on 100,000 simulations of seasons either normal lengths (162 games) or shortened (60 games).
Short - rbinom(1e5, 60, prob.5) / 60 long - rbinom(1e5, 162, prob.5) / 162 tibble(short, long) gather ggplot(aes(value, fill key, color key) geom_density(bw.02, alpha.5) scale_x_continuous(labels percent) labs(title "Simulated win percentage for a team with 50 win probability.In our imaginary world where this team has a 50 probability of winning every game, in short season that team would end up winning 40 or fewer of its games.8 of the time, while in a normal.Modeling, well build a model to predict win percentage based on preseason odds of winning the World Series.
Below are those odds: odds mutate(team fct_reorder(team, -odds) ggplot(aes(team, odds, color factor(year) geom_point(size 2) coord_flip scale_y_continuous(labels function(x) paste0 comma(x trans "log10 labs(title "Comparing World Series champion odds between years y "odds of winning World Series (log scale x color "season caption "odds from.And below is how the 2019 odds compared to each times final regular season win percentage: odds filter(year 2019) inner_join(results_2019, by "team ggplot(aes(odds, pct) geom_point geom_text_repel(aes(label team) scale_x_log10(labels scales:comma) scale_y_continuous(labels scales:percent) labs(x "preseason World Series odds title "Preseason World.Prior predictive checks, our model is a logistic regression that looks like: W_i sim sf Binom(n, p_i) logit(p_i) alpha beta * O_i alpha sim mathcalN(0, 1) beta sim mathcalN(0,.5) where (W) is number of wins.First, we check our priors to make sure they produce reasonable results.