Interactive Tables Update

Interactive Tables Update

We recently updated the app with a few more bells and whistles so that you can make more noise. I explained the xGoal model changes earlier(link), so here I’ll highlight the app’s key updates. We’ve made the app accessible publicly, we’ve integrated compensation into the data tabs, and we’ve improved sidebar filter options.

Read More

Model Update: Coefficient Blending

Model Update: Coefficient Blending

With our most recent app update, you might notice that some numbers in the xGoals tables have changed for past years where it wouldn’t normally make sense to see changes. As an example, Josef Martinez had 29.2 xG in 2018, but updated app shows 28.7 (-1.7%). No, this is not an Atlanta effect, though I can understand why you might support such an effect. Gyasi Zardes lost 0.5 xG as well (-2.4%), and no one dislikes Columbus.

We have updated our xGoal models with the 2018 season’s data, and that is the culprit of all the discrepancies since the last version of the app. I have already cited the largest two discrepancies by magnitude, so this isn’t some major overhaul of the model. In fact, only 2018’s xG values have been materially adjusted.* The new model estimated 35.6 fewer xGoals in 2018 than it did before, equivalent to a 2.8% drop.

Read More

Playoff Seeding Probabilities Model

Playoff Seeding Probabilities Model

Starting yesterday, you will find playoff seeding probabilities in our web app. We show the probability that each team finishes in each playoff seeding position in its conference, as well as the Supporters’ Shield probabilities for all teams.

What is this based on? Well, it’s a two-part process. First, we built a model capable of predicting the probabilities of future game outcomes based on team performance to date. Then we set up a simulation to randomly determine outcomes for all the remaining games this season, with probabilities derived from that predictive model. For each of 1,000 simulated seasons, we tallied each team’s final points, wins, and goals scored and allowed, and seeded the teams in each conference. Then we figured out what proportion of those 1,000 seasons each team finished in each place.

Read More

Expected goal chains are back!

Expected goal chains are back!

Expected Goal Chains is not a new thing here on the site, but we have now streamlined the process of generating them so that you can enjoy weekly updates on our web app! Kevin Shank (@Kev_Shank) introduced the concept last year, and because the concept hasn’t changed, I will steal some of his and Statsbomb’s explanation…

Read More

An Updated Expected Passing Model

An Updated Expected Passing Model

In the offseason we upgraded our passing model, and its outputs are now featured in our xPassing tables (both interactive and static). After a few minor tweaks this week, now is as good a time as any to explain how it works. 

Much like our Expected Goals (xG) models, the purpose of this model is to estimate the probability of success. Only, in this case, a success is a pass that is completed rather than a shot that is scored. For example, if Player A is passing the towards Player B, we can assign a likelihood of that pass being completed. We do this based on a variety of factors, such as the circumstances and player postion on the field, pass type, and the direction of the pass. And in this case, we opted to use a gradient-boosted ensemble of decision trees (GBM) rather than a logistic regression model (GLM). 

Read More

Adjusting team xGoals

Adjusting team xGoals

By Matthias Kullowatz (@mattyanselmo)

When we produced the game-by-game expected goals results last week, we were surprised to see that Seattle had outpaced Portland 4.0 to 1.7. That didn't feel right, but it didn't take long before we noticed that Seattle recorded five shots inside the six-yard box leading up to its first goal. Those shots added up to more than 2.0 expected goals, despite the fact that soccer's rules limit scoring to one goal at a time. 

Read More

Introducing interactive data at ASA

Introducing interactive data at ASA

You know how you go to some sports websites and you can sort and filter their data, and there are lots of options and it looks cool and stuff? Well starting today, we’re rolling out interactive versions of our stats that also look cool.  You can find the link up at the top under "xG Interactive Tables." This first iteration focuses on shot stats and expected goals, and it gives you guys more ways to filter and explore the data.

Read More

Validating the ASA xGoals Model

Validating the ASA xGoals Model

It was more than two years ago that we built the current model for determining the expected goals of each shot, so let’s go back and see how it’s doing. I've included some R code for fitting our generalized linear model (GLM), as well as a gradient-boosted tree model (GBM) for making comparisons. I selected the training dataset to be shots from 2011 - 2014, and the validation dataset to be shots from 2015 and 2016. Actual and predicted goals per shot are shown across each variable of the model.

First, I fit the original model as seen on the ASA website. This is a logistic generalized linear model, which is designed to predict the probability of binary outcomes like shots (goal vs. not goal). Coefficients will differ somewhat from what we posted long ago, as this is a different training dataset.

Read More

MLS Playoff Projections

By Matthias Kullowatz (@mattyanselmo)

In preparation for the beginning of the MLS Playoffs on Wednesday, we're rolling out projections for each subsequent round. Throughout the playoffs, you can find them under the "Projections" tab in the upper right. First, let's take a look at what our simulation spit out, and then I'll explain what the simulation was thinking.

Team Quarters Semis Finals Cup Winners
NYRB 1.000 0.672 0.445 0.326
CLB 1.000 0.511 0.197 0.130
MTL 0.632 0.331 0.139 0.089
VAN 1.000 0.554 0.288 0.089
FCD 1.000 0.526 0.270 0.082
NE 0.496 0.211 0.099 0.064
TOR 0.368 0.145 0.082 0.051
LA 0.429 0.227 0.126 0.043
POR 0.591 0.257 0.117 0.039
SEA 0.571 0.242 0.106 0.033
SKC 0.409 0.195 0.092 0.028
DCU 0.504 0.130 0.037 0.026

The simulation is designed to follow the new MLS Playoffs format. Two-legged series, which occur in the conference semifinals and finals, are modeled using simulated scores from a bivariate Poisson model. This allows us to both precisely project outcomes, and to update the probabilities after game one of such a series. 50,000 iterations of the MLS Cup Playoffs are run, and the outcomes from those iterations are summarized to produce the projections you see above.

It should come as no surprise that the Red Bulls are far and away the most probable team to win the Cup. They have dominated our power rankings for weeks, and their 32.6% chances at winning the cup line up very closely with what we gave 2014's favorite LA Galaxy (33.4%) and 2013's favorite Sporting KC (30.2%). New York led the league in both actual goals scored and expected goals scored, and the model has found that goal scoring is more predictive of future success than goal allowing. This is why they have topped our power rankings for so long.

It should also come as no surprise that D.C. United received our worst probability of winning the Cup. Despite home-field advantage, DCU is only given 50.4% chances of beating New England in their play-in game. DCU's expected goal differential is bad, and their actual goal differential is surprisingly bad. They are the only playoff team with a negative xGD, and the only playoff team with a negative GD. In other words, even if you don't subscribe to how xGoals handles DCU, actual goals doesn't like them either. 

I think seeing Columbus and Montreal with the next-best chances of winning the Cup is a bit confusing at first, but it actually makes perfect sense. If either of those teams has to face NYRB, they will do so in a two-legged series where home-field advantage is largely stripped away. On the other coast, whichever Western Conference team makes the final has a good chance (44.5%) of playing in New York in that one-game championship. Essentially, when and how you play New York largely determines your probability of winning the Cup.

Speaking of home-field advantage, we account for it with two processes. First, the model knows who's playing at home, and adjusts outputs accordingly. That has been true with our Playoff Push all season. Second, the two-legged series are set up such that if teams tie on goals, and on away goals, they will play two 15-minute overtime periods followed by penalty kicks if necessary. Additionally, that will only happen on the higher seed's turf. Our simulation determines if such an aggregate-tie occurs, and then indirectly gives the home team (also the higher-seeded team) a slight advantage in extra time. We regress the home team's 90-minute probability of winning, conditional on not-tying, halfway back toward 50%. This is an approximation to what FiveThirtyEight has done with extra time, where the better teams are still given advantages in what is not a 50-50 outcome.

Anyway, enjoy the playoffs! And check back for updated projections. 

How can Portland play Seattle in the play-in round?

Thanks to Drew's work this week on playoff scenarios, enumerating many of the scenarios that would lead to a potential repeat of this is now a simpler task. One interesting note I discovered working through these scenarios: there is only one scenario related to Seattle and Portland in which today's goal differential might matter. If San Jose ties, and Seattle loses by at least three goals, then San Jose could take Seattle's seed. Otherwise, no goal differential today, no matter how lopsided, can determine the fate of Seattle or Portland.*

Here are the results that would lead to a Timbers-Sounders one-game playoff.

Read More