The Expected Own Goals NWSL Awards Ballot

The Expected Own Goals NWSL Awards Ballot

Hi. Evan from Expected Own Goals here. The NWSL regular season becomes past tense after this weekend, and with most players’ seasons more or less fully formed, we thought it was the perfect time to lay down our marker for the annual player awards.

Read More

Europe, Money, and the Problem with Disparity

American Soccer Analysis has been in the analytics game since 2013, and, early on in this project, we noticed something that’s always troubled us when it comes to taking the seminal analytics studies and concepts developed in Europe and applying it to an MLS data-set. To put it frankly, they don’t work as well.

Read More

Coaches Reward Goalscorers. But Should They?

Coaches Reward Goalscorers. But Should They?

On March 30, 2019, the 16-year-old midfielder Gianluca Busio came on for Sporting Kansas City in a rout of Montreal. He didn’t do a whole lot in his half hour on the pitch—seven of his eight completed passes went backwards—but in the 78th minute he poked the ball away from a center back and slotted home his team’s sixth goal. The next week Busio was rewarded with a full 90 minutes and he scored again. The week after that, another appearance, a third straight goal. Coach Peter Vermes was sticking with the red-hot kid and it was paying off.

Alas, not all breakthroughs go as smoothly as Busio’s. On July 17, a teenage striker named Theo Bair earned his second career start for Vancouver. He made a couple of promising runs where he held off a New England defender and found a shot from a low cross, but neither chance connected. The first hit the far post and ricocheted out. Two minutes later, Bair reached back for a bouncing pass at the top of the six-yard box but couldn’t quite corral it. The shot sailed over the crossbar from embarrassingly close range and Bair tumbled head over heels into the goal, where he slapped the grass in frustration. He was subbed off, and next game he only appeared for the last 14 minutes.

Read More

Tactics, Talent, and Success: Diversity in Scoring and Chance Creation

Tactics, Talent, and Success: Diversity in Scoring and Chance Creation

I’ve been wondering for some time about soccer teams’ reliance on star power and top statistical producers. Is it really a good strategy? Are teams with one main goal scorer or playmaker easier to “figure out”? When the game is on the line, is a singular threat easier to neutralize than a team with a plethora of attacking options? And would this kind of reliance actually hamper a team’s success across a season?

My skepticism must seem foolish to European executives, given the huge fees Gonzalo Higuain and Paul Pogba went for this summer. But the conventional wisdom is different in the American sports landscape. In our most popular sports, one person simply can’t do it all. Here, Defense Wins Championships. The San Antonio Spurs, the best NBA team of the past two decades, emphasize team play over everything. Peyton Manning was completely underwhelming in both of his Super Bowl wins, needing his incredible teams to carry him to glory. One star pitcher or one star hitter is simply not capable of winning a World Series on their own. The anecdotal evidence even appears in MLS. Chris Wondolowski’s 27 goals in 2012 didn’t get the Earthquakes past the first round of MLS playoffs; neutralize MVP Sebastian Giovinco, and 2015’s Toronto FC didn’t have much else to offer.

More after the jump.

Read More

Do expected goals models lack style?

By Jared Young (@JaredEYoung)

Expected goals models are hip in the land of soccer statistics. If you have developed one, you are no doubt sporting some serious soccer knowledge. But it seems to be consistent across time and geography that the smart kids always lack a bit of style.

If you are reading this post you are probably at least reasonably aware of what an expected goals model is. It tells you how many goals a team should have scored given the shots they took. Analysts can then compare the goals actually scored with the goals a team was expected to score and use that insight to better understand players and teams and their abilities.

The best expected goals models incorporate almost everything imaginable about the shot. What body part did the shooter connect with? What were the exact X,Y coordinates of the shooter? What was the position of the goalie? Did the player receive a pass beforehand? Was it a set piece? All of these factors are part of the model. Like I said, they are really cool.

But as with all models of the real world, there is room for improvement. For example, expected goals models aren’t great at factoring in the number of defenders between the shooter and the goal. That could force a higher number of blocked shots or just force the shooter to take a more difficult shot than perhaps they would like to. On the opposite end of that spectrum, perhaps a shooter was wide open on a counterattack, the models would not likely recognize that situation and would undervalue the likelihood of a goal being scored. But I may have found something that will help in these instances.

I recently created a score that attempted to numerically define extreme styles of play. On the one end of the score are extreme counterattacking teams (score of 1) and on the other end are extreme possession-oriented teams (score of 7). The question is, if I overlay this score on top of expected goals models, will I find any opportunities like those mentioned above? It appears there are indeed places where looking at style will help.

I have only scored one full MLS season with the Proactive Score (PScore) so I’ll start with MLS in 2014, where I found two expected goals models with sufficient data. There is the model managed here by the American Soccer Analysis team (us!) and there is the publicly available data compiled by Michael Caley (@MC_of_A). Here is a chart of the full season’s average PScore and the difference between goals scored and expected goals scored for the ASA model and Michael Caley’s model.

Both models are pretty similar. If you were to draw a straight line regression through this data you would find nothing in particular. But allowing a polynomial curve to find a best fit reveals an interesting pattern in both charts. When the Pscores are below 3, indicating strong counterattacking play, the two models consistently under predict the number of goals scored. This makes sense given what I mentioned above; teams committed to the counterattack should find more space when shooting and should have a better chance of making their shots. Michael Caley’s model does a better job handling it, but there is still room for improvement.

It’s worth pointing out that teams that rely on the counterattack tend to be teams that consider themselves to be less talented (I repeat, tend to be). But you would think that less-talented teams would also be teams that would have shooters that are worse than average. The fact that counterattacking teams outperform the model indicates they might also be overcoming a talent gap to do so.

On the other hand, when the PScore is greater than 4, the models also underpredict the actual performance. This, however, might be for a different reason. Usually possession-oriented teams are facing more defenders when shooting. The bias here may be a result of the fact that teams that can outpossess their opponent to that level may also have the shooting talent to outperform the model.

Notice also where most teams reside, between 3 and 4. This appears to be no man’s land; a place where the uncommitted or incapable teams underperform.

Looking at teams in aggregate, however, comes with its share of bias, most notably the hypothesis I suggested for possession-oriented teams. To remove that bias, I looked at each game played in MLS in 2014, home and away, and plotted those same metrics. I did not have Michael Caley’s data by game, so I only looked at the ASA model.

For both home and away games there does appear to be a consistent bias against counterattacking teams. In games where teams produce strong counter-attacking Pscores of 1 or 2, we see them also typically outperforming expected goals (G - xG). Given that xG models are somewhat blind to defensive density it would make perfect sense that counterattacking teams shoot better than expected. By design they should have more open shots than teams that play possession soccer. It definitely appears to me that xG models should somehow factor in teams that are playing counterattacking soccer or they will under estimate goals for those teams.

What’s interesting is that same bias does not reveal itself as clearly at the other end of the spectrum, like we saw in the first graph. When looking at the high-possession teams -- the sixes and sevens -- the teams' efficiencies become murkier. If anything, it appears that being more proactive to an extreme is detrimental to efficiency (G - xG), especially for away teams. The best fit line doesn’t quite do the situation justice. When away teams are very possession-oriented with a PScore of 6 or 7, they actually underperform the ASA xG model by an average of 0.3 goals per game. That seems meaningful, and might suggest that gamestates are playing a role in confusing us. With larger samples sizes this phenomenon could be explored further, but for now it's safe to say that when a team plays a counter-attacking game, it tends to outperform its expected goals.

Focusing on home teams with high possession over the course of the season, we saw an uptick to goals minus expected goals. But It doesn’t appear the case that possession-oriented teams shoot better due to possession itself, based on the trends we saw from game to game. It seems that possession-oriented teams play that way because they have the talent to, and it’s the talent on the team that is driving them to outperform their expected goals.

So should xG models make adjustments for styles of play? It really depends on the goal of the model. If the goal is to be supremely accurate then I would say that the xG models should look at the style of play and make adjustments. However, style is something that is not specific to one shot, it looks over an entire game. Will modelers want to overlay macro conditions to their models rather than solely focus on the unique conditions of each shot?

Perhaps the model should allow this bias to continue. After all, it could reveal that counterattacking teams have an advantage in scoring as one would expect.

If the xG models look to isolate shots based on certain characteristics, perhaps they should strive to add data to each particular moment. Perhaps an aggregate overlay on counterattacks would be counterproductive as it would take the foot off the pedal of collecting better data for each shot taken. Perhaps this serves as inspiration to keep digging, keep mining for the data that helps fix this apparent bias. Perhaps it’s the impetus to shed the sweater vest and find an old worn-in pair of boots. Something a little more hip to match the intellect.