Introducing Goals Subtracted: Where you aren’t but you oughta be

By Matthias Kullowatz

While valuations of offensive actions in soccer are, by no means, perfect, they are still significantly more accurate and meaningful than how we evaluate defensive actions and players’ defensive contributions. In a challenge-accepted moment of weakness, we took a stab at better assigning a Goals Added (g+) equivalent for defense: g- (“g minus”). What we’re about to share will blow your mind reinforce just how hard it is to quantify the value of an individual’s defensive actions, but hopefully I can also entertain you down this rabbit hole we’ve been playing around in for more than a year. 

At the team level, evaluating defensive efficacy is actually not that hard. Expected goals allowed (xGA) or g+ allowed do a pretty good job of ranking teams’ defensive prowess in a meaningful way that predicts their future defensive performance. But how do you take all the goals, xG, or g+ allowed and allocate to individual players? It’s clear that Interrupting goals added–the value gained by making defensive actions–falls well short here. Tiotal Football had this to say when g+ came out in 2020:

“When a player interrupts or stops a +0.030 scenario by tackling the ball away to generate a +0.005 net goal difference scenario for their own team, the model awards +0.035 [Interrupting g+] to the defender (the change in the two game states). But what about all the times the player was not in position to intercept the pass or make the tackle or block the shot? And what about the fact that the defender is part of a defensive unit that allowed its opponents to move the ball [into the] +0.030 net goal scenario in the first place? It is a problem.”

Goals minus (g-) in theory seems like a pretty good idea, so I’ll spend a few thousand words here explaining how we tackled it, what was hard, and what the results look like today.

Allocation methodology

At its core, g- is the allocation of a team’s g+ allowed to individual players. When the opposition splits the center backs with a through ball that puts their striker in on goal, that’s a lot of value the defense just gave up without any interrupting action for us to measure. Clearly in that example the center backs carry some responsibility and should be penalized, but also the defensive midfielder(s?) may be partially at fault for not being able to pressure the ball enough to deny such a pass. Other people probably screwed up, too! Some combination of players defending that pass need to take the blame, and g- is essentially the assignment of that blame. One caveat is that we don’t have tracking data, so we don’t actually know the center backs got split. As you’ll see, the on-ball-events methodology must revolve around estimating which sets of players should have been defending which actions, and how to apportion blame around. (Somewhere down there I’ll also rant about how tracking doesn’t fully solve this issue).

Zones

We opted to evaluate fixed, rectangular zones discretely—not because this is the clear best  approach, but because using individual players’ customized convex hulls (or whatever) seemed significantly more difficult. If we’re allocating the g+ allowed in the back left corner, clearly the left back should be more responsible than the right winger. Our method should assign a greater portion of 100% total responsibility to the left back (and that’s what it does). The allocation math details are below, but an important question for this section is how many zones to use and where to draw them. Too many small zones leads to small sample size issues in each zone, while too few large zones is not specific enough to distinguish between adjacent defensive positions (e.g., left back and left center back). For this draft, we opted to use a 30-zone grid, as we do in the app, starting with zone #1 in the right back’s defensive corner and ending with zone #30 containing the left forward/winger’s corner flag.

Positions

When a player is on the field, they will be responsible for various zones. That is, except for goal keepers, who were left out of this iteration because they play defense in a very different way than any other position. So for all other positions, a player’s allocation of responsibility will be based on where their general position tends to play, and also how their teammates’ positions tend to play. To the degree that certain positions are specific to certain formations, this approach takes formation into account. However, we are not directly taking formation into account, nor are we trying to granularly assess each player’s role. When we ask “where should this player be defending?” we answer with “wherever people of their position tend to play.” 

There are potential weaknesses to this methodology - some fullbacks in a 4-4-2 may be regular contributors to the attack (think Carson Pickett or Pedro Santos), whereas others may play a defense-first role (e.g. Alex Roldan or Reggie Cannon). Should the Picketts of the world get more leeway defensively if their role is to play more attacking that the average fullback? Should Roldan get more for the opposite reason? Perhaps, but if the model is working properly then they should offset their g- debts with g+ credits. Just because the model evaluates players of a position in the places it is most commonly played doesn’t mean that’s the best way to play the position… but for the purposes of our model it does.

Patterns and Phases of Play

If an alien were to drop down and watch a soccer game, they might note that there are two different games being played: one when all the humans and the sphere are moving (open play!) and one when most humans are stopped waiting for one human to kick the sphere (set pieces!). For this analysis, we have only focused on open play, defined by the following conditions:

  • The possession began via a defensive action with no stoppage of play, or

  • The possession is more than 10 possessing actions removed from a corner kick

  • The possession is more than five possessing actions removed from a goal kick or throw in

  • The possession is more than 10 possessing actions removed from any other sort of free kick

In some cases, I may have been a little strict here. Arguably, the game settles into open play faster than five actions after a throw in, for example, but I think these conditions represent a reasonably conservative definition for this first iteration. This filter left us with slightly more than 50% of all actions in the dataset.

Related to the pattern of play is the phase of play—buildup vs. counter, for example—which can impact where lie a player’s defensive responsibilities (whether they are attacking and about to have to start defending, or whether they are defending). For this analysis we avoided breaking the game down further into phases of play, but we had some ideas for what this might look like. 

  1. The first idea revolves around labeling the dataset with phases. This would likely require a few someones to watch tape and manually label some reasonable sample of possessions–or sub-possessions–as various phases of play, followed by training an ML algorithm to learn from context clues in the data to identify those phases, and then using that ML model to label the rest of the data. 

  2. The second idea we discussed is more a heuristic. We would break the game down further by features like length of the prior possession, where the prior possession ended, and length of the current possession. Presumably that would effectively classify a decent percentage of the phases that we would otherwise spend much more time labeling in idea #1. But instead of doing it, we just talked about it and probably also made fun of the Seattle Sounders, and thus that segmentation across phases of play was not performed for this analysis.

Gamestates

Clearly players and teams play differently when they are down a goal in the 89th minute versus when they are up a goal at the same minute mark. Halves and scorelines may be the largest driver of where players “should be”, so we further broke down a position’s responsibility (or allocation) based on half and scoreline—i.e., gamestate. We did not take into account player differentials due to red cards here, but that seems like a perfectly reasonable segment to consider. The primary issue there is small sample size, but because we are averaging across all players within a position, we have some room to segment down further.  

Math

We start by applying the filters and segments above to every position across all thirty zones to derive a frequency distribution of where that position plays. One final, completely arbitrary, adjustment we made is to weight defensive actions three times greater than offensive actions in determining where positions play. We are trying to suggest where a player should cover defensively, but it seems that some input from where the player plays offensively can provide information about where they should be defending. Here, as an example, is the frequency distribution of touches for defensive midfielders in 2021.

Weighted fraction of touches by zone for DMs, 2021

Next, we broke each game down into segments separated by goals, substitutions, and halftime timestamps. This step also gave us segments where the same set of players were on the field for the entire segment, so we could more easily divide up debits to fixed units. That’s important, because the goal here was to take the global positional responsibility allocations from above and apply them granularly to players on the field during specific segments of each game. During each game segment, we adjusted each player’s responsibility relative to their teammates’ positional expectations. Later, we needed to allocate exactly 100% of the g+ allowed in that zone during that game segment, so we eventually normalized each zone to add up to 100% allocation.

The normalization math here is, actually, pretty simple: within a zone, add up all the team’s positions’ frequency percentages in that zone (among those positions on the field at that time), and multiply by whatever factor makes the sum equal to 100%. In a very simple version of soccer where there are only two zones and two defenders, suppose the first defender’s frequencies were 25% in Zone One and 75% in Zone Two. Then suppose the second defender’s frequencies were 60% and 40%. Note that the second zone is more popular (75% + 40% = 115%). So for that zone, we divided both players’ individual frequencies by 1.15 to get about 65% and 35%, respectively. Then, for the first zone, we divide both individual frequencies by 60% + 25% = 85% to get about 30% and 70% respectively. Now each zone has exactly 100% of its responsibility allocated to the players on the field at that moment, according to where players at those positions tend to play. This is crucial for allocating to its players all of a team’s g+ allowed in a meaningful way, leaving us with player g- that sums to team g-. 

In the first positional weighting step, why would we use individual position percentages instead of something like touches per minute? By first normalizing at the position level to frequencies summing to 100%, we don’t overweight all the responsibility to defenders who touch the ball the most. That approach would lead to an outcome that overburdens defenders with really high responsibility allocation. While defense is literally in their job title, the g- “penalties” just didn’t look reasonable when I checked that out. More on this later, but as an example of where that went wrong, attacking players who track back to get into the mix ended up losing more value by simply being in more places more often. That does not seem like what we want to do here if we can help it.

Even though the math itself is not particularly fancy—you can do it all on a four-function calculator—it’s easy to get lost in what’s really going on. I’ve been lost for a while. So let’s take a look at an example. Sticking with our defensive midfielders, here’s Diego Chara’s 2019 season of average allocation of defensive responsibility by zone and gamestate. Note that there is nothing constraining this to add to 100%. Effectively this means that positions that make more defensive actions will still tend to take on larger shares of the defensive actions than their teammates in other positions. 

Average allocation of defensive responsibility for Diego Chara, 2019

g- Calculation

The final step is to factor in the amount of g+ allowed within game segments. Recall that game segments are separated by goals, substitutions, and halftime timestamps, so we can line up zonal responsibility—derived in combinations of various gamestates, as seen above—to each game segment and allocate the g+ allowed around to all defenders as g-. If the right center back has 35% responsibility for zone 7 (the zone bordering the top-right of the 18-yard box), then they’ll be penalized 35% of the g+ that flowed through that zone during that game segment. We constructed the responsibilities to sum to 100% in each zone so that 100% of team g+ allowed gets allocated to individuals.

Because allocations are derived from league-wide positional locations, the allocation portion of this current iteration of g- is not really specific to the team at all–well, outside of some formations having two attacking midfielders or three center backs or something like that, but that wouldn’t make a huge difference. But the point I want to make is that the g+ allowed is specific to the team and the players on the field during any particular game segment. So if a team is consistently getting beat up one wing, then the back on that side will take a larger hit, which seems fair.

Tracking data

When those center backs got split in our earlier example, tracking data could have told us who they were and how close they were to the ball, and a model using tracking data may have even been able to predict what proportion of center backs would be able to make that play, or something like that. But who should have been making that pass impossible to complete by pressuring the ball? The point I’m trying to make here is that tracking data doesn’t simply solve this whole problem of defensive responsibility, and I think some of the methodology and thoughts presented in this article would apply similarly to tracking data. How g+ allowed is allocated to individual players–separating out open play from set pieces, construction of and segmentation by phases of play–all of these concepts would still need to be explored with tracking data.

Attackers’ Defensive Value

Attackers don’t tend to defend high value regions. The g+ allowed on passes between a goal keeper and center back, for example, are worth virtually nothing. Thus, this whole approach doesn’t penalize attackers much because their highest allocation zones are also worth the least in g+ allowed. From the team’s perspective, the value of defending your attacking half is not about stopping gobs of g+ from flowing through, but more about earning high value turnovers, starting your possessions in an advanced region of the field, and occasionally poaching a dangerous counter inside your opponent’s half. Attackers who press well should get some decent Interrupting g+ for their work, and on the side, g- will reward them a little for forcing clearances and passes out of bounds.

But this leaves attackers—who already score the highest as a position in offensive g+—with some of the best defensive scores, as well (sum of interrupting g+ and defensive g-). Using the magical handwaving trick we employed with g+, we normalized g- scores by nominal position with g- above average and g- above replacement. The results of that normalization are shown in the Results section.

Results

When we combine (raw, position-agnostic) interrupting g+ with g-, we see that the best seasons from 2019 and 2021 come from a wide variety of positions…except center backs. Looking at the worst seasons from 2019 and 2021, they are all center backs. Clearly removing your center backs and adding two strikers is not a valid strategy, so something is missing here, and I’m still not sure what.

Best player-seasons by raw combined defensive rating, 2019 and 2021
Player Team Season Minutes Position g-/96 Int. g+/96 (g- + g+)/96
Latif Blessing LAFC 2019 2957 RCM -0.058 0.072 0.014
Claudio Bravo POR 2021 2242 LB -0.098 0.108 0.010
Dru Yearwood NYRB 2021 1851 DM -0.058 0.066 0.008
Judson SJE 2019 2068 DM -0.097 0.105 0.008
Jordan Harvey LAFC 2019 2560 LB -0.059 0.064 0.004
Déiber Caicedo VAN 2021 2449 F -0.051 0.053 0.002
Cristian Dájome VAN 2021 2890 F -0.048 0.049 0.001
Julian Araujo LAG 2021 3056 RB -0.084 0.084 0.000
Mark-Anthony Kaye LAFC 2019 2730 LCM -0.060 0.059 -0.001
Worst player-seasons by raw combined defensive rating, 2019 and 2021
Player Team Season Minutes Position g-/96 Int. g+/96 (g- + g+)/96
Steve Birnbaum DCU 2021 1773 RCB -0.146 0.008 -0.138
Botond Baráth SKC 2019 1915 RCB -0.131 -0.007 -0.138
Bill Tuiloma POR 2019 2138 RCB -0.190 0.049 -0.141
Omar González TOR 2021 2510 RCB -0.188 0.046 -0.141
Matt Besler ATX 2021 1938 LCB -0.169 0.023 -0.146
Giancarlo Gonzalez LAG 2019 1602 LCB -0.207 0.060 -0.147
Aljaz Struna HOU 2019 2883 RCB -0.156 0.000 -0.157
Nick Hagglund CIN 2019 1982 RCB -0.253 0.080 -0.173
Tommy Smith COL 2019 2659 LCB -0.216 0.038 -0.178
Julio Cascante POR 2019 1610 LCB -0.192 0.000 -0.192

Unsurprisingly, when we aggregate by position we see center backs getting hit the hardest. While centerbacks do record more interrupting g+ on average than other positions, it’s not enough to make up for the large disparities in g+ allowed (i.e., g-).

Position Minutes Int. g+ g- g-/96 Int. g+/96 (g- + Int. g+)/96
RCM 81,822 39.28 -70.22 -0.08 0.05 -0.04
LB 113,603 55.56 -99.35 -0.08 0.05 -0.04
LCM 69,375 31.95 -59.24 -0.08 0.04 -0.04
RM 60,766 18.94 -43.45 -0.07 0.03 -0.04
DM 159,495 87.94 -152.53 -0.09 0.05 -0.04
LM 59,008 16.25 -42.49 -0.07 0.03 -0.04
RB 137,365 69.93 -131.03 -0.09 0.05 -0.04
F 181,660 39.00 -120.55 -0.06 0.02 -0.04
AM 78,336 22.48 -58.19 -0.07 0.03 -0.04
CB 27,422 16.83 -34.43 -0.12 0.06 -0.06
LCB 142,480 86.40 -202.90 -0.14 0.06 -0.08
RCB 142,308 80.84 -202.44 -0.14 0.05 -0.08

So, we employ the hand-wavey trick of simply normalizing every player’s defensive rating around their position’s average. Now check out the top 10 players by defensive rating. There are now six center backs in the top 10, but this makes sense. Center backs have the widest array of bad-to-good raw g-, so when we center their ratings around positional averages, their most extreme representatives are going to be in the top and bottom 10s. Indeed, though not shown here, eight of the bottom 10 in normalized defensive rating are center backs. But now every position has the chance to be best or worst!

Best player-seasons by normalized defensive ratings, 2019 and 2021
Player Team Season Minutes Position g-/96 Int. g+/96 (g- + g+)/96
Yeimar Gómez Andrade SEA 2021 2932 RCB 0.034 0.040 0.074
Alexander Callens NYC 2021 2293 LCB 0.024 0.038 0.062
Marco Farfan LAFC 2021 1934 LCB 0.052 0.005 0.056
Jack Elliott PHI 2021 3289 LCB 0.020 0.037 0.056
Latif Blessing LAFC 2019 2957 RCM 0.024 0.026 0.050
Eddie Segura LAFC 2019 3246 LCB 0.038 0.011 0.049
Dru Yearwood NYRB 2021 1851 DM 0.034 0.013 0.047
Andy Najar DCU 2021 2093 RCB 0.052 -0.005 0.047
Judson SJE 2019 2068 DM -0.005 0.052 0.047
Claudio Bravo POR 2021 2242 LB -0.014 0.061 0.047

Conclusion

Evaluating defense is hard, whether you’re doing it with your eyes or your spreadsheets. g- (“g minus”) is our attempt to better evaluate defense using the common currency of goals, allocated to players all over the field. Goals Added gave us a tool to quantify the value of actions all over the field, and here we inverted those values to debit defenders rather than credit attackers. There’s a nice symmetry to it, but did it work? 

I think there are some positives to be taken from the Results section before we rehash the clear shortcomings. In the normalized defensive ratings–i.e., the combination of g- and Interrupting g+–we got players on good defensive teams. This was expected, because we know that player g- sums to team g+ allowed (basically), and it’s a good thing. Good defensive teams are made up of good defensive players–or at least, players who are so good offensively that it becomes good defense. I actually kind of like that it might be valuing that, too. 

But we also got Bravo from Portland’s 2021 team, 3rd worst by team g+ allowed that season, and Judson from San Jose’s 2019 team, 9th worst by team g+ allowed. These players go against the grain, finding a way to perform well here on bad teams. Yes, their strengths were their interrupting prowess, not necessarily their ability to prevent g+ from flowing through their zones, but I think that was exactly what Tiotal Football was getting at in the introduction. Interrupting g+ is mostly meaningless unless you can control for the opportunities the player had to interrupt in some way–like a denominator to help control the numerator. On the offensive side, if a player completes 100 passes per game, what does it matter if he turns it over 100 times, as well? Passes attempted, and the more advanced expected passes completed, become that control. I think maybe we have created something that can help control for otherwise unmitigated Interrupting g+! 

There were also many shortcomings and challenges presented throughout this deep dive into g-. At its core, determining defensive responsibility is hard no matter how much data you have. Was that player actually covering for another who deserves the blame? Was the attacker out of position because their role is to, well, attack? For this metric to be a meaningful compliment to g+ and overall value, should we even care what their role is? Tracking data does not magically solve any of this.

Looking ahead, how are we going to evaluate set pieces? Is there a better approach for debiting attacking players without relying on the positional normalization trick? Should we include goal keepers? At this point I’m just full of questions with few answers. In the end, it seems like this iteration of g- could be useful, while still falling well short of a silver bullet to defensive evaluation.