Using k-means to learn what soccer passing tells us about playing styles

By Cheuk Hei Ho (@tacticsplatform) and Eliot McKinley (@etmckinley), colloquially known as CheuKinley

When you talk about a soccer team, you almost always talk about its style: high-pressing, possession-heavy, parking-the-bus, etc. A team’s style not only signifies how they play on the field but also reflects its coaching. Since there aren't guidelines on how the style of the team should be defined, everyone uses their own rules and we can't directly compare each other's descriptions.

An accurate quantitative description of the style is needed. It can help one to properly analyze not only the opponent's team but also his/her own team. With an accurate method to describe the style, one can scientifically evaluate if a training exercise is efficient at serving its purpose. We previously have used dimension reduction technique, t-SNE, to find MLS teams with similar styles based on the spatial distribution of activities and pass networks. This time we use a different method, k-means clustering of pass types, to quantitatively measure the style, tactical specialization, and the influence of coaching on a team’s system.

K-means clustering of passes

We used k-means clustering of pass types to quantify the styles of the teams in MLS. K-means clustering is a machine learning algorithm that separates data points into a user selected (k) number of clusters based upon their similarities. If you think that two clusters define the groups you want, you will choose k=2. If you think it is 10, choose 10. In our case, after using the elbow method and visual inspection, we chose to classify passes into 64 different groups based upon how and where passes were made. We want to note that using k-means clustering has been used many other times to describe passing behavior in soccer (and we used it, in part, to classify player positions). We extended previous work by using z-scores to standardize the quantification of each pass group. Then by filtering pass clusters based on z-scores we can find characteristic pass patterns for every team.

This visualization combines the features of both pass network and touch heatmap. It shows what areas a team utilizes the most and how (what type of passes) it uses to access this zone. For example, last season, Atlanta used long horizontal passes to stretch the opponent while Kansas City camped outside the opponent’s box with its possession dominance. By plotting distinctive pass types this way, we can also see how a team evolves under a coach. For instance, Tata Martino had clearly instructed how Atlanta played out from the back, however, it was a work-in-progress in the first year. They got the build-up part right but had trouble transitioning into the attack. With another full season to practice, they exploded into one of the best offensive teams in MLS history in their second season. 

By varying the z-score to filter the data, you can look at the under-presented pass types and choose the degree of representation. In 2018, Columbus did not utilize long passes out of the back often, LAFC was less likely to cross from the flanks, and Portland didn’t pass from central locations back towards their own goal.

Tactical specialization

Using z-scores not only gives us a standardized score to evaluate the degree of representation of each pass cluster but also a quantitative measure of a team’s tactical specialization. Each z-score measures how much different a team is in using one type of passes compared to everyone else. If we take the median of the absolute value of the z-scores (since because both over- and under-representation equate to specialization, thanks for the idea, Dummy Run) per team, we approximate how much different a team is to everyone else.

Specialization does not necessarily mean a team is good or bad. There is only a weak, but significant, correlation between specialization and expected goal difference (R = 0.24, p = 0.007). In fact, two of the most specialized teams (>99th percentile) in the last seven years are New York City FC in 2016 and Colorado in 2018. Their most over-represented pass types are those that couldn’t get across the half-line. They are basically specialized in not passing forward. A non-ideal method of winning games, to say the least. The full table of specialization scores is at the bottom of this post.

The specialization scoring confirms some eye tests while refutes the others. For example, New York Red Bulls are believed to be the most distinctive franchise in MLS. The top five most distinctive teams from the last seven years include three Red Bulls, all under the supervision of Jesse Marsch (and Chris Armas last year). In contrast, many pundits believe that Columbus Crew under Gregg Berhalter played with a very unique style. However, their specialization scores suggest that they have been less specialized than most teams in the last four seasons. These are good examples of how an objective measure of style can help judge whether our subjective opinion stands.

Coaching influences tactical systems

The specialization score only tells us whether a team is different from everyone else, but it doesn’t tell us whether two teams are similar or not. Two teams can have very similar specialization scores but they can be specialized in different ways. Quantifying the way two teams play can tell us how coaching change or player turnover can impact the play style of the team.

To quantify the similarity of the play styles of the same team in two consecutive seasons, we calculate the Euclidean distance of the z-score for each cluster between seasons. We then do another z-score to standardize the resultant score and calculate a percentile to determine how the change between two years are compared to every other transition in the last seven seasons (note: above 50 is greater than average difference, below 50 less than average difference):

seasons.png

A coaching change seems to be the strongest driver in the evolution of the play style; even though the New York Red Bulls are the most distinctive franchise in the MLS, their style has been consistent under Marsch since 2016. Large differences in similarities were seen in Columbus when Gregg Berhalter took over for Robert Warzycha (2013-2014), NYCFC transitioning from Jason Kreis to Patrick Vieira (2015-2016), and New England in Brad Friedel's first season (2017-2018) after years of below average change under Jay Heaps. However coaching changes don’t always bring change, Portland, San Jose, and LA Galaxy showed less than average change when moving to new coaches. Interestingly, since 2015, SKC has shown increased year-over-year differences under Peter Vermes. While Ben Olsen and Pablo Mastroeni showed wild swings year-to-year during their respective tenures at DC United and Colorado.

Conclusion

Our next steps will be to link our quantitative measurement of the style to some forms of performance index. For example, some teams may predominantly use a pass type, but at a low success rate. In that case, a coach may want to decide how important that cluster is for the team’s function. He or she may want to introduce a new training regimen to improve the performance of that pass type, use different players in those positions, or even alter the pass routes to bypass it. We can look at the outcome of the style by linking pass clustering with the pass chain concept and rate them with Expected Goal Chain. This way, we can find all groups of passes that produce the most damage for any team. Imagine three linked forward pass clusters in which the middle cluster is under-represented and sandwiched by two over-represented ones. Immediately you will know that the under-represented cluster is the weakest link; your team may use other actions such as dribbles or carries to move the ball through that area. The coach may want to instruct his/her players to pass more than they are doing. The opponent’s coach may want to hit that area or player.

Applications like these are the tip of the iceberg in how this type of analysis can help coaching. Things like this can provide “actionable insights”, the holy grail of the soccer analytics.

Below: Over- and under-represented pass clusters for every team in each MLS season since 2013.

Season Team Specialization Score Rank
2013 Chicago 1.47 10
2013 Colorado 0.08 48
2013 Columbus 1.41 12
2013 DC United -0.99 107
2013 FC Dallas -0.35 71
2013 Houston -0.99 108
2013 Kansas City -0.77 98
2013 L.A. Galaxy 0.43 34
2013 Montreal 1.54 9
2013 New England 0.57 29
2013 New York 0.27 40
2013 Philadelphia -0.06 53
2013 Portland -0.35 70
2013 Salt Lake -0.74 96
2013 San Jose 0.53 31
2013 Seattle -0.22 65
2013 Toronto -0.30 68
2013 Vancouver 0.22 44
2013 Chivas 1.04 17
2014 Chicago -0.70 92
2014 Colorado -0.77 99
2014 Columbus -0.11 59
2014 DC United -0.46 80
2014 FC Dallas -0.71 93
2014 Houston -0.46 79
2014 Kansas City -1.18 112
2014 L.A. Galaxy 1.78 7
2014 Montreal -0.40 76
2014 New England 1.67 8
2014 New York 0.26 41
2014 Philadelphia 1.13 16
2014 Portland -0.57 84
2014 Salt Lake -1.41 121
2014 San Jose 0.05 49
2014 Seattle 0.39 36
2014 Toronto -0.68 91
2014 Vancouver -0.93 104
2014 Chivas -0.72 94
2015 Chicago -1.28 116
2015 Colorado -1.14 109
2015 Columbus 2.17 6
2015 DC United 0.12 47
2015 FC Dallas -0.53 83
2015 Houston -0.10 57
2015 Kansas City -1.43 123
2015 L.A. Galaxy -1.20 113
2015 Montreal -0.61 87
2015 New England 1.19 15
2015 New York 0.39 35
2015 New York City FC -0.39 75
2015 Orlando City -0.33 69
2015 Philadelphia -0.07 54
2015 Portland 0.63 27
2015 Salt Lake -0.46 81
2015 San Jose -0.50 82
2015 Seattle 0.74 23
2015 Toronto 0.29 37
2015 Vancouver -0.08 55
2016 Chicago 0.65 26
2016 Colorado -1.17 111
2016 Columbus -0.20 64
2016 DC United 0.29 38
2016 FC Dallas -0.58 85
2016 Houston 0.22 45
2016 Kansas City -0.72 95
2016 L.A. Galaxy -1.35 120
2016 Montreal -1.15 110
2016 New England 1.36 13
2016 New York 2.48 4
2016 New York City FC 3.14 2
2016 Orlando City -0.61 86
2016 Philadelphia -0.16 60
2016 Portland -0.42 77
2016 Salt Lake 0.17 46
2016 San Jose -0.80 100
2016 Seattle -0.81 102
2016 Toronto -1.24 114
2016 Vancouver -0.09 56
2017 Chicago 0.46 33
2017 Colorado 0.77 22
2017 Columbus -0.61 88
2017 DC United -0.95 105
2017 FC Dallas -0.38 72
2017 Houston 0.69 24
2017 Kansas City -0.10 58
2017 L.A. Galaxy -1.33 119
2017 Montreal 0.57 30
2017 New England -0.85 103
2017 New York 2.31 5
2017 New York City FC 0.65 25
2017 Orlando City -0.38 73
2017 Philadelphia 0.29 39
2017 Portland -1.30 118
2017 Salt Lake -0.81 101
2017 San Jose -0.96 106
2017 Seattle -0.75 97
2017 Toronto -0.38 74
2017 Vancouver -0.06 52
2017 Atlanta United 0.24 43
2017 Minnesota United -0.17 62
2018 Chicago 0.82 21
2018 Colorado 2.91 3
2018 Columbus -0.19 63
2018 DC United -0.05 51
2018 FC Dallas -1.43 122
2018 Houston -0.65 90
2018 Kansas City 0.82 20
2018 L.A. Galaxy -1.28 117
2018 Montreal 0.52 32
2018 New England -0.25 66
2018 New York 3.79 1
2018 New York City FC 1.43 11
2018 Orlando City -0.04 50
2018 Philadelphia 0.96 18
2018 Portland -1.27 115
2018 Salt Lake 0.58 28
2018 San Jose -0.30 67
2018 Seattle 1.35 14
2018 Toronto 0.25 42
2018 Vancouver -0.44 78
2018 Atlanta United 0.86 19
2018 Minnesota United -0.62 89