Sporting exceptional at home; RSL lame on the road

It is true that Sporting has had trouble getting points at home. SKC earned 30 points at Sporting Park this year, good for 13th in a league of 19 teams. Based on that information alone, some will argue that Sporting is not a good home team. One of those people is Simon Borg, who justifies his viewpoint by pointing out that SKC lost five times at home, as though that matters. It doesn't.

I've shown that past points simply don't correlate well to future points. With information like shot ratios and expected goal differentials (xGD), points are essentially a meaningless indicator of team ability---or at the very least, a meaningless predictor. I see "predictor" and "indicator" as near-synonyms in this instance, but you may not. Regardless, Sporting's home points total should not even be considered in the discussion of who will win on Saturday. Why not? In addition to out-shooting its opponents in every single home game this season, here is how SKC did relative to the league in xGD this season:

Team GF GA GD xGF xGA xGD "Luck"
LA 32 8 24 32.1 11.0 21.1 2.9
SKC 29 15 14 28.7 11.6 17.1 -3.1
PHI 23 17 6 28.8 16.0 12.8 -6.8
NYRB 32 15 17 26.8 15.4 11.4 5.6
SEA 28 15 13 26.5 15.6 10.9 2.1
COL 28 16 12 25.3 14.5 10.7 1.3
HOU 23 16 7 28.0 18.9 9.1 -2.1
RSL 31 16 15 24.6 16.4 8.2 6.8
CHI 28 19 9 27.0 19.4 7.6 1.4
SJ 23 13 10 28.8 21.2 7.6 2.4
POR 28 11 17 23.9 16.6 7.2 9.8
CLB 19 13 6 25.6 18.8 6.8 -0.8
NE 29 15 14 22.2 17.4 4.8 9.2
MTL 31 19 12 25.1 20.5 4.6 7.4
FCD 28 21 7 24.2 19.8 4.4 2.6
VAN 32 18 14 23.7 19.5 4.2 9.8
DCU 16 27 -11 23.5 21.3 2.2 -13.2
TOR 22 21 1 18.5 19.1 -0.6 1.6
CHV 16 28 -12 18.9 26.0 -7.1 -4.9

SKC has a decent goal differential at home, but more importantly, it has the second-best expected goal differential at home. xGD is an excellent predictor of future success, and a better indication in my mind of true team skill.

Borg goes on to talk about the "road warriors" from Salt Lake City:

"They love playing on the road. Playing at home is too much pressure; they do it better when they're away from home."

No team is better on the road than at home, but whatever. RSL did tie for third in MLS this season with 22 away points earned, but again, we don't care. RSL out-shot it opponents in just five of 17 road games (29.4%), and, well this:

Team GF GA GD xGF xGA xGD "Luck"
SKC 16 15 1 19.3 18.2 1.1 -0.1
SJ 11 29 -18 20.5 21.3 -0.8 -17.2
LA 20 30 -10 18.5 20.2 -1.7 -8.3
FCD 18 28 -10 19.9 23.6 -3.7 -6.3
HOU 17 23 -6 21.7 25.4 -3.8 -2.2
POR 25 22 3 18.9 23.5 -4.6 7.6
COL 15 22 -7 19.9 24.9 -5.1 -1.9
NYRB 24 24 0 19.5 25.6 -6.1 6.1
PHI 19 26 -7 19.4 26.7 -7.3 0.3
NE 19 21 -2 16.1 23.7 -7.6 5.6
CLB 22 33 -11 17.1 26.0 -8.9 -2.1
SEA 11 27 -16 17.9 27.2 -9.4 -6.6
CHI 18 30 -12 20.5 30.0 -9.5 -2.5
MTL 19 29 -10 16.1 26.0 -9.9 -0.1
VAN 21 23 -2 16.6 27.7 -11.0 9.0
TOR 6 25 -19 15.9 27.3 -11.4 -7.6
RSL 25 25 0 17.4 29.7 -12.3 12.3
DCU 5 28 -23 11.9 26.2 -14.3 -8.7
CHV 12 38 -26 12.0 28.9 -16.9 -9.1

Real Salt Lake finished 17th in the league in expected goal differential on the road. Ouch. The fact that their actual goal differential was tied for third in MLS means very little, since xGD makes for a much better Nostradamus.

Unless expected goal differential completely falls apart in home-away splits---which is not likely---we can conclude that Sporting is a good home team, and RSL is a bad away team.

Our current model gives Sporting 72 percent probability of a win. An xGD model---which we don't use yet because we only have one season of data---increases those chances to 88 percent. There is a lot of evidence that Sporting is the better team, and that home field advantage still applies to them. Regardless of Saturday's outcome, those two statements are still well supported.

*Note that these goal statistics do not include own goals, which is why my figures may differ slightly from those found at other sites. 

Does last season matter?

We've shown time and time again how helpful a team's shot rates are in projecting how well that team is likely to do going forward. To this point, however, data has always been contained in-season, ignoring what teams did in past seasons. Since most teams keep large percentages of their personnel, it's worth looking into the predictive power of last season. We don't currently have shot locations for previous seasons, but we do have general shot data going back to 2011. This means that I can look at all the 2012 and 2013 teams, and how important their 2011 and 2012 seasons were, respectively. Here goes.

First, I split each of the 2012 and 2013 seasons into two halves, calculating stats from each half. Let's start by leaving out the previous season's data. Here is the predictive power of shot rates and finishing rates, where the response variable is second-half goal differential.

Stat

Coefficient

P-value

Intercept

-28.36792

0.04%

Attempt Diff (first 17)

0.14244

0.00%

Finishing Diff (first 17)

77.06047

1.18%

Home Remaining

3.37472

0.03%

To summarize, I used total shot attempt differential and finishing rate differential from the first 17 games to predict the goal differential for each team in the final 17 games. Also, I controlled for how many home games each team had remaining. The sample size here is the 56 team-seasons from 2011 through 2013. All three variables are significant in the model, though the individual slopes should be interpreted carefully.*

The residual standard error for this model is high at 6.4 goals of differential. Soccer is random, and predicting exact goal differentials is impossible, but that doesn't mean this regression is worthless. The R-squared value is 0.574, though as James Grayson has pointed out to me, the square root of that figure (0.757) makes more intuitive sense. One might say that we are capable of explaining 57.4 percent of the variance in second-half goal differentials, or 75.7 percent of the standard deviation (sort of). Either way, we're explaining something, and that's cool.

But we're here to talk about the effects of last season, so without further mumbo jumbo, the results of a more-involved linear regression:

Stat

Coefficient

P-value

Intercept

-31.3994

1.59%

Attempt Diff (first 17)

0.12426

0.03%

Attempt Diff (last season)

0.02144

28.03%

Finishing Diff (first 17)

93.27359

1.14%

Finishing Diff (last season)

72.69412

12.09%

Home Remaining

3.71992

1.53%

Now we've added teams' shot and finishing differentials from the previous season. Obviously, I had to cut out the 2011 data (since 2010 is not available to me currently), as well as Montreal's 2012 season (since they made no Impact in 2011**). This left me with a sample size of 37 teams. Though the residual standard error was a little higher at 6.6 goals, the regression now explained 65.2 percent of the variance in second-half goal differential. Larger sample sizes would be nice, and I'll work on that, but for now it seems that---even halfway through a season---the previous season's data may improve the projection, especially when it comes to finishing rates.

But what about projecting outcomes for, say, a team's fourth game of the season? Using its rates from just three games of the current season would lead to shaky projections at best. I theorize that, as a season progresses, the current season's data get more and more important for the prediction, while the previous season's data become relatively less important.

My results were most assuredly inconclusive, but leaned in a rather strange direction. The previous season's shot data was seemingly more helpful in predicting outcomes during the second half of the season than it was in the first half---except, of course, the first few weeks of the season. Specifically, the previous season's shot data was more helpful for predicting games from weeks 21 to 35 than  it was from weeks 6 to 20. This was true for finishing rates, as well, and led me to recheck my data. The data was errorless, and now I'm left to explain why information from a team's previous season helps project game outcomes in the second half of the current season better than the first half.

Anybody want to take a look? Here are the results of some logistic regression models. Note that the coefficients represent the estimated change in (natural) log odds of a home victory.

 Weeks 6 - 20

Coefficient

P-value

Intercept

0.052

67.36%

Home Shot Diff

0.139

0.35%

H Shot Diff (previous)

-0.073

29.30%

Away Shot Diff

-0.079

7.61%

A Shot Diff (previous)

-0.052

47.09%

Weeks 21 - 35

Coefficient

P-value

Intercept

0.036

78.94%

Home Shot Diff

0.087

19.37%

H Shot Diff (previous)

0.181

6.01%

Away Shot Diff

-0.096

15.78%

A Shot Diff (previous)

-0.181

4.85%

Later on in the season, during weeks 21 to 35, the previous season's data actually appears to become more important to the prediction than the current season's data---both in statistical significance and actual significance. This despite the current season's shot data being based on an ample sample of at least 19 games (depending on the specific match in the data set). So I guess I'm comfortable saying that last season matters, but I'm still confused---a condition I face daily.

*The model suggests that each additional home game remaining projects a three-goal improvement in differential (3.37, actually). In a vacuum, that makes no sense. However, we are not vacuuming. Teams that have more home games remaining have also played a tougher schedule. Thus the +3.37 coefficient for each additional home game remaining is also adjusting the projection for teams who's shot rates are suffering due to playing on the road more frequently. 

**Drew hates me right now.

What Piquionne's goal means to Portland

Though our game states data set doesn't yet include all of 2013, it still includes 137 games. In those 137 games, only five home teams ever went down three goals, and all five teams lost. There were 24 games in which the home team went down two goals, with only one winner (4.2%) and five ties (20.8%). The sample of two-goal games perhaps gives a little hope to the Timbers, but these small sample sizes lend themselves to large margins of error. It is also important to note that teams that go down two goals at home tend to be bad teams---like Chivas USA, which litters that particular data set. None of the five teams that ever went down three goals at home made the playoffs this year. Only seven of the 24 teams to go down two goals at home made it to the playoffs. Portland is a good team. Depending on your model of preference, the Timbers are somewhere in the top eight. So even if those probabilities up there hypothetically had small margins of error, they still wouldn't necessarily apply to the Timbers.

Oh, and while we're talking about extra variables, in those games the teams had less time to come back. To work around these confounding variables, I consulted a couple models, and I controlled for team ability using our expected goal differential. Here's what I found.

A logistic model suggests that, for each goal of deficit early in a match, the odds of winning are reduced by a factor of  about two or three. A tie, though, would also allow Portland to play on. A home team's chances winning or tying fall from about 75 percent in a typical game that begins zero-zero, to about 25 percent being down two goals. Down three goals, and that probability plummets to less than 10 percent. But using this particular logistic regression was dangerous, as I was forced to extrapolate for situations that never happen during the regular season---starting a game from behind.

So I went to a linear model. The linear model expects Portland to win by about 0.4 goals. 15.5 percent of home teams in our model were able to perform at least 1.6 goals above expectation, what the Timbers would need to at least force a draw in regulation. Only 4.6 percent of teams performed 2.6 goals above expectation. If we just compromise between what the two models are telling us, then the Timbers probably have about a 20-percent chance to pull off a draw in regulation. That probability would have been closer to five percent had Piquionne not finished a beautiful header in stoppage time.

The Predictive Power of Shot Locations Data

Two articles in particular inspired me this past week---one by Steve Fenn at the Shin Guardian, and the other by Mark Taylor at The Power of Goals. Steve showed us that, during the 2013 season, the expected goal differentials (xGD) derived from the shot locations data were better than any other statistics available at predicting outcomes in the second half of the season. It can be argued that statistics that are predictive are also stable, indicating underlying skill rather than luck or randomness. Mark came along and showed that the individual zones themselves behave differently. For example, Mark's analysis suggested that conversion rates (goal scoring rates) are more skill-driven in zones one, two, and three, but more luck-driven or random in zones four, five, and six. Piecing these fine analyses together, there is reason to believe that a partially regressed version of xGD may be the most predictive. The xGD currently presented on the site regresses all teams fully back league-average finishing rates. However, one might guess that finishing rates in certain zones may be more skill, and thus predictive. Essentially, we may be losing important information by fully regressing finishing rates to league average within each zone.

I assessed the predictive power of finishing rates within each zone by splitting the season into two halves, and then looking at the correlation between finishing rates in each half for each team. The chart is below:

Zone Correlation P-value
1 0.11 65.6%
2 0.26 28.0%
3 -0.08 74.6%
4 -0.41 8.2%
5 -0.33 17.3%
6 -0.14 58.5%

Wow. This surprised me when I saw it. There are no statistically significant correlations---especially when the issue of multiple testing is considered---and some of the suggested correlations are actually negative. Without more seasons of data (they're coming, I promise), my best guess is that finishing rates within each zone are pretty much randomly driven in MLS over 17 games. Thus full regression might be the best way to go in the first half of the season. But just in case...

I grouped zones one, two, and three into the "close-to-the-goal" group, and zones four, five, and six into the "far-from-the-goal" group. The results:

Zone Correlation P-value
Close 0.23 34.5%
Far -0.47 4.1%

Okay, well this is interesting. Yes, the multiple testing problem still exists, but let's assume for a second there actually is a moderate negative correlation for finishing rates in the "far zone." Maybe the scouting report gets out by mid-season, and defenses close out faster on good shooters from distance? Or something else? Or this is all a type-I error---I'm still skeptical of that negative correlation.

Without doing that whole song and dance for finishing rates against, I will say that the results were similar. So full regression on finishing rates for now, more research with more data later!

But now, piggybacking onto what Mark found, there does seem to be skill-based differences in how many total goals are scored by zone. In other words, some teams are designed to thrive off of a few chances from higher-scoring zones, while others perhaps are more willing to go for quantity over quality. The last thing I want to check is whether or not the expected goal differentials separated by zone contain more predictive information than when lumped together.

Like some of Mark's work implied, I found that our expected goal differentials inside the box are very predictive of a team's actual second-half goal differentials inside the box---the correlation coefficient was 0.672, better than simple goal differential which registered a correlation of 0.546. This means that perhaps the expected goal differentials from zones one, two, and three should get more weight in a prediction formula. Additionally, having a better goal differential outside the box, specifically in zones five and six, is probably not a good thing. That would just mean that a team is taking too many shots from poor scoring zones. In the end, I went with a model that used attempt difference from each zone, and here's the best model I found.*

Zone Coefficient P-value
(Intercept) -0.61 0.98
Zones 1, 3, 4 1.66 0.29
Zone 2 6.35 0.01
Zones 5, 6 -1.11 0.41

*Extremely similar results to using expected goal differential, since xGD within each zone is a linear function of attempts.

The R-squared for this model was 0.708, beating out the model that just used overall expected goal differential (0.650). The zone that stabilized fastest was zone two, which makes sense since about a third of all attempts come from zone two. Bigger sample sizes help with stabilization. For those curious, the inputs here were attempt differences per game over the first seventeen games, and the response output is predicted total goal differential in the second half of the season.

Not that there is a closed-the-door conclusion to this research, but I would suggest that each zone contains unique information, and separating those zones out some could strengthen predictions by a measurable amount. I would also suggest that breaking shots down by angle and distance, and then kicked and headed, would be even better. We all have our fantasies.

Jamison Olave's Value to New York

There was quite a popular tweet from a canine about New York's improved play this season when Jamison Olave was playing. https://twitter.com/GothamistDan/status/397398611438608384

There are obviously confounding factors at play here, not to mention small sample sizes. There were only seven matches this season in which Olave did not start, and eight in which he played 45 minutes or less. Any data obtained from these games is going to be subject to A) small sample sizes, B) lots of variance in the response variable (goals or wins), and C) no control for quality of opponent or location of the match.

To deal with the small sample size/variance problem, I'm going to use our now semi-famous data set on shot location origins. Steven Fenn kindly showed the world their predictive value, and to me that means that expected goals for and against are the most stable stat available for such an analysis. To control for New York's opponents---when Olave was both in and out of the starting XI---I have included each of New York's opponent's expected goals data in the linear regression, while also accounting for whether or not the Red Bulls were at home. Blah, blah, blah, to the results!

Looking at the defensive side, New York allowed shots leading to 0.24 fewer expected goals against in games that Olave started. That seems to indicate New York's need for Olave, but the p-value was a kind-of-high 26 percent. Overall, New York's expected goal differential climbed 0.19 goals in those games that Olave started, though again, the p-value was quite high at 46 percent.*

Now for your shitty conclusion, courtesy of shitty p-values: Olave's influence on New York's level of play this season was questionable. There is some suggestion that he helped reduce goal-scoring against, however there is a reasonable chance that that difference was due to other, not-measured-here variables. What I am more comfortable claiming is that he does not make a 0.86-goal difference on the defensive side.

The point is this. New York's shot creation and goal scoring ability, for and against, are more a function of whether or not the Red Bulls are home, and against whom they are playing. Not as much whether Olave starts. Obviously putting an inferior player into the starting XI isn't going to help New York out. But, as I always question, do we really know how to value soccer players at all? Maybe Olave just doesn't make that much of a difference. After all, he's only one of eleven players.

*For those curious, the number of minutes Olave played was a worse predictor variable than the simple binary variable of whether or not he started. Controlling for the strength of opponent was necessary since perhaps Mr. Petke was more likely to sit Olave against a worse opponent at home, or something like that.

Rosales, not Dempsey, is the clear choice for Seattle's set-piece crosses

In Seattle’s 2-1 loss to Portland on Saturday, Clint Dempsey took all of the Sounders’ attacking set-pieces in the first half. He was impressive with his free kick shots on goal, clipping the crossbar and forcing Donovan Ricketts into multiple saves. But his corner kicks left much to be desired. Mauro Rosales subbed on in the 63rd minute and took the remainder of the set-piece crosses and created more chances. With Lamar Neagle suspended for yellow card accumulation and Seattle needing goals in leg two, Rosales seems likely to start. Requisite warning about small sample sizes aside, based off of the results in leg one, the data suggest Sigi Schmidt would be wise to let Rosales take over set-piece crossing duties in the second leg.

Here's how Dempsey’s nine corners and one free kick cross went in leg one:

DempseyLeg1FKs

3rd minute corner: To the near post, cleared by Diego Chara 6th corner: Near post, cleared by Will Johnson 20th corner: Near post, cleared by Will Johnson 25th corner: Near post, cleared by Chara 32nd corner: Near post, cleared by Chara 38th corner: Top of the six yard box, cleared by Pa-Moudou Kah 38th corner: Top of six, cleared by Kah 39th free kick: Cross from 18 yards out on the wing to the top of the six, cleared by Futty Danso 45th corner: Near post, punched clear by Ricketts

In the second half, Rosales took all three Seattle corners and two free kick crosses:

RosalesLeg1FKs

68th minute corner: To the penalty spot, shot by Djimi Traore, saved by Ricketts 69th corner: Top of six, Headed cross by Dempsey  blocked by Zemanski and eventually caught by Ricketts 82nd free kick: Cross from 38 yards in the center to the penalty spot, cleared by Danso 86th free kick: Cross from 28 yards on the wing to the edge of the penalty box, headed by Shalrie Joseph across the box 87th corner: Penalty spot, Headed shot by Dempsey off of the crossbar and out

In summary: Dempsey had 10 set-piece crosses, none of which reached a Seattle teammate. Rosales had five set-piece crosses, four of which found a teammate in the box, and three of which led to shots.

As you can tell, it was a tale of two halves. In the first, Dempsey’s crosses rarely cleared the first defender, and none found another Sounders player. In the second half, four of Rosales’ five crosses created chances, two off of the head of Dempsey himself.

If Seattle is going to win at Jeld-Wen Field on Thursday, they’ll need to do better with their crosses. Based on their chances in game one, it looks to be in the Sounders' best interest to allow Rosales to take the free kick crosses in game two. Not only did his crosses create better chances than Dempsey in game one, but Deuce seems to be more dangerous getting on the end of crosses than he is at taking them.

Playoff Probabilities and Seeding

Now that the "Playoff Push" has given way to the actual playoffs, we have included the probabilities of all the various outcomes in this year's edition of the MLS Cup. One sees that our model's darling, Sporting Kansas City, has the best chances at an MLS Cup trophy of all teams, which is not surprising. But what this simulation really articulated to me were the differences between the three, four, and five seeds in each conference, as well as the top seeds overall. Despite the fact that our model thinks that fifth-seeded Colorado is nearly as good as Portland and Real Salt Lake, its chances at the MLS Cup are significantly lower as a five seed. Having to play that extra match on the road essentially chops the Rapids' chances in half right away, and then its slight disadvantage in a home-and-home against Portland---a 42-percent chance in that series---leaves Colorado with just a 3.2-percent chance at the silverware.

In the Eastern Conference, the same issue arises for fourth-seeded Houston. The Dynamo are not thought to be significantly worse than New York---the model projects the Red Bulls to win that potential home-and-home matchup with 59-percent probability---but the additional uncertainty of the play-in game really screws them over.

The three seeds, however, are well-represented in Cup probabilities. Though New England's Cup chances sit at just 5.1 percent, you have to remember that they play Sporting KC in the first round. We love SKC around here, if you weren't aware. And then Los Angeles, the West's third-seeded team, actually has the third-best chance overall at a Cup win---15.0 percent.

Finally, the potential for home-field advantage in an MLS Cup Final really has Sporting and New York drooling. Together those two teams hog nearly 44 percent of all the Cup probability. Given that Sporting makes the finals, the probability that it goes on to win them is about 64 percent (26.5/41.2). New York's conditional probability is similar at about 62 percent (17.4/27.9). That home-field advantage gives each of those teams a huge boost if  they can make it that far.

For Cinderella teams that make it that far, having to play a superior opponent on the road in the championship one-game-off doesn't bode well for a storybook ending.

Keep track of all the playoff outcome probabilities on our Cup Chances 2013 page under MLS Tables.

Supporters' Shield Probabilities

After completing the Eastern and Western Conference playoff scenarios yesterday, it only makes sense to move on to each team's chances at the Supporters' Shield and a potential home-field advantage in the MLS Cup Final. Only four teams could mathematically win the shield: New York, Sporting KC, Portland and Real Salt Lake.

New York has the best chance due its current lead in the tables and the fact that it's playing a home game. If New York wins at home against Chicago, then it will be the Shield winner, regardless of other outcomes, but that's not the only way the Red Bulls could hoist the trophy. A tie against Chicago would eliminate both RSL and Portland from contention---since New York holds the "wins" tie-breaker over Portland---and then an SKC tie or loss would leave the Red Bulls as Shield winners, as well. In fact, even a loss from New York could leave them in first overall if SKC, Portland and RSL all don't win. However, it's not probable that both Portland and RSL would each earn less than three points against Chivas USA. In the end, New York's Shield chances sit at 73.7 percent, with 61.8 percent of that coming from its probability of beating Chicago this weekend.

Sporting Kansas City has the next-best chance at the trophy at 15.5 percent. Obviously it needs New York to lose or tie and then---due to SKC losing the potential tie-breaker to New York,---SKC would need to win. The only scenario where SKC ties and still gets the Shield involves crazy scenarios like an 8-to-8 tie with Philly.

Like SKC, Portland and Real Salt Lake both need to win, and then things need to go their way. RSL would (likely) hold tie-breakers over both SKC and New York, so RSL would need SKC to lose or tie, Portland to lose or tie, and then New York has to lose. Portland loses tie-breakers to SKC and New York, so it needs to win, and then have SKC lose or tie and New York lose. In the end, Portland's probability at the Supporters' Shield is just 6.0 percent, while RSL's is 4.8 percent.

 Team Shield%
NYRB 0.737
SKC 0.155
POR 0.060
RSL 0.048

Eastern Conference Playoff Seeding

I put together the Western Conference version earlier, but the Eastern Conference and its four million more scenarios are so much more exciting. I used our predictive model to calculate the probabilities of each game's outcome, and then applied those to all possible scenarios. This is going to be fun... New York and Sporting KC have locked up a one-two finish in some order, and then there are five teams with mathematically non-zero chances at the final three playoff spots in the East. There are no San Joses here, as each of Montreal, Chicago, New England, Houston and Philly all have real chances of at least eight percent at a playoff berth. On to the scenarios!

The New York Red Bulls can guarantee themselves both first place in the East and a Supporters' Shield trophy with a win at home against Chicago this weekend, but that's not the only way it could take the top seed. A tie against Chicago coupled with SKC not winning, or losses by both New York and SKC would leave the Red Bulls in first place, as well. Totaled up, the Red Bulls chances at a top seed sit at 84.5 percent, with the other 15.5 percent going to a second place finish.

Sporting Kansas City's outcomes are the exact opposite of those of New York. SKC has to play on the road in Philly while New York plays in a more comfortable home environment, leaving SKC with a 15.5-percent chance of a first place finish. SKC has been our loving model's favorite team in the East since the model was born, and a two-seed shouldn't hurt its chances of a date in the MLS Cup final.

Despite limping into the postseason, its comeback win against the Union has Montreal on firmer ground going into the last week. A win at Toronto guarantees the Impact third place in the East, allowing them to avoid that one-game-off. In total, Montreal has a 53.1-percent chance at third place---remember, the model doesn't think the Impact are all that much better than Toronto, but at least they don't have to go through customs. The sequences leading to fourth or fifth place start to become more complicated, but those probabilities are 31.7 and 13.4 percent, respectively. That leaves the Impact with just a 1.8-percent chance of missing out on the playoffs altogether, a scenario that essentially requires a poor result from Montreal with wins from New England and Houston, and at least a draw for Chicago on the road in New York.

Chicago is in a surprisingly good position going into its game in New York. The outcomes leading to the Fire making the playoffs add up to 93.7 percent. 17.1 percent of that leaves Chicago in third, which would require Montreal to lose or tie and Chicago to subsequently earn a tie or win, depending on Montreal's result. And there's more good news for Chicago. If it gets stuck in the play-in game, it has a better chance of being the home team (43.8 percent) than the away team (32.8 percent). Chicago would win any potential tie-breakers with New England, Houston, and Philly, which is partly why its playoff chances are so high.

The New England Revolution could avoid the play-in game, but that would require a win at Columbus in addition to both Chicago and Montreal not winning. Our model suggests that probability is only 12.5 percent. If it makes the play-in game, New England is more likely do so as the fifth seed (30.2 percent) than the fourth seed (8.2 percent). Those of you keeping score at home know that the Revolution's chances of missing the playoffs altogether are thus 49.1 percent, the most probable outcome of the four. Though New England holds the goals-for tie-breaker over Houston, Houston has an easier opponent in D.C. United.

Speaking of Houston, due to that easier, aforementioned opponent, Houston has a better shot of claiming third place than New England at 17.3 percent. However that tie-breaker plays to New England's favor in all of the ways that each team could finish fourth or fifth. The Dynamo have just a 16.3-percent chance at fourth, and a 15.6% chance at fifth, leaving them out of the playoffs with 50.9-percent probability.

Philadelphia's best chance at the playoffs comes from the fact that they would almost surely win a tie-breaker to Houston if it came to that. A Philly win coupled with a Houston tie would leave both tied at 49 points and 13 wins. Philly currently leads Houston by two goals scored and holds the third tie-breaker, goal differential, as well. Essentially, in this scenario, Houston would have to tie something like 4 - 4, while Philly slipped past SKC 1 - 0. Not likely, so this scenario would lead to Philly's only real shot at the playoffs, an 8.0-percent chance at fifth place. There is no mathematical way the Union could do better than fifth, as it would lose potential tie-breakers to both Chicago and Montreal.

In conclusion, for your viewing pleasure, the table of probabilities:

1st 2nd 3rd 4th 5th Out
NYRB 84.5% 15.5% 0.0% 0.0% 0.0% 0.0%
SKC 15.5% 84.5% 0.0% 0.0% 0.0% 0.0%
MTL 0.0% 0.0% 53.1% 31.7% 13.4% 1.8%
CHI 0.0% 0.0% 17.1% 43.8% 32.9% 6.3%
NE 0.0% 0.0% 12.5% 8.2% 30.2% 49.1%
HOU 0.0% 0.0% 17.3% 16.3% 15.6% 50.8%
PHI 0.0% 0.0% 0.0% 0.0% 8.0% 92.0%

Western Conference Playoff Seeding

For the past few weeks, we have been including each team's probabilities of earning a playoff spot and each team's probabilities of winning the Supporters' Shield (for most points in MLS). Due to the complexity of the tie-breaker system, I have avoided that topic altogether. Until now. Now that every team with at least 7 caring fans has just one match remaining---Chivas USA has two---I have recalculated the actual playoff chances below using our model, accounting for the various tie-breakers. Let's get started easy. The five Western Conference playoff teams may not be mathematically determined; however, by probability, the top five teams today are certain to remain the same. San Jose could tie Seattle and/or Colorado for the fifth-and-final spot, but the first tie-breaker is total wins, and Seattle has that one covered. The second tie-breaker (which would go into effect only against Colorado) is goals scored. Right now San Jose trails Colorado by 12 goals, meaning that San Jose would have to score at least 13 goals to win all the relevant tie-breakers. So yes, the top five in the West can safely be etched in stone.

Seeds are still important, though. Earning a top-three finish allows a team to avoid that one-game playoff between the fourth and fifth-ranked teams in each conference. Below I have given some relevant probabilities for each team's seeding:

The Portland Timbers could theoretically finish anywhere from first to fourth, though with their final game against Chivas, first place is the most likely. Our model gives Portland a 52-percent chance to beat Chivas on the road and lock up the top spot regardless of other outcomes. However, a tie or loss to Chivas leaves the door wide open for the other playoff teams, and thus Portland's overall chances at first place increase only marginally to about 54 percent. Its chances of a dreaded fourth-place finish require a string of results that has just a four-percent likelihood.

Real Salt Lake guarantees itself at least second place in the West with a win. It has about a 42-percent shot at first place, and only a one-percent shot at fourth place---a result that would require a loss to Chivas at home, a Colorado win at Vancouver, and a tie between Los Angeles and Seattle. Thus both RSL and Portland are not likely to find themselves playing in a one-game playoff, and even in that worst-case scenario, either team would get to play at home.

Probabilities surround the LA Galaxy are a little trickier. Finishing in first place would require a win against the Sounders in addition to Chivas getting points in consecutive matches against RSL and the Timbers. That probability is only about three percent. There is, however, a very real chance that the Galaxy finish in fourth or fifth. A loss to Seattle coupled with a Colorado win leaves them in fifth, while a tie in Seattle and a Colorado win would drop LA to fourth. Those probabilities are 11 and 10 percent, respectively. Thus the remaining 76 percent has the Galaxy finishing in either third or second place.

Seattle's football club could finish anywhere between first and fifth. The unlikely sequence that would vault the Sounders into first place includes them beating the Galaxy, Chivas getting points at RSL on Wednesday, and then Chivas returning home to beat Portland at home on Saturday...but that has just a one-percent likelihood. A loss or tie against the Galaxy locks Seattle into the play-in game. Though playing at home, our model really likes LA and gives the Sounders just a 30-percent chance of winning that matchup. That leaves them with a 32-percent chance at fourth and a 38-percent chance at fifth place.

Colorado is in a rough spot, as it would lose potential tie-breakers to Seattle, LA and RSL. Taking first place would require that Colorado win at Vancouver, that Portland and RSL both lose to Chivas, and that Seattle and LA tie. That string of events has virtually a zero-percent chance of happening. Colorado is much more likely to find itself in fifth place. A loss would guarantee the Rapids the last playoff spot, and a tie would stick them there, too, so long as Seattle earned at least a point against LA. All that adds up to a 51-percent chance at finishing fifth and having to play a one-game playoff on the road.

Here's the chart that sums up all of the probabilities for each playoff contender:

1st 2nd/3rd 4th 5th
POR 54.7% 41.3% 4.0% 0.0%
RSL 41.7% 57.4% 0.8% 0.0%
LA 2.9% 56.8% 29.2% 11.2%
SEA 0.7% 29.4% 32.4% 37.5%
COL 0.0% 15.0% 33.6% 51.3%