ASA's New Interactive Tables

By Tyler Richardett

Last Saturday, the North Carolina Courage’s Denise O’Sullivan ushered in the return of professional team sports in the United States after several months, with the first strike of the ball in this summer’s NWSL Challenge Cup. At roughly the same time, we haphazardly struck a few keys on our computers, proudly ushering in a set of exciting changes to ASA’s interactive application. This allowed us to share some new features, most importantly, NWSL data, our goals added (g+) values, and some crucial improvements behind the scenes.

NWSL Data

Part of our scramble to get everything ready for last Saturday was driven by the fact that we now had NWSL data, and we couldn’t wait to share it. You can find our NWSL app here!

NWSL’s coverage includes all the same advanced metrics as our MLS data, and all the underlying models have been trained to the NWSL data. One caveat here is that we’re using the expected goals (xG) model trained on MLS shots to offset the xG model subsequently trained on NWSL shots. NWSL has fewer teams and shorter seasons, which means that certain types of shots aren’t well-represented in the data. For these types of low-volume shots, offsetting means that the NWSL model will regress toward the weights of the MLS model, preventing overfitting. Eventually, as we collect more data and the coverage of these low-volume shots improves, the effect of offsetting should diminish, at which point we’ll no longer have a need to apply the technique. 

That said, across most relevant segments of data, the average xG is closer to NWSL observed finishing rates than to the corresponding MLS offset, indicating that the NWSL data is already rich enough to command some statistical credibility. 

Goals Added (g+)

Back in May, we unveiled our framework for evaluating players’ on-ball actions, known as goals added (g+). And we’re now proud to announce that the g+ above average metric — signaling the goal value each player has added, above or below their “average” counterpart playing in the same position — is available in the app for all MLS and NWSL field players.

If you haven’t already, we encourage you to immerse yourself in our g+ literature, including: Matthias Kullowatz’s methodological deep dive, Kieran Doyle-Davis’s exploration of the industry’s shift to possession-based value frameworks, and John Muller’s recap of our evaluation strategies. If you just want the g+ for dummies version, read John’s straightforward explainer.

Under the Hood

Like plenty of other hobbyists and enthusiasts in their respective fields, months of quarantine suddenly had left us with a dwindling list of excuses for continuing to put off half-finished projects. Many of these improvements were sketched out one year ago, shortly after we realized that our managed hosting fees were about to make a much larger dent in our Mint goal labeled “Burritos.” We decided we’d migrate off shinyapps.io and onto a unmanaged platform like DigitalOcean — this would save us money, yes, but it would also afford us the freedom to rethink how our data got from point A to point B. 

What if we automated our ETL process? What if we also automated how our models were periodically re-trained and applied to our transformed data? What if we automated everything, quickly getting our data from its rawest form and into the app, with the help of a relational database and an API? Getting these things right meant new data could be available in the app within a couple hours of the final whistle, and it meant that we could spend a greater share of our time developing new features and new metrics.

We’d like to peel back the curtain on how we made this happen.

Migration and Automated Pipelines

The beauty of a managed service like shinyapps.io is that it handles all the extra nonsense with which us data people simply couldn’t be bothered. It seamlessly takes care of any software updates and versioning conflicts, and it offers one-click deployment through the RStudio IDE, which makes it an excellent candidate for once-off projects to be shared with a small audience. Its value, therefore, is in that beauty — making it prohibitively expensive to scale projects with humble resources, such as ours.

Alternatively, it’s far less expensive to scale with an unmanaged provider, such as DigitalOcean (our choice) or AWS, but the primary challenges stem from the fact that us data people need to also start thinking like traditional software developers. Across all steps, we’ve relied on GitHub for collaboration, version control, and deployment; and on renv for avoiding versioning conflicts among collaborators. And our architecture is simple enough that even we could build it without shedding too many tears.

Beginning on the far-right side of the image, our first server is in charge of aggregating the raw data, reshaping it, and loading it into our PostgreSQL database. We have a cron job running periodically to search for and collect any new data it finds. Then, using a custom-built library, we clean that raw data based on previously observed patterns, we mold it into a shape that we like, and we pass it through our pre-trained models. Finally, those outputs get deposited into the database which, along with our API, serves it up on a platter for users of the app.

Sitting between the user and the database is our second all-purpose server, which is home to our API and the application itself. Under the hood of the app, when you’re deciding exactly how you want your results sliced, we’re using your inputs to construct an API request. And when you click the refresh button, we’re passing that request along to the corresponding API endpoint, which grabs your data out of our database and hands it back to the application, all in a matter of seconds. And the application is all powered by an open source version of Shiny Server — the same glue that holds together projects deployed to shinyapps.io.

As proud parents, we witnessed this all come together for the first time Wednesday. The Portland Thorns and Chicago Red Stars played to a 0-0 draw that concluded around 2:36 p.m. ET. By 3:03 p.m. ET, data from that game was already available in the app. And not one of us lifted a finger.

Looking Ahead

In the interest of openness (and of getting others to do our work for us), we’d like to highlight that all of our source code for the interactive application is publicly available on GitHub. If you’d like to report a bug or request a new feature, you can do so here. Or, if you’d like to contribute code, we’ve left some instructions there as well. Additionally, we’ve left our API and accompanying documentation open for anyone who prefers to retrieve data this way. We do use modest compute resources, so we ask that you be mindful when making such requests.

We mentioned upfront that many of these changes will free up our time to develop new features, and we’re excited about the endless possibilities. First, we know we need to reintroduce a few things from the old app, such as position labels and playoff predictions. And it’s likely we’ll get detoured by a few nagging SSL issues and load balancing challenges. Looking beyond that, we’ve got ambitious plans for interactive visualizations, player and team profiles, and more.

In the meantime, we recommend wasting a few hours getting lost in this newer and better version of our interactive tables.