What does SALO mean?

SALO is the SALO Alternative-wise Log Odds model.

  • “Log odds” is another name for the inverse of the logistic function used as the link between ratings and shot probability in SALO.
  • By “alternative-wise”, I mean that there are more than two mutually exclusive possibilities whose log odds are given by the ratings.

Why so few years?

More years are coming!

Previous versions of this FAQ attributed this limitation to limited time and RAM. The recently completed port from Stan (see Credits) to pure R, however, has virtually eliminated this limitation. By version 2.0 I expect to cover the full fancy stats era.

Why shots, not attempts?

This is a (conceptual) bug from very early in the development of SALO.

For the time being, the bug is simply retconned as a feature. Shots as the outcome of interest can be seen as a balance between relevance (excluding attempts that by definition can’t directly turn into goals) and sample size (shots outnumber goals by a much bigger factor than attempts outnumber shots by).

At root, though, misses and blocks are left out of the data because I didn’t remember to include them when doing the very first work on SALO, and putting them back in hasn’t ever been a higher priority than getting model features working one by one. In a future major version, the outcome considered may change from shots to Corsi or Fenwick.

Why not account for…?

The aspects of the game thus far accounted for in SALO are all of the highest-priority ones I could code up by release time.

Adding complexity to a model like SALO on a data set this size takes a long time – a little to code it, a lot to run the model with it, times however many tries it takes to debug it.

A lot of variables already either adjusted for, or studied through outcomes beyond shots, in other work thus aren’t included in SALO yet. SALO will acquire new features over time, but it starts with the ones that represent the main contribution to the state of the art.

For current plans around incorporation of particular features, see the roadmap.

Filter by team?

I don’t yet know how to make DataTables work well with set-valued columns. I’m not sure how I’d deal with skaters who changed teams in a given year.

Credits

  • The model, fitting algorithm, and data prep are coded in the R programming language.
  • Prior to 1.2, SALO was estimated using Stan by Andrew Gelman, Bob Carpenter, et al.
  • Data are collected using nhlscrapr by A. C. Thomas and Sam Ventura.
  • Data are ultimately provided by the NHL.
  • Player ages are from hockey-reference.

Where else is work like this done?

Although SALO was conceived (as far back as January 2013), developed, and (other than open-source software reuse described above) implemented independently, other researchers have attended to modeling individual players’ underlying ability levels with shrinkage toward average.

The most comprehensive use of regularized regression techniques in rating players is in the wins above replacement estimates at Emmanuel Perry’s Corsica. As described at the site, Perry models ability to drive not just shot rates, but also shot quality, penalty rates, and zone changes, all with related (albeit distinct) methods. Estimates are published.

Before Corsica, the main example of this kind of model was Thomas and Ventura’s wins above replacement. This work has been abandonware for some time now; war-on-ice has not published estimates since the 2016 offseason at latest. Perry’s methodology is in some places inspired by this work but elsewhere diverges from it.

While SALO is intended to model more aspects of hockey over time, it is not likely to approach the complexity of Perry’s WAR (or Thomas and Ventura’s) in the near term, and is likely to develop along methodological lines not identical to those chosen in creating WAR.

Thomas himself links to some other work that presaged WAR itself. The most advanced of these don’t appear to have been associated with published results, and were less advanced than SALO in methodology.