What does SALO mean?

SALO is the SALO Alternative-wise Log Odds model.

  • “Log odds” is another name for the inverse of the logistic function used as the link between ratings and shot probability in SALO.
  • By “alternative-wise”, I mean that there are more than two mutually exclusive possibilities whose log odds are given by the ratings.

Why so few years?

The period I can cover is limited by the time and RAM I have to work with.

One year’s worth of one estimation run (one “chain”) takes about half a week to complete on one core and ties up about 4.5GB of memory. Running two chains at once (more than one is recommended for diagnostic reasons) on four years’ data occupies almost all the RAM I have for over a fortnight.

Recoding the model (and the algorithm for fitting it!) to run without the Stan software (see Credits) may eventually allow faster, cheaper, better-parallelized computation, but would require doing quite a bit of careful programming. Stan has made model development possible in the first place by vastly simplifying the programming.

Why shots, not attempts?

This is a (conceptual) bug from very early in the development of SALO.

For the time being, the bug is simply retconned as a feature. Shots as the outcome of interest can be seen as a balance between relevance (excluding attempts that by definition can’t directly turn into goals) and sample size (shots outnumber goals by a much bigger factor than attempts outnumber shots by).

At root, though, misses and blocks are left out of the data because I didn’t remember to include them when doing the very first work on SALO, and putting them back in hasn’t ever been a higher priority than getting model features working one by one. In a future major version, the outcome considered may change from shots to Corsi or Fenwick.

Why not account for…?

The aspects of the game thus far accounted for in SALO are all of the highest-priority ones I could code up by release time.

Adding complexity to a model like SALO on a data set this size takes a long time – a little to code it, a lot to run the model with it, times however many tries it takes to debug it.

A lot of variables already either adjusted for, or studied through outcomes beyond shots, in other work thus aren’t included in SALO yet. SALO will acquire new features over time, but it starts with the ones that represent the main contribution to the state of the art.

For current plans around incorporation of particular features, see the roadmap.

Filter by team?

I don’t yet know how to make DataTables work well with set-valued columns. I’m not sure how I’d deal with skaters who changed teams in a given year.


  • SALO is estimated using Stan by Andrew Gelman, Bob Carpenter, et al.
  • Data are prepared in the R programming language.
  • Data are collected using nhlscrapr by A. C. Thomas and Sam Ventura.
  • Data are ultimately provided by the NHL.
  • Player ages are from hockey-reference.

Where else is work like this done?

Although SALO was conceived (as far back as January 2013), developed, and (other than open-source software reuse described above) implemented independently, other researchers have attended to modeling individual players’ underlying ability levels with shrinkage toward average.

The most comprehensive use of regularized regression techniques in rating players is in the wins above replacement estimates at Emmanuel Perry’s Corsica. As described at the site, Perry models ability to drive not just shot rates, but also shot quality, penalty rates, and zone changes, all with related (albeit distinct) methods. Estimates are published.

Before Corsica, the main example of this kind of model was Thomas and Ventura’s wins above replacement. This work has been abandonware for some time now; war-on-ice has not published estimates since the 2016 offseason at latest. Perry’s methodology is in some places inspired by this work but elsewhere diverges from it.

While SALO is intended to model more aspects of hockey over time, it is not likely to approach the complexity of Perry’s WAR (or Thomas and Ventura’s) in the near term, and is likely to develop along methodological lines not identical to those chosen in creating WAR.

Thomas himself links to some other work that presaged WAR itself. The most advanced of these don’t appear to have been associated with published results, and were less advanced than SALO in methodology.