SALO makes strong front-page claims. How does it meet them?

SALO does something different with shot data from what has been common until recently (see Q & A) in quantitative analysis of hockey.

SALO’s SF% isn’t a *count* of shots with weights applied; it’s an *estimate* produced by fitting a Bayesian model, expressing a relationship between shot probability for each team and the ability levels of the skaters on ice, to regular-season data.

SALO’s SF%, that is, is the percentage each player would be expected to exhibit in some hypothetical additional minutes with exactly average teammates and opponents, based on their icemates and results so far.

What follows is an in-depth explanation of how SALO comes up with its estimates:

- The first section presents a short outline of the entire method.
- The next introduces the modeling approach in general terms.
- The subsequent two sections detail each part of the model.
- The section after that covers projections for the coming year.
- The last discusses other choices in the design of the model.

For technically inclined readers, the first and last sections may suffice.

SALO is a hierarchical Bayesian model with three parts:

- an ordered logit likelihood for net shots each second due to on-ice skater ability,
- a Gaussian prior for skater ability on the logit scale,
- and a beta-binomial relationship, with a logit link, between ability and games played.

The model is fit to multiple years of regular-season data with a Hamiltonian Monte Carlo (HMC) algorithm. Error is tracked by working with each HMC draw separately and sampling from predictive distributions where needed.

Each skater’s ability estimate is translated to a shots-for percentage by taking the ratio of shots for to total shots predicted by the model at even strength with that player on ice alongside nine average icemates.

Projections are drawn from an aging curve created by regressing year-to-year ability change on age, using the fitted prior for a skater with 0 games played as the next-season ability for those who drop out each offseason.

Hyperparameters are estimated. Hyperpriors are not imposed.

The method of counting frequency-weighted events used to correct shot stats for context features like score effects can’t be extended to adjust for quality of teammates and competition. A different method is needed to account for this central aspect of each skater’s on-ice context.

Likewise, even with a method able to fully account for icemates’ contribution to individuals’ shot stats, analyzing each skater using only their own on-ice data to estimate their true level would yield unreliable values for those who received little ice time.

The first issue can be addressed with a model of (net) shot probability at any time in terms of the ability levels of the skaters on ice.

The second can be resolved by making that model *Bayesian*, allowing information about ability to be responsibly *shared* across skaters based on a couple basic hockey facts beyond shot data alone.

In formal terms, the former is a **likelihood**, and the latter is a **prior**:

- A likelihood is an equation for the probability of observing a particular outcome given what we know about the context it happened in, with coefficients representing the effect of each modeled aspect of that context.
- A prior is an equation for how likely a coefficient of any given value is in the first place. Bayesian models are those that include a prior in addition to a likelihood.

By building a model where each skater gets their own coefficient, and fitting the whole model at once, SALO allows every skater’s individual effect on net shot creation to be estimated net of the players around them.

By including a prior, SALO lets the most-observed skaters contribute to a picture of the overall range of NHL ability levels, which lets those whose own data provide little information be understood mostly through the overall picture.

Once the model is fit, each skater’s ability estimate is converted back to an easily-interpreted 5v5 SF% by:

- plugging it back into the likelihood;
- plugging in four teammates and five opponents each with exactly average talent;
- and finding the fraction of shots that are shots for as predicted by the model under those conditions.

Fitting the model means plugging in the observed data – which team, if any, took a shot each second and who was on the ice at the time – and searching over all possible values of the model’s coefficients for ones that yield high values for the model’s joint probability (the product of the likelihood and the prior).

This search is performed with a Monte Carlo algorithm, a technique that boils down to the following steps:

- Propose
*any old values*for the model’s coefficients. *Store*the proposed set of values with probability equal to the model’s probability given those values. Otherwise, drop them.- Repeat until some desired number of sets of values have been stored.
- Average the stored values of each coefficient to get an estimate. Take each one’s standard deviation to get a standard error.

There are different versions of this kind of algorithm. One that’s especially efficient for models with very many coefficients is the No-U-Turn Sampler, implemented in the software package Stan. The estimates presented here are obtained by programming the SALO model into Stan.

As stated above, the likelihood is the equation mapping the ability levels (to be estimated by plugging in the data) for each skater on ice at a given time to the probability of a shot for either team.

SALO uses an ordered logit likelihood. In this likelihood, the sum of the current home skaters’ ability levels, minus the sum for the away team, minus an estimated baseline coefficient, maps to a probability of a shot for either team.

A link function translates a sum of coefficients, each anywhere from negative to positive infinity, into a sensible outcome value for a model. SALO uses an (inverse) logit link to turn skater ability levels into shot probabilities.

Probabilities live between 0 and 1, but it’s a very messy business to put any bounds on coefficients. To prevent absurd predictions (like a shot probability less than zero), the link function needs to translate any real input into an output between zero and one.

The canonical link function for this kind of model is the logit link. Using this link means assuming that the net sum of coefficients defined above represents not the probability, but the *log odds* of a (home team) shot:

- For a probability \(p\) between 0 and 1, the equivalent odds ratio equals \(p / (1-p)\). The odds ratio can be between zero and infinity.
- Taking the natural logarithm of the odds ratio \(\log(p / (1-p))\) gives a number that may be anywhere from negative infinity (when \(p = 0\)) to positive infinity (\(p = 1\)).

SALO treats the possible outcomes as *ordered*. The net sum of abilities is assumed to have exactly the same effect on the log odds of a home team shot at a given time as on the log odds of *not* observing a shot for the visitors.

Only the baselines – the log odds of each outcome given equal numbers of average players on ice – differ: most seconds we do observe no shots by the visitors, but most seconds we don’t observe any shots by the home team.

SALO estimates the two baselines separately – meaning the model accounts for home-ice advantage by allowing the baseline log odds of a shot for the home time to be higher than those of a shot against.

A model consisting of only the likelihood described above could be estimated by itself. However, the reliability of each estimate from such a model would depend on how many data points we observed for the corresponding skater. For those with relatively little ice time, small sample size would mean potentially wild ability estimates.

In SALO, a two-part *prior* is joined to the likelihood to help stabilize such estimates. A prior, as noted above, is an equation for the probability that a given coefficient takes a given value in the first place, based on things we already understand before looking at the data.

Each part of the prior includes parameters of its own (known as hyperparameters) that describe *how strongly* the prior should act on the model’s estimates. In SALO, these hyperparameters are themselves estimated together with the individual coefficients. The strength of the prior relative to the data is set using information ultimately donated by the data themselves.

This means that:

- the more a given skater plays, the more they shape the overall picture of skaters given by the prior;
- the less they are observed, the more the overall picture the prior shapes the estimate of their ability.

The prior in SALO uses two equations to represent two hopefully uncontroversial ideas:

- Players are average on average.
- Better players get more games.

SALO supposes that the overall distribution of player ability levels is Gaussian (or normal). The standard deviation of the overall Gaussian distribution is left to be estimated.

The idea of regression to the mean is familiar to quantitative hockey readers. A player who puts up extreme numbers is expected in the future to return to more pedestrian ones. The more data we have on a given player, the less strongly we expect to see them regress. Quantifying just how severe that regression will most likely be is often left asn an exercise, however.

Incorporating a Gaussian prior means that SALO builds in the idea of regression to the mean. Estimating the standard deviation of the Gaussian means allowing skaters for whom larger samples of data are available to determine how strong that regression is expected to be.

A player who comes up to the big leagues for a cup of coffee isn’t generally even an average player, of course. It’s a better assumption that they’re somewhat below average. Someone average would play more often. Is there a good way to regress players with the smallest samples toward something other than exactly average?

SALO modifies the Gaussian prior somewhat by including a beta-binomial regression for games played in terms of player ability.

The beta-binomial distribution is a flexible way to model the probability of getting any given number of successes in a given number of chances. For regression purposes, the parameters of the beta-binomial consist of a mean \(\eta\), broken down in terms of the effects of each relevant variable, and a precision \(\phi\).

The mean is a probability – the expected *fraction* of successes, given the input variables. This means using the logit link again! SALO supposes that the *log odds* of a player getting a jersey equals:

- a baseline \(\alpha\) (the log odds for an average player)
- plus a slope \(\gamma\) times their ability.

The baseline and the slope are estimated, meaning the relationship between ability and games played need not come out particularly strong (and it doesn’t).

The precision, between 0 and infinity, adds further flexibility by letting the distribution’s shape vary:

- As \(\phi\) goes to infinity, the distribution of successes looks increasingly like a series of weighted coin flips, one for each chance at success. This doesn’t realistically describe games played; lineups aren’t drawn from a hat each night.
- As \(\phi\) approaches zero, the distribution instead looks more like a
*single*weighted coin flip awarding all the possible successes or failures at once. This also doesn’t seem realistic.

In SALO, the precision hyperparameter is estimated as well, letting the beta-binomial shape land somewhere between the two extremes as encouraged by the data.

Modifying the prior to capture the information about ability contained directly in each player’s sample size is an idea from David Robinson at Variance Explained, although Robinson’s *input* variable is appearances and his *output* is ability rather than the other way around as here.

SALO presents projections for next year’s context-adusted ability levels as well. These come from a parametric version of the delta method from sabermetrics, applied to the raw ability estimates, with a correction derived from the model for survivor bias.

In sports, the term “delta method” refers to the construction of typical player aging curves by chaining together average year-to-year changes in performance for every observed value of player age.

Estimating the typical change at every single value age independently of every other seems a likely inefficient use of available information, however, requiring estimation of twenty-odd separate parameters.

In SALO, a modified version of the method is used: the aging curve is built from a *linear regression* of observed year-to-year ability changes on age. This gives an aging curve that is exactly quadratic by design.

The ability estimates have both a mean and a standard deviation. There are even more possible sources of error in the projections: there is uncertainty in the linear regression coefficients that define the aging curve, and there is random variation around the mean predicted change.

SALO’s projections account for all these sources of error:

- in estimated abilities, by fitting the aging curve and projecting separately for each Monte Carlo draw;
- in the aging curve, by taking a random draw from the distribution of the slope and intercept for each one;
- in projected abilities, by drawing projected change (above mean) randomly from the distribution of residuals.

Whether in this version or the original, the delta method is subject to survivor bias.

The players who leave the league each year aren’t around to tell us about the change in ability they experienced from the previous year. However, the survivors likely experienced different ability changes on average from those who dropped out! An aging curve constructed using only seasons players actually played will therefore come out biased.

The state-of-the-art adjustment for survivor bias consists of creating a phantom player who can stand in for the unobserved following-year season of anyone who drops out of the league in a given offseason.

The design of such a phantom could end up consisting mostly of ad hoc or arbitrary decisions. Happily, however, the SALO model provides a justified way to design it instead: there is an estimated prior distribution for the ability of an NHL player with any number of games played – including none!

Thus, in the linear regression used to build the aging curve, the following-year ability estimate of a skater who appeared in one season but not the next is replaced with an independent draw from the informed prior (the product of the Gaussian term and the beta-binomial term assuming 0 GP).

SALO creates phantom players at the other end of players’ careers as well, using a draw from the informed prior for the previous-year ability of any skater who appeared in a given season but not the one before. There seems little reason to think that those who lose their big-league jobs may systematically differ from those who don’t, yet not that those who get called up differ in an analogous way from those who don’t.

One other important characteristic of the model is not explicitly mentioned above. SALO estimates each skater’s ability. SALO also estimates the ability of *null* skaters – absences of a skater where there could be one.

- When teams skate five-on-five, each team is treated as sending out one null skater.
- When a team has an extra attacker, none of their skaters are null.
- When a team has four skaters on ice (due to penalties and/or overtime), they have two null skaters.
- When a team has three skaters on ice, they have three null skaters.
- The first, second, and third null skaters get
*separate*ability estimates. - Estimating null skaters’ abilities means that the average real skater’s ability can be set to 0.
- Null skaters’ ability estimates are
*not*subject to the prior. - Each null skater’s ability is treated as the same each year, unlike real skaters, and as the same for each team.

Lastly, SALO assumes for convenience that it is only possible to get one shot on goal per second. The rare cases where multiple shots are recorded are treated as a single shot. This occurs in only about one half of one percent of all seconds with a shot recorded, so even the baseline log odds of more than one shot for either team in a second would be not be estimated very reliably.