Here’s MARKOV

With version 2.0 of the SALO player ability model(s) I introduce a new WAR1 approach, which I’m calling MARKOV (or “MARKOV Approximation for Reasonable Konstruction of Overall Value”).

MARKOV doesn’t involve explicitly estimating a linear weight to give each separate player ability estimate as a driver of wins. Rather, MARKOV’s WAR values come from an expected distribution of game results directly implied by the underlying ability model itself. The link between models for player and situation effects on event rates and whole-game outcomes is a mathematical object called a Markov chain.

This technique seems potentially of interest to future producers and users of hockey WAR:

This post covers the motivation and theory behind Markov chain-based WAR. Part II (forthcoming) will detail the application of that theory to SALO to get the specific implementation named MARKOV.

Motivation

Getting WAR numbers from play-by-play data ultimately consists of two tasks – analysis followed by synthesis:

Exemplary is the way this is done at Corsica:2

SALO uses entirely the same kind of regression approach to analysis, but models fewer event types than Corsica does.

Meanwhile, the linear approach to the synthesis task is a great starting point, but leaves two openings for improvement.

First, wins added aren’t linear in goals added. There are diminishing, not constant, marginal returns – each additional goal a player contributes (to an average team) is a bit more likely than the last to come in a game the team is already winning.4

Second, events added aren’t linear in ability – events added due to player effects change the game context (e.g., the score), and game-context effects also add (or remove) events. Players’ estimated abilities are only their direct effects on event rates, not their indirect effect on those rates through changes in the expected game situation. Linear synthesis a la Corsica captures only the former.

To defeat these limitations, I propose to replace all the linear steps in the synthesis task with a Markov chain. If you’re familiar with that concept, skip to Theory. Otherwise, read on.

On Markov chains

Definition

A Markov chain is a mathematical representation of a real-world system, with two parts:

  • A finite space of mutually exclusive states the system can be in; and
  • A square matrix \(M\) giving the probability \(M_{ij}\) of any state \(j\) a moment from now conditional on the current state \(i\).

Example

Picture a cat.5

  • Right now, the cat is either asleep (\(1\)) or awake (\(2\)).
  • The cat’s resting behavior is (realistically) random:
    • If the cat is asleep, with probability \(M_{12} = 0.3\) it will be awake in ten minutes.
    • If the cat is awake, with probability \(M_{21} = 0.2\) it will be asleep in ten minutes.

This gives us a matrix \(M = \left[\begin{matrix}0.7 & 0.3 \\0.2 & 0.8\end{matrix}\right]\).

So what good is that?

Usefulness: the Markov property

Consider: if the system is in state \(i\) now, with what probability will it be in state \(j\) in two moments?

We can find it from the matrix. For every possible state \(k\) (including \(i\) and \(j\)), just take:

  • the probability \(M_{ik}\) of state \(k\) in a moment given state \(i\) now, times
  • the probability \(M_{kj}\) of state \(j\) in a moment given state \(k\) now.

This is valid because the conditional probabilities don’t change over time. This fact is referred to as the Markov property.

In the cat example, the probability of still being asleep in twenty minutes if asleep now is 55%:

\(M_{11} \times M_{11} + M_{12} \times M_{21} = 0.7 \times 0.7 + 0.3 \times 0.2 = 0.49 + 0.06 = 0.55\)

By the definition of matrix multiplication, though, this sum is just equal to \((M \times M)_{ij} = M^2_{ij}\).

Indeed, the probability of state \(j\) at any future time \(n\) moments from now given state \(i\) now is just \(M^n_{ij}\).

Let’s make use of that.

Theory

It’s not a stretch, of course, to think of a hockey game as a system transitioning randomly among various possible states – including scores.

If we knew the conditional probability of any game state a short time from now given any state now, then by the Markov property we’d immediately know the probability that either team holds a lead at the time the game ends – their expected winning percentage – just by self-multiplying the matrix of those conditional probabilities enough times.

Contrasting those expected winning percentages for two otherwise average teams, one featuring a given real player and one with a replacement player instead, would give us a WAR value for the player – without intermediate steps to calculate events or goals added.

Over a short enough time period, the expected rate of events is the same as the probability of one event.

So how about it? If we already did analysis with a suite of regression models that return player and game situation effects on event rates (or probabilities), is that enough to fill out a matrix of short-term state transition probabilities so that we may solve for a winning percentage?

Yes, it is, under two conditions:

Even if the analysis model(s) don’t perfectly meet these conditions, the further assumptions required to fill out a Markov chain picture of hockey can be generally less severe, and less opaque, than those in a purely linear weights-based WAR.

Actually writing up those assumptions and filling out such a matrix, of course, may not be easy even if it is trivial. Some aspects of game state may always be hard or impossible to represent this way even though they can be readily included in the analysis model(s). All the same, the Markov chain approach may well offer higher fidelity than other known means to derive WAR numbers from a given underlying model.

As stated above, in another post I’ll write up the math used in applying this theory specifically to SALO 2.0 under the name of MARKOV.


  1. I take “WAR” as, by now, the standard term for any single number representing total player value. I’ve gone back and forth in the past on whether to use other terms depending on the nature of the metric, but I’m acclimated to this one by now.

  2. Evolving Wild appears to use goals (GF and xGA) as target variables at analysis time, which saves a step at synthesis time.

  3. The exception is that goals added due to shots added are calculated from expected shots added times players’ own modeled on-ice Sh%, not the league average Sh%.

  4. Evolving Wild accounts for this by plugging goals added into the nonlinear Pythagenpat goals-to-wins formula used in baseball.

  5. I thank my thesis committee member Prof. Jeff Gill for the notion of cat behavior as an illustration of a Markov chain.