# Risk is the ally of the underdog

When talking about sports strategy, there are often ideas that seem counterintuitive and obvious at the same time.

This fact has become painfully clear to me during the last year and a half or so, ever since I first advanced the idea of Braess’s Paradox in basketball. The idea (which eventually became an academic paper and a presentation at last year’s Sloan Sports conference; download the PowerPoint file here) generally prompted one of two responses: “Wow, that’s really cool” or “…why did you need a bunch of confusing diagrams and equations to say something I already knew?”.

Both responses seem perfectly reasonable to me. As I see it, the reason for this dichotomy is that the main idea behind the “price of anarchy in basketball” can be stated in two ways:

1) “A team needs to maintain diversity of its offensive options in order to stay effective.”

OR

2) “In order to play its best possible game, a team often has to *intentionally *run a play that is less likely to succeed than other plays.”

While the second statement seems surprising and counterintuitive; the first seems obvious. Both of these statements, though, are equivalent to saying “running the best possible play each time down the court is not the same as playing the best possible game.”

In general, this is a fairly common predicament: an idea can be stated in two equivalent ways, one of which sounds wrong while the other seems correct. It is my opinion that whenever such situations arise you need a quantitative description that can be carried through to a single logical conclusion. In this way, the logical rigor of mathematics can serve as your King Solomon, adjudicating between right and wrong.

In this post I want to discuss another sports idea that seems both obvious and counterintuitive. The obvious phrasing goes like this:

An underdog needs to accept risk in order to give itself a real chance of winning. A heavily-favored team, on the other hand, should try to minimize risk.

This statement has a much more counterintuitive partner, though. Namely:

A team’s best strategy for success involves

intentionally lowering its expected offensive efficiency.

In other words, sometimes an underdog needs to adopt a strategy that will lead, on average, to a worse loss.

In the remainder of this post I’ll discuss why this idea is correct and how it can be made quantitative. Then, as an example, I’ll apply it to the following simple question: how often should a basketball team shoot the three?

**Is risk your enemy or your ally?**

The major point of this post is to assert and then explore the following statement: There is a significant difference between optimizing the efficiency of your offense and optimizing your chance of winning.

As an illustration of why this is true, consider the following simple example. Imagine that you are the coach of a basketball team that trails by three points with only one shot left. Should you tell the team to go for a two- or a three-pointer?

The answer to this question, of course, is obvious: you shoot the three. But to understand the implications of this answer, consider how it sounds in statistical language. Your basic choice is between a high-probability 2-point shot (which the defense will likely allow without a fight) and a low-probability 3-point shot (which they’ll probably defend carefully). Say that you estimate the chance of making a 2-pointer at 80%, while the chance of making a 3-pointer is only 20%. This means that, on average, calling for the two-point shot will yield points while calling for the three-point shot will give only . So by calling for the three-point shot, you are instructing your team to make a play that is almost three times worse in terms of average number of points scored.

Think of it this way: if your team were coached by a robot that had been programmed to optimize the team’s expected number of points scored, then the robot coach would immediately order your team to go for two. How would you explain to the robot (or its programmers) the flaw in its design?

I might say it this way: when winning is unlikely, you need to be willing to sacrifice from the *average* outcome in order to improve the *best possible* outcome. Or, as a more general principle, an underdog must be willing to accept greater-than-average risk.

Now let me try to translate that statement into mathematical language: A team whose expected scoring output is lower than that of its opponent (an underdog) should pursue strategies with a higher variance (more “risk”), even if these result in a lower mean.

Got that?

Let me try to make the point graphically. When your team decides on a strategy for the remainder of the game, its final number of points scored can be described by some distribution (a “probability density function”). This distribution has some average (the mean) and some width (the standard deviation). Similarly, your opponent has some distribution describing its own final score. If your opponent is generally better than you, then their distribution will have a larger mean than yours.

For you to win the game requires two unlikely events to happen simultaneously: you have to happen to score near the top of the distribution, and your opponent has to score near the bottom of their distribution. As such, the probability that you will win is represented graphically by the overlap between your distribution and that of your opponent. More overlap means a better chance for you to pull off the upset.

The figure above shows this principle schematically. The two blue curves represent two hypothetical possible strategies that your team can employ. One of them results in an average final score of 100 points while the other results in an average final score of 92 points. In this case, though, the 92-point strategy is the better one because it is accompanied by a much larger variance, so that the overlap with your opponent’s distribution is greater.

You could also say it this way: if you have an opponent who is going to score about points, a strategy that yields points is much better than a strategy that yields .

On the other hand, if it’s worthwhile for the underdog to lower their mean in order to *increase* their variance, then it must also be worthwhile for the favored opponent to lower their mean in order to *lower* its variance. Like this:

In other words, if you’re the favorite to win the game, it can be worthwhile to play conservatively. Such conservative strategies lower your average score, but they also reduce the likelihood of very low scores that would produce an upset.

So now that the tradeoff between large average score and large/small variance in score is apparent, we have an optimization problem. How much should a team be willing to sacrifice from its average score (or average offensive efficiency) in order to increase/reduce its variance? How much risk is the right amount?

In this post I want to show that these questions can be answered quantitatively. As a simple example I’ll use the most straightforward risk/reward question in basketball: how often should my team shoot the 3?

**Live by the three, die by the three?**

In basketball, taking a three-point shot is an inherently risky play: it’ll give you more points, but you are more likely to miss. Imagine, for example, the following scenario. You are in the gym, shooting around, when a friend bets you $100 that you can’t score 100 points on 100 shots. You are given the option of shooting either 2’s (from the free throw line) or shooting 3’s. Which would you choose?

The answer, of course, depends on your shooting percentages from each spot. Consider, for example, the following hypothetical cases:

A) You shoot 45% from the free throw line and 30% from the three-point line.

B) You shoot 54% from the free throw line and 36% from the three-point line.

Which option would you pick in each case: 2’s or 3’s?

The numbers in cases A and B are cleverly chosen so that your average number of points scored is the same no matter which option you choose. But the risk associated with the two strategies is different. In scenario A, for example, there is a 95% chance that your score will fall between 72 and 110 if you shoot 2’s. If you choose to shoot 3’s, however, that same window of probability is between 63 and 120. So your risk (lowest reasonably-likely score) is higher, but your reward (highest reasonably-likely score) is also higher. Which should you choose?

The answer is this: if you are an underdog (scenario A, where your average score is 90 points), go for the three; risk is your ally. If you are favored to win (scenario B, where your average score is 108 points), go for the two; risk is your enemy.

Now let’s move beyond this contrived example to a slightly less-contrived example: a hypothetical NBA team (let’s call them the Timberwolves) is considering how often they should shoot the three against a highly favored opponent (let’s call them the Lakers). The Timberwolves can score 50% of the time if they go for the two and 30% of the time if they go for the three. How many three-pointers should they shoot?

The Lakers, for their part, are a well-coached team who realize that they should play conservatively. So let’s say (for the sake of argument) that they take only low-risk 2-point shots, which they make 55% of the time. In this case, the distribution of their final score is well-known: it is given by the binomial distribution. [*See the important footnote at the end of the post.]

In deciding how often to go for three, the Timberwolves face the following tradeoff: shooting three’s lowers their average score, but it increases the variance of their score. What is the optimal amount?

The distribution of final scores for the Timberwolves is also given by the binomial distribution (more correctly, a combination of two binomial distributions: one for 2’s and one for 3’s). Since both distributions are known, it is possible to calculate the probability that the Timberwolves will win the game as a function of the number of 3’s they take. From there we can calculate the “optimal strategy” for the T-Wolves.

The result, shown below, depends on two crucial factors: how many possessions are left in the game, and how large a deficit the Timberwolves are facing. When the score is close or there are many possessions left, the Timberwolves should shoot mostly 2’s, thereby optimizing their average output. On the other hand, when the Timberwolves are facing a large deficit with only a relatively short time left, they should shoot mostly 3’s in a (somewhat) desperate attempt to catch up.

This result is pretty interesting: there are clearly-separated phases of three-point shooting and two-point shooting. But the result is also pretty unrealistic. For example, it says that a team down by 20 points with 30 possessions left (about a quarter and a half) should take a 3-pointer with every single shot. In real life, such an approach is pretty unlikely to succeed. What the model above fails to take into account, of course, is that if you start relying heavily on a particular play then the defense will start guarding that play more carefully and its effectiveness will drop. This was the essential observation behind the “price of anarchy in basketball” analysis (and, of course, is not originally mine).

So let me present another, very slightly less contrived scenario, this time involving G&L staple Ray Allen. Suppose your team is comprised of four decent two-point shooters and one stellar three-point shooter named Ray Allen. The two-point shooters can score 50% of the time. Ray Allen can score 50% of the time when used very rarely, but the more often he is called upon to shoot the three, the more his percentage will decline due to increased pressure from the defense. Let’s say that Ray Allen’s shooting percentage is related to the fraction of the team’s shots he takes by

.

The strategy that optimizes the team’s average scoring output is for Ray Allen to shoot a fraction of the team’s shots (almost the same as everyone else), which results in the team scoring about 105 points per 100 possessions. (Details on how to figure this out are here, or in the academic paper I linked to above).

Now let’s say that your team (with Ray Allen) is the underdog against an opponent that scores 110 points per 100 possessions (again, let’s have them be the Lakers). In this scenario your team should carefully weigh how to use Ray Allen. If you have him shoot about 20% of the time, you will optimize the average scoring output of your team. If you push him to shoot more, your average scoring will suffer but the variance of your score will increase. What is your optimal strategy?

Below is the result of the calculation: the team’s optimal strategy as a function of possessions remaining and deficit. I should admit that some of the points in the bottom left-hand corner are questionable (if your team has one shot remaining and Ray Allen takes it, is it fair to set so that Ray Allen’s shooting percentage is 10%?). This is an inherent weakness of using deterministic “skill curve” relations like my above.

The general message, though, is pretty clear. When there is a lot of time remaining and the lead is not too big, your team should be looking to optimize its average efficiency: have Ray Allen shoot about 20% of the time. When you are facing a big deficit, however, it’s worth being “risky”. In these situations, having Ray Allen shoot more than 20% of the time lowers your average score, but it increases your chance of winning.

**Toward quantifying risk and reward**

In the end, these are only very simplified examples of what it means to take risk in a sporting contest. There are lots of other ways, and these have become particularly popular points of discussion during the last year or two (here, for example). But I really believe that with the right analysis these questions can be answered quantitatively, so that the intuitive notion of “we need to take risk” can be turned into a definite answer to “how much risk, exactly, should we take?”.

*** Footnote**

An underdog needs to increase the variance of its final score. Given that fact, it’s really tempting to think that the solution is for underdog teams to bring in “streaky” players. If you’re down by 20 points, it seems, it’s worth rolling the dice with an unpredictable scorer who might get hot and go 17/20 from the field or might stink up the court and go 4/20. (If you’re a Timberwolves fan, this person is Michael Beasley.)

The problem is that statisticians have found absolutely no evidence for such players. As far as a large number of advanced statistical methods can tell, the shooting patterns of every NBA player are consistent with the idea that all shots are statistically independent of each other. There are no hot hands.

If this is true, then there is a strict relationship between the team’s shooting percentage and the variance of its final score. Namely,

.

So apparently in basketball the only control a team has over their variance is through the value of the shot they are taking (a 2 or a 3).

Football, on the other hand, may be an entirely different matter, since there is a huge range of “values” for different plays. It’s unfortunate, in that sense, that I grew up a basketball fan. Assessing risk in football might be a much more interesting problem.

UPDATE: Since writing this post, its main idea has evolved to become a talk at the Sloan Sports Analytics Conference and a full-length paper (also available on the arXiv).

I didn’t read the final 4/5 of your post…but the first thing you write about Braess’s paradox does not apply to basketball. You have in your head a deterministic model of offensive basketball strategy where you can simply allocate more or fewer shots to one player or the other. The truth is closer to a model where offensive players respond to randomly drawn scoring opportunities within the flow of an offense. Optimal strat in the game = optima strat on an individual possession. The alternative could only be true if you could trick the defense into sub optimally allocating their energies.

What you’re saying is a common criticism, and probably a fair one. Is it correct to use a deterministic formula for the efficiency of a play as a function of use? That’s a fair question. The “skill curve” is still a pretty theoretical concept (which doesn’t belong to me) that you are not required to buy into.

I think of it this way, though: when the offense comes down the court, they have to try

something. High screen and roll? Isolation on the low post? Backdoor baseline cut? The final execution of the play may of course involve some improvising, as dictated by how the defense responds, but the offense still has to make the first move. The question I’m trying to answer is: how often should they try each option? It seems to me very reasonable (likely, even) to assume that if a team tries to run the same thing over and over again the probability of scoring will decline.You don’t have to agree with my reasoning, but it seemed to me likely to be true, and if it is there are a variety of strange and interesting consequences.

Maybe we can compromise somewhat by saying that if there is an effect its strength will depend on what kind of offensive system the team is running and how structured it is.

Great post, which adds some important features. I will say this same thesis has been stated a few times, particularly in response to a Malcolm Gladwell article last year. Two good (football related) articles on this can be found here:

http://smartfootball.blogspot.com/2009/02/conservative-and-risky-football.html

http://www.advancednflstats.com/2009/05/are-nfl-coaches-too-timid.html

Thanks for the links, Gregory. Those are good ones, and I appreciate you pointing out that my main point here is not a novel one.

One could easily transfer that to boxing. The boxer with the lower reach, weight, height, whatever being nominally inferior, must hope for a few powerful punches. The technically superior opposition can try to boil their opponent tender.

In soccer, smaller team hope for bigger teams to fall for counters and let themselves being dragged into an open clash with tempo up front, causing late tackles. The best examples I can make up right nore are, even though not quite alike, Greece 2004 and Germany 2010.

I know basically nobody knows jack about handball, but that would be a whole lot more interesting to analyze by the means of possible parameters to toy with.

This is eye-opening.

I’ve thought a lot about how a team’s optimal strategy changes with the in-game deficit and that team’s skill level relative to their opponent. However, instead of looking at it from a shot selection point of view, I wanted to know when (if ever) it is favorable to play run-and-gun vs. a slower, more defensive-minded game.

If anyone cares, two things led me to ask this question. For one, I’ve been reading Dean Smith’s Multiple Offense and Defense where, in the opening pages, Smith says he favored a faster style of play when he felt he had a slight edge over the opponent and a more careful style of play when he felt he was over-matched. This makes sense, I suppose, if you assume a slight edge in scoring efficiency can be magnified by playing more possessions, and that playing fewer possessions when you’re outmatched might have the effect of “randomizing” the outcome–the extreme case being, each team plays only one possession and attempt only one shot. (The other reason I was thinking about this is that I’m a Knicks fan and I often wonder where notion came from that D’Antoni’s style is not optimal for playoff matches.)

But for in-game deficits with the clock ticking away, I think your graphs point to a solution. In the area where it becomes more optimal to shoot the 3PA, a team could also opt for a faster style, thus in a way moving horizontally to the right (because they would in effect be increasing the number of possessions remaining).

How do you see a team’s chances of winning affecting a team’s pace of play and do you think there’s anything to the claim that a run-and-gun system can’t win an NBA championship? (It certainly won us a couple NCAA championships in ’05 and ’09!)

Hi Jared,

Your points about pace are good ones. In general, an underdog has an incentive to make the game as short as possible.

For example, the Timberwolves have almost no chance to beat the Lakers in a 7-game series, but their chance of outscoring the Lakers in a given quarter is decent and their chance of outscoring them during a given 60-second interval is almost 50%.

So an underdog should be willing to slow the game down (which is why they invented the shot clock in basketball). The question is, to what extent should they do so? If they face a large deficit or if slowing the game hurts their offensive efficiency, then they need to weigh the trade-off.

I’m submitting a paper to the Sloan Sports Conference again this year, and it considers this question in more detail. I might discuss the results in this blog in the near future.

Right, I see what you’re saying. But in the case where a team is behind 10 points with 2 minutes left, they should probably attempt to run up the number of possessions remaining — in essence, giving them more time — even if it will likely aggravate a small disadvantage in efficiency. (This is, like you pointed out, assuming changing the pace of the game does not have any effect on their offensive efficiency.)

By the way, I enjoyed your last paper and I think I recall you saying somewhere it would be nice to have some detailed possession and personnel data to validate the theory. I am working on some scripts to parse NBA play-by-play files (grouping them into possessions, calculating certain attributes like “fast break possession”, and ultimately generating metrics), so if you still have a use for any of that, please feel free to send me an email.

Thanks for the great posts!

Thanks. There are actually a fair number of people who have worked on a similar project — check out basketballgeek.com, for example, and browse through the APBRmetrics forum.

Personally, this is how I imagine the optimal way for a team to keep track of statistics: You memorize the team’s playbook, and every time the team calls one of its plays you record which play was called, how many points it produced, which option it ran, and which players were on the floor. Then I think you could really put together a statistical description that is useful for the coach. Perhaps, even, you could get all the necessary statistical parameters to plug into my “price of anarchy” theory and use its optimality calculations.

This is what I daydream about, anyway. I’m not sure its feasible if you don’t have the team’s playbook, or at least a strong knowledge of the game and a lot of patience.

The key piece from the play-by-play that we don’t have is whether each play was a post-up, catch-and-shoot, pick-and-roll, etc. But we do have who was on the floor, the “type” of shot (these things are labeled “running jump shot”, “hook shot” — not terribly useful), time left on the clock, and based on a few of those, you can tell whether it was a transition play or not.

I know there are some teams that keep track of it all. Here’s a pretty interesting Q&A with Erik Spoelstra about the data his staff keeps: http://www.nba.com/2011/news/features/john_schuhmann/01/22/spoelstra-qa/index.html?ls=iref:nbahpt1