• Analytics Blog

LAST UPDATED

Nov 2nd, 2018

Part 1: Age profiles and adjusting strokes-gained from 1983-present

[Skip ahead to the discussion of projections]

There is a lot of interesting stuff to get to in this article, so we’ll cut
the fluff and get right into the details. The goal of this statistical exercise
is to project the performance level of golfers several years into the future.
The metric of performance we focus on is adjusted strokes-gained.
We use this because we believe it is the least noisy measure of golfer performance.
Recall that, in a given round of golf, a golfer’s adjusted strokes-gained is the
number of strokes better or worse their score is than some benchmark golfer
(e.g. the average player on the PGA Tour). You can roughly think of this as being
calculated by taking a golfer’s raw score and first subtracting the mean score of
the field on the given day, and then making a further adjustment to correct for
the average skill level of the field.

We are going to be predicting a golfer’s average adjusted strokes-gained in future seasons. Critical to this are two things: 1) properly adjusting scores over long periods of time, and 2) understanding age profiles of golfers. Let’s tackle these two things in turn.

Adjusting scores over many seasons is fundamentally no different than adjusting them across tournaments in a given season. Therefore, we adjust scores from 1983 to 2018 using the same method we’ve used before to adjust scores across tournaments and tours within a given season. To see some details and a link to the academic paper we roughly follow, see the first endnote of this article. The core idea when adjusting scores over time is that by estimating each golfer's ability at each point in time, we can properly assess the difficulty of a course on a given day. A key intuition is the following: to compare the 2009 version of Tiger Woods to the 1992 version of Fred Couples, we are comparing Woods’ and Couples’ performance against common opponents (e.g. they both played against Phil Mickelson). Additionally, every player’s skill level is allowed to vary over time (e.g. Mickelson was likely at a different skill level when he played against Woods than when he played against Couples). The end result is a strokes-gained measure for every round played from 1983-present on the major professional tours (data for European Tour is from 1996-present, and Web.com data is from 1990-present). For a baseline (i.e. a strokes-gained value of 0) we use the average player on the PGA Tour in the 2000 season. Here is the average value for this strokes-gained measure on some of the major tours since 1983 (subject to data availability):

We are going to be predicting a golfer’s average adjusted strokes-gained in future seasons. Critical to this are two things: 1) properly adjusting scores over long periods of time, and 2) understanding age profiles of golfers. Let’s tackle these two things in turn.

Adjusting scores over many seasons is fundamentally no different than adjusting them across tournaments in a given season. Therefore, we adjust scores from 1983 to 2018 using the same method we’ve used before to adjust scores across tournaments and tours within a given season. To see some details and a link to the academic paper we roughly follow, see the first endnote of this article. The core idea when adjusting scores over time is that by estimating each golfer's ability at each point in time, we can properly assess the difficulty of a course on a given day. A key intuition is the following: to compare the 2009 version of Tiger Woods to the 1992 version of Fred Couples, we are comparing Woods’ and Couples’ performance against common opponents (e.g. they both played against Phil Mickelson). Additionally, every player’s skill level is allowed to vary over time (e.g. Mickelson was likely at a different skill level when he played against Woods than when he played against Couples). The end result is a strokes-gained measure for every round played from 1983-present on the major professional tours (data for European Tour is from 1996-present, and Web.com data is from 1990-present). For a baseline (i.e. a strokes-gained value of 0) we use the average player on the PGA Tour in the 2000 season. Here is the average value for this strokes-gained measure on some of the major tours since 1983 (subject to data availability):

The quality of the average golfer on each of the 3 tours we analyze has steadily
gotten better over time. From 1983 to 2018, we estimate that the average golfer
on the PGA Tour improved by about 1.6 strokes. (In an old article, we used a
different (and likely much less reliable) method and found larger differences - we
place much more faith in the estimates here).

It is always controversial in sports analytics work to present statistical comparisons of athletes across generations. This is likely because there are always assumptions involved and also because different camps of observers often have strong priors about performance across generations. We won’t spend much time here arguing about the validity of these estimates; as you will see later, the prediction exercise does not hinge on belief in our estimates of changing average skill level over time. That being said, they are important to understanding age profiles of golfers, so that’s why we’re covering it.

Now, let’s talk about age profiles. Like most skills in life, we would expect a professional golfer’s ability to increase as they age before flattening out and eventually declining. It should be clear why the previous analysis and discussion is critical to understanding aging curves: if the quality of the average professional golfer is improving over time, then we may underestimate changes in a golfer’s ability as they age. For example, suppose that your metric for performance is strokes-gained relative to the average professional in the current year. Then, given the results in the first figure, we would wrongly conclude that a golfer who gains 1 stroke over PGA Tour fields in 1990 and does the same in 1995 has not improved. However, after adjusting for the improvements in the average field quality on the PGA Tour from 1990 to 1995, the correct conclusion would be that this golfer improved by about 0.25 strokes per round.

Regardless of which measure you use, either time-adjusted strokes-gained or current-year-adjusted strokes-gained, it is a bit complicated to properly construct age profiles. The reason it is complicated is that not all golfers should be expected to have the same age profiles. In our view, there are two main variables that will affect a golfer’s expected aging curve: 1) their baseline ability (e.g. are they a +1 strokes-gained player at age 22, or a -1 strokes-gained player at age 22?), and 2) how many years they’ve spent in professional golf. The first point is important because the higher is your strokes-gained baseline, the less room for improvement you have (this follows if we assume that there is some natural upper bound to golfer performance). The second point is important if there is a learning curve in professional golf; a 25-year-old that is in his rookie season on one of the professional tours may not be the same as a 25-year-old who has already spent 5 seasons in professional golf.

It is always controversial in sports analytics work to present statistical comparisons of athletes across generations. This is likely because there are always assumptions involved and also because different camps of observers often have strong priors about performance across generations. We won’t spend much time here arguing about the validity of these estimates; as you will see later, the prediction exercise does not hinge on belief in our estimates of changing average skill level over time. That being said, they are important to understanding age profiles of golfers, so that’s why we’re covering it.

Now, let’s talk about age profiles. Like most skills in life, we would expect a professional golfer’s ability to increase as they age before flattening out and eventually declining. It should be clear why the previous analysis and discussion is critical to understanding aging curves: if the quality of the average professional golfer is improving over time, then we may underestimate changes in a golfer’s ability as they age. For example, suppose that your metric for performance is strokes-gained relative to the average professional in the current year. Then, given the results in the first figure, we would wrongly conclude that a golfer who gains 1 stroke over PGA Tour fields in 1990 and does the same in 1995 has not improved. However, after adjusting for the improvements in the average field quality on the PGA Tour from 1990 to 1995, the correct conclusion would be that this golfer improved by about 0.25 strokes per round.

Regardless of which measure you use, either time-adjusted strokes-gained or current-year-adjusted strokes-gained, it is a bit complicated to properly construct age profiles. The reason it is complicated is that not all golfers should be expected to have the same age profiles. In our view, there are two main variables that will affect a golfer’s expected aging curve: 1) their baseline ability (e.g. are they a +1 strokes-gained player at age 22, or a -1 strokes-gained player at age 22?), and 2) how many years they’ve spent in professional golf. The first point is important because the higher is your strokes-gained baseline, the less room for improvement you have (this follows if we assume that there is some natural upper bound to golfer performance). The second point is important if there is a learning curve in professional golf; a 25-year-old that is in his rookie season on one of the professional tours may not be the same as a 25-year-old who has already spent 5 seasons in professional golf.

Below we first present the aging curve using all golfers in our sample, followed
by a few aging curves for different samples of golfers. Each data point on these
curves is interpreted as a golfer’s expected strokes-gained relative to their
performance at age 21 (or whatever the youngest age is in the sample). See endnote
[1] for a discussion of how these
curves are constructed. In all of the figures that
follow, there are two age profiles: the blue line is the “time-adjusted” age profile and
the red line is the “relative” age profile. The difference between them is that
the time-adjusted profile uses a strokes-gained measure that has corrected for the fact
that the skill level of the average professional has changed over time,
while the relative profile uses a strokes-gained measure that is relative
to the average PGA Tour golfer in the current year.

The age profile constructed using all of our data indicates that the typical
professional golfer’s “true” performance improves steadily up to about age 32,
flattens from ages 32-37, and then declines steadily up to age 48. In contrast,
the relative age profile, which is confounded by the fact that the quality of
the average PGA Tour professional has increased steadily over time, indicates
that performance starts declining at age 33 and continues to decrease to age 48,
ultimately reaching a much lower skill level than the starting performance level
at age 21.

There shouldn’t be too much debate around the validity of the relative age profile. While we have adjusted scores within each year for this (across tournaments and tours), there are not any real statistical tricks going on. If you performed the same exercise using average strokes-gained on the PGA Tour (unadjusted for field strength - so, just subtracting off the mean score in each round) you would likely obtain a similar profile. The time-adjusted SG profile, on the other hand, requires believing that we have correctly adjusted scores over this time period. It would be interesting to see if professional golfers agree with the true age profile (i.e. the time-adjusted SG profile) here, given their own experiences. They would have to be able to assess how their performance has evolved over time, irrespective of how their performance has evolved relative to the tour average - a pretty hard thing to do, especially with technological change occurring at the same time.

We provide the other graphs which use specific subsets of golfers to highlight the fact that age profiles are likely to differ depending on several factors. For example, it matters how high the baseline ability of the golfer is. This is due to the fact that we always expect some regression to the mean for any player who has a great season (i.e. Spieth may have been performing above his true ability as a young professional), and it is also due to the fact that players who truly have high abilities at young ages have less room to improve (i.e. even if 22-year-old Spieth truly is a +2 SG player, we know from historical data that he is very unlikely to improve beyond +2.5 SG, for example). In looking at plot (a), it doesn’t seem like selecting for players who had unusually great starting seasons is much of an issue, as the aging profile still slopes up beginning at age 21. But, we do see much less of an improvement leading up to peak performance at age 32 than when we examine players who started their careers performing below the PGA Tour average (subplot b), which speaks to the point about how there simply is less room for improvement for elite young golfers.

The other dimension of aging profiles we examine is the age at which a golfer entered our dataset (i.e. a proxy for when they started their professional careers). It seems plausible that golfers who start their professional careers at age 20, compared to those who started when they were aged 25, would have different profiles from age 28-32, for example. We do see that the early starters appear to have aging curves that start to decline at slightly younger ages, but overall their aging curves look pretty similar.

A final point is that the older end of the aging profile may be harder to interpret. It is likely that we are selecting for players who did not experience a huge decline in their performance as they moved into their mid-to-late 40s. Players who really drop off are unlikely to show up in our data, even though we do have data for the developmental tours. There are plenty of examples of professionals who end their playing careers in their 40s to pursue other things (e.g. commentating). To the extent that this happens in our data, we may underestimate the drop off in performance that occurs at the tail-end of the aging curve.

There shouldn’t be too much debate around the validity of the relative age profile. While we have adjusted scores within each year for this (across tournaments and tours), there are not any real statistical tricks going on. If you performed the same exercise using average strokes-gained on the PGA Tour (unadjusted for field strength - so, just subtracting off the mean score in each round) you would likely obtain a similar profile. The time-adjusted SG profile, on the other hand, requires believing that we have correctly adjusted scores over this time period. It would be interesting to see if professional golfers agree with the true age profile (i.e. the time-adjusted SG profile) here, given their own experiences. They would have to be able to assess how their performance has evolved over time, irrespective of how their performance has evolved relative to the tour average - a pretty hard thing to do, especially with technological change occurring at the same time.

We provide the other graphs which use specific subsets of golfers to highlight the fact that age profiles are likely to differ depending on several factors. For example, it matters how high the baseline ability of the golfer is. This is due to the fact that we always expect some regression to the mean for any player who has a great season (i.e. Spieth may have been performing above his true ability as a young professional), and it is also due to the fact that players who truly have high abilities at young ages have less room to improve (i.e. even if 22-year-old Spieth truly is a +2 SG player, we know from historical data that he is very unlikely to improve beyond +2.5 SG, for example). In looking at plot (a), it doesn’t seem like selecting for players who had unusually great starting seasons is much of an issue, as the aging profile still slopes up beginning at age 21. But, we do see much less of an improvement leading up to peak performance at age 32 than when we examine players who started their careers performing below the PGA Tour average (subplot b), which speaks to the point about how there simply is less room for improvement for elite young golfers.

The other dimension of aging profiles we examine is the age at which a golfer entered our dataset (i.e. a proxy for when they started their professional careers). It seems plausible that golfers who start their professional careers at age 20, compared to those who started when they were aged 25, would have different profiles from age 28-32, for example. We do see that the early starters appear to have aging curves that start to decline at slightly younger ages, but overall their aging curves look pretty similar.

A final point is that the older end of the aging profile may be harder to interpret. It is likely that we are selecting for players who did not experience a huge decline in their performance as they moved into their mid-to-late 40s. Players who really drop off are unlikely to show up in our data, even though we do have data for the developmental tours. There are plenty of examples of professionals who end their playing careers in their 40s to pursue other things (e.g. commentating). To the extent that this happens in our data, we may underestimate the drop off in performance that occurs at the tail-end of the aging curve.

(Astute readers may wonder.. if aging effects are important, don't you need to
incorporate this to properly adjust scores in the first place? This thinking is correct,
and in theory we have incorporated aging into our score adjustment method
[2].)

Part 2: Understanding our career projections

Next, let’s move on to the discussion of how best to predict career trajectories.
As was eluded to earlier, for this prediction exercise we are going to be using
the "relative” strokes-gained measure as the measure of performance. Recall that
the interpretation of this
measure is strokes-gained relative to the average PGA Tour professional in
each season, while time-adjusted strokes-gained is a measure of performance
relative to a specific year (e.g. the average PGA Tour player in 2000).
The main reason we focus on relative strokes-gained is that an important part of this project
is finding comparable golfers to the current professionals using our historical database.
If we were to use the time-adjusted strokes-gained measure, it would be the case
that most of the top comparisons for today’s players are recent players. For example,
using the relative measure, we find that Rickie Fowler and Davis Love III were fairly
similar golfers at the start of their careers. They both performed at a level between 1-2
strokes better than the average PGA Tour player (at the time) in most of their seasons
between the ages of 22-27. However, using the time-adjusted measure, we
would say that Fowler was performing at a significantly higher level than Love III
(because we estimate that the PGA Tour was much stronger in 2010 than in 1990),
and that they therefore are not that comparable. Using
the relative measure also has the benefits of being more intuitive and not requiring that readers believe
we have correctly adjusted scores from 1983-present. We also don't think that the projections
would be significantly altered using the time-adjusted strokes-gained measure
instead of the relative one [3].

(Here are rough guidelines for inferring performance level from relative strokes-gained averages:
+2 is typically near the level of a top 5 player in the world, 0 is the PGA Tour average,
and -1 is the European Tour average.)

The historical database includes data from 1983 on the PGA Tour, from 1990 on the Web.com Tour, and from 1996 on the European Tour. To find the top comparisons for each present-day golfer, we use a few (subjectively chosen) characteristics. The most important is age: for a 27-year-old Rickie Fowler, all comparisons are chosen from the 27-year-old versions of each golfer in our database. The characteristics used to calculate similarity amongst this set of 27-year-old golfers are: average strokes-gained and rounds played in each of the previous 3 seasons, career average strokes-gained, number of years as a professional golfer, and the fraction of their career spent on the PGA Tour (as opposed to other tours).

The historical database includes data from 1983 on the PGA Tour, from 1990 on the Web.com Tour, and from 1996 on the European Tour. To find the top comparisons for each present-day golfer, we use a few (subjectively chosen) characteristics. The most important is age: for a 27-year-old Rickie Fowler, all comparisons are chosen from the 27-year-old versions of each golfer in our database. The characteristics used to calculate similarity amongst this set of 27-year-old golfers are: average strokes-gained and rounds played in each of the previous 3 seasons, career average strokes-gained, number of years as a professional golfer, and the fraction of their career spent on the PGA Tour (as opposed to other tours).

To actually form the projections, we rely on regression models. We project
performance 5 years into the future, and for each year we project a golfer’s mean
performance level (i.e. what we expect), their 90th percentile performance level
(i.e. the performance level that is better than 90 percent of the possible
career trajectories they “could” follow), and finally their 10th percentile
performance level (i.e. the performance level that is worse than 90 percent
of the possible trajectories they “could” follow). This therefore involves
estimating 15 regression models: 1-5 years into the future and at 3 points
of the distribution. To project the mean performance level we use the standard
regression model as you know it, and to project performance at different
percentiles we use quantile regression (see endnote [4]
for a primer).
The main variables used to predict future performance are a golfer’s
current age, the number of years as a professional, their strokes-gained averages
over the last few years as well as the number of rounds played, and their
career average strokes-gained (as well as various interactions of these variables).

For intuition into the projections, there are really just two concepts to consider.
The first is the aging profile - recall that, because we are using the relative
strokes-gained measure, the relevant age profiles to look at are the red lines in the figures above.
All else equal, we should expect younger golfers to improve and older golfers’
performance to decline. Using the relative strokes-gained measure, this decline
seems to start around age 32-33. The second concept is regression to the mean:
we always expect present-day performance gaps between golfers to narrow over
time. Why should we expect this? Let's think of golf performance as mainly the
result of golfer skill, but also as something influenced by “luck” (where “luck”
should not be necessarily thought of as fortunate bounces off of trees, but as unusually good
or poor stretches of performance by a golfer, which could occur for any number
of reasons). Under this model, we will observe regression to the mean: if two
golfers are separated by, for example, 3 strokes on average per round in one
season, we expect them to be separated by less than that the following season.
The reasoning is simple: some of that performance gap is likely due to luck
and not true skill differences, and we do not expect luck to persist to the
following season.

Another important, and related, point is that the further the projections are into the future,
the closer together they become. A present-day 3 stroke difference between
two golfers might be projected to narrow to 2 strokes by next season and to
just 1 stroke 5 seasons into the future. (The best projection for next season belongs to
Dustin Johnson at +2.3, while the best projection for 5 seasons from now belongs
to Jon Rahm at just +1.76.) Again you could think of several
models of lifetime performance that could produce this pattern. Intuitively,
the further into the future we look, the more opportunity there is for a
golfer’s ability level to change. These changes will exhibit
regression to the mean: golfers with a current ability level that is in
the right tail of the ability distribution will be statistically more
likely to experience changes for the worse. (If you are befuddled by the
concept of regression to the mean,
start here,
and if after that you want to take the full dive into the rabbit hole, see
[5]).

Finally, we should point out that while the writing above would seem to imply we are making choices about how much regression to the mean to apply to our projections, this is not the case. The projections are data-driven: they are the values that best fit the data, conditional on using the class of models we use (i.e. regression models). The concepts of regression to the mean and aging profiles are useful for understanding the career trajectories of golfers, but they are not imposed on to the data in any way when forming the projections.

Let’s break down a specific example. Here is Justin Thomas’ projection:

Finally, we should point out that while the writing above would seem to imply we are making choices about how much regression to the mean to apply to our projections, this is not the case. The projections are data-driven: they are the values that best fit the data, conditional on using the class of models we use (i.e. regression models). The concepts of regression to the mean and aging profiles are useful for understanding the career trajectories of golfers, but they are not imposed on to the data in any way when forming the projections.

Let’s break down a specific example. Here is Justin Thomas’ projection:

Thomas will spend the majority of 2019 as a 26-year-old. This puts him on
the increasing side of the aging curves shown above. However, Thomas has
also performed at a very high level to begin his career, especially in the
last 2 seasons. We have Thomas’ performance projected to be lower next season
than it has been the last two years. Therefore, in his case, the (negative)
effects of regression to the mean outweigh the (positive) aging effects.
Thomas’ projection is also lower due to the fact that in seasons before 2017,
he was not performing at the level he did in 2017 and 2018.
If he had performed at the level of the previous two seasons for each of his first
five seasons on tour, then we would not project as much
regression to the mean (see Jon Rahm’s projection for an example). As the
projection moves farther out, we see more regression to the mean for Thomas,
which is typical of all the top players. In terms of projecting quantiles,
Thomas’ 90th percentile projection basically hovers around 2.5. This is also
characteristic of a lot of the top players: it is simply very rare for a golfer
(other than the younger versions of Tiger) to on average gain more than 2.5
strokes per round in a season. The final point, which is common to most golfers’
projections, is that the gap between the 10th and 90th percentile projections
widen as we move further into the future; this is simply because the longer
the timeframe under consideration, the more uncertainty there is around a
player’s future performance. Even though we do project Thomas' performance
to decline slightly in the next 5 years, he still has 3rd highest projection
5 seasons from now (behind just Rahm and Spieth).

One interesting question is the following: for good young golfers, at which point do the negative effects of regression to the mean outweigh the positive effects of aging? The answer will depend on several characteristics of the golfer at hand, but a rough answer appears to be around +1 strokes-gained. Take Daniel Berger as an example: he will be 26 years old next season and has averaged around +1 strokes-gained in each of his last four seasons. Our projections for him for the next 5 years are basically flat at +1. You can think of this as the result of the offsetting effects of regression to the mean and aging.

One interesting question is the following: for good young golfers, at which point do the negative effects of regression to the mean outweigh the positive effects of aging? The answer will depend on several characteristics of the golfer at hand, but a rough answer appears to be around +1 strokes-gained. Take Daniel Berger as an example: he will be 26 years old next season and has averaged around +1 strokes-gained in each of his last four seasons. Our projections for him for the next 5 years are basically flat at +1. You can think of this as the result of the offsetting effects of regression to the mean and aging.

In general, the regression models we use seem to provide a lot of nuance in the
projections. This is mainly due to the inclusion of several interaction terms
[6].
For example, the interaction between age and various measures of past strokes-gained
(e.g. last season, two seasons ago, career to that point) allows for separate age
profiles for golfers who have performed differently in the past. We saw this
with Thomas above: young elite golfers are expected to improve less (or even regress)
than young average or below-average golfers. Another important interaction term
in these models is the number of years spent on tour interacted with the golfer’s
career strokes-gained average. Averaging +2 strokes-gained over a career that has
spanned 20 years is worth more than averaging that over a 5-year career. This is
largely what is driving Tiger Woods’ excellent projection for the next few seasons.

If you've made it through the document, awesome, hopefully it was useful and insightful. You can explore all of the projections and the historical comparisons for today's players here.

If you've made it through the document, awesome, hopefully it was useful and insightful. You can explore all of the projections and the historical comparisons for today's players here.