# Tagged: statistics

# PGA Tour Stat Analytics Part 1 — Are the Strokes Gained PGA Tour stats correlated to scoring average?

Welcome to the GOTM analytics series! Here’s where I’ll keep writing posts about stats from the PGA Tour and how they affect results, courses, players. I have a background in computer programming, love scraping data, and grabbed a bunch of data already from pgatour.com meaning it’s time for a series of posts where I analyze stats on Tour. If you want to look at the method and code that I wrote to snag this data, you can check out the post on my other blog, Big-Ish Data, specifically this post.

Considering there are tons of different ideas on golf specific stats, I decided to write golf posts here, and leave the other blog for more specific programming topics.

If you have other ideas for Tour stats analytics, hit GOTM up on Twitter and I’ll see what I can do!

And to start out, I just want to say that this post violates Betteridge’s Law because the Strokes Gained stats are *very* impressive at determining scoring averages, and defining which parts of the game players are good at.

### Strokes Gained

Part 1 of the series here is analyzing the importance of Strokes Gained. So what is Strokes Gained? Take a look at the PGA Tour’s press release. If you’re looking for a more technical explanation of strokes gained, read this article.

Basically, they know the average number of shots it takes a Tour player to hole out from a specific distance. They count the number of shots it takes a player to finish the hole from that distance, and then credits him for + or – strokes from the average, with a positive number indicating he is that many strokes better than the average. Then they add his strokes gained from the four different parts of a hole: 1) off the tee, 2) approaching the green, 3) around the green, and 4) putting. The total of those four is his total SG, but SG can also be broken down into the four components.

As an example, let’s look at Daniel Berger:

- Off the Tee: Berger ranks 39th, with an average of +.416.
- Approach-the-Green: Berger ranks 18th, with an average of +.595.
- Around-the-Green: Berger ranks 152nd, with an average of -.136.
- Putting: Berger ranks 25th, with an average of +.462.

To get Berger’s total SG, we add up all four averages to find a total SG of +1.337 which is exactly what the stats show for him here! (Note that these stats are updated weekly, so depending on when you read this, the numbers may not match exactly.) Super cool, and fantastic to show different places where players are better, and where they probably need to practice more. Based on these stats, Daniel Berger may want to focus more on shots around the green, since his average there is pulling down his total SG.

The goal of this analysis is to answer questions about the SG stats:

- How good is the Strokes Gained stat at predicting scoring average? (In case you’re wondering right now, it’s really really good).
- Which of the strokes gained values is most correlated with a player’s scoring average?
- If you have four different players who are all +1 strokes gained in each of the four different SG stats, which one would you expect to have the lowest scoring average?

### Analytics terms defined

Before getting back to the golf stuff, I want to talk quickly about two terms you’ll need to be familiar with for the rest of this article to make sense: normal distributions and r-squared.

The scoring averages on tour follow a normal distribution, like this:

Look at that! So pretty. In this case, the average scoring average is 70.923 with a standard deviation of 0.591. This means that 68.27% of player’s scoring averages are between (70.923 + 0.591 =) 71.514 and (70.923 – 0.591 =) 70.332.

Before running analysis on stats, it’s important to make sure they’re normally distributed. Clearly the scoring average stats are normally distributed, and so are the strokes gained stats, as I’ll show quickly below.

We’re good to go on that front!

One last definition to mention that I’ll talk about a lot is the coefficient of determination, also known as **r-squared**. You’ll see me mention this a lot. R-squared values range from 0 to 1, and the higher the r-squared value, the more correlated the data sets. And since all the data sets we are using are normally distributed, these numbers are correctly comparable.

### SG: Total

When looking at Strokes Gained stats for the first time, I decided to check how correlated the SG: Total stat is with Scoring Average. Checking that there’s an initial indication of how correlated these stats can be. When I saw the following graph and how correlated the numbers are, I knew this was a great part of PGA Tour stats to analyze.

Look at how incredibly, dead, freaking accurate this is! The r-squared value is an impressively high at 0.926, and just by looking at the graph, you can see how correlated those dots are.

Another quick test to see how valid this regression is by checking that fitted line on the graph. Using the equation of that line, if a player has 0 strokes gained total will probably have a scoring average of ~71.079, which is very close to the average scoring average mentioned above of 70.923. Since the slope of that line is -0.96, it means that adding one stroke gained in total, your scoring average will drop by 0.96 strokes. Not exactly 1, but so close that it proves how correct that data is.

Ok, now time to test the more specific SG stats.

### SG: Off-the-Tee

In case you’re not sure how golf works, the first shot you hit on a hole is off the tee. And in this case, the SG stat measures performance using all tee shots on par 4s and par 5s.