Category Archives: baseball stats for beginners

From the top: a beginner’s guide to baseball stats

I don’t want to rewrite history, but I figured since there’s no real good blog that starts from square 1 on baseball stats, here’s a very brief introduction into Saber/Sabr/Sabometrics.

This is going to be weird to hear, but one stat is actually better than another. Certain stats correlate better with success. Here’s what we know.

In the late 19th century, batting average was created. A man named Henry Chadwick, an amateur statistician and cricket fan, came to America and devised a few simple formulas to evaluate hitters, some he took from cricket. Hits divided by at bats became batting average. We would then know how good the batter is at hitting the ball–a very important thing. He also created earned run average and runs scored, which are both still used today.

However, no one seemed to want to improve on this. For years and years, this was just accepted as the standard, and they’re deeply flawed. Batting average doesn’t include a number of things, chiefly how the runner got on base: home runs, triples, doubles, walks and errors. Earned run average doesn’t include runs from error and doesn’t factor in the team’s defense. Of course, at the time, some of these things were unavailable.

The funny thing is baseball isn’t like basketball or soccer, where a shot either goes in for points or it doesn’t. In those sports, counting the number of shots a shooter makes or a striker shoots and scores makes sense. But counting the number of hits a batter gets per at bat doesn’t have the same quantification.

That’s because baseball has different offensive variants with the same goal. Batting average sees a single and a home run as equal, even though they’re completely different. And then it works inversely from there. Three singles in four at-bats is seen as greater than one home run in four at-bats, though one home run guarantees at least one run scored and three singles do not. Not all base hits are created equal, as Saber Library put it (see the link at the bottom). Batting average, it would seem, just doesn’t do a good enough job.

Now fast-forward to some years later and baseball front office members begin to overrate these statistics and underrate the value of things not included. Walks become completely ignored by some front offices simply because Chadwick didn’t include them (Chadwick may have also thought a walk was the pitcher’s fault and wasn’t due to the batter’s patience, since he was a cricket fan). Home runs (a good stat) and runs batted in (a very bad stat) became a benchmark with BA as the triple crown.

A few people came along and started convincing us that maybe we were doing this all wrong. Branch Rickey and Earl Weaver were two of the first managers in baseball to acknowledge that there was more to hitting than just batting average. Rickey, in interviews, came up with a primitive form of on-base percentage and slugging percentage. Weaver liked high-walk batters and was quoted as saying “God bless the three-run homer.”

In the 1980s, things started to take shape. Bill James wasn’t the first person to say “we’re evaluating things wrong,” and wasn’t the first person to try to correct it, but he was the one to popularize it. The first thing James began working on was popularizing on-base percentage, which includes walks into batting average. Indeed, hitters do have quite a bit of say in walks when they choose which pitches to swing at. Out of this came ISOD, or Isolated Discipline (on-base percentage – batting average), which is a good measure of how good a player is when it comes to being patient at the plate.

Later, a distinction was made between at-bats and plate appearances, where at-bats doesn’t include walks, hit by pitches or sacrifices while plate apperances does.

Then came slugging percentage, a sort of a booster patch for batting average that gives credit to a player for the type of hit. A single counts as 1.000, a double 2.000, a triple 3.000 and a home run 4.000. If a batter hits 10 balls into play, gets three singles, one double and six outs, he has a .500 slugging and a .400 batting average. If a batter hits 10 balls into play, gets one single, one double and two home runs with six outs, he has a 1.100 slugging and a .400 batting average. Out of this came ISOP, or Isolated Power (slugging minus batting average), which is a decent measure of how good a hitter is at the skill of hitting.

The flaws with batting average were corrected with these two new stats.

On-Base Percentage and Slugging Percentage were then combined into one stat, simply by adding the two, known as On-Base Plus Slugging, or OPS. That’s not really how it works, but it turned out to be a heck of a lot better for offense than batting average.

This was a solid bench mark for a while, and still is. These are the two basic statistics that have withstood the test of time. But a few people wanted to keep going, they wanted to create the One True Stat.

From OPS came OPS+, which took the league average in OPS, set it to 100 and noted any player hitting above 100 was above-average and the opposite for below. 100 is league average for OPS+, regardless of whether league average OPS goes up or down. It’s also adjusted for ballpark a bit, which makes it a really cool stat. Baseball-Reference has OPS+.

From OPS also came VORP, or Value Over Replacement Player, which took the base production from the worst position player in baseball and compared it to other players. Also came wRC, EQA, wOBA, and other stats. Then Tom Tango, a very awesome guy, assembled all of the data from hits and runs in baseball history and came up with exact values for each type of offensive outcome. That’s basically where we are today.

I use all of these, but my favorite for individual analysis is “slash stats,” which is just batting average, on-base and slugging organized like this: AVG/OBP/SLG. OBP is the most important of the three, but slashes are my favorites because you look at the raw data and they give a pretty complete picture of the batter’s qualities. For example:

2009 Ichiro Suzuki: .352/.386/.465
2009 Ryan Braun: .320/.386/.551
2009 Albert Pujols: .327/.443/.658
2009 Ryan Howard: .279/.360/.571
2009 Nyjer Morgan: .307/.369/.388

As you can see, Ichiro is very good at getting base hits, but his lack of patience puts him even with Ryan Braun in on-base percentage. Braun’s slugging percentage makes him a more valuable player.

Then you have Albert Pujols, the consummate professional. Everything about him is perfect. He already has an excellent batting average, but compounded with his elite patience (any ISOD higher than .100 is usually elite), he has one of the best on-base percentages in the game. Add in his incredible slugging and you’ve got the best player in baseball, bar none.

Likewise, Ryan Howard has very good patience, but a lack-luster batting average puts him in above-average territory. However, he still has a better slugging than Ryan Braun.

And then last we have Nyjer Morgan, who’s surprisingly better at getting on-base than Howard, but has very little power. Howard is a higher value player because of their difference in power.

Now just one final note: Runs Batted In is the worst cited statistic. RBIs are a result of inherited runners on base more than they are of the batter’s ability. Theoretically, yes, a batter with a higher batting average and/or slugging will get more RBIs, but they’re very unfair to good hitters in bad line ups. There have been a lot of damn seasons in history where a hitter had a better than average offensive season and still had 60 or fewer RBIs.

This is a good starting point. From here, you can read more at the Saber Library or at Beyond the Box Score.


Filed under baseball stats for beginners, MLB