Plate Discipline and Strikeout and Walk Rate
Right about now everybody is running their big "What's wrong with the Nats?" columns, articles and blog posts. It's the standard narrative driven tripe that offers up a lot of clichés and little in the way of actual answers. That's incredibly boring and completely below us. So, in the immortal words of Monty Python, "and now for something completely different." One of the best compliments someone can give a hitter is to say that they have a good eye. Having a superior knowledge of the strike zone is a coveted attribute that not even every Major League hitter possesses. But how can we define having a good eye statistically? FanGraphs, through PITCHF/x data, tracks a number of statistics related to plate discipline that boil down to two main statistics: swing percentage and contact rate. For both stats there is a separate rate for pitches outside of the strike zone and pitches inside of the strike zone. So now we have some stats, but we have no context for what they mean. Sure it's nice that Player A swings at pitches outside of the strike zone at a below average rate, but does that actually affect his performance?
To answer that we turn to two familiar statistics strikeout rate and walk rate, the cornerstones of hitter evaluation. If we could use plate discipline data to predict in-season strikeout and walk rate we would have a valuable tool to see how hitters are performing at a more basic level. Similar research was done by Jeff Zimmerman at Beyond the Box Score in 2010. However, unlike Zimmerman's multiple variate regression, we'll do a simple linear regression looking into each statistic individually.
Just a couple of housekeeping notes before we get on with the show. First, for all of the below analyses I've used a sample of hitters from 2009-2014 with greater than or equal to 250 plate appearances with each season split. This should ensure a healthy sample of hitters, not just the very best who qualify for the batting title, while still giving a good sample size. Second, instead of using the standard walk rate I've created a custom walk rate that excludes intentional walks for a higher degree of accuracy. With that out of the way, let's take a look at the contact rate stats first.
To begin with here's three models on how contact rate affects walk rate.
Well that's a whole lot of nothing. There isn't a lick of correlation between any of these contact rates and walk rate in a negative or positive direction. I could add a linear regression trend line and report that the r-squared value is some miniscule number, but that's just wasting all of our time. There's no trend here. That's a disappointing start; let's move on to strikeout rate.
That's more like it. With r-squared values above .65 for all three rates, we can safely say there's a strong correlation between contact rate, both on pitches outside and inside the strike zone, and strikeout rate. This is a negative relationship, therefore the more contact a hitter makes, the lower his strikeout rate should be. That isn't some sort of earth shattering news when it comes to baseball, but it's nice to see statistical evidence back it up. But just knowing that there's a correlation isn't all that informative, we want to know how well this model can predict in-season strikeout rate.
To do so I'll take the stat with the strongest correlation, contact rate, and find the root mean square error of the predicted strikeout rates based on contact percentage and the actual strikeout rate of the hitter. This will give us a number in terms of strikeout rate that tells us how good of a fit this linear regression model is. The RMSE for the contact rate is 2.62%, which means contact rate is a fairly accurate predictor of in-season strikeout rate. Let's see if the swing rate stats can do any better in predicting strikeout rate.
The answer is a resounding no. Just like contact rate and walk rate, there's little to no correlation between swing rate and strikeout rate. Let's see if they do any better with walk rate.
Much better, like with contact rate and strikeout rate we've found a strong correlation between swing rate and walk rate. There's one key difference, instead of the overall swing rate having the best correlation, it is actually the swing rate on pitches outside of the strike zone. When we look at the graph for swing rate inside the strike zone the reason for this becomes immediately apparent, there is a weak correlation between the two. This suggests that not all pitches in the strike zone are equal, so swinging at pitches inside the strike zone does not necessarily mean a hitter is swinging at a hittable pitch. The rate at which a hitter swings at pitches in the strike zone has no tangible effect on either strikeout rate or walk rate.
Therefore, the best correlation to walk rate is with O-Swing%. Like with contact rate and strikeout rate this is a negative relationship, so the more a player swings at pitches outside of the strike zone, the less they walk. Again, not earth shattering news, but it is nice to have our notions confirmed statistically. But on to the important question, how well does O-Swing% predict in-season walk rate? The root mean square error for this model is 1.95%, so we can conclude that O-Swing% is a strong predictor of in-season walk rate.
What we've stumbled upon is what every hitting coach calls being patiently aggressive. Hitters should lay off pitches they cannot hit and when they do decide to swing they need to make contact. Interestingly enough being what's often called a free swinger doesn't have a negative effect on strikeout rate on its own. Rather, it's the lack of contact that is the issue, swinging and missing is much worse than not swinging at all. Now we have two stats that measure directly how a hitter performs that can give us a reasonable prediction of what a player's strikeout rate and walk rate should be in that season.
This is still a Nationals blog, so what does this all mean for the Nats in 2014? Here's two tables of strikeout rate and walk rate for Nats hitters with at least 50 plate appearances this year along with their predicted strikeout rate based on contact rate and their predicted walk rate based on swing rate on pitches outside of the strike zone and the difference between the two.
From these tables we can determine some candidates for regression. Kevin Frandsen is striking out at a much lower rate than he should be based on his contact rate, while Nate McLouth, Jose Lobaton and Tyler Moore are all striking out more. Many coaches and teammates have said that McLouth is not getting the results that he should be getting based on the contact he's making, and now we have some statistical evidence to back that claim up. On the walk rate side the Nats are doing about as well as expected. Only Danny Espinosa's walk rate differs substantially from his expected rate, although his expected rate is also low.
We've now added two new tools to our tool belt for evaluating hitters. Especially early in the season, when a hitter has had a small number of plate appearances, as he will have seen significantly more pitches in that time. In these situations, by checking his contact rate and swing rate at pitches outside of the zone we can get an early look at how a hitter can be expected to perform by season's end. At any point in the season contact rate and swing rate at pitches outside of the strike zone can lend valuable insight into a hitter's performance.