Friday, December 21, 2007

A Flotsam data special: Tangiblizing the intangible

Chuck Dickens
Idiot Savant


After Tim McCarver’s month-long David Eckstein sploogefest that was October 2006, a serious investigation into 'grit' was long overdue. Despite the penchant of sportswriters and broadcasters to throw the term around willy-nilly, I was hard-pressed to locate a firm definition of grit in the baseball sense. Using lots of laptop science stuff, I think I’ve improved the definition, which isn’t really saying much, since there wasn’t one to begin with.

First, some definitions to help us focus in on what exactly this 'grit' stuff is.

Gritty
1. Containing, covered with, or resembling grit.
2. Showing resolution and fortitude; plucky: Biggio’s gritty 12-pitch at-bat ultimately resulted in a routine 6-3 groundout.

In keeping with those definitions I’m proposing a new composite statistic: General Requirements of Intangible Talent (GRIT). GRIT incorporates four basic components: dirt, determination, talent, and opportunity.

DATA
I used a modified version of the Sean Lahman dataset that includes player statistics from 1871-2006. My dataset includes player-seasons starting in 1955, the first year for which statistical data for intentional walks and GIDP was available. To chop the dataset down to a manageable size, I limited the number of eligible players to those who have at least 100 plate appearances and 81 games played. This removes pitchers from consideration, but also ensures that an adequate indication of a player’s abilities during each season is evident. Statistics for 2007 were compiled from ESPN.

The resulting dataset includes 13,249 player-seasons with 2,385 unique players represented.

HYPOTHESIS
I hold that gritty players are those who sincerely want to win or succeed at baseball (determination), but due to a lack of natural skill (talent), are forced to do so through the least efficient means possible, resulting in an excessive amount of dirt on their uniform.

DIRT
The most important factor in determining a player's GRIT is his uniform. A player who is "containing, covered with, or resembling grit" will show visible signs of his grittiness on his uniform. Dirty uniforms are good; bloody uniforms are better. A true team player, the gritty player is prepared to sacrifice his body at all costs. This is one of the few ways gritty players are efficient, since they probably aren’t as well compensated as their genuinely talented teammates.

The Dirt Formula

HBP: A hit batter produce minimal gains (one base) with relatively high costs in terms of potential bodily injury. The official colors of gritty players may well be black and blue. And red. And maybe some brown with a little purple and some yellow around the edges, depending on the severity of the bruising.

IBB: Next to home runs, intentional walks are probably the most anti-gritty statistic. Intentional walks are indicative that a player has so much talent that the pitcher would rather give him first base than risk an extra-base hit. Gritty players have to earn every base through hard-knocks, moxie, and a heaping helping of some good ol’ fashioned hustle.

CS/SB stuff (SBINEFF): This is a statistic I call Stolen Base Inefficiency (SBINEFF). This looks for players who like to attempt lots of steals but are largely unsuccessful. Stealing bases produces minimal gains (one base) but comes with greater potential costs by raising the likelihood of being thrown out. Base-stealers (successful or not) also have dirty uniforms from sliding.

DID YOU KNOW: Harold Reynolds holds the single-season record for SBINEFF with a stunning 13.385? Harold’s 1988 season saw him tally 35 steals while being caught 29 times. He broke the record set by Will Clark (13.304) during the previous season when Mr. Eyeblack went 5 for 22 on steal attempts. WOW!

DETERMINATION
Gritty players want to succeed. They just happen to not have the talent to actually do so. This results in inefficient baseball plays. For example, Jerry Hairston is gritty. He slides head-first into first base. A true sign of someone gritty enough to want to get to first base, but shitty enough to actually get there efficiently.

The Determination Formula

(Outs – SO): As short in stature as they are on talent, gritty players are determined to put the ball in play at all costs. Additionally, the ball looks gigantic to their tiny, elfin eyes and thus they’re less prone to striking out.

(BB+SH+SF): With their microscopic strike zones, gritty players generate walks (the unintentional ones) at a superhuman rate. Sacrificing oneself is an inefficient (read: gritty) method of moving runners along.

GIDP: Double plays are produced by well-struck balls that are able to cut through the infield grass. Aside from a bottle of hard liquor (eh, Mr. Furcal?) gritty players rarely hit anything well.

DID YOU KNOW: 2007 NL MVP Jimmy Rollins produced the sixth-highest season total of outs since 1955? He probably owes a fair share of his award to a trail-blazing fellow Phillie middle-infielder who set a precedent. Juan Samuel, in 1984, produced the second highest number of outs on his way to earning a tie for 21st place in the MVP voting and 2nd in the NL ROY. HOOCHIEMAMA!

DID YOU ALSO KNOW: Pete Rose has only the second highest season total of determination. The real "Charlie Hustle" is actually a "Dick." Dick Howser, that is. Howser’s 1964 season slightly edges out Pete’s numbers from 1974. CRACKER JACK!

WHATCHUKNOWABOUTTHISHERE: Dick Howser’s phone listing reads as "Howser, Dick." This tidbit is worth a few laughs given the right delivery, set-up, and audience. SHABANG!


Talent – It is my contention that "grittiness" is a subset of talent that cannot translate well statistically. Two players may very well have the same raw amount of grit, but one player may have more tangible talent, making him appear less gritty because the grit is too diluted. Gritty players are those who have the largest concentration of grit. As such, too find the grittiest players, we should look for players who have as little tangible talent as possible.

The Talent Formula

XBH: Extra base hits are über-efficient ways of getting multiple bases.

RBI: Gritty players move runners over, but aren’t talented enough to drive them in.

TB: Total bases is an additional means of counting the overall ability of a player.

(OMS*1000): OMS (OBP minus SLG) is a proprietary statistic I developed for use in GRIT. It rewards players who reach base, but deprecates players who have the talent to get extra bases.

OPPORTUNITY
In order for a player to become gritty, they first need to be on the field. In the words of Ted Williams, "Nobody ever became a .400 hitter without taking the bat off their shoulder." To apply the quote more appropriately here, one might attribute it to Willams' quasi-gritty teammate, Milt Bolling, and change it to read "Nobody ever became a .250 hitter by getting splinters in their ass.” We simply use plate appearances as a representation of opportunity.

After calculating the four GRIT component values for each player-season, the resulting values are then plugged in to this equation:

(Dirt + Determination – Talent) / Opportunity

However, each component has a different scale relative to the others, so I experimented with normalizing the values. This can be accomplished by calculating an average and a standard deviation for the dirt, determination, talent, and opportunity scores of all the player-years.

Basic Normalization Formula

This was applied for each of the basic components across all player-seasons. The rationale for normalizing this data is to remove as much bias as possible from the process. As each of the four basic components creates a different range of values, some sort of weighting would be necessary to produce a meaningful list. Normalization automatically weights the components by determining how far a given player-season is above or below the average of all player-seasons.

RESULTS
Across 13,249 player-seasons, the data appears to have a relatively normal distribution. The data shows a range of about -50 to +50 with one outlier at -90.011 (see below), and a mean and median extremely close to 0. These numbers are promising for the prospects of GRIT as a statistic, as they suggest that the average player is neither extremely gritty, nor extremely talented. The tails in the extreme positive end of the distribution should show the grittiest players, while talented players should appear in the negative tail.

Enough talk; bring on the numbers ...

The Top 50 Grittiest Season and the 25 Least Gritty Seasons

Rank

Year

Player

Team

GRIT

1

1971

Ron Hunt

MON

52.061

2

2002

David Eckstein

ANA

35.963

3

1968

Ron Hunt

SFN

34.901

4

1998

Fernando Vina

MIL

33.296

5

1996

Craig Biggio

HOU

32.251

6

1997

Craig Biggio

HOU

27.964

7

2002

Fernando Vina

SLN

27.687

8

2005

Jason Kendall

OAK

27.373

9

2001

Jason Kendall

PIT

27.018

10

1955

Nellie Fox

CHA

26.703

11

1986

Don Baylor

BOS

26.442

12

2003

Jason Kendall

PIT

26.319

13

2000

Fernando Vina

SLN

26.064

14

1999

Chuck Knoblauch

NYA

25.910

15

2003

Craig Biggio

HOU

25.743

16

2001

David Eckstein

ANA

25.423

17

1957

Nellie Fox

CHA

25.311

18

1975

Felix Millan

NYN

25.188

19

1967

Cesar Tovar

MIN

25.102

20

1969

Ron Hunt

SFN

24.829

21

1968

Cesar Tovar

MIN

24.692

22

2005

Brady Clark

MIL

24.659

23

1996

Eric Young

COL

24.635

24

1998

Chuck Knoblauch

NYA

24.558

25

2001

Craig Biggio

HOU

24.346

26

1997

Jason Kendall

PIT

23.913

27

2004

Jason Kendall

PIT

23.717

28

1998

Jason Kendall

PIT

23.617

29

1972

Ron Hunt

MON

23.580

30

2001

Fernando Vina

SLN

23.189

31

2004

Juan Pierre

FLO

23.028

32

1980

Ozzie Smith

SDN

22.815

33

1976

Don Baylor

OAK

22.419

34

2005

David Eckstein

SLN

22.402

35

1957

Minnie Minoso

CHA

22.188

36

1991

Brett Butler

LAN

21.874

37

1961

Nellie Fox

CHA

21.834

38

1970

Ed Brinkman

WS2

21.702

39

2006

Juan Pierre

CHN

21.334

40

1973

Ron Hunt

MON

21.142

41

2002

Melvin Mora

BAL

20.893

42

1980

Alfredo Griffin

TOR

20.875

43

1993

Mike Bordick

OAK

20.719

44

2005

Juan Pierre

FLO

20.615

45

1995

Craig Biggio

HOU

20.413

46

1990

Brett Butler

SFN

20.399

47

1959

Richie Ashburn

PHI

20.079

48

1993

Chuck Knoblauch

MIN

19.994

49

1993

Brett Butler

LAN

19.919

50

1984

Brett Butler

CLE

19.816






Rank

Year

Player

Team

GRIT

13234

1957

Ted Williams

BOS

-27.054

13235

2000

Sammy Sosa

CHN

-27.055

13236

1999

Mark McGwire

SLN

-27.905

13237

2006

Albert Pujols

SLN

-27.913

13238

1989

Kevin Mitchell

SFN

-31.490

13239

1998

Mark McGwire

SLN

-32.316

13240

1970

Willie McCovey

SFN

-34.094

13241

2001

Barry Bonds

SFN

-35.160

13242

2007

Ryan Howard

PHI

-35.452

13243

1969

Willie McCovey

SFN

-38.707

13244

2006

Ryan Howard

PHI

-38.898

13245

1993

Barry Bonds

SFN

-39.723

13246

2003

Barry Bonds

SFN

-42.087

13247

2001

Sammy Sosa

CHN

-42.659

13248

2002

Barry Bonds

SFN

-50.984

13249

2004

Barry Bonds

SFN

-90.011



50 All-Time Grittiest Players and the 15 All-Time Least Gritty Players

Rank


Full Name

CareerGRIT

Yrs

Yearly Avg

1

*

Craig Biggio

250.22

19

13.17

2


Ron Hunt

236.96

11

21.54

3

*

Jason Kendall

214.62

11

19.51

4


Nellie Fox

188.42

10

18.84

5


Brett Butler

187.26

15

12.48

6


Chuck Knoblauch

170.67

11

15.52

7

*

Omar Vizquel

165.22

17

9.72

8


Luis Aparicio

162.49

18

9.03

9


Bert Campaneris

154.16

15

10.28

10


Don Baylor

152.16

17

8.95

11

*

David Eckstein

146.58

7

20.94

12


Pete Rose

143.91

23

6.26

13


Maury Wills

142.30

13

10.95

14


Ozzie Smith

140.77

18

7.82

15


Rickey Henderson

137.94

23

6.00

16


Cesar Tovar

137.83

10