Stat Guy tracking
  •  What is NBA trak?
  •  PROFITS standings
  •  WAMMERS rankings

NBA data links
  •  Hoops Stats
  •  82 Games
  •  Doug's Stats

  •  Previous scores
  •  Statistics
  •  Odds
  •  Teams
  •  Matchups
  •  Transactions
  •  NL standings
  •  NL schedule
  •  NL statistics
  •  NL rotations
  •  NL injuries
  •  NL previews
  •  NL game capsules
  •  AL standings
  •  AL schedule
  •  AL statistics
  •  AL rotations
  •  AL injuries
  •  AL previews
  •  AL game capsules
  •  Pitching report cards
  •  Probable pitchers
  •  Teams

Thursday, March 31, 2005

First annual PROFITS projections

What were you doing in January and February? Well, let me tell you what I was doing...

One night, not long after the dawn of the new year, I was sitting in a Midtown watering hole messing around with some fielding statistics. Yes, fielding statistics. I do things like that.

The numbers in question were last year's defensive efficiency statistics. Defensive efficiency record (DER) is a wonderful statistic - so simple yet so revealing. And powerful. I still contend that the 2002 Anaheim Angels won the World Series by sprouting the wings of a soaring defensive efficiency record.

One of the problems with DER is that it's strictly a team statistic. While we can assume that the individual fielders on a team with an outstanding DER have done a good job (duh!), the stat provides no easy way to divide up the credit.

There are two defensive statistics in baseball that measure approximately the same thing. DER measures the percentage of balls-in-play that are converted into outs by a team. Zone-rating, meanwhile, makes the same measurement at the individual level: for each defender, we count up the number of opportunities from balls hit into his "zone" and determine out many of those were turned into outs.

Theoretically, if we could find a relationship between DER and zone-rating, then we could bridge the gap between team and individual. There are several problems associated with determining this relationship, not the least of which is the lack of raw data to support zone-rating, which is a proprietary statistic compiled by Stats, Inc.

To bring this discussion back around to the main point, I believe that there is enough visible evidence to justify using a compilation of individual player zone ratings to project an overall team DER. A full essay about how I developed my procedure to do this will be available on this site in short order.

For now, let's get back to me, in January, at my watering hole.

Once it occurred to me that it might be possible to take a swing at projecting DER for a team, an entire line of lockstepped dominos toppled over in my mind.

By using Voros McCracken's DIPS theory, in conjunction with a project team DER, a robust model for forecasting a team's runs allowed could be constructed. No, the hits allowed figures yielded by a DIPS system are not perfectly explained by a team's ability to turn balls-in-play into outs. I wish they were. But it's a good start.

Forecasting a team's offensive output for a season is a much more straightforward, if imperfect, process. So if you can combine a solid model of runs scored with a good runs allowed model that accurately accounts for the gloves behind the pitching staff, then - voila! You have yourself a team forecasting model.

I have called my team forecasting model the PROFITS system. You have to call it something. It stands for PROjected Fully-Integrated Team System. Yeah, I know, that's pretty contrived. But this is, after all, a highly derivative system.

Now, here's a quote from someone I admire:
"It is a very long and very difficult road from a fact to a conclusion. But it is a million times longer from a theory to a fact." - Bill James
PROFITS is a compilaton of theories. A tool for learning. I didn't have to spend 120 hours of my "free" time this winter building this model, compiling raw data and projecting each team, player-by-player. Others have done it better and with more sophisticated methods.

I am, after all, only a writer. But I'm also not somebody who feels comfortable merely piggybacking the work of others. Perhaps PROFITS won't become the ultimate tool in forecasting team performance. It almost certainly won't.

But this system is mine. And I have learned more about baseball in the last three months than I had in the last eight years - ever since I discovered Rob Neyer's work on which, in turn, led me back to the master himself, Mr. James.

By revealing my full methodology, testing out the many moving parts of the system, learning new approaches and - most of all - getting feedback from like-minded individuals, my hope is to make the model the best it can be and to learn as much about the inner dynamics of the baseball statistical universe as is possible.

This version of PROFITS is a beta version. By putting it into place now, the 2005 baseball season will turn into one long debugging process. Much of my work that you see on this site and in The Kansas City Star this season, will be based on what I learn from this process.

PROFITS. Get used to it. You'll be hearing about it alot.

And another thing...

There is another reason I wanted to build a model for team forecasting. One of the themes of my work last year was that each team needed a working model for runs scored and runs allowed as a central tool in making personnel decisions. There should be other factors involved, of course, but a statistical component should be the key part of the process.

There is a chapter in Michael Lewis' Moneyball, in which Billy Beane and Paul DePodesta are tackling the problem of replacing departed center fielder Johnny Damon. The chapter goes on to discuss how DePodesta used a system of derivatives which applied an expected run value to each event that takes place on the field - both offensively and defensively. He measured how many runs the presence of Damon meant the previous season. These were the net runs the A's needed to replace in order to maintain their excellent winning percentage.

Ever since I read that chapter, I've wondered how many teams use a statistical model to guide them. Not many, I suspect. Further, I wondered how such a model might work.

This year's A's have a position battle underway that demonstrates the value of a fully-integrated model. At second base, newly-acquired Keith Ginter is duking it out with Mark Ellis. Ginter has the better bat, Ellis the better glove.

I suspect that most statistcally-oriented writers would favor Ginter in this matchup. I also suspect that the others, who are more susceptible to conventional wisdom, would favor Ellis.

Using PROFITS, I was able to look at the issue both ways. The deciding factor is wins. With Ginter projected to get the majority of the playing time, the A's will score more runs and give up more. With Ellis, they will score fewer but save more. Which net effect is greater?

In this instance, according to PROFITS, the A's projected run differential improves by eight runs with Mark Ellis as the starting second baseman. The glove outweights the bat. That's a win, maybe two. I can you assure that, at this time last year, I would never have arrived at this conclusion.

In essence, I have constructed a simpler statistcal model that each team should rely upon in a more advanced form. Teams have better resources and better data than I have. And they can afford to hire smarter people.

So, now I have a model for each team in the big leagues. Every time they make a move, I'll be watching. Careful, fellas.


PROFITS generates a projected runs differential for each team. That's a good start to determining the likely final standings for a coming season.

The problem is that these run profiles exist in a sort of vacuum. While you can rank teams according to these figures, it's problematic to convert them into final standings. The profiles don't stand alone - they interact with each other in the form of the games we know and love.

So, to take PROFITS one step further, each team has its runs profile compared to the profiles of each team in its schedule. After some fits and starts, a modified Pythagorean formula was developed to generate a likely won-loss record based on each team's runs profile and the teams they actually play.

PROFITS is, as the name suggests, fully integrated. The player projection pages are interlinked with the team projection page. The team projection pages are interlinked with the overall MLB projection page, which makes these schedule-based calculations.

The result is that I can immediately see the effect a transaction or injury has on the overall projected standings. When Barry Bonds was injured, the Giants' record was hurt, obviously. But, at the same time, the record of every team the Giants played improved.

These changes are often sutble and small. It might take several moves for a team's projected record to change. But the change is in there, somewhere. There is, in essence, a butterfly effect for every player-related move that happens in baseball.

The caveat...

The final standings of the 2005 Major League Baseball season are not going to look exactly like the ones you see below. I think we all understand that.

Team rosters will change. Players will get injured. Some will be traded. Prospects that I haven't accounted for will rise to the big leagues and make a contribution. The playing time estimates used in this process will be obliterated by managerial whims.

Further, one season of big-league baseball isn't really a large sample size. If the rosters were all frozen, playing time was alloted exactly according to PROFITS guidelines, and the season played over and over again, say 100,000 times, then the final standings might well look much as you see them below.

But they don't play a season 100,000 times. They play it once. And in that one-season sample, almost anything can happen. And, really, would we want it any other way?

On the other hand, what we do have is a well-informed expectation of what will happen in this season. It's a starting point. And keeping track of where the method veers off the rails will only serve to make us smarter.

And finally...

As I mentioned, there are many gaps in my explanation of the PROFITS system. These will be filled in as the season goes along.

Let's jump to the end of what has been, for me, a very long road. Over the next few days, I'll be posting my team power rankings with commentary about each team's projection.

For now, I present to you the fruits of my labor. The first-ever PROFITS projected team standings.
(click on thumnail to enlarge)

NOTES: first off, thanks to for its image hosting service. The categories you see on the standings are, going across, wins, losses, winning percentage, games behind, runs scored, runs allowed, net runs, defensive efficiency record, strength (or power rating) and schedule factor (the higher the number, the tougher the schedule).


Blogger Kyle Hale said...

An additional extra might be for you to post the 25th and 75th percentile Win-Loss records for each team, since presumably the simulation curves for each team are not necessarily normal. A team with more ifs and buts might have a larger deviation which the straight projection average might be masking.

But an interesting analysis, and worth looking into further. Good job!

2:45 PM  
Blogger Steve said...

I am impressed by your efforts. Your standings provide an interesting comparison/contrast with the 2005 projections of the simulation game Diamond Mind (

4:55 PM  
Blogger fjm235 said...

I like your approach, but I question your Schedule Strength ratings. How can you say only one AL team has an easier than average schedule, and even theirs is only 1% below average? The flip side of that question, of course, applies to the NL teams, with only two of them showing a tougher than average schedule.

Take the AL Central, for example. Most people consider it to be the weakest division in MLB, and your own W-L projections seem to confirm that: only one team with a winning record, and they have the worst record of any divisional champion. Yet these teams, who get to play nearly half their games against each other, earn an average SKD of 1.055. Compare that with the NL East: 3 teams well above .500 and another only one game below it, yet their SKD is .976. Please check this out.

10:39 AM  

Post a Comment

<< Home