Introducing BWRI: an index to choose a good baseball game to rewatch


I really enjoy rewatching games during the offseason, so I set up an index that helps to choose which game to rewatch without knowing anything else than the teams and the day. Using R, I arranged an algorithm that takes account of changes in win probability during the game to make it possible to bring out exciting games to watch.

The code takes also account of good pitching, no-hitter situations, walk-offs, and rivalry, to get a mix of different kinds of interesting games. BWRI is a percent rank, so 1 is the most exciting game detected by the algorithm and 0 the worst. BWRI includes seasons from 2011 to 2020. You can dig on BWRI in list mode if you want, but my suggestion is to use the random mode, filtering for games with a score > 0.95, which is basically as if you get the best game for a random day of the season. BWRI doesn’t take account of season context, you can choose just postseason games to be shown though.

Extended version

Offseasons are too long. I’m not much into the hot stove thing, so usually, I spend time watching games from the past season, eager to discover relievers or just having fun with exciting games. It’s not half the fun if you know in advance the outcome of the game, but this is not hard since there are more than 2,400 games in an MLB normal regular season. Sometimes I watch random games, but then I found, and that saved me during the spring lockdown. Unfortunately, the website hasn’t been updated for the 2020 season.

It was then that I thought if it would be possible to make an index to evaluate how much a game is worth to rewatch, just using the play by play stats of the game. Using Retrosheet data and the knowledge of ‘Exploring Baseball Data with R‘ I could easily get ready the basic tool to calculate the index: Win Probability Added (WPA) play by play, which is how the probability to win a game changes play after play. That’s the main tool used to create what I call Baseball Worth Rewatch Index (BWRI), there are other things I took into account though.

Total WPA

The first I thought was, If I add the absolute WPA values of every play in a game, the highest figures will point me to exciting games. Games that switched from the hands of one team to the other several times during the game. Drama, leverage situations, and entertainment, especially in the late innings, when a change in the scoreboard cashes a higher value of WPA. So the first factor of BWRI is Total WPA, nevertheless, I made some adjustments to it.

The main problem was the “noise” produced by non-important plays. In close games, those could make a huge difference at the end. A boring game decided at the 8th got too much score, so I decided to use just the plays that changed the probability of winning 8% or more.

Two recent examples of Total WPA from the last World Series. Game 1, Dodgers score 2 runs in the 4th and 4 more in the 5th, so the game gets a low Total WPA. On the other hand, game 4 gets the highest play-off Total WPA value from the last 10 years.

And as a curiosity, here you have the game with the most added WPA since 2011. It’s very recent, late in the 2020 season Atlanta wins Boston with two extra innings. Takes the lead with 3 runs at the eight, Boston ties the game with 2 runs in the ninth, each team scores two runs in the tenth, Boston adds one more in the eleventh, but Atlanta finally wins with two more and a walkoff. Wow. And nobody in the stands.

Unexpected outcome (UO)

It’s great when the team that has been losing most of the time wins at the end. I love that kind of games, and usually those have low added WPA, because during most part of the game the in the end winning team is trailing by a 2 or more run margin. In the beginning I though that would be as easy as average the probability of winning, and subtract it from the final outcome. That turned out to fall short considering my purpose, so I decided to flag winning probability from seventh inning to the end, and in last two extrainnings, to detect late changes. The final OU is a summation of all that.


It’s not all about runs and action, good pitching games are really enjoyable. Pitching here is evaluated in two simple ways. First of all how many Ks per inning there is in a game, and how close it is to a no-hitter. So from games that get to the 7th with a no hitter to no-hitter games, all of those get extra points. Would be really nice to take account of low probability catches too, unfortunately, I think this data is not available on a game basis.

Close games (CG)

As we love UO, it’s also nice when the game is very balanced and matched. In that case, got the difference between the win probability and 0.5 in every plays, and summate all of them. Low values mean close games. I thought of it as a main factor in the beginning, but then decided to keep it low, since it spoiled the BWRI score of some exciting games.


There’s extra points too for a walk-off game, most of them already grade high on the main indicators, but that extra push helps to highlight happy endings for home teams.


Finally, I added some extra points for games with rivalry, for that purpose I use data from

Other features

I decided to grade Added WPA and UO on a Z score, so very unbalanced games not only make no addition but even subtract value on BWRI. On the other hand, assumed good pitching and all other factors should be just positive values and not decrease the final score. BWRI doesn’t take account of season context, you can choose just postseason games to be shown though, or filter by month.

If you’re asking which is the weight of every factor on the final BWRI score, the answer is that for the top 500 games, Added WPA weights about 33% of the score, Unexpected Outcome 32%, Ks factor 17%, close games 8%, walk-off 7%, rivalry is accountable for just slightly more than 2%, and no-hitter factor for slightly less than 2%.

Here are the two main factors of BWRI, only with games from the last 10 postseasons. Notice that most of the games have some Added WPA, but many have no Unexpected Outcome rate. I’ve highlighted some of the games that get a high score. The most isolated dot is the wild end of 2020 World Series Game 4. 2011 Word Series Game 6 makes a difference too, you’ll probably remember the game if I tell you a name: David Freese. On the side of UO the most remarkable according to BWRI is the ninth-inning comeback of Oakland on game 4 of 2012 ALCS, just followed by the Cubs comeback on game 4 of 2016 NLDS. World Series Game 7 from 2016 gets a BWRI over 0.8 too, but rates second most exciting game of that Cubs road to the title.

To evaluate win probability play by play I used the method suggested by Max Marchi, Jim Albert and Benjami S. Baumer.

Thanks for reading. Sorry for my English. I hope you enjoy using BWRI and rewatching baseball. Don’t hesitate on leaving a comment or getting in touch for any comments or suggestions.

Autumn is not disappearing but has been moved on the calendar

Summer lasts longer. November, December, and January have more mild days than before

You’ve probably heard or chatted recently about the topic “the Autumn is disappearing”. People talk about that here in Barcelona. After a mild October, suddenly a cold snap brought the winter for the first time. The feeling is that this is more and more common as years go by, but, is really the Fall threatened with extinction? Data rather suggest that it’s been pushed and it’s leaving the October. >>> “Autumn is not disappearing but has been moved on the calendar”

Five last minutes are a mine of points for Real Madrid

FC Barcelona would have won two more championships considering the games to end at minute 85

If you’re a Barça fan this should sound very familiar to you: it’s Sunday in the afternoon, you were at the movies. As you leave the theater and switch on the cell phone you get some messages about Real Madrid’s game. Real isn’t winning and there are just 20 minutes left. Don’t trust, but step into a bar to watch the rest of the game though. Let’s go, maybe there’s some good news on the way. 75 minutes and still a tie game, 80 minutes, 85 minutes, almost… but in the end Real Madrid scores, and you go home upset because of a game you were not supposed to have watched. Among the most common Barça supporters mantra’s, there is the one which says that Real Madrid scores last-gasp winners very often. What truth is there in this complaint?
>>> “Five last minutes are a mine of points for Real Madrid”

Why it’s best to bet for underdogs

The commission that bookmakers make you pay for betting it’s quite different depending on the chances of win

An easy search on the net is enough to find thousands of sources making reference to how bookmakers do to earn money, regardless of the outcomes in sports events. They do that in many different ways, but the most basic is simple to understand: they charge a kind of commission that it’s already included on the odds they offer you to win. >>> “Why it’s best to bet for underdogs”

The field factor and the referee’s influence

The referees award almost the same fouls to home teams as to away ones, but away players are sent off more easily

Finished the previous post telling about one of my reference books here in this blog: Scorecasting: The hidden influences behind how sports are played and games are won. One of the studies that the book mentions was made by two Spanish economists that in 2005 set out to see how peer pressure affects human decisions. Luis Garicano and Ignacio Palacios-Huerta counted the extra minutes added by referees in the Spanish league, taking into account the result of the score in the 90th minute. >>> “The field factor and the referee’s influence”

Playing at home is no longer the advantage it used to be

A study of every match of the 5 most important European football leagues since 1970 up to nowadays. Overall, at the end of the 70’s teams retained almost 70% of home points, whereas in the last seasons this figure has come down even below 60%

To start this blog site I get back an article I published on the newspaper ARA on august 2015. A study about the scores of the 5 main European football leagues, based on a database I made up using datasets from I’ve split it into two halves. Here comes the first one: >>> “Playing at home is no longer the advantage it used to be”