Projects > Stats Monkey

Stats Monkey

Imagine that you could push a button, and magically create a story about a baseball game.  That’s what the Stats Monkey system does.  Given information commonly available online about many games—the box score and the play-by-play—the system automatically generates the text of a story about that game that captures the overall dynamic of the game and highlights the key plays and key players.  The story includes an appropriate headline and a photo of the most important player in the game.

The system is based on two underlying technologies.  First, it uses baseball statistical models to figure out what the news is in the story: By analyzing changes in Win Probability and Game Scores, the system can pick out the key plays and players from any baseball game.  Second, the system includes a library of narrative arcs that describe the main dynamics of baseball games (as well as many other competitions): Was it a come-from-behind win?  Back-and-forth the whole way?  Did one team jump out in front at the beginning and then sit on its lead?  The system uses a decision tree to select the appropriate narrative arc.  This then determines the main components of the game story and enables the system to put them together in a cohesive and compelling manner.  The stories can be generated from the point of view of either team.

The applicability technology underlying the Stats Monkey system scopes across any sport or event in which the events produce significant quantitative data.    It also has applications in domains in which recurring story types that are primarily data-driven, including other kinds of sports stories and many kinds of business stories such as quarterly or annual earnings stories, market updates, and so on.  The Machine Generated Sports Stories system could be employed by news organizations or directly by organizations which wish to publish information about their activities, such as college sports teams or businesses.

Ultimately, the system can be extended to generate stories that include quotes from individuals or organizations involved in those stories (when those quotes are available online) as well as stories in different narrative styles for different audiences.

StatsMonkey is a joint project of the Medill School of Journalism and the McCormick School of Engineering at Northwestern University through the Center for Innovation in Technology, Media and Journalism.

Project Papers

  • StatsMonkey: A Data-Driven Sports Narrative Writer.