Home Opinions & Editorials Industry News MLB migrates to Google BigQuery data warehouse

MLB migrates to Google BigQuery data warehouse


Major League Baseball (MLB) has announced its migration to Google BigQuery from its Teradata-based on-premise servers. The migration has started back in May 2019 and was finally completed in November 2019 without any major issues. According to the organization, the migration has brought numerous advantages for MLB including ease of its data structure, lower costs, better performance, and lower operational overheads.

The migration comes after the organization had conducted a technical evaluation back in 2018 and felt the need to move toward a far more efficient system. MLB found the answer in Google BigQuery and Google delivered on every promise. MLB has now finally made the switch from aforementioned Teradata-based servers. Robert Goretsky, the vice president of data engineering at MLB, even detailed the procedure of how the organization ensured that it migrates to Google Cloud without any significant impact on its daily operations.

Robert Goresky said, “with BigQuery’s on-demand pricing model, we were able to run side-by-side performance tests with minimal cost and no commitment. These tests involved taking copies of some of our largest and most diverse datasets and running real-world SQL queries to compare execution time. As MLB underwent the migration effort, BigQuery cost increased linearly with the number of workloads migrated. By switching from on-demand to flat-rate pricing using BigQuery Reservations, we are able to fix our costs and avoid surprise overages (there’s always that one user who accidentally runs a ‘SELECT * FROM’ the largest table), and share unused capacity with other departments in our organization, including our data science and analytics teams.”

On the performance side, Goretsky said MLB can now complete queries on its data 50% faster than before. Besides, BigQuery can do complex questions that Teratada cannot handle.

Why Does MLB Need Google Cloud?

At MLB, the data generated by fans’ digital and stadium transactions and interactions allows them to quickly state product features, customize content and offers, and ensure that fans are connected to their sport. The Fan Data Engineering team at MLB is responsible for managing 350+ databases to absorb data from third-party and internal sources and centralize it into an Enterprise Database Warehouse (EDW).

Suggested Read –

EDW focuses on directing data-related initiatives from internal production, marketing, finance, ticketing, shop, analysis and data science, and all 30 MLB clubs. Examples of these efforts include:

  • Customizing articles on MLB.com for fans, based on their favorite teams.
  • Communicating information to the fans about the games they plan to attend.
  • Generating revenue estimates and rate analysis for MLB.tv subscribers.
  • Creating ML models to predict future fan purchase behavior.
  • Sharing fan transaction and engagement data from Central MLB to the other 30 MLB clubs to allow clubs to make informed local decisions.

How Does Google Cloud Migration Help MLB?

The migration to Google Cloud has enabled MLB to derive more beneficial insights from its data. Goretsky further explained that MLB uses Google’s BigQuery data transfer service to integrate its data with services such as Google Ads, Google’s application development framework, Google Campaign Manager, and Firebase. It also integrates with Looker, a Google-owned business intelligence tool used by MLB.

Suggested Read –

On the business side, MLB can now generate more accurate revenue projections, churn out rate analytics for MLB.tv subscribers, and create new machine learning models that can predict the future buying behaviour of fans. Finally, BigQuery MLB’s fan transaction and engagement data are shared with each of its clubs, enabling them to gather additional information about their businesses.

Akarshan Narang
Akarshan Narang
Covering the world of Cloud at CMI.


Cloud Management