Acquiring and Analyzing Baseball Data (2024)

The package consists of two main sets of functions: data acquisition and metric calculation.

For example, if you want to see the standings for a specific MLB division on a given date, you can use the bref_standings_on_date() function. Just pass the year, month, day, and division you want:

library(baseballr)library(dplyr)bref_standings_on_date("2015-08-01", "NL East", from = FALSE)
## ── MLB Standings on Date data from baseball-reference.com ─── baseballr 1.5.0 ──## ℹ Data updated: 2023-12-25 02:24:44 EST## # A tibble: 5 × 8## Tm W L `W-L%` GB RS RA `pythW-L%`## <chr> <int> <int> <dbl> <chr> <int> <int> <dbl>## 1 WSN 54 48 0.529 -- 422 391 0.535## 2 NYM 54 50 0.519 1.0 368 373 0.494## 3 ATL 46 58 0.442 9.0 379 449 0.423## 4 MIA 42 62 0.404 13.0 370 408 0.455## 5 PHI 41 64 0.39 14.5 386 511 0.374

Right now the function works as far as back as 1994, which is when both leagues split into three divisions.

You can also pull data for all hitters over a specific date range. Here are the results for all hitters from August 1st through October 3rd during the 2015 season:

data <- bref_daily_batter("2015-08-01", "2015-10-03") data %>% dplyr::glimpse()
## Rows: 764## Columns: 30## $ bbref_id <chr> "machama01", "duffyma01", "altuvjo01", "eatonad02", "choosh01…## $ season <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…## $ Name <chr> "Manny Machado", "Matt Duffy", "José Altuve", "Adam Eaton", "…## $ Age <dbl> 22, 24, 25, 26, 32, 21, 27, 28, 36, 28, 29, 29, 27, 29, 27, 2…## $ Level <chr> "Maj-AL", "Maj-NL", "Maj-AL", "Maj-AL", "Maj-AL", "Maj-AL", "…## $ Team <chr> "Baltimore", "San Francisco", "Houston", "Chicago", "Texas", …## $ G <dbl> 59, 59, 57, 58, 58, 58, 59, 58, 59, 57, 55, 57, 57, 58, 56, 5…## $ PA <dbl> 266, 264, 262, 262, 260, 259, 259, 258, 257, 257, 255, 255, 2…## $ AB <dbl> 237, 248, 244, 230, 211, 224, 239, 235, 231, 233, 213, 218, 2…## $ R <dbl> 36, 33, 30, 37, 48, 35, 32, 29, 37, 27, 50, 37, 36, 25, 38, 4…## $ H <dbl> 66, 71, 81, 74, 71, 79, 54, 66, 75, 48, 65, 56, 61, 51, 78, 5…## $ X1B <dbl> 43, 54, 53, 56, 47, 51, 34, 37, 48, 30, 34, 32, 35, 33, 66, 2…## $ X2B <dbl> 10, 12, 19, 12, 14, 17, 6, 17, 16, 11, 13, 13, 15, 10, 7, 13,…## $ X3B <dbl> 0, 2, 3, 1, 1, 4, 1, 0, 2, 1, 2, 4, 0, 1, 3, 0, 4, 0, 1, 1, 0…## $ HR <dbl> 13, 3, 6, 5, 9, 7, 13, 12, 9, 6, 16, 7, 11, 7, 2, 20, 9, 8, 8…## $ RBI <dbl> 32, 30, 18, 31, 34, 32, 27, 40, 53, 21, 50, 19, 31, 39, 23, 4…## $ BB <dbl> 26, 15, 10, 23, 39, 18, 16, 17, 21, 21, 34, 33, 21, 39, 12, 3…## $ IBB <dbl> 1, 0, 1, 1, 1, 0, 0, 6, 1, 1, 0, 1, 1, 5, 0, 4, 3, 3, 7, 2, 2…## $ uBB <dbl> 25, 15, 9, 22, 38, 18, 16, 11, 20, 20, 34, 32, 20, 34, 12, 35…## $ SO <dbl> 42, 35, 28, 55, 51, 38, 68, 56, 29, 53, 46, 62, 41, 48, 27, 7…## $ HBP <dbl> 2, 0, 4, 5, 8, 1, 3, 5, 1, 1, 2, 3, 3, 1, 1, 6, 1, 3, 4, 1, 0…## $ SH <dbl> 0, 0, 1, 2, 1, 11, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, …## $ SF <dbl> 1, 1, 3, 2, 1, 5, 1, 1, 4, 2, 5, 1, 2, 2, 3, 0, 3, 2, 3, 4, 3…## $ GDP <dbl> 5, 9, 6, 1, 1, 4, 2, 2, 9, 7, 5, 1, 4, 8, 1, 2, 3, 10, 5, 4, …## $ SB <dbl> 6, 8, 11, 9, 2, 10, 0, 0, 0, 3, 3, 4, 5, 4, 24, 2, 1, 0, 6, 0…## $ CS <dbl> 4, 0, 4, 4, 0, 2, 0, 0, 0, 1, 0, 1, 3, 2, 7, 2, 3, 0, 2, 0, 0…## $ BA <dbl> 0.279, 0.286, 0.332, 0.322, 0.337, 0.353, 0.226, 0.281, 0.325…## $ OBP <dbl> 0.353, 0.326, 0.364, 0.392, 0.456, 0.395, 0.282, 0.341, 0.377…## $ SLG <dbl> 0.485, 0.387, 0.508, 0.448, 0.540, 0.558, 0.423, 0.506, 0.528…## $ OPS <dbl> 0.839, 0.713, 0.872, 0.840, 0.996, 0.953, 0.705, 0.848, 0.906…

In terms of metric calculation, the package allows the user to calculate the consistency of team scoring and run prevention for any year using team_consistency():

team_consistency(2015)
## # A tibble: 30 × 5## Team Con_R Con_RA Con_R_Ptile Con_RA_Ptile## <chr> <dbl> <dbl> <dbl> <dbl>## 1 ARI 0.37 0.36 17 15## 2 ATL 0.41 0.4 88 63## 3 BAL 0.4 0.38 70 42## 4 BOS 0.39 0.4 52 63## 5 CHC 0.38 0.41 30 85## 6 CHW 0.39 0.4 52 63## 7 CIN 0.41 0.36 88 15## 8 CLE 0.41 0.4 88 63## 9 COL 0.35 0.34 7 3## 10 DET 0.39 0.38 52 42## # ℹ 20 more rows

You can also calculate wOBA per plate appearance and wOBA on contact for any set of data over any date range, provided you have the data available.

data %>% dplyr::filter(PA > 200) %>% woba_plus %>% dplyr::arrange(desc(wOBA)) %>% dplyr::select(Name, Team, season, PA, wOBA, wOBA_CON) %>% dplyr::glimpse()
## Rows: 117## Columns: 6## $ Name <chr> "Edwin Encarnación", "Bryce Harper", "David Ortiz", "Joey Vot…## $ Team <chr> "Toronto", "Washington", "Boston", "Cincinnati", "Baltimore",…## $ season <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…## $ PA <dbl> 216, 248, 213, 251, 253, 260, 245, 255, 223, 241, 223, 259, 2…## $ wOBA <dbl> 0.490, 0.450, 0.449, 0.445, 0.434, 0.430, 0.430, 0.422, 0.410…## $ wOBA_CON <dbl> 0.555, 0.529, 0.541, 0.543, 0.617, 0.495, 0.481, 0.494, 0.459…

You can also generate these wOBA-based stats, as well as FIP, for pitchers using the fip_plus() function:

bref_daily_pitcher("2015-04-05", "2015-04-30") %>%  fip_plus() %>%  dplyr::select(season, Name, IP, ERA, SO, uBB, HBP, HR, FIP, wOBA_against, wOBA_CON_against) %>% dplyr::arrange(dplyr::desc(IP)) %>%  head(10)
## ── MLB Daily Pitcher data from baseball-reference.com ─────── baseballr 1.5.0 ──## ℹ Data updated: 2023-12-25 02:27:52 EST## # A tibble: 10 × 11## season Name IP ERA SO uBB HBP HR FIP wOBA_against## <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 2015 Johnny Cueto 37 1.95 38 4 2 3 2.62 0.21 ## 2 2015 Dallas Keuchel 37 0.73 22 11 0 0 2.84 0.169## 3 2015 Sonny Gray 36.1 1.98 25 6 1 1 2.69 0.218## 4 2015 Mike Leake 35.2 3.03 25 7 0 5 4.16 0.24 ## 5 2015 Félix Hernández 34.2 1.82 36 6 3 1 2.2 0.225## 6 2015 Corey Kluber 34 4.24 36 5 2 2 2.4 0.295## 7 2015 Jake Odorizzi 33.2 2.41 26 8 1 0 2.38 0.213## 8 2015 Josh Collmenter 32.2 2.76 16 3 0 1 2.82 0.29 ## 9 2015 Bartolo Colón 32.2 3.31 25 1 0 4 3.29 0.28 ## 10 2015 Zack Greinke 32.2 1.93 27 7 1 2 3.01 0.24 ## # ℹ 1 more variable: wOBA_CON_against <dbl>
Acquiring and Analyzing Baseball Data (2024)

FAQs

How data analysis is used in baseball? ›

Sabermetrics is a science of sport. It is the empirical analysis of baseball through statistics, used to predict the performance of players, giving teams a winning edge. With the help of sabermetrics, teams can: Forecast results by making predictions based on previous data.

How to get baseball data? ›

Browsing Baseball Data
  1. ESPN.
  2. MLB.com.
  3. Baseball-Reference.com.
  4. Retrosheet.org.
  5. BaseballProspectus.com.
  6. FanGraphs.com.
  7. Baseball-Almanac.com.
  8. TheBaseballCube.com.

How to get into baseball analytics? ›

To become a baseball data analyst, you typically need a bachelor's degree in math, statistics, or a related field. Sports Management Worldwide offers eight-week courses on sports analytics, allowing you to focus on baseball and learn the software and tools currently used for professional teams.

Is analytics good for baseball? ›

It is impossible to exaggerate the significance of analytics in modern sports. It's the game inside the game, a painstaking analysis of each play, each stat, each heartbeat on the field. Analytics is about more than just analyzing data; it's about understanding the complex dance between performance and strategy.

How is data analysis used in sports? ›

Sports analytics is the process of plugging statistics into mathematical models to predict the outcome of a given play or game. Coaches rely on analytics to scout opponents and optimize play calls in games, while front offices use it to prioritize player development.

What software is used in baseball analytics? ›

DakStats Baseball statistics software is ideal for tracking game, season and career stats from high schools to professional stadiums. DakStats Baseball records both baseball and softball stats and is the preferred statistical software in most MLB facilities.

What technology is used in baseball? ›

Radar systems and cameras work to monitor and analyze every pitch made. This provides key data on pitch velocity, trajectory, and spin. Many players and teams love the ability to refine their techniques by reviewing their pitches. They can alter their grip and release points to improve their performance.

What is the formula for baseball statistics? ›

The calculation is total bases divided by official at-bats. The math equation looks like this: 1B + 2B(2) + 3B(3) + HR(4)AB. The purpose behind the statistic is to gauge how many bases a player will average each at bat. 4.000 SLG is a homerun each at bat. .500 SLG means the batter averages one base every two at bats.

Who is the official data provider for MLB? ›

Sportradar is the Official Provider of real-time MLB statistics. The data collection comes direct from the MLB operations teams on-venue. This provides lightning speed and the highest-quality stats available to power your baseball experiences. All MLB games – including Spring Training – feature full coverage.

How do I get into sports data analytics? ›

Requirements and Qualifications
  1. Bachelor's degree in statistics, data science, sports science, or a related field (Master's degree preferred).
  2. Proven experience as a Data Analyst, preferably in the sports industry.
  3. Proficiency in data analysis tools and programming languages, such as Python, R, SQL, or similar.

Which MLB team uses analytics the most? ›

By their assessment, the Tampa Bay Rays, New York Yankees, and Los Angeles Dodgers were most dedicated to analytics (see Figure 1).

What degree do you need for baseball analytics? ›

Sports Statisticians usually have at least a bachelor's degree in some combination of mathematics, statistical analysis and computer science.

How does MLB use data analytics? ›

The Era of Data-Driven Baseball

Teams now collect an extensive array of data, including player performance metrics, pitch tracking, defensive shifts, and more. This wealth of information is meticulously analyzed by data scientists and analysts to provide actionable insights and predictions.

What is the most important baseball statistics? ›

Traditionally, statistics such as batting average (the number of hits divided by the number of at bats) and earned run average (the average number of earned runs allowed by a pitcher per nine innings) have dominated attention in the statistical world of baseball.

How much do MLB sports analytics make? ›

Total Salary Range for Major League Baseball (MLB) Analyst

The estimated total pay range for a Analyst at Major League Baseball (MLB) is $65K–$110K per year, which includes base salary and additional pay. The average Analyst base salary at Major League Baseball (MLB) is $77K per year.

How are statistics used in baseball? ›

General managers and baseball scouts have long used the major statistics, among other factors and opinions, to understand player value. Managers, catchers and pitchers use the statistics of batters of opposing teams to develop pitching strategies and set defensive positioning on the field.

How many MLB teams use analytics? ›

Now, however, every professional baseball team has an analytics department and uses data to inform its decision-making.

How much data does a baseball game use? ›

Data consumption is about 1 GB of data per hour when streamed on a smartphone, and up to 3 GB per hour for each stream of HD video on tablet or connected device.

Is baseball based on statistics? ›

Baseball is a numbers game. All sports have statistics, but there is something about baseball that lends itself to being the perfect sport for number nerds everywhere. So much so there's even a specific name given to the study of baseball statistics: sabermetrics.

References

Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 5981

Rating: 4.8 / 5 (58 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.