Fiscal Characteristics of Major US Public Transit Systems
Timbila Nikiema
Introduction
This report provides a comprehensive analysis of transit data from 2022, focusing on several key areas including fare revenues, expenses, transit usage, and performance across various agencies and modes of transportation in the United States. The tasks performed in the analysis explore trends in unlinked passenger trips (UPT), vehicle revenue miles (VRM), fare-box recovery ratios, and cost efficiency in terms of expenses per UPT. Data was collected from public sources, including the U.S. Department of Transportation, and processed using R to draw meaningful insights.
Data Overview
The data used in this analysis includes the following:
Fare Revenue Data: Contains information on fare revenue earned by various transit systems.
Expense Data: Contains financial information related to transit expenses.
Ridership Data: Contains monthly figures on unlinked passenger trips (UPT) and vehicle revenue miles (VRM) for different transit agencies and modes.
Loading and Cleaning FARES and EXPENSES Data The first step in the analysis was downloading the fare revenue and expense data-sets from the DOT website and cleaning them to create tables for further analysis.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Let's start with Fare Revenuelibrary(tidyverse)if(!file.exists("2022_fare_revenue.xlsx")){# This should work _in theory_ but in practice it's still a bit finicky# If it doesn't work for you, download this file 'by hand' in your# browser and save it as "2022_fare_revenue.xlsx" in your project# directory.download.file("http://www.transit.dot.gov/sites/fta.dot.gov/files/2024-04/2022%20Fare%20Revenue.xlsx", destfile="2022_fare_revenue.xlsx", quiet=FALSE, method="auto")}FARES <- readxl::read_xlsx("2022_fare_revenue.xlsx") |>select(-`State/Parent NTD ID`, -`Reporter Type`,-`Reporting Module`,-`TOS`,-`Passenger Paid Fares`,-`Organization Paid Fares`) |>filter(`Expense Type`=="Funds Earned During Period") |>select(-`Expense Type`) |>group_by(`NTD ID`, # Sum over different `TOS` for the same `Mode``Agency Name`, # These are direct operated and sub-contracted `Mode`) |># of the same transit modality# Not a big effect in most munis (significant DO# tends to get rid of sub-contractors), but we'll sum# to unify different passenger experiencessummarize(`Total Fares`=sum(`Total Fares`)) |>ungroup()
`summarise()` has grouped output by 'NTD ID', 'Agency Name'. You can override
using the `.groups` argument.
# Next, expensesif(!file.exists("2022_expenses.csv")){# This should work _in theory_ but in practice it's still a bit finicky# If it doesn't work for you, download this file 'by hand' in your# browser and save it as "2022_expenses.csv" in your project# directory.download.file("https://data.transportation.gov/api/views/dkxx-zjd6/rows.csv?date=20231102&accessType=DOWNLOAD&bom=true&format=true", destfile="2022_expenses.csv", quiet=FALSE, method="auto")}EXPENSES <- readr::read_csv("2022_expenses.csv") |>select(`NTD ID`, `Agency`,`Total`, `Mode`) |>mutate(`NTD ID`=as.integer(`NTD ID`)) |>rename(Expenses = Total) |>group_by(`NTD ID`, `Mode`) |>summarize(Expenses =sum(Expenses)) |>ungroup()
Rows: 3744 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Agency, City, State, NTD ID, Organization Type, Reporter Type, UZA...
dbl (2): Report Year, UACE Code
num (10): Primary UZA Population, Agency VOMS, Mode VOMS, Vehicle Operations...
lgl (7): Vehicle Operations Questionable, Vehicle Maintenance Questionable,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
`summarise()` has grouped output by 'NTD ID'. You can override using the `.groups` argument.
Vehicle Revenue Miles and Transit Trips We will now proceeded by joining ridership (UPT) and miles traveled (VRM) data, creating a comprehensive data set that combined these metrics by transit Agency and Mode.
# Download and clean the ridership datalibrary(tidyverse)if(!file.exists("ridership.xlsx")){download.file("https://www.transit.dot.gov/sites/fta.dot.gov/files/2024-09/July%202024%20Complete%20Monthly%20Ridership%20%28with%20adjustments%20and%20estimates%29_240903.xlsx", destfile="ridership.xlsx", quiet=FALSE, method="auto")}TRIPS <- readxl::read_xlsx("ridership.xlsx", sheet="UPT") |>filter(`Mode/Type of Service Status`=="Active") |>select(-`Legacy NTD ID`, -`Reporter Type`, -`Mode/Type of Service Status`, -`UACE CD`, -`TOS`) |>pivot_longer(-c(`NTD ID`:`3 Mode`), names_to="month", values_to="UPT") |>drop_na() |>mutate(month=my(month))# Download and clean the miles traveled dataMILES <- readxl::read_xlsx("ridership.xlsx", sheet="VRM") |>filter(`Mode/Type of Service Status`=="Active") |>select(-`Legacy NTD ID`, -`Reporter Type`, -`Mode/Type of Service Status`, -`UACE CD`, -`TOS`) |>pivot_longer(-c(`NTD ID`:`3 Mode`), names_to="month", values_to="VRM") |>drop_na() |>group_by(`NTD ID`, `Agency`, `UZA Name`, `Mode`, `3 Mode`, month) |>summarize(VRM =sum(VRM)) |>ungroup() |>mutate(month=my(month))
`summarise()` has grouped output by 'NTD ID', 'Agency', 'UZA Name', 'Mode', '3
Mode'. You can override using the `.groups` argument.
# Merge the trips and miles data, then rename the field UZA Name to metro_areaUSAGE <-inner_join(TRIPS, MILES) |>rename(metro_area =`UZA Name`)
By cleaning, transforming, and merging these data sets, we will now compute key metrics to assess transit system performance.
Results
Transit Agency with the Most Vehicle Revenue Miles (VRM)
The MTA New York City Transit recorded the highest total VRM in 2022 with 10,832,855,350 miles. Demonstrating the scale of New York City’s transit system compared to others.
# Summarize VRM by agency and find the agency with the highest total VRMagency_max_vrm <- USAGE |>group_by(Agency) |>summarize(total_VRM =sum(VRM, na.rm =TRUE)) |>arrange(desc(total_VRM))head(agency_max_vrm, 1) # View the top agency
# A tibble: 1 × 2
Agency total_VRM
<chr> <dbl>
1 MTA New York City Transit 10832855350
Transit Mode with the Most Vehicle Revenue Miles (VRM)
The Motorbus mode had the highest total VRM across all agencies, contributing 49,444,494,088 miles. This indicates that motorbuses are a heavily utilized mode of public transportation nationwide.
mode_max_vrm <- USAGE |>group_by(Mode) |>summarize(total_VRM =sum(VRM, na.rm =TRUE)) |>arrange(desc(total_VRM))head(mode_max_vrm, 1) # View the top Mode
In May 2024, the MTA New York City Transit’s subway system (Heavy Rail) transported 180,458,819 passengers. This highlights the ongoing high demand for subway services in New York City.
# Filter for NYC Subway, Heavy Rail, and May 2024nyc_subway_may_2024 <- USAGE |>filter(Agency =="MTA New York City Transit", Mode =="Heavy Rail", month ==as.Date("2024-05-01"))# Sum the UPT (Unlinked Passenger Trips) for May 2024total_trips_may_2024 <-sum(nyc_subway_may_2024$UPT, na.rm =TRUE)total_trips_may_2024 # Display the total trips
[1] 180458819
Agency with the Most Unlinked Passenger Trips (UPT)
The MTA New York City Transit once again led the nation with the highest total UPT, recording 69,101,730,780 trips. This underscores the significant role that New York City’s public transit plays in daily commuting.
# Let's find which Agency has the most active transit by finding the transit agency with the most number of tripsmost_active_agency <- USAGE |>group_by(Agency) |>summarize(total_UPT =sum(UPT, na.rm =TRUE)) |>arrange(desc(total_UPT))head(most_active_agency, 1)
# A tibble: 1 × 2
Agency total_UPT
<chr> <dbl>
1 MTA New York City Transit 69101730780
The next step of computing the key metrics will creating a table USAGE_2022_ANNUAL with only 2022 riders data. Then merge it to the FINANCIALS table. A quick view of the FINANCIALS table show that the Mode field from both tables is different. One showing the definitions and the other showing the codes. We needs to re-code to match the values in USAGE_2022_ANNUAL.
`summarise()` has grouped output by 'NTD ID', 'month', 'Agency', 'metro_area'.
You can override using the `.groups` argument.
USAGE_2022_ANNUAL
# A tibble: 256,803 × 7
`NTD ID` month Agency metro_area Mode UPT VRM
<chr> <date> <chr> <chr> <chr> <dbl> <dbl>
1 00001 2002-01-01 King County Seattle--Tacoma, WA Demand Res… 1.35e5 7.46e5
2 00001 2002-01-01 King County Seattle--Tacoma, WA Motorbus 6.05e6 2.88e6
3 00001 2002-01-01 King County Seattle--Tacoma, WA Trolleybus 2.04e6 3.15e5
4 00001 2002-01-01 King County Seattle--Tacoma, WA Vanpool 1.74e5 8.67e5
5 00001 2002-02-01 King County Seattle--Tacoma, WA Demand Res… 1.27e5 6.56e5
6 00001 2002-02-01 King County Seattle--Tacoma, WA Motorbus 5.41e6 2.56e6
7 00001 2002-02-01 King County Seattle--Tacoma, WA Trolleybus 1.84e6 2.81e5
8 00001 2002-02-01 King County Seattle--Tacoma, WA Vanpool 1.50e5 7.49e5
9 00001 2002-03-01 King County Seattle--Tacoma, WA Demand Res… 1.36e5 7.27e5
10 00001 2002-03-01 King County Seattle--Tacoma, WA Motorbus 6.00e6 2.84e6
# ℹ 256,793 more rows
# Re-coding the Mode with their definitionsFINANCIALS <- FINANCIALS |>mutate(Mode =case_when( Mode =="DR"~"Demand Response", Mode =="MB"~"Motorbus", Mode =="CR"~"Commuter Rail", Mode =="CC"~"Cable Car", Mode =="LR"~"Light Rail", Mode =="PB"~"Publico", Mode =="SR"~"Streetcar Rail", Mode =="VP"~"Vanpool", Mode =="CB"~"Commuter Bus", Mode =="TB"~"Trolleybus", Mode =="FB"~"Ferryboat", Mode =="TR"~"Aerial Tramway", Mode =="HR"~"Heavy Rail", Mode =="MG"~"Monorail/Automated Guideway", Mode =="RB"~"Bus Rapid Transit", Mode =="OR"~"Other Rail", Mode =="YR"~"Hybrid Rail", Mode =="AR"~"Alaska Railroad", Mode =="IP"~"Inclined Plane",TRUE~"Unknown")) |>mutate(`NTD ID`=as.character(`NTD ID`)) # joining Integer and Character type data is impossible, So I converted NTD ID data type to match USAGE_2022_ANNUAL# We are now merging the USAGE_2022_ANNUAL and FINANCIALS tablesUSAGE_AND_FINANCIALS <-left_join(USAGE_2022_ANNUAL, FINANCIALS, join_by(`NTD ID`, Mode)) |>drop_na()USAGE_AND_FINANCIALS
The MTA New York City Transit’s Heavy Rail system recorded the most UPT in 2022, with 1,793,073,801 trips. This highlights the central role of the NYC subway in the nation’s transit network.
# First let's filter for major transit with more than 400,000 UPT per annum in 2022major_system <- USAGE_AND_FINANCIALS |>filter(year(month) ==2022, UPT >400000)most_UPT <- major_system |>group_by(Agency, Mode) |>summarize(total_UPT =sum(UPT, na.rm =TRUE)) |>arrange(desc(total_UPT))
`summarise()` has grouped output by 'Agency'. You can override using the
`.groups` argument.
head(most_UPT, 1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode total_UPT
<chr> <chr> <dbl>
1 MTA New York City Transit Heavy Rail 1793073801
Transit System with the Highest Fare-box Recovery
The Anaheim Transportation Network’s Motorbus service achieved the highest fare-box recovery ratio at 0.865 (or 86.5%), indicating that the system covers a significant portion of its operating costs through fare collection.
# Find the transit system with the highest fare-box recovery ratio (Total Fares / Expenses)highest_farebox_recovery <- major_system |>group_by(Agency, Mode) |>summarize(farebox_recovery =sum(`Total Fares`, na.rm =TRUE) /sum(Expenses, na.rm =TRUE)) |>arrange(desc(farebox_recovery))
`summarise()` has grouped output by 'Agency'. You can override using the
`.groups` argument.
University of Georgia (Motorbus) had the lowest expenses per UPT at $14.90. This suggests that the university’s motorbus system operates at a lower cost per passenger compared to other major systems.
# Find the transit system with the lowest expenses per UPT (Expenses / UPT)lowest_expenses_per_UPT <- major_system |>group_by(Agency, Mode) |>summarize(expenses_per_UPT =sum(Expenses, na.rm =TRUE) /sum(UPT, na.rm =TRUE)) |>arrange(expenses_per_UPT)
`summarise()` has grouped output by 'Agency'. You can override using the
`.groups` argument.
head(lowest_expenses_per_UPT, 1)
# A tibble: 1 × 3
# Groups: Agency [1]
Agency Mode expenses_per_UPT
<chr> <chr> <dbl>
1 University of Georgia Motorbus 14.9
Transit System with the Highest Fares per UPT
The Metro-North Commuter Railroad (Commuter Rail) had the highest total fares per UPT, with $98.65 per trip. This is indicative of higher fares typically charged on long-distance commuter services.
# Find the transit system with the highest total fares per UPT (Total Fares / UPT)highest_fares_per_UPT <- major_system |>group_by(Agency, Mode) |>summarize(fares_per_UPT =sum(`Total Fares`, na.rm =TRUE) /sum(UPT, na.rm =TRUE)) |>arrange(desc(fares_per_UPT))
`summarise()` has grouped output by 'Agency'. You can override using the
`.groups` argument.
With a cost of $51.03 per VRM, The Delaware Transit Corporation (Motorbus) haS the lowest expenses. Suggesting that their bus operations are particularly cost-efficient in relation to the distance their vehicles travel.
# Find the transit system with the lowest expenses per VRM (Expenses / VRM)lowest_expenses_per_VRM <- major_system |>group_by(Agency, Mode) |>summarize(expenses_per_VRM =sum(Expenses, na.rm =TRUE) /sum(VRM, na.rm =TRUE)) |>arrange(expenses_per_VRM)
`summarise()` has grouped output by 'Agency'. You can override using the
`.groups` argument.
The Washington State Ferries (Ferryboat) system had the highest total fares per VRM, with $827.86. Ferries typically have high operating costs and therefore charge higher fares per mile traveled.
# Find the transit system with the highest total fares per VRM (Total Fares / VRM)highest_fares_per_VRM <- major_system |>group_by(Agency, Mode) |>summarize(fares_per_VRM =sum(`Total Fares`, na.rm =TRUE) /sum(VRM, na.rm =TRUE)) |>arrange(desc(fares_per_VRM))
`summarise()` has grouped output by 'Agency'. You can override using the
`.groups` argument.
This analysis highlights the performance of various transit systems across the U.S., with New York City’s MTA consistently leading in both VRM and UPT. The Motorbus mode proves to be the most heavily utilized, while Anaheim’s Transportation Network demonstrated an impressive fare-box recovery ratio, suggesting a highly efficient operation. These findings provide valuable insights for transit agencies in terms of resource allocation, planning, and operational efficiency, while also offering a benchmark for future performance assessments.