Prepare Modeled Probability Data from GeoTIFFs — prepare_prob

Extracts exposure probability estimates from GeoTIFF rasters and aggregates them by geographic unit (e.g., county) to create the CSV format required by geoExposeR.

Usage

prepare_prob_data(
  prob_rasters,
  boundaries,
  id_col = "GEOID",
  pop_data = NULL,
  pop_id_col = "GEOID",
  pop_well_col = "private_well_pop",
  extraction_method = c("mean", "centroid"),
  cutoffs = c(5, 10),
  output_file = NULL
)

Arguments

prob_rasters: A named list of file paths to GeoTIFF rasters containing exceedance probabilities. Names should indicate thresholds (e.g., list(gt1 = "prob_gt1.tif", gt5 = "prob_gt5.tif", gt10 = "prob_gt10.tif")).
boundaries: An sf object containing polygon boundaries (e.g., counties) for spatial aggregation.
id_col: Character. Column name in boundaries containing the geographic identifier. Supports any identifier format (FIPS codes, census tract IDs, ZIP codes, custom IDs). Default is "GEOID".
pop_data: Optional data frame with population data. Should have columns for geographic ID and private well population.
pop_id_col: Character. Column name in pop_data for geographic ID. Default is "GEOID".
pop_well_col: Character. Column name in pop_data for private well population. Default is "private_well_pop".
extraction_method: Character. Method for extracting raster values: "centroid" (faster) or "mean" (more accurate). Default is "mean".
cutoffs: Numeric vector of length 2 specifying concentration cutoffs in ug/L for creating categories. Default is c(5, 10) for categories: <5, 5-10, >=10.
output_file: Optional file path to save the resulting CSV.

Value

A data.table with columns for geographic ID and multinomial probabilities for each exposure category.

Details

The function converts exceedance probabilities (P(conc > threshold)) to multinomial probabilities that sum to 1:

Category 1: P(conc < lower_cutoff) = 1 - P(conc > lower_cutoff)
Category 2: P(lower <= conc < upper) = P(conc > lower) - P(conc > upper)
Category 3: P(conc >= upper_cutoff) = P(conc > upper_cutoff)

Required Packages

This function requires the terra and sf packages. For the "mean" extraction method, exactextractr is also recommended for better accuracy.

Examples

if (FALSE) { # \dontrun{
# Load required packages
library(terra)
library(sf)

# Define raster paths
rasters <- list(
  gt5 = "path/to/prob_gt5.tif",
  gt10 = "path/to/prob_gt10.tif"
)

# Load county boundaries
counties <- st_read("path/to/counties.shp")

# Extract and format probability data
prob_data <- prepare_prob_data(
  prob_rasters = rasters,
  boundaries = counties,
  id_col = "GEOID",
  cutoffs = c(5, 10),
  output_file = "prob_model_output.csv"
)
} # }