Skip to contents

Processes measured concentration data and converts to lognormal distribution parameters required by geoExposeR.

Usage

prepare_conc_data(
  conc_data,
  id_col = "county_fips",
  concentration_col = "concentration",
  population_col = NULL,
  private_well_pct_col = NULL,
  input_type = c("raw", "summary"),
  mean_col = "mean_concentration",
  sd_col = "sd_concentration",
  default_sdlog = 1,
  min_concentration = 0.1,
  output_file = NULL
)

Arguments

conc_data

A data frame containing concentration measurements. Can be raw measurements or summary statistics.

id_col

Character. Column name containing geographic identifiers. Default is "county_fips".

concentration_col

Character. Column name containing concentrations. Default is "concentration".

population_col

Optional character. Column name for population served by each water system, used for weighted averaging.

private_well_pct_col

Optional character. Column name containing private well percentage. If NULL, must be provided separately or merged from another source.

input_type

Character. Type of input data: "raw" for individual measurements, "summary" for pre-aggregated statistics. Default is "raw".

mean_col

Character. For summary input, column name for mean concentration. Default is "mean_concentration".

sd_col

Character. For summary input, column name for standard deviation. Default is "sd_concentration".

default_sdlog

Numeric. Default sdlog value to use when SD cannot be calculated. Default is 1.0.

min_concentration

Numeric. Minimum concentration value to avoid log(0). Default is 0.1.

output_file

Optional file path to save the resulting CSV.

Value

A data.table with columns for geographic ID, conc_meanlog (lognormal mean), conc_sdlog (lognormal SD), and optionally PWELL_private_pct.

Details

The function converts arithmetic mean and standard deviation to lognormal parameters using the following formulas: $$meanlog = \log(\mu^2 / \sqrt{\sigma^2 + \mu^2})$$ $$sdlog = \sqrt{\log(1 + \sigma^2/\mu^2)}$$

where \(\mu\) is the arithmetic mean and \(\sigma\) is the standard deviation of the concentrations.

Examples

if (FALSE) { # \dontrun{
# Example with raw concentration data
conc_raw <- data.frame(
  county_fips = c("06001", "06001", "06003", "06003"),
  concentration = c(2.5, 3.1, 8.2, 7.5),
  population_served = c(50000, 25000, 10000, 15000)
)

conc_processed <- prepare_conc_data(
  conc_data = conc_raw,
  id_col = "county_fips",
  concentration_col = "concentration",
  population_col = "population_served",
  input_type = "raw",
  output_file = "conc_lognormal.csv"
)
} # }