Prepare Measured Concentration Data with Lognormal Parameters
prepare_conc_data.RdProcesses measured concentration data and converts to lognormal distribution parameters required by geoExposeR.
Usage
prepare_conc_data(
conc_data,
id_col = "county_fips",
concentration_col = "concentration",
population_col = NULL,
private_well_pct_col = NULL,
input_type = c("raw", "summary"),
mean_col = "mean_concentration",
sd_col = "sd_concentration",
default_sdlog = 1,
min_concentration = 0.1,
output_file = NULL
)Arguments
- conc_data
A data frame containing concentration measurements. Can be raw measurements or summary statistics.
- id_col
Character. Column name containing geographic identifiers. Default is "county_fips".
- concentration_col
Character. Column name containing concentrations. Default is "concentration".
- population_col
Optional character. Column name for population served by each water system, used for weighted averaging.
- private_well_pct_col
Optional character. Column name containing private well percentage. If NULL, must be provided separately or merged from another source.
- input_type
Character. Type of input data: "raw" for individual measurements, "summary" for pre-aggregated statistics. Default is "raw".
- mean_col
Character. For summary input, column name for mean concentration. Default is "mean_concentration".
- sd_col
Character. For summary input, column name for standard deviation. Default is "sd_concentration".
- default_sdlog
Numeric. Default sdlog value to use when SD cannot be calculated. Default is 1.0.
- min_concentration
Numeric. Minimum concentration value to avoid log(0). Default is 0.1.
- output_file
Optional file path to save the resulting CSV.
Value
A data.table with columns for geographic ID, conc_meanlog (lognormal mean), conc_sdlog (lognormal SD), and optionally PWELL_private_pct.
Details
The function converts arithmetic mean and standard deviation to lognormal parameters using the following formulas: $$meanlog = \log(\mu^2 / \sqrt{\sigma^2 + \mu^2})$$ $$sdlog = \sqrt{\log(1 + \sigma^2/\mu^2)}$$
where \(\mu\) is the arithmetic mean and \(\sigma\) is the standard deviation of the concentrations.
Examples
if (FALSE) { # \dontrun{
# Example with raw concentration data
conc_raw <- data.frame(
county_fips = c("06001", "06001", "06003", "06003"),
concentration = c(2.5, 3.1, 8.2, 7.5),
population_served = c(50000, 25000, 10000, 15000)
)
conc_processed <- prepare_conc_data(
conc_data = conc_raw,
id_col = "county_fips",
concentration_col = "concentration",
population_col = "population_served",
input_type = "raw",
output_file = "conc_lognormal.csv"
)
} # }