Skip to contents

Formats and validates health outcome data to ensure compatibility with geoExposeR's requirements.

Usage

format_health_data(
  health_data,
  id_col = "FIPS",
  outcome_cols,
  covariate_cols = NULL,
  geoid_digits = 5,
  validate = TRUE,
  output_file = NULL,
  output_format = c("txt", "csv")
)

Arguments

health_data

A data frame containing health outcomes and covariates.

id_col

Character. Column name containing geographic identifiers (e.g., FIPS codes). Default is "FIPS".

outcome_cols

Character vector. Column names of health outcome variables to include.

covariate_cols

Optional character vector. Column names of covariate variables to include.

geoid_digits

Integer. Number of digits for FIPS code formatting. Default is 5 (county level).

validate

Logical. Whether to perform validation checks. Default TRUE.

output_file

Optional file path to save the resulting file.

output_format

Character. Output format: "txt" (tab-delimited) or "csv". Default is "txt".

Value

A data.table with properly formatted health outcome data.

Details

The function performs the following operations:

  • Formats FIPS codes to specified number of digits with zero-padding

  • Validates that outcome columns exist and contain valid data

  • Standardizes missing value representation to NA

  • Optionally validates data ranges for common health outcomes

Examples

if (FALSE) { # \dontrun{
health <- data.frame(
  FIPS = c(6001, 6003, 6005),
  BWT = c(3200, 3150, NA),
  OEGEST = c(39.2, 38.8, 40.1),
  MAGE = c(28, 32, 25)
)

formatted <- format_health_data(
  health_data = health,
  id_col = "FIPS",
  outcome_cols = c("BWT", "OEGEST"),
  covariate_cols = c("MAGE"),
  output_file = "health_outcomes.txt"
)
} # }