UPL calculations for MACT floor standards • UPLforOAR

The goal of UPLforOAR is to provide a set of functions for supporting National Emissions Standards for Hazardous Air Pollutants (NESHAP) analyses. This includes organizing data sets for Maximum Achievable Control Technology (MACT) floor analysis and Upper Predictive Limit (UPL) calculations. These functions include selecting the best and top performing sources from emissions data based on appropriate Clean Air Act sections, determining the appropriate distributions for the emissions data, and calculating the UPL for Existing source Guidance (EG) and New Source Performance Standards (NSPS).

The UPLforOAR R package replicates all of the functionality of the UPL.xlsx workbook while streamlining its use. Using R instead of Excel avoids common sources of user-error such as copy-paste mistakes, cell-dragging, and inter-sheet references. Furthermore, the R package adds clarity to UPL standards calculations by plotting the distribution probability densities and emissions data underlying the methods. This allows the user to verify visually that the emissions data are well represented and the assumptions of the probability distribution are reasonable.

You can use UPLforOAR in R as demonstrated in the example below. For those unfamiliar with R, or if you want quick and reproducible UPL calculations, you can use the UPL shiny app which will launch an interactive browser where your emissions data can be uploaded. The UPL shiny app will determine your top performing sources and replicate the results of the Excel workbook, including determining the best distribution and calculate the corresponding UPL. It will also plot your distribution and you can choose to download the resulting data set of top performers, UPL calculations, and a report of the results as a PDF document. Using the UPL app will still require the installation of UPLforOAR and its dependencies.

A more advanced and robust way to calculate UPL’s is included with the Bayesian_UPL() function. Please explore the documentation on all of the functions in UPLforOAR, user guides, references, and worked examples included at the UPLforOAR website.

Installation

You can install the most recent development version of UPLforOAR from GitHub with:

# install.packages("pak")
pak::pak("USEPA/UPLforOAR")

Contact

If you have any questions please reach out to Ludwig.Ludda@epa.gov

Example emissions data

This is example uses Hg emissions data from the recent EPA rule-making NESHAP for Coal- and Oil-fired Electric Utility Steam Generating Units. This data set contains a lot of test report information, but only columns for emissions and sources are needed for the MACT floor UPL analysis. The emissions and sources need to be named such explicitly. The emissions should all be in consistent units, and the sources should be unique at the unit-level (e.g. a single boiler), not including sub-categories.

library(UPLforOAR)
dat_emiss = read_csv("man/data_example/MATS_Hg.csv", col_names = TRUE)
dat_emiss$sources = paste0(dat_emiss$`Plant Name`, "_", dat_emiss$`Unit Number`,
                         "_", dat_emiss$boiler_id)
dat_emiss$emissions = dat_emiss$Mercury_min_lb_MMBtu
dat_emiss = subset(dat_emiss, select = c(sources, emissions))
nrow(dat_emiss) # number of tests in data set
#> [1] 387
summary(dat_emiss$emissions)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 2.630e-09 2.805e-07 1.570e-06 2.713e-06 3.820e-06 2.990e-05

dat_EG = MACT_EG(CAA_section=112, dat_emiss)
dat_EG_avg = dat_EG%>%group_by(sources)%>%
  summarize(avg = mean(emissions), counts = n())
dat_EG_avg = arrange(dat_EG_avg, avg)
distribution_result_EG = distribution_type(dat_EG)

Source	Average emission	No. of Tests
Spruance Genco, LLC_GEN2_2A	2.63e-09	1
Spruance Genco, LLC_GEN2_2B	2.63e-09	1
Spruance Genco, LLC_GEN3_3A	4.69e-09	1
Spruance Genco, LLC_GEN3_3B	4.69e-09	1
Logan Generating Plant_Unit1_B01	5.33e-09	1
Nucla_001_1	5.33e-09	1

Top 5 of 42 sources for EG standard UPL calculation

Since there were more than 30 sources in the emissions data, the top 12% were chosen to represent the top sources. This yielded 47 sources. The data included in this regulatory docket were test averages as opposed to individual runs. As such the number of future runs used in UPL calculations will be 1 instead of the default, an average of 3 runs. The appropriate distribution for the UPL calculation is Normal, according to ratios of skewness and kurtosis. This simple distribution test does not take into account that we cannot have negative emissions, so we will also calculate the Lognormal UPL for comparison, which is strictly positive.

UPL1_EG = Normal_UPL(data = dat_EG,
                     future_runs = 1,
                     significance = 0.99)
UPL2_EG = Lognormal_UPL(data = dat_EG,
                        future_runs = 1,
                        significance = 0.99)

Next we calculate the UPL using the appropriate distribution, which results in a MACT floor standard of 4.8582101^{-8} lb/MMBtu in Hg emissions. Lastly, we will want to plot observation density of emissions data as well as the Normal distribution that was used to as the probability density function for the UPL calculation. From the figure below, the Lognormal appears to better represent the data, which would yield a UPL MACT floor standard of 8.4825717^{-8} lb/MMBtu in Hg emissions.

# make an ordered sequence of emissions 
# for which we will define the probability density
x_hat = seq(0, 3*max(dat_EG$emissions), length.out = 1024)
# next define the probability density along x_hat
# and at each emission observation.
obs_dens_results = obs_density(dat_EG, xvals = x_hat)
Obs_onPoint = obs_dens_results$Obs_onPoint
obs_den_df = obs_dens_results$obs_den_df
# create a probability density function along the same x_hat
# based on estimated distribution parameters
pdf_n = dnorm(x_hat, mean = mean(dat_EG$emissions, na.rm = T),
              sd=sd(dat_EG$emissions, na.rm = T))
pdf_ln = dlnorm(x_hat,mean = log(mean(dat_EG$emissions, na.rm = T)),
              sd = sd(log(dat_EG$emissions), na.rm = T))
pred_dat = tibble(x_hat, pdf_ln, pdf_n)

ggplot()+
  geom_line(data = obs_den_df, size=0.75,
            aes(y = ydens, x = x_hat, color = 'a'))+
  geom_area(data = obs_den_df, alpha=0.25,
            aes(y = ydens, x = x_hat, fill = 'a'))+
  geom_point(aes(y = ydens, x = emissions), data = Obs_onPoint,
             size = 3, alpha = 0.5, shape = 19, color = 'black')+
  geom_line(aes(y = pdf_n, x = x_hat, color = 'b'),
               data = pred_dat, size = 0.75, linetype = 2)+
  geom_area(aes(y = pdf_n, x = x_hat, fill = 'b'), alpha = 0.25,
            data = pred_dat)+
  geom_line(aes(y = pdf_ln, x = x_hat, color = 'c'),
               data = pred_dat, size = 0.75, linetype = 3)+
  geom_area(aes(y = pdf_ln, x = x_hat, fill = 'c'), alpha = 0.25,
            data = pred_dat)+
  ylab("Density")+xlab("Hg emissions (lb/MMBtu)")+
  ggtitle("Overall observed population")+
  pop_distr_theme()+
  geom_vline(aes(xintercept = (mean(dat_EG$emissions)),
                 color = 'a'), size = 1, linetype = 1)+
  geom_vline(aes(xintercept = UPL1_EG, color = 'b'), 
             linewidth = 1, linetype = 2)+
  geom_vline(aes(xintercept = UPL2_EG, color = 'c'), 
             linewidth = 1, linetype = 3)+
  scale_x_continuous(expand = expansion(mult = c(0, 0.05)))+
  scale_y_continuous(expand = expansion(mult = c(0, 0.05)))+
  coord_cartesian(clip = 'off')+
  labs(color = 'Distribution:', fill = 'Distribution:')+
  geom_rug(sides = 'b', aes(x = emissions), data = dat_EG,
           alpha = 0.5, outside = TRUE, color = 'black')+
  scale_color_manual(values = c('black', '#FF7F00', '#984EA3'),
                     labels = c('Observations', 'Normal', 'Lognormal'))+
  scale_fill_manual(values = c('black', '#FF7F00', '#984EA3'),
                    labels = c('Observations', 'Normal', 'Lognormal'))

Observation density of Hg for the overall population. The obseration data are indicated in black as points and a rug along the axis, with the observation density distribution as a black line. The fitted lognormal distribution that is the basis of the UPL estimate is colored purple. The fitted normal distribution that is the basis of the UPL estimate is colored orange. The average of the Hg emissions is the vertical black line and the UPL results are the vertical purple and orange lines.

Disclaimer

The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.