User Guide

Automated Probabilistic Co-Occurrence Assessment Tool

Version 1.0 - 2022

Stone Environmental

Table of Contents

  1. Release Notes
  2. Introduction
  3. Software and Data Download
  4. System Requirements
  5. Running the Software
    1. Project Management
    2. Pesticide Use Footprints
    3. Species Distribution Modeling
    4. Co-Occurrence Assessment
  6. Example Assessment
    1. Pesticide Use Footprints Example
    2. Species Distribution Modeling Example
    3. Co-Occurrence Assessment Example
  7. Citation
  8. References

Foreword

 In response to the need for efficient production of advanced geospatial analyses of co-occurrence between pesticide use and species of interest as required by the Endangered Species Act, Stone Environmental, with the support of Syngenta Crop Protection, developed the Automated Probabilistic Co-Occurrence Assessment Tool (APCOAT) in early 2022. APCOAT is designed to produce probabilistic spatial models of both pesticide use and species distributions, and combine the models for co-occurrence assessments. Each of the models may also be run independently. The pesticide use models are represented by probabilistic crop footprints[1, 2] and statistical measures of the Percent Crop Treated (PCT) derived from freely available pesticide usage data[3], or from pesticide usage data provided by the user. The species distribution models (SDMs) are produced using maximum entropy methods[4] analyzing the statistical fit between species presence location records and geographic predictor rasters for environmental variables. Probabilistic co-occurrence between pesticide usage and species distributions is calculated by multiplying the two model output rasters. For planning and conservation purposes, the co-occurrence statistics may be summarized by state, crop reporting district, county, or watershed.



1. Release Notes

Overview
 In this initial release of APCOAT, the core functionality includes:

Bug Fixes
 If you encounter any bugs in this release, please use the contact form at https://www.stone-env.com/APCOAT to submit a detailed report, including a description of the inputs used, error messages received, and contact information for follow-up communications.

Coming Soon
 The next version of APCOAT to be released in 2022 will include:


2. Introduction

APCOAT workflow diagram

 More than 1,600 animal, plant, and other species are listed as threatened or endangered under the Endangered Species Act (ESA) by the US Fish and Wildlife Service (USFWS), whose mission is to prevent the extinction of these sensitive species and take actions to allow population recovery and eventual delisting. Under Section 7 of the Act, federal agencies are required to consult with USFWS and the National Marine Fisheries Service (NMFS) to evaluate whether their actions may affect listed species protected by the ESA[5]. For example, the US Environmental Protection Agency (EPA) regulates the sale of more than 1,000 pesticide chemicals, each of which is required by law to go through this consultation process, in which the USFWS and/or NMFS must issue a Biological Opinion regarding potential risks to listed species. Despite this legal requirement, few pesticides have undergone the consultation process, perhaps due to interagency disagreements about interpretation of the law or poorly defined methods. A 2013 National Research Council[6] study described a method for conducting these consultations, known as Biological Evaluations (BE), in which a preliminary determination for each listed species is made, in part, based on the degree of spatial overlap between pesticide usage areas and the species' range or Critical Habitat maps. Following publication of this study, USFWS conducted several preliminary consultations on pesticides regulated by EPA[7, 8, 9], later clarifying that >1% spatial concurrence between pesticide usage areas and listed species habitats was a threshold for a 'may affect' determination[10]. In the event of such a 'may affect' decision, BEs use a weight-of-evidence approach to evaluate exposure and toxicity data in consultation with USFWS.

 The current BE model has not been effective for regulatory agencies or industry and may not be appropriately assessing potential risks to listed species. The method is deterministic (i.e., presence/absence), identifying simple spatial overlap between listed species ranges and crop footprints. The species range datasets are often poorly modeled or of low spatial resolution, and the crop footprints do not reflect variable patterns of crop production or pesticide usage. Such methods may either under- or over-predict pesticide exposure risk, benefiting neither listed species nor the parties involved. In recent work, a team of scientists at Syngenta Crop Protection and Stone Environmental has addressed some of these challenges by developing probabilistic models of pesticide use patterns and species distributions[1, 2, 11]. To address the high number of species protected by the ESA and pesticide products regulated by EPA, in ongoing research in 2019-2020 we apply these methods to suites of listed species co-occurring in delimited geographic areas (i.e., watersheds). USFWS recently published standard operating procedures for characterizing listed species distributions that are expected to improve the rigor of co-occurrence analysis, and follow similar methods to those used in our own work[12].

 The scope and aim of this software is to provide users with the ability to rapidly implement the methods we have developed to assess probabilistic co-occurrence between pesticide use and species of interest for the purposes of regulatory review. With APCOAT, users are able to use pesticide application rates and species locations to generate automated reports detailing pesticide usage, species distribution modeling, and co-occurrence between the two (Figure 1). This guide walks users through each APCOAT function in a step-by-step manner and includes descriptions of the methods and processing involved in each model. These methods have been peer-reviewed and published in several research manuscripts, and more detailed descriptions of the models are cited in the References section.


3. Software and Data Download

The APCOAT software package, user guide, and associated probabilistic crop footprints can be downloaded in a single package at http://stone-env.com/APCOAT.


4. System Requirements

4.1 Installation


There are three software packages that must be installed before APCOAT can be deployed:
  1. Install 64-bit Java for Windows version 8.x - https://www.java.com/en/download/.
  2. Install 64-bit r version 4.x - https://cran.r-project.org/bin/windows/base/.
    When APCOAT is installed or run with administrator priviliges it will attempt to install the following r libraries, but they may need to be installed manually:
    • data.table
    • dismo
    • GGally
    • maptools
    • raster
    • reshape2
    • rgdal
    • rgeos
    • rJava
    • tidyverse
  3. Install 32-bit version of Microsoft Access database driver http://www.microsoft.com/en-us/download/details.aspx?id=13255.
  4. Install APCOAT.

4.2 Memory Requirements




Component Space Required
64-bit Java for Windows 81 MB
64-bit r version 4.x 86 MB
32-bit Microsoft Access driver 25 MB
APCOAT and Probabilistic Crop Footprints 7.5 GB
Recommended SDM Predictor Variables 13.9 GB
TOTAL 21.6 GB

4.3 Processing

APCOAT will run on most 64-bit Windows 10 machines. However, generating use footprints, species distribution models, and co-occurrence summaries may require several hours of processing time. We recommend beginning co-occurrence assessments using a subset of species of interest to determine the general rate of processing time before completing large batches of species assessments.

Example processing times for an analysis of one pesticide applied to two crops, two SDMs, and the four resulting co-occurrence reports:

Computer Pesticide Usage Modeling Species Distribution Modeling Co-Occurrence Modeling
Laptop
Processor: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
RAM: 8.00 GB
70 minutes 80 minutes 36 minutes
Work Station 1
Processor: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
RAM: 16.00 GB
41 minutes 37 minutes 27 minutes
Work Station 2
Processor: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
RAM: 32.00 GB
45 minutes 42 minutes 30 minutes


5. Running the Software

  Running APCOAT to generate co-occurrence reports consists of four separate processes:
    5.1 Project Management to create, save, and load the project database and associated folders
    5.2 Pesticide Use Footprints to generate probabilistic maps of the likelihood that a location will be planted with a given crop, and that the given crop will be treated with a pesticide of interest
    5.3 Species Distribution Modeling to generate probabilistic maps of the likelihood that a location will provide suitable habitat for a given species
    5.4 Co-Occurrence Assessment to multiply the pesticide use footprints and species distribution models and produce a co-occurrence raster and report summarizing model inputs and outputs


5.1 Project Management

Methods
  APCOAT projects consists of a project folder paired with a project database. Do not rename, edit, or move any files created by APCOAT in that folder. If the user would like to further process or analyze (intermediate) outputs created by APCOAT it is strongly recommended to make a copy of the data and analyze in a different folder. Associated project sub-folders paired with project databases are used to store the raster and statistical outputs.

To create a project database and associated output sub-folders:

  1. - Click the "File" menu on the upper left corner of the window.
  2. - Select "New".
  3. - Select an empty destination folder.
  4. - Provide a project file name. APCOAT will generate the following files and be named as shown below:
    • A project database used to store user inputs
      • [Project Name].mdb
    • A folder used to store results of co-occurrence analyses
      • \CoOccurrence\
    • A folder used to store results of species distribution modeling
      • \SpeciesDistModels\
    • A folder used to store results of pesticide usage modeling
      • \UseFootprints\
  5. - Project progress may be saved at any time that APCOAT is not actively processing data. The file menu is also used to load previously saved projects.

5.2 Pesticide Use Footprints

Methods
 APCOAT pesticide use footprints are generated by multiplying probabilistic crop footprints by estimates of the Percent Crop Treated (PCT)(Figure 1). The probabilistic crop footprints have already been produced and are included in the software package. The method for producing the probabilistic crop footprints incorporates best available information at the time of analysis from the Cropland Data Layer (CDL)[13] for 6 years (2015 - 2020), the 2016 National Land Cover Database (NLCD)[14], and 6 years of the NASS Agricultural Survey (2015 - 2020)[15]. The method also accounts for misclassification of crop classes in the CDL by incorporating accuracy assessment information[16] by state, year, and crop. The NLCD provides additional information to improve the CDL crop probability through an adjustment based on the NLCD accuracy assessment data using the principles of Bayes' Theorem. Finally, annual crop probabilities are scaled at the state level by comparing against NASS surveys of reported planted acres by crop, and the average annual probability is calculated*.

 The PCT rasters are created by first calculating a time series of the maximum potential annual usage in a region. Maximum annual usage is calculated by multiplying the regional crop acreage measured from CDL for 6 years by the specified application rates. Regional pesticide usage data is then divided by the maximum potential usage in each year to generate a time series of annual PCT calculations for each region. Finally, a user-specified statistic is calculated from these regional time series and converted to raster format. The PCT rasters are then multiplied by probabilistic crop footprint rasters.

*Due to large discrepancies between the alfalfa acreage reported by CDL and NASS Agricultural Survey data, the Agricultural Survey data adjustment has been omitted for the alfalfa probabilistic crop footprint.

To create probabilistic pesticide use footprints:
  1. - Click the "Pesticide Use" tab.
  2. - Select one of the following the pesticide use data sources. Note that your selection will modify the appearance of the subsequent input fields and may not match the example screenshot:
    1. - USGS ePest: Usage estimates compiled by USGS for years 2012 - 2019[3]
    2. - 100% Crop Treated: No usage data is used and all usage sites are assumed to be treated
    3. - Custom (CSV file): Usage data provided by the user. See below for Preparing Custom Usage Data
  3. - Select the desired pesticide use statistic. This will be calculated for each region of interest over the specified annual time series.
  4. - Select the desired spatial resolution, which is dependent on the pesticide use data source.
    1. - USGS ePest estimates are published by state, the only available spatial resolution will be "State" if it is selected as the pesticide use data source.
    2. - Custom (CSV file) data may be provided at the county, Crop Reporting District, or state scale, and the resolution will be automatically detected from the file header. If you wish to summarize the data you have provided at a larger spatial scale, select that scale from the Upscale options (not visible in the example screenshot).
  5. - Select or name the pesticide of interest, depending on pesticide use data source:
    1. - USGS ePest select one of the 315 pesticides available from the ePest database.
    2. - Custom (CSV file) enter the name of the selected pesticide
  6. - Specify the units and annual application rates for each crop of interest where the pesticide may be applied.
  7. - Click "Generate Pesticide Use Footprint". APCOAT will generate the following files for each crop in the "\UseFootprints\" folder and be named as shown below:
    • A CSV file containing the time series of Percent Crop Treated (PCT) calculations
      • [Pesticide]_PCT_[CropApplication Rate]_[Usage Data Source]_[Usage Data Resolution]_[UsageStatistic].csv
    • A CSV file containing the statistic calculated from the PCT time series
      • [Pesticide]_PCTstatsRep_[Crop]_[Application Rate]_[Usage Data Source]_[Usage Data Resolution]_[UsageStatistic].csv
    • A raster showing the PCT values applied for each region
      • [Pesticide]_PCT_[Crop]_[ApplicationRate]_[Usage Data Source]_[Usage Data Resolution]_[UsageStatistic].tif
    • Probabilistic crop usage footprints.
      • [Pesticide]_ProbUseFP_[Crop]_[ApplicationRate]_[Usage Data Source]_[Regional Resolution]_[UsageStatistic].tif
      • Downsampled versions of the crop usage footprints for display and review are also created during co-occurrence processing:
        • [Pesticide]_ProbUseFP_[Crop]_[ApplicationRate]_[Usage Data Source]_[Usage Data Resolution]_[UsageStatistic]_[2-128]x.tif

Preparing Custom Usage Data

If you wish to load pesticide use data it must be formatted as a comma-separated values (CSV) file conforming to the following requirements. Templates for county, CRD, and state level pesticide use data files are available and installed by default in "C:\Program Files (x86)\APCOAT\UserDataTemplates".

Field Title Data Type Data Requirements
"State_FIPS" for state resolution
or
"CRD_STASD" for CRD resolution
or
"FIPS" for county resolution
Text
  • County and state FIPS codes must match those published by the US Census Bureau[17].
  • Crop Reporting District, or Agricultural Statistics District codes must match those published by the National Agricultural Statistics Service[18].
  • A lookup table (tbl_FIPS_CRD) joining each of these code sets is available in each project database.
Year Integer May be 2012 - 2020. Years outside of this range will be ignored.
Crop_Code Integer Crop codes must conform to the following USGS crop groups:
 1 - corn
 2 - soybeans
 3 - wheat
 4 - cotton
 6 - rice
 8 - alfalfa
Pesticide_Use_kg Float No blanks, must include 0 for years and regions with no use.

Screenshot of example custom user data

5.3 Species Distribution Models

Methods
 APCOAT uses Maxent software [6] to compare species location records to environmental variables and locate areas of similar habitat suitability as the basis for generating Species Distribution Models. Specifically, Maxent uses presence-only species records to "minimize the relative entropy between two probability densities (one estimated from the presence data and one, from the landscape) defined in covariate space"[16] and generate probabilistic models. The user provides the species location records as a CSV file, and the environmental variables that comprise the covariate space as raster images. In the sections below we have provided recommended sources for each data type.

 Since the probabilistic models vary slightly with each solution, APCOAT generates 5 models at a time and evaluates the average of the 5 models iteratively. Iterative evaluation of the averaged models follows a set of best practice methods to select the best-fit model that uses the fewest predictor variables and minimizes correlation among these predictors [11]. Eighty percent of the species location records are used in the initial model training iterations, and the remaining twenty percent of location records are used for evaluation of the final model.

To create probabilistic species distribution models:

  1. - Click the "Species Distribution" tab.
  2. - Click "Browse ..." and navigate to a folder containing either a single CSV file or multiple CSV files composed of species location data. See below for recommended species location data sources and formatting requirements.
    • As the species location data is loaded, APCOAT will perform quality checks to ensure that there are the requisite minimum number of 5 records per species (80% of records are used for model training and 20% for model validation), that there are no formatting errors, and whether or not duplicates exist.
  3. - Click "Browse ..." and navigate to a folder containing SDM predictor variable rasters. A single land use/land cover categorical predictor variable ('USGS_LULC.tif') is included with APCOAT, and by default will be installed at C:\Program Files (x86)\APCOAT\maxentPredictors. It is recommended to either place additional compatible SDM predictor variable rasters in this folder, or place "USGS_LULC.tif" in a folder containing additional compatible SDM predictor variable rasters. See below for additional recommended compatible SDM predictor variable rasters and formatting requirements for creating additional compatible rasters. Users may also select different predictor variable rasters, but they must all conform to the listed formatting requirements.
    • If a custom predictor variable is selected that contains categorical data such as land cover, the user must double click the name of the predictor in the selection window and add '(factor)' to the end of the name. This is done automatically for "USGS_LULC.tif", the land use/land cover raster included with APCOAT.
  4. - Select the species you would like to model which will use the same SDM predictor variables. Hold Ctrl to add single species to the selection, or Shift to add multiple species to the selection.
  5. - Select the SDM predictor variables you would like to be included in SDM generation and evaluation. Hold Ctrl to add single variables to the selection, or Shift to add multiple variables to the selection.
  6. - Specify the correlation threshold. If predictor variables are correlated above this threshold, only the variable with the highest contribution to the distribution model will be kept. All other correlated variables will be excluded.
  7. - Specify the minimum predictor contribution threshold used to exclude SDM predictor variables.
  8. - Specify the minimum species distribution modeling threshold. This value is used to filter out low quality habitat values.
  9. - Click "Generate Species Distribution Models". APCOAT will generate the following files in the "\SpeciesDistModels\" folder and be named as shown below:
    • Final SDM raster without the modeling threshold applied
      • [Scientific Name]_sdm_final_raster_raw.tif
    • SDM raster showing only values that exceed the modeling threshold
      • [Scientific Name]_sdm_final_raster_nan_[threshold].tif
    • SDM raster classifying locations where final values exceed the modeling threshold as 1
      • [Scientific Name]_sdm_final_raster_msk_[threshold].tif
    • Polygon shapefile showing the coverage of the raw SDM raster
      • [Scientific Name]_sdm_final_range_raw.shp
    • Polygon shapefile showing the coverage of the SDM where the modeling threshold is exceeded
      • [Scientific Name]_sdm_final_range_[threshold].shp

    • The following files will also be created when a species is included in a co-occurrence assessment:
    • SDM raster rescaled to match the resolution, projection, and cell alignment of the CDL
      • [Scientific Name]_sdm_final_raster_[threshold]_res.tif
    • Polygon shapefile showing the coverage of the SDM where the modeling threshold is exceeded, matching the projection of the CDL
      • [Scientific Name]_sdm_final_range_[threshold]_prj.shp
    • Captions for diagnostic image and text files detailing model iteration and production which are included in the assessment report.

Species Location Data
 The user may provide their own species occurrence records, or collect them from third parties including:

  • GBIF.org - freely available batches of species occurrence records.
  • FESTF.org - batches of species occurrence records may be accessed through membership.
  • iNaturalist.org - individual species occurrence records, locations of threatened species will be obscured and require additional authorization for access at high resolution.
 When compiling species location data into a single CSV file or multiple CSV files, records must conform to the following requirements:

Field Title Data Type Data Requirements
Scientific Name Text
Latitude_DecimalDegrees Float Must be in decimal degrees
Longitude_DecimalDegrees Float Must be in decimal degrees

 An example species location dataset[20] is installed by default in C:\Program Files (x86)\APCOAT\SpeciesLocationData\

Species Distribution Predictor Variables
 Species distribution predictor variables used in Maxent are formatted as rasters and must be of identical spatial resolution, projection, and cell alignment. The recommended source for these rasters is WorldClim[21]. The WorldClim datasets are available at 30 second spatial resolution and include climate data averaged over 1970 - 2000, "bioclimatic" variables derived from the same climate data specifically for species distribution modeling, and elevation data. The land cover raster included with APCOAT is installed by default in C:\Program Files (x86)\APCOAT\maxentPredictors and was developed by USGS[22] and processed to match the formatting of the 30 second resolution WorldClim rasters. The user may process additional predictor variables to match these layers, or the recommended layers may be ignored as long as the additional layers have matching resolution, projection, and cell alignment. Resampling, reprojection, and cell alignment functions are available in GIS software such as ArcGIS or QGIS.

Reference table for WorldClim bioclimatic variables
BIO1 = Annual Mean Temperature
BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))
BIO3 = Isothermality ((BIO2/BIO7) x100)
BIO4 = Temperature Seasonality (standard deviation x100)
BIO5 = Max Temperature of Warmest Month
BIO6 = Min Temperature of Coldest Month
BIO7 = Temperature Annual Range (BIO5-BIO6)
BIO8 = Mean Temperature of Wettest Quarter
BIO9 = Mean Temperature of Driest Quarter
BIO10 = Mean Temperature of Warmest Quarter
BIO11 = Mean Temperature of Coldest Quarter
BIO12 = Annual Precipitation
BIO13 = Precipitation of Wettest Month
BIO14 = Precipitation of Driest Month
BIO15 = Precipitation Seasonality (Coefficient of Variation)
BIO16 = Precipitation of Wettest Quarter
BIO17 = Precipitation of Driest Quarter
BIO18 = Precipitation of Warmest Quarter
BIO19 = Precipitation of Coldest Quarter

5.4 Co-Occurrence Assessments

Methods
 With APCOAT, co-occurrence is measured by multiplying batches of SDM rasters by pesticide use footprints and summarizing the resulting rasters by zones of interest for ease of interpretation. When calculating co-occurrence APCOAT will also produce a report detailing the inputs variables, methods, and outputs involved in each step of the analysis.

To create probabilistic co-occurrence assesments:

  1. - Click the "Co-Occurrence" tab.
  2. - Confirm that the path to the use footprints folder is correctly associated with the current project. If not, click "Browse..." and navigate to the "UseFootprints" folder associated with the currently loaded assessment project database.
  3. - Select the crop footprints of interest.
  4. - Confirm that the path to the SDM folder is correctly associated with the current project. If not, click "Browse..." and navigate to the "SpeciesDistModels" folder associated with the currently loaded assessment project database.
  5. - Select the SDMs of interest.
  6. - Select the desired summary resolution.
  7. - Click "Generate Co-Occurrence Report". APCOAT will generate the following files for each combination of crop and species analyzed in "[Project Folder]\CoOccurrence\[Usage Footprint]\[Species name]\[SDM run number]\..." and be named as shown below:



6. Example Assessment

 This section shows the expected modeling outputs using an example co-occurrence assessment between two endangered species and applications of a pesticide to two crops. For each modeling tab the inputs, intermediate results, and selected report document figures are shown:
    6.1 Pesticide Use Footprints Example
    6.2 Species Distribution Modeling Example
    6.3 Co-Occurrence Modeling Example

6.1 Pesticide Use Footprint Example

Inputs

  • The desired output is the probabilistic usage of malathion on alfalfa and cotton.
  • The usage data statistic of interest is the 90th percentile of annual usage as measured by USGS ePest estimates.
  • Application rates are provided in imperial units, 7.5 lbs/acre for both alfalfa and cotton.
  • Processing time was 70 minutes using a laptop with an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz processor and 8 GB of RAM.

Intermediate Results
* indicates data will be included in co-occurrence report

  • Malathion_PCT_A84064T84064_ePest_state_90thpercentile.csv
    • Pesticide usage, maximum potential usage, and PCT for all crops, years, and regions
  • Malathion_PCTstats_A84064T84064_ePest_state_90thpercentile.csv
    • The annual usage statistic (90th percentile) calculated for each crop and region
  • * Malathion_PCTstatsRep_Alfalfa_84064_ePest_state_90thpercentile.csv
    • A range of statistics from minimum to maximum for a single crop and each region
  • Malathion_PCT_Alfalfa_84064_ePest_state_90thpercentile.tif
    • The national PCT raster generated for the specified usage statistic and crop at 30m resolution. The raster is in integer format and multiplied by a factor of 10,000 such that 100% PCT = 10,000 and 50% PCT = 5,000.
  • Malathion_ProbUseFp_Alfalfa_84064_ePest_state_90thpercentile.tif
    • The national probabilistic usage raster for the specified usage statistic and crop at 30m resolution. The raster is in integer format and multiplied by a factor of 10,000 such that 100% usage probability = 10,000 and 50% usage probability = 5,000. It is produced by multiplying the PCT raster by the probabilistic crop footprint included in the APCOAT installation.
  • Malathion_ProbUseFp_Alfalfa_84064_ePest_state_90thpercentile_64x.tif
    • The national probabilistic usage raster for the specified usage statistic and crop, showing the maximum values from the above raster at 1/64th the original spatial resolution (1,920m). The raster is in integer format and multiplied by a factor of 10,000 such that 100% usage probability = 10,000 and 50% usage probability = 5,000.
Co-Occurrence Report Result

 When a given probabilistic crop footprint is selected for use in a co-occurrence report, the map figure shown here will be automatically generated and included in the report document and co-occurrence folder ([Project Folder]\CoOccurrence\Malathion_ProbUseFp_Alfalfa_84064_ePest_state_90thpercentile.png). This example is for the 90th percentile of annual malathion usage by state as measured by USGS ePest estimates, multiplied by the probability that a given 30m pixel will be planted with alfalfa as measured by the USDA Cropland Data Layer.


Figure 1. Probabilistic use footprint for malathion applications on alfalfa.

6.2 Species Distribution Modeling Example

Inputs
  • The desired outputs are Species Distribution Models (SDM) for two species, Eurycea tonkawae and Lycaeides melissa samuelis.
  • The selected species distribution predictor variables include all of the recommended bioclimatic and elevation predictors, as well as the land use/land cover predictor included with APCOAT.
  • The correlation threshold used to exclude combinations of predictors that are strongly correlated with one another is left at the default value of 0.67.
  • The predictor contribution threshold used to exclude predictors during iterative model evaluation is left at the default value of 1 percent.
  • The species distribution modeling threshold used to filter out regions of low quality habitat is specified at 0.2. Regions of the final SDM with probabilistic habitat quality below 20% will not be included in the output raster.
  • Processing time was 80 minutes using a laptop with an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz processor and 8 GB of RAM.
Intermediate Results
* indicates data will be included in co-occurrence report
  • * Eurycea tonkawae_run001Parameters.txt
    • Each modeling parameter is listed on a separate line: all species predictor variable file names, discrete predictor variable file names, predictor correlation threshold, predictor contribution threshold
  • * Eurycea tonkawae_PredCor.png
    • Correlation between predictor variables
  • Eurycea tonkawae_sdm_iter[1...5]_real[1-5].rds
    • Intermediate modeling SDM results. Five model realizations are generated for each iteration of predictor variable combinations. In this example five iterations were run before iteration number 4 was chosen.
  • Eurycea tonkawae_Contribution_Iter[1-X]_Real[1-5].png
    • The contribution of each predictor variable for all 5 realizations of each model iteration.
  • Eurycea tonkawae_ResTable_Iter[1-5].rds
    • Diagnostic statistics for all 5 realizations of each model iteration.
  • Eurycea tonkawae_ResVarsTable_Iter[1-5].rds
    • Table showing the contribution of each predictor variable for all 5 realizations of each iteration.
  • * Eurycea tonkawae_ModelFitAUC_iter[3-5].png
    • Model fit as measured by total Area Under Curve for each model iteration and all previous iterations.
  • Eurycea tonkawae_sdm_final_iter4_real[1-5].rds
    • Five realizations of the final selected SDM iteration.
  • Eurycea tonkawae_sdm_final_raster.png
    • A map showing the average of the five final SDM realizations, and species locations. Be sure to consider any restrictions associated with redistributing species location data when publishing this figure.
  • Eurycea tonkawae_sdm_sd_final_raster.png
    • A map showing the standard deviation of the five final SDM realizations, and species locations. Be sure to consider any restrictions associated with redistributing species location data when publishing this figure.
  • Eurycea tonkawae_sdm_final_raster_raw.tif
    • The final SDM raster without the species distribution model threshold applied.
  • * Eurycea tonkawae_final_stats.csv
    • CSV version of ResTable_Iter[1-5] for the final selected model iteration.
  • * Eurycea tonkawae_sdm_final_auc.csv
    • Model fit as measured by Area Under Curve for each of the five realizations of the final model iteration.
  • * Eurycea tonkawae_sdm_final_contribution.png
    • Contribution to model of each predictor variable included in the final model iteration.
  • * Eurycea tonkawae_sdm_final_contvariablefit.png
    • The distributions of values sampled randomly and from species observations for all continuous variables included in the final model iteration.
  • * Eurycea tonkawae_sdm_final_discvariablefit.png
    • The distributions of values sampled randomly and from species observations for all discrete variables included in the final model iteration.
  • Eurycea tonkawae_sdm_final_range_200.shp (and associated .dbf, .prj, .shx files)
    • A shapefile showing the shape of the final model iteration with the species distribution model threshold applied.
  • Eurycea tonkawae_sdm_final_range_200_prj.shp (and associated .dbf, .prj, .shx files)
    • The shapefile described above, projected in the Albers Conical Equal Area coordinate system to match APCOAT pesticide usage footprints.
  • Eurycea tonkawae_sdm_final_range_raw.shp (and associated .dbf, .prj, .shx files)
    • A shapefile showing the extent of the final model iteration.
  • Eurycea tonkawae_sdm_final_raster_200.tif
    • A raster of the final model iteration with the species distribution model threshold applied. Values below the threshold are set to 0.
  • Eurycea tonkawae_sdm_final_raster_200_res.tif
    • The raster described above, resampled and reprojected to match the projection, resolution, and alignment of APCOAT pesticide usage footprints.
  • Eurycea tonkawae_sdm_final_raster_msk_200.tif
    • A raster showing the extent of the final model iteration with the species distribution model threshold applied.
  • Eurycea tonkawae_sdm_final_raster_nan_200.tif
    • A raster of the final model iteration with the species distribution model threshold applied. Values below the threshold are set to NoData.
  • Eurycea tonkawae_sdm_sd_final_raster.tif
    • A raster showing the standard deviation of the five realizations of the final model iteration.
  • Eurycea tonkawae__[County, CRD, HUC8, HUC12, State].shp (and associated .dbf, .prj, .shx files)
    • The zones of interest within the extent of the final model iteration. These are generated when a species is selected for inclusion in a co-occurrence report.
Co-Occurrence Report Result

 When a given probabilistic crop footprint is selected for use in a co-occurrence report, the map figure shown here will be automatically generated and included in the report document and co-occurrence folder ([Project Folder]\CoOccurrence\HabitatSuitability_200.png). This example is for Eurycea tonkawae, modeled with all of the recommended bioclimatic and elevation predictors, as well as the land use/land cover predictor included with APCOAT, and a model threshold of 20%.


Figure 8. Final mean SDM output for Eurycea tonkawae.

6.3 Co-Occurrence Modeling Example

Inputs

  • The desired outputs are reports on probabilistic spatial co-occurrence between applications of malathion to cotton and alfalfa, and areas where the probability of suitable habitat is greater than 20% for Eurycea tonkawae and Lycaeides melissa samuelis.
  • The spatial co-occurrence is to be summarized by HUC8 watershed for visualization purposes.
  • Processing time was 36 minutes using a laptop with an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz processor and 8 GB of RAM.

Intermediate Results
* indicates data will be included in co-occurrence report
  • \[Project Folder]\CoOccurrence\Malathion\Cotton_84064_ePest_state_90thpercentile\Eurycea tonkawae\run001_200
    • Directory for results of a single co-occurrence assessment. Separate directories are created for each combination of species and crop.
  • * Eurycea tonkawae_run001Parameters.txt
    • A copy of the SDM input parameters file. Each modeling parameters is listed on a separate line: all species predictor variable file names, discrete predictor variable file names, predictor correlation threshold, predictor contribution threshold.
  • ProbUseFp.tif
    • Probabilistic use footprint raster clipped to the extent of the species distribution model. The raster is in integer format and multiplied by a factor of 10,000 such that 100% usage probability = 10,000 and 50% usage probability = 5,000.
  • Coo_200.tif
    • Raster showing values of probabilistic co-occurrence between pesticide usage raster and SDM raster at 30m resolution
  • * Coo_200__HUC8_NHDPlusV21.csv
    • Zonal summary of average co-occurrence raster values
  • Coo_200__HUC8_NHDPlusV21.shp (and associated .dbf, .prj, .shx files)
    • Shapefile containing zonal averages of co-occurrence raster values
  • Eurycea tonkawae_sdm_final_range_200.shp (and associated .dbf, .prj, .shx files)
    • Shapefile of the SDM used for the co-occurrence assessment, contains average co-occurrence raster values per individual SDM polygons
  • Eurycea tonkawae_sdm_final_range_200.csv
    • Summary of average co-occurrence raster values per individual SDM polygons
  • * SdmOverview.png
    • Map of the extent of the SDM
  • Coo_200__HUC8_NHDPlusV21_wgs84.shp (and associated .dbf, .prj, .shx files)
    • Shapefile containing zonal averages of co-occurrence raster values, projected to the WGS 1984 datum
  • * Coo_200__HUC8_NHDPlusV21.png
    • Map of zonal co-occurrence averages
  • * Malathion_ProbUseFp_Cotton_84064_ePest_state_90thpercentile.png
    • Map of pesticide usage probability
  • * HabitatSuitability_200.png
    • Map of habitat suitability derived from SDM
  • APCOAT_report_Coo_200__HUC8_NHDPlusV21.html
    • Probabilistic co-occurrence report containing modeling inputs, outputs, and assessment
  • Coo_200_2x.tif
    • Co-occurrence raster averaged at 1/2 resolution (60m)
  • Coo_200_4x.tif
    • Co-occurrence raster averaged at 1/4 resolution (120m)
Co-Occurrence Report Results

 A report containing model inputs and outputs will be generated for each selected combination of species and crop and located in
\[Project Folder]\CoOccurrence\[Pesticide]\[Crop]_[Application Rate]_[Usage Data Source]_[Usage Data Summary Resolution]_[Usage Data Statistic]\[Species]\[SDM Version]\.
 The example table shows statistics regarding co-occurrence over the species range, and over the extent of the SDM. The example figure shows probabilistic co-occurrence between the extent of the SDM of Eurycea tonkawae and malathion applications on cotton, summarized at the HUC8 scale. An example case study composed of ten compiled co-occurrence reports is available at https://stone-env.com/APCOAT.

Table 6. Statistics of probabilistic co-occurrence between the range of Eurycea tonkawae and malathion applications on cotton.
Average co-occurrence for species range 8.18e-09%
Number of HUC8 polygons included in assessment 92
Maximum average HUC8 co-occurrence 2.18e-09%
Minimum average HUC8 co-occurrence 0.00e+00%



Figure 9. Map showing probabilistic co-occurrence between the range of Eurycea tonkawae and malathion applications on corn, summarized at the HUC8 scale.

7. Citation

Suggested citation for this manual:
Dunne, J., Richardson, L., Rathjens, H., Winchell, M. (2022). User Guide Automated Probabilistic Co-Occurrence Assessment Tool Version 1.0. Stone Environmental Inc., Montpelier, Vermont.

Suggested citation for APCOAT software:
Dunne, J., Richardson, L., Rathjens, H., Winchell, M. (2022). Automated Probabilistic Co-Occurrence Assessment Tool. Stone Environmental Inc., Montpelier, Vermont.


8. References

1. Budreski, K., Winchell, M., Padilla, L., Bang, J., & Brain, R. A. (2016). A probabilistic approach for estimating the spatial extent of pesticide agricultural use sites and potential co-occurrence with listed species for use in ecological risk assessments. Integrated environmental assessment and management, 12(2), 315-327. https://doi.org/10.1002/ieam.1677

2. Richardson, L., Bang, J., Budreski, K., Dunne, J., Winchell, M., Brain, R. A., & Feken, M. (2019). A probabilistic co-occurrence approach for estimating likelihood of spatial overlap between listed species distribution and pesticide use patterns. Integrated environmental assessment and management, 15(6), 936-947. https://doi.org/10.1002/ieam.4191

3. Wieben, C.M. (2019). Estimated Annual Agricultural Pesticide Use by Major Crop or Crop Group for States of the Conterminous United States, 1992-2017 (ver. 2.0, May 2020): U.S. Geological Survey data release. https://doi.org/10.5066/P9HHG3CT.

4. Phillips SJ, Anderson RP, Dudík M, Schapire RE, Blair ME. (2017). Opening the black box: an open-source release of Maxent. Ecography. 40(7):887-893. https://doi.org/10.1111/ecog.03049

5. USFWS. (2019). Endangered Species Act Section 7 consultation. https://www.fws.gov/endangered/laws-policies/section-7.html.

6. National Research Council. (2013). Assessing risks to endangered and threatened species from pesticides. https://doi:10.17226/18344.

7. US EPA. (2016). Biological evaluation chapters for diazinon ESA assessment. US EPA https://www.epa.gov/endangered-species/biological-evaluation-chapters-diazinon-esa-assessment.

8. US EPA. (2016). Biological evaluation chapters for malathion ESA assessment. US EPA https://www.epa.gov/endangered-species/biological-evaluation-chapters-malathion-esa-assessment

9. US EPA. (2016). Biological evaluation chapters for chlorpyrifos ESA assessment. US EPA https://www.epa.gov/endangered-species/biological-evaluation-chapters-chlorpyrifos-esa-assessment.

10. US EPA. (2020). Revised method for national level endangered species risk assessment process for biological evaluations of pesticides. US EPA https://www3.epa.gov/pesticides/nas/revised/revised-method-march2020.pdf

11.Richardson, L. L., Dunne, J., Feken, M., Brain, R., Ghebremichael, L., & Winchell, M. (2021). Probabilistic co-occurrence assessment for suites of listed species. Integrated environmental assessment and management. https://doi.org/10.1002/ieam.4542

12. Moskwik, M., Mainali, K., Pavelka, M., Nicolaysen, T., Juliusson, L., Chhetri, D., Shultz, M. (2019). USFWS refined range maps for threatened and endangered species--standard operating procedures. V. 3.0. https://ecos.fws.gov/docs/SR_SOP/SDM_SOP_Final_14Nov2019.pdf

13. United States Department of Agriculture, National Agricultural Statistics Service. (2019). Cropland data layer. https://nassgeodata.gmu.edu/CropScape/

14. Multi-Resolution Land Characteristics Consortium. (2021). NLCD 2016 Land Cover (CONUS) | Multi-Resolution Land Characteristics (MRLC) Consortium. https://www.mrlc.gov/data/nlcd-2016-land-cover-conus

15. USDA - National Agricultural Statistics Service - Census of Agriculture. (2021). https://www.nass.usda.gov/AgCensus/

16. United States Department of Agriculture, National Agricultural Statistics Service. (2020). Cropland Data Layer - Metadata. https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/meta.php

17. United States Census Bureau. (2021). ANSI and FIPS codes. https://www.census.gov/library/reference/code-lists/ansi.html

18. USDA National Agricultural Statistics Service. (2018). USDA - National Agricultural Statistics Service - Data and Statistics - County Data FAQs. https://www.nass.usda.gov/Data_and_Statistics/County_Data_Files/Frequently_Asked_Questions/ 19. Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and distributions, 17(1), 43-57. https://doi.org/10.1111/j.1472-4642.2010.00725.x

20. GBIF.org. (2022). GBIF Occurrence Download. https://doi.org/10.15468/dl.yu2nqj

21. Fick SE, Hijmans RJ. (2017). WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology. 37(12):4302-4315. https://doi.org/10.1002/joc.5086

22. Brown, J. F., Loveland, T. R., Merchant, J. W., Reed, B. C., & Ohlen, D. O. (1993). Using multisource data in global land-cover characterization: Concepts, requirements, and methods. Photogrammetric Engineering and Remote Sensing, 59(6), 977-987. https://pubs.er.usgs.gov/publication/70187631