BIRDIE: Species abundance
v02-birdie-spp-abundance.Rmd
Introduction
There are two main analytical modules in BIRDIE: distribution and abundance.
In this document we will see how to use the abundance (ABU) module. To do this we will have a look at the control script used to run the module and then we will break down its different sections to understand what all the functions involved in this analysis do.
The control script
The control script can be found at
/analysis/scripts/pipeline_script_abu.R
. This script has
three parts:
- Configuration
- Create logs
- Run modules
We will go through each of these parts below.
The script is really just a for loop over species in which the main
pipeline function for the abundance module
ppl_run_pipe_abu1()
is executed. This means that if we are
only interested in one species we can go ahead and use the
ppl_run_pipe_abu1()
function, directly.
To run the abundance module of the pipeline, it is enough to just run
(source()
) the script. However, the default values will run
the full pipeline for all species, and for several years, which might
take quite long. We will now go through the different steps of the
pipeline to better understand how to configure it to do what we
want.
Note that while this is the default script we use for running the pipeline, there is nothing special about it, and we could use something different that suits our needs. The functionality of the pipeline comes from its functions.
Configuration
# Configure pipeline
config <- configPipeline(
year = 2021,
dur = 29,
mod_file = "cwac_ssm_two_season_mean_rev_jump.R",
package = "jagsUI",
data_dir = "analysis/data", # this might have to be adapted?
out_dir = "analysis/output", # this might have to be adapted?
server = FALSE
)
# Read in catchment data. This should go as an argument
catchment <- sf::read_sf(file.path(config$data_dir, "quinary_catchmt_22.shp"))
# Re-project and simplify
catchment <- catchment %>%
dplyr::select(QUATERNARY, Province, UNIT_ID) %>%
sf::st_simplify(preserveTopology = TRUE, dTolerance = 1000) %>%
sf::st_transform(crs = sf::st_crs(4326))
In the configuration section above we will create a
config
object using the configPipeline()
function that will be used throughout the pipeline by several
functions.
The configPipeline()
function, allows us to let the
pipeline know what models we want to run, for what years, what packages
we want to use, etc. For detailed information see
?configPipeline
.
Rather than specifying the covariates that we will use in our models,
like we did for the distribution module, here we only pass the model
file name on to the function. configPipeline()
will look
for the model file in analysis/models
.
The only supported package at the moment is jagsUI
and
all package related functions written for BIRDIE are stored in the
R/utils-jags.R
file.
(Note: there might be a better way of doing this, such as creating environment variables or something like this, but this works for now)
The abundance module allow us to incorporate environmental covariates into the models, although we are not using any at the moment. When we use covariates, we do so at the quinary catchment level. So rather than taking the value of the covariates at some specific location we take the average value observed across the quinary catchment at some time. We will see more of this on the BIRDIE ABU: Data preparation vignette. The important point here is that we need to load the quinary catchment spatial object here to pass it on to the main pipeline function, so that it can be used during data preparation (edit the path to the file).
Create logs
# Create log?
createLog(config, log_file = NULL, date_time = NULL, species = NA, model = NA,
year = NA, data = NA, fit = NA, diagnose = NA, summary = NA,
package = NA, notes = "Log file created")
The pipeline has a system to log all its activity. This is useful to
keep track of what species and years the pipeline has run for and
whether there have been any problems (e.g., species with too few data
points or model fit errors). All the activity is stored in .csv files
that are saved to analysis/output/reports
.
For more information check the logging functions of the BIRDIE
package that are stored on the utils.R
file, notably see
?createLog()
for general logs and
?logFitStatus()
for logs of model runs.
We see that for if we run the pipeline for several species, it makes sense to create a log file only for the first species, and then use this file to store information for all other species as well, each one in a row of the .csv file. Past this first setup phase, the pipeline will look for the most recent log file and add information to it, regardless of whether information is already present or not.
Run modules
for(i in 1:length(config$species)){
sp_code <- config$species[i]
message(paste0("Working on species ", sp_code, " (", i, " of ", length(config$species), ")"))
# Run abudance pipeline module 1
status_abu1 <- ppl_run_pipe_abu1(sp_code, config,
steps = c("fit", "diagnose", "summary"),
prep_data_steps = c("subset", "missing", "gee", "model"),
summary_scale = "model",
catchment = catchment,
force_gee_upload = FALSE,
force_gee = FALSE,
monitor = TRUE)
message(paste("ABU1 status =", status_abu1))
}
The final part of the script runs the main function of the abundance
module of the pipeline, looping through all species. The
ppl_run_pipe_abu1()
can run all the steps of the module:
data preparation, model fitting, model diagnostics and model summary, or
it can run just some of them. It will use the config
object
to know what package it should use for model fitting and the paths to
store model-ready data and model outputs. Note that some of the steps,
may take quite long. For example, to prepare data the pipeline connects
to Google Earth Engine and annotates data with environmental covariates,
which can take a while. It also requires an internet connection. Keep
this in mind and note that we can skip some of the steps using the right
arguments. For more information see ?ppl_run_pipe_abu1
.