Bayesian data analysis usually incurs long runtimes
and cumbersome custom code. A pipeline toolkit tailored to
Bayesian statisticians, the stantargets
R package leverages
targets
and cmdstanr
to ease these burdens.
stantargets
makes it super easy to set up scalable
Stan pipelines that automatically parallelize the computation
and skip expensive steps when the results are already up to date.
Minimal custom code is required, and there is no need to manually
configure branching, so usage is much easier than targets
alone.
stantargets
can access all of cmdstanr
's major algorithms
(MCMC, variational Bayes, and optimization) and it supports
both single-fit workflows and multi-rep simulation studies.
https://docs.ropensci.org/stantargets/, tar_stan_mcmc()
tar_stan_compile()
creates a target
to compile a Stan model on the local file system and return the
original Stan model file. Does not compile the model
if the compilation is already up to date.
tar_stan_compile( name, stan_file, quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_compile( name, stan_file, quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, name of the target. A target
name must be a valid name for a symbol in R, and it
must not start with a dot. Subsequent targets
can refer to this name symbolically to induce a dependency relationship:
e.g. |
stan_file |
(string) The path to a |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the
$compile()
method of the CmdStanModel
class.
For details, visit https://mc-stan.org/cmdstanr/reference/.
tar_stan_compile()
returns a target object to compile a Stan file.
The return value of this target is a character vector
containing the Stan model source file and compiled
executable file. A change in either file
will cause the target to rerun in the next run of the pipeline.
See the "Target objects" section for background.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list(tar_stan_compile(compiled_model, path)) }) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list(tar_stan_compile(compiled_model, path)) }) targets::tar_make() }) }
tar_stan_example_file()
.An example dataset compatible with the model file
from tar_stan_example_file()
.
tar_stan_example_data(n = 10L)
tar_stan_example_data(n = 10L)
n |
Integer of length 1, number of data points. |
A list with the following elements:
n
: integer, number of data points.
x
: numeric, covariate vector.
y
: numeric, response variable.
true_beta
: numeric of length 1, value of the regression
coefficient beta
used in simulation.
.join_data
: a list of simulated values to be appended
to as a .join_data
column in the output of
targets generated by functions such as
tar_stan_mcmc_rep_summary()
. Contains the
regression coefficient beta
(numeric of length 1)
and prior predictive data y
(numeric vector).
The tar_stan_example_data()
function draws a Stan
dataset from the prior predictive distribution of the
model from tar_stan_example_file()
. First, the
regression coefficient beta
is drawn from its standard
normal prior, and the covariate x
is computed.
Then, conditional on the beta
draws and the covariate,
the response vector y
is drawn from its
Normal(x * beta
, 1) likelihood.
List, dataset compatible with the model file from
tar_stan_example_file()
.
Other examples:
tar_stan_example_file()
tar_stan_example_data()
tar_stan_example_data()
Overwrites the file at path
with a built-in example
Stan model file.
tar_stan_example_file(path = tempfile(pattern = "", fileext = ".stan"))
tar_stan_example_file(path = tempfile(pattern = "", fileext = ".stan"))
path |
Character of length 1, file path to write the model file. |
NULL
(invisibly).
Other examples:
tar_stan_example_data()
path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) writeLines(readLines(path))
path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) writeLines(readLines(path))
tar_stan_gq()
creates targets to run
the generated quantities of a Stan model and save
draws and summaries separately.
tar_stan_gq( name, stan_files, data = list(), fitted_params, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, output_dir = NULL, sig_figs = NULL, parallel_chains = getOption("mc.cores", 1), threads_per_chain = NULL, variables = NULL, variables_fit = NULL, summaries = list(), summary_args = list(), return_draws = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_gq( name, stan_files, data = list(), fitted_params, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, output_dir = NULL, sig_figs = NULL, parallel_chains = getOption("mc.cores", 1), threads_per_chain = NULL, variables = NULL, variables_fit = NULL, summaries = list(), summary_args = list(), return_draws = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of Stan model files. If you
supply multiple files, each model will run on the one shared dataset
generated by the code in |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
fitted_params |
Symbol, name of a |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
variables |
(character vector) The variables to include. |
variables_fit |
Character vector of variables to include in the
big |
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
return_draws |
Logical, whether to create a target for posterior draws.
Saves |
return_summary |
Logical, whether to create a target for
|
draws |
Deprecated on 2022-07-22. Use |
summary |
Deprecated on 2022-07-22. Use |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
,
$generate_quantities()
, and $summary()
methods
of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_gq()
returns list of target objects.
See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_gq(name = x, stan_files = "y.stan", ...)
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to the model
file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: run the R expression in the data
argument to produce
a Stan dataset for the model. Returns a Stan data list.
x_gq_y
: run generated quantities on the model and the dataset.
Returns a cmdstanr
CmdStanGQ
object with all the results.
x_draws_y
: extract draws from x_gq_y
.
Omitted if draws = FALSE
.
Returns a tidy data frame of draws.
x_summary_y
: extract compact summaries from x_gq_y
.
Returns a tidy data frame of summaries.
Omitted if summary = FALSE
.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other generated quantities:
tar_stan_gq_rep_draws()
,
tar_stan_gq_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = c(x = path), data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ), tar_stan_gq( custom_gq, stan_files = path, # Can be a different model. fitted_params = your_model_mcmc_x, data = your_model_data, # Can be a different dataset. stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = c(x = path), data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ), tar_stan_gq( custom_gq, stan_files = path, # Can be a different model. fitted_params = your_model_mcmc_x, data = your_model_data, # Can be a different dataset. stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_gq_rep_draws()
creates targets
to run generated quantities multiple times and
save only the draws from each run.
tar_stan_gq_rep_draws( name, stan_files, data = list(), fitted_params, batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, output_dir = NULL, sig_figs = NULL, parallel_chains = getOption("mc.cores", 1), threads_per_chain = NULL, variables = NULL, data_copy = character(0), transform = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_gq_rep_draws( name, stan_files, data = list(), fitted_params, batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, output_dir = NULL, sig_figs = NULL, parallel_chains = getOption("mc.cores", 1), threads_per_chain = NULL, variables = NULL, data_copy = character(0), transform = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
fitted_params |
(multiple options) The parameter draws to use. One of the following:
NOTE: if you plan on making many calls to |
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
data_copy |
Character vector of names of scalars in |
transform |
Symbol or |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
and $sample()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_gq_rep_draws()
returns a list of target objects.
See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_gq_rep_draws(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run generated quantities
once per dataset.
Each dynamic branch returns a tidy data frames of draws
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of draws.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other generated quantities:
tar_stan_gq()
,
tar_stan_gq_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = c(x = path), data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile(), refresh = 0 ), tar_stan_gq_rep_draws( generated_quantities, stan_files = path, data = tar_stan_example_data(), fitted_params = your_model_mcmc_x, batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = c(x = path), data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile(), refresh = 0 ), tar_stan_gq_rep_draws( generated_quantities, stan_files = path, data = tar_stan_example_data(), fitted_params = your_model_mcmc_x, batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_gq_rep_summaries()
creates targets
to run generated quantities multiple times and
save only the summaries from each run.
tar_stan_gq_rep_summary( name, stan_files, data = list(), fitted_params, batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, output_dir = NULL, sig_figs = NULL, parallel_chains = getOption("mc.cores", 1), threads_per_chain = NULL, data_copy = character(0), variables = NULL, summaries = list(), summary_args = list(), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_gq_rep_summary( name, stan_files, data = list(), fitted_params, batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, output_dir = NULL, sig_figs = NULL, parallel_chains = getOption("mc.cores", 1), threads_per_chain = NULL, data_copy = character(0), variables = NULL, summaries = list(), summary_args = list(), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
fitted_params |
(multiple options) The parameter draws to use. One of the following:
NOTE: if you plan on making many calls to |
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
data_copy |
Character vector of names of scalars in |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
and $generate_quantities()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_gq_rep_summaries()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_gq_rep_summary(name = x, stan_files = "y.stan")
returns a list of target objects:
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to the
model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run generated quantities
once per dataset.
Each dynamic branch returns a tidy data frames of summaries
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of summaries.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other generated quantities:
tar_stan_gq()
,
tar_stan_gq_rep_draws()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = c(x = path), data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ), tar_stan_gq_rep_summary( generated_quantities, stan_files = path, data = tar_stan_example_data(), fitted_params = your_model_mcmc_x, batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = c(x = path), data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ), tar_stan_gq_rep_summary( generated_quantities, stan_files = path, data = tar_stan_example_data(), fitted_params = your_model_mcmc_x, batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_mcmc()
creates targets to run one MCMC
per model and separately save summaries draws, and diagnostics.
tar_stan_mcmc( name, stan_files, data = list(), compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), variables = NULL, variables_fit = NULL, inc_warmup = FALSE, inc_warmup_fit = FALSE, summaries = list(), summary_args = list(), return_draws = TRUE, return_diagnostics = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mcmc( name, stan_files, data = list(), compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), variables = NULL, variables_fit = NULL, inc_warmup = FALSE, inc_warmup_fit = FALSE, summaries = list(), summary_args = list(), return_draws = TRUE, return_diagnostics = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of Stan model files. If you
supply multiple files, each model will run on the one shared dataset
generated by the code in |
data |
Code to generate the |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
output_basename |
(string) A string to use as a prefix for the names of
the output CSV files of CmdStan. If |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
chains |
(positive integer) The number of Markov chains to run. The default is 4. |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
chain_ids |
(integer vector) A vector of chain IDs. Must contain as many
unique positive integers as the number of chains. If not set, the default
chain IDs are used (integers starting from |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
opencl_ids |
(integer vector of length 2) The platform and
device IDs of the OpenCL device to use for fitting. The model must
be compiled with |
iter_warmup |
(positive integer) The number of warmup iterations to run
per chain. Note: in the CmdStan User's Guide this is referred to as
|
iter_sampling |
(positive integer) The number of post-warmup iterations
to run per chain. Note: in the CmdStan User's Guide this is referred to as
|
save_warmup |
(logical) Should warmup iterations be saved? The default
is |
thin |
(positive integer) The period between saved samples. This should typically be left at its default (no thinning) unless memory is a problem. |
max_treedepth |
(positive integer) The maximum allowed tree depth for the NUTS engine. See the Tree Depth section of the CmdStan User's Guide for more details. |
adapt_engaged |
(logical) Do warmup adaptation? The default is |
adapt_delta |
(real in |
step_size |
(positive real) The initial step size for the discrete approximation to continuous Hamiltonian dynamics. This is further tuned during warmup. |
metric |
(string) One of |
metric_file |
(character vector) The paths to JSON or
Rdump files (one per chain) compatible with CmdStan that contain
precomputed inverse metrics. The |
inv_metric |
(vector, matrix) A vector (if |
init_buffer |
(nonnegative integer) Width of initial fast timestep adaptation interval during warmup. |
term_buffer |
(nonnegative integer) Width of final fast timestep adaptation interval during warmup. |
window |
(nonnegative integer) Initial width of slow timestep/metric adaptation interval. |
fixed_param |
(logical) When |
show_messages |
(logical) When |
diagnostics |
(character vector) The diagnostics to automatically check
and warn about after sampling. Setting this to an empty string These diagnostics are also available after fitting. The
Diagnostics like R-hat and effective sample size are not currently
available via the |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
variables_fit |
Character vector of variables to include in the
big |
inc_warmup |
(logical) Should warmup draws be included? Defaults to
|
inc_warmup_fit |
Logical of length 1, whether to include
warmup draws in the big MCMC object (the target with |
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
return_draws |
Logical, whether to create a target for posterior draws.
Saves |
return_diagnostics |
Logical, whether to create a target for
|
return_summary |
Logical, whether to create a target for
|
draws |
Deprecated on 2022-07-22. Use |
summary |
Deprecated on 2022-07-22. Use |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the non-data-frame
targets such as the Stan data and any CmdStanFit objects.
Please choose an all=purpose
format such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
,
$sample()
, and $summary()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mcmc()
returns a list of target objects.
See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mcmc(name = x, stan_files = "y.stan", ...)
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to the
model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: run the R expression in the data
argument to produce
a Stan dataset for the model. Returns a Stan data list.
x_mcmc_y
: run MCMC on the model and the dataset.
Returns a cmdstanr
CmdStanMCMC
object with all the results.
x_draws_y
: extract draws from x_mcmc_y
.
Omitted if draws = FALSE
.
Returns a tidy data frame of draws.
x_summary_y
: extract compact summaries from x_mcmc_y
.
Returns a tidy data frame of summaries.
Omitted if summary = FALSE
.
x_diagnostics
: extract HMC diagnostics from x_mcmc_y
.
Returns a tidy data frame of HMC diagnostics.
Omitted if diagnostics = FALSE
.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other MCMC:
tar_stan_mcmc_rep_diagnostics()
,
tar_stan_mcmc_rep_draws()
,
tar_stan_mcmc_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = path, data = tar_stan_example_data(), variables = "beta", summaries = list(~quantile(.x, probs = c(0.25, 0.75))), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc( your_model, stan_files = path, data = tar_stan_example_data(), variables = "beta", summaries = list(~quantile(.x, probs = c(0.25, 0.75))), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_mcmc_rep_diagnostics()
creates targets
to run MCMC multiple times per model and save only the sampler
diagnostics from each run.
tar_stan_mcmc_rep_diagnostics( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), inc_warmup = FALSE, data_copy = character(0), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mcmc_rep_diagnostics( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), inc_warmup = FALSE, data_copy = character(0), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
Code to generate a single replication of a simulated dataset.
The workflow simulates multiple datasets, and each
model runs on each dataset. To join data on to the model
summaries, include a |
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
output_basename |
(string) A string to use as a prefix for the names of
the output CSV files of CmdStan. If |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
chains |
(positive integer) The number of Markov chains to run. The default is 4. |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
chain_ids |
(integer vector) A vector of chain IDs. Must contain as many
unique positive integers as the number of chains. If not set, the default
chain IDs are used (integers starting from |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
opencl_ids |
(integer vector of length 2) The platform and
device IDs of the OpenCL device to use for fitting. The model must
be compiled with |
iter_warmup |
(positive integer) The number of warmup iterations to run
per chain. Note: in the CmdStan User's Guide this is referred to as
|
iter_sampling |
(positive integer) The number of post-warmup iterations
to run per chain. Note: in the CmdStan User's Guide this is referred to as
|
save_warmup |
(logical) Should warmup iterations be saved? The default
is |
thin |
(positive integer) The period between saved samples. This should typically be left at its default (no thinning) unless memory is a problem. |
max_treedepth |
(positive integer) The maximum allowed tree depth for the NUTS engine. See the Tree Depth section of the CmdStan User's Guide for more details. |
adapt_engaged |
(logical) Do warmup adaptation? The default is |
adapt_delta |
(real in |
step_size |
(positive real) The initial step size for the discrete approximation to continuous Hamiltonian dynamics. This is further tuned during warmup. |
metric |
(string) One of |
metric_file |
(character vector) The paths to JSON or
Rdump files (one per chain) compatible with CmdStan that contain
precomputed inverse metrics. The |
inv_metric |
(vector, matrix) A vector (if |
init_buffer |
(nonnegative integer) Width of initial fast timestep adaptation interval during warmup. |
term_buffer |
(nonnegative integer) Width of final fast timestep adaptation interval during warmup. |
window |
(nonnegative integer) Initial width of slow timestep/metric adaptation interval. |
fixed_param |
(logical) When |
show_messages |
(logical) When |
diagnostics |
(character vector) The diagnostics to automatically check
and warn about after sampling. Setting this to an empty string These diagnostics are also available after fitting. The
Diagnostics like R-hat and effective sample size are not currently
available via the |
inc_warmup |
(logical) Should warmup draws be included? Defaults to
|
data_copy |
Character vector of names of scalars in |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Saved diagnostics could get quite large in storage, so please use thinning if necessary.
Most of the arguments are passed to the $compile()
and $generate_quantities()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mcmc_rep_diagnostics()
returns
a list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mcmc_rep_diagnostics(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to the
model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run MCMC once per dataset.
Each dynamic branch returns a tidy data frames of HMC diagnostics
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of HMC diagnostics.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other MCMC:
tar_stan_mcmc()
,
tar_stan_mcmc_rep_draws()
,
tar_stan_mcmc_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc_rep_diagnostics( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc_rep_diagnostics( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_mcmc_rep_draws()
creates targets
to run MCMC multiple times per model and
save only the draws from each run.
tar_stan_mcmc_rep_draws( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), inc_warmup = FALSE, variables = NULL, data_copy = character(0), transform = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mcmc_rep_draws( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), inc_warmup = FALSE, variables = NULL, data_copy = character(0), transform = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
Code to generate a single replication of a simulated dataset.
The workflow simulates multiple datasets, and each
model runs on each dataset. To join data on to the model
summaries, include a |
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
output_basename |
(string) A string to use as a prefix for the names of
the output CSV files of CmdStan. If |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
chains |
(positive integer) The number of Markov chains to run. The default is 4. |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
chain_ids |
(integer vector) A vector of chain IDs. Must contain as many
unique positive integers as the number of chains. If not set, the default
chain IDs are used (integers starting from |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
opencl_ids |
(integer vector of length 2) The platform and
device IDs of the OpenCL device to use for fitting. The model must
be compiled with |
iter_warmup |
(positive integer) The number of warmup iterations to run
per chain. Note: in the CmdStan User's Guide this is referred to as
|
iter_sampling |
(positive integer) The number of post-warmup iterations
to run per chain. Note: in the CmdStan User's Guide this is referred to as
|
save_warmup |
(logical) Should warmup iterations be saved? The default
is |
thin |
(positive integer) The period between saved samples. This should typically be left at its default (no thinning) unless memory is a problem. |
max_treedepth |
(positive integer) The maximum allowed tree depth for the NUTS engine. See the Tree Depth section of the CmdStan User's Guide for more details. |
adapt_engaged |
(logical) Do warmup adaptation? The default is |
adapt_delta |
(real in |
step_size |
(positive real) The initial step size for the discrete approximation to continuous Hamiltonian dynamics. This is further tuned during warmup. |
metric |
(string) One of |
metric_file |
(character vector) The paths to JSON or
Rdump files (one per chain) compatible with CmdStan that contain
precomputed inverse metrics. The |
inv_metric |
(vector, matrix) A vector (if |
init_buffer |
(nonnegative integer) Width of initial fast timestep adaptation interval during warmup. |
term_buffer |
(nonnegative integer) Width of final fast timestep adaptation interval during warmup. |
window |
(nonnegative integer) Initial width of slow timestep/metric adaptation interval. |
fixed_param |
(logical) When |
show_messages |
(logical) When |
diagnostics |
(character vector) The diagnostics to automatically check
and warn about after sampling. Setting this to an empty string These diagnostics are also available after fitting. The
Diagnostics like R-hat and effective sample size are not currently
available via the |
inc_warmup |
(logical) Should warmup draws be included? Defaults to
|
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
data_copy |
Character vector of names of scalars in |
transform |
Symbol or |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Draws could take up a lot of storage. If storage becomes
excessive, please consider thinning the draws or using
tar_stan_mcmc_rep_summary()
instead.
Most of the arguments are passed to the $compile()
and $sample()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mcmc_rep_draws()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mcmc_rep_draws(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to the
model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run MCMC once per dataset.
Each dynamic branch returns a tidy data frames of draws
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of draws.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other MCMC:
tar_stan_mcmc()
,
tar_stan_mcmc_rep_diagnostics()
,
tar_stan_mcmc_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc_rep_draws( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc_rep_draws( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
Targets to run MCMC multiple times and save only the summary output from each run.
tar_stan_mcmc_rep_summary( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), data_copy = character(0), variables = NULL, summaries = NULL, summary_args = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mcmc_rep_summary( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, output_basename = NULL, sig_figs = NULL, chains = 4, parallel_chains = getOption("mc.cores", 1), chain_ids = seq_len(chains), threads_per_chain = NULL, opencl_ids = NULL, iter_warmup = NULL, iter_sampling = NULL, save_warmup = FALSE, thin = NULL, max_treedepth = NULL, adapt_engaged = TRUE, adapt_delta = NULL, step_size = NULL, metric = NULL, metric_file = NULL, inv_metric = NULL, init_buffer = NULL, term_buffer = NULL, window = NULL, fixed_param = FALSE, show_messages = TRUE, diagnostics = c("divergences", "treedepth", "ebfmi"), data_copy = character(0), variables = NULL, summaries = NULL, summary_args = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
Code to generate a single replication of a simulated dataset.
The workflow simulates multiple datasets, and each
model runs on each dataset. To join data on to the model
summaries, include a |
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
output_basename |
(string) A string to use as a prefix for the names of
the output CSV files of CmdStan. If |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
chains |
(positive integer) The number of Markov chains to run. The default is 4. |
parallel_chains |
(positive integer) The maximum number of MCMC chains
to run in parallel. If |
chain_ids |
(integer vector) A vector of chain IDs. Must contain as many
unique positive integers as the number of chains. If not set, the default
chain IDs are used (integers starting from |
threads_per_chain |
(positive integer) If the model was
compiled with threading support, the number of
threads to use in parallelized sections within an MCMC chain (e.g., when
using the Stan functions |
opencl_ids |
(integer vector of length 2) The platform and
device IDs of the OpenCL device to use for fitting. The model must
be compiled with |
iter_warmup |
(positive integer) The number of warmup iterations to run
per chain. Note: in the CmdStan User's Guide this is referred to as
|
iter_sampling |
(positive integer) The number of post-warmup iterations
to run per chain. Note: in the CmdStan User's Guide this is referred to as
|
save_warmup |
(logical) Should warmup iterations be saved? The default
is |
thin |
(positive integer) The period between saved samples. This should typically be left at its default (no thinning) unless memory is a problem. |
max_treedepth |
(positive integer) The maximum allowed tree depth for the NUTS engine. See the Tree Depth section of the CmdStan User's Guide for more details. |
adapt_engaged |
(logical) Do warmup adaptation? The default is |
adapt_delta |
(real in |
step_size |
(positive real) The initial step size for the discrete approximation to continuous Hamiltonian dynamics. This is further tuned during warmup. |
metric |
(string) One of |
metric_file |
(character vector) The paths to JSON or
Rdump files (one per chain) compatible with CmdStan that contain
precomputed inverse metrics. The |
inv_metric |
(vector, matrix) A vector (if |
init_buffer |
(nonnegative integer) Width of initial fast timestep adaptation interval during warmup. |
term_buffer |
(nonnegative integer) Width of final fast timestep adaptation interval during warmup. |
window |
(nonnegative integer) Initial width of slow timestep/metric adaptation interval. |
fixed_param |
(logical) When |
show_messages |
(logical) When |
diagnostics |
(character vector) The diagnostics to automatically check
and warn about after sampling. Setting this to an empty string These diagnostics are also available after fitting. The
Diagnostics like R-hat and effective sample size are not currently
available via the |
data_copy |
Character vector of names of scalars in |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
and $sample()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mcmc_rep_summary()
returns a list of target objects.
See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mcmc_rep_summary(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with the paths to the
model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run MCMC once per dataset.
Each dynamic branch returns a tidy data frames of summaries.
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of summaries.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other MCMC:
tar_stan_mcmc()
,
tar_stan_mcmc_rep_diagnostics()
,
tar_stan_mcmc_rep_draws()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc_rep_summary( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mcmc_rep_summary( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_mle()
creates targets to optimize a Stan model once
per model and separately save draws-like output and summary-like output.
tar_stan_mle( name, stan_files, data = list(), compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, init_alpha = NULL, iter = NULL, tol_obj = NULL, tol_rel_obj = NULL, tol_grad = NULL, tol_rel_grad = NULL, tol_param = NULL, history_size = NULL, sig_figs = NULL, variables = NULL, variables_fit = NULL, summaries = list(), summary_args = list(), return_draws = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mle( name, stan_files, data = list(), compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, init_alpha = NULL, iter = NULL, tol_obj = NULL, tol_rel_obj = NULL, tol_grad = NULL, tol_rel_grad = NULL, tol_param = NULL, history_size = NULL, sig_figs = NULL, variables = NULL, variables_fit = NULL, summaries = list(), summary_args = list(), return_draws = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of Stan model files. If you
supply multiple files, each model will run on the one shared dataset
generated by the code in |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
algorithm |
(string) The optimization algorithm. One of |
init_alpha |
(positive real) The initial step size parameter. |
iter |
(positive integer) The maximum number of iterations. |
tol_obj |
(positive real) Convergence tolerance on changes in objective function value. |
tol_rel_obj |
(positive real) Convergence tolerance on relative changes in objective function value. |
tol_grad |
(positive real) Convergence tolerance on the norm of the gradient. |
tol_rel_grad |
(positive real) Convergence tolerance on the relative norm of the gradient. |
tol_param |
(positive real) Convergence tolerance on changes in parameter value. |
history_size |
(positive integer) The size of the history used when approximating the Hessian. Only available for L-BFGS. |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
variables |
(character vector) The variables to include. |
variables_fit |
Character vector of variables to include in the
big |
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
return_draws |
Logical, whether to create a target for posterior draws.
Saves |
return_summary |
Logical, whether to create a target for
|
draws |
Deprecated on 2022-07-22. Use |
summary |
Deprecated on 2022-07-22. Use |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
,
$optimize()
, and $summary()
methods of the CmdStanModel
class.
If you previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mle()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mle(name = x, stan_files = "y.stan", ...)
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: run the R expression in the data
argument to produce
a Stan dataset for the model. Returns a Stan data list.
x_mle_y
: run generated quantities on the model and the dataset.
Returns a cmdstanr
CmdStanGQ
object with all the results.
x_draws_y
: extract maximum likelihood estimates from x_mle_y
in draws format.
Omitted if draws = FALSE
.
Returns a wide data frame of MLEs.
x_summary_y
: extract MLEs from from x_mle_y
in summary format.
Returns a long data frame of MLEs.
Omitted if summary = FALSE
.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other optimization:
tar_stan_mle_rep_draws()
,
tar_stan_mle_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mle( your_model, stan_files = path, data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mle( your_model, stan_files = path, data = tar_stan_example_data(), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_mle_rep_draws()
creates targets
to run maximum likelihood multiple times per model and
save the MLEs in a wide-form draws-like data frame.
tar_stan_mle_rep_draws( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, init_alpha = NULL, iter = NULL, sig_figs = NULL, tol_obj = NULL, tol_rel_obj = NULL, tol_grad = NULL, tol_rel_grad = NULL, tol_param = NULL, history_size = NULL, data_copy = character(0), variables = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mle_rep_draws( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, init_alpha = NULL, iter = NULL, sig_figs = NULL, tol_obj = NULL, tol_rel_obj = NULL, tol_grad = NULL, tol_rel_grad = NULL, tol_param = NULL, history_size = NULL, data_copy = character(0), variables = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
algorithm |
(string) The optimization algorithm. One of |
init_alpha |
(positive real) The initial step size parameter. |
iter |
(positive integer) The maximum number of iterations. |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
tol_obj |
(positive real) Convergence tolerance on changes in objective function value. |
tol_rel_obj |
(positive real) Convergence tolerance on relative changes in objective function value. |
tol_grad |
(positive real) Convergence tolerance on the norm of the gradient. |
tol_rel_grad |
(positive real) Convergence tolerance on the relative norm of the gradient. |
tol_param |
(positive real) Convergence tolerance on changes in parameter value. |
history_size |
(positive integer) The size of the history used when approximating the Hessian. Only available for L-BFGS. |
data_copy |
Character vector of names of scalars in |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
and $optimize()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mle_rep_draws()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_mcmc_rep_draws(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run maximum likelihood
once per dataset.
Each dynamic branch returns a tidy data frames of maximum likelihood
estimates corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of maximum likelihood estimates.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other optimization:
tar_stan_mle()
,
tar_stan_mle_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mle_rep_draws( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mle_rep_draws( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_mle_rep_summaries()
creates targets
to run maximum likelihood multiple times per model and
save the MLEs in a long-form summary-like data frame.
tar_stan_mle_rep_summary( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, init_alpha = NULL, iter = NULL, tol_obj = NULL, tol_rel_obj = NULL, tol_grad = NULL, tol_rel_grad = NULL, tol_param = NULL, history_size = NULL, sig_figs = NULL, data_copy = character(0), variables = NULL, summaries = list(), summary_args = list(), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_mle_rep_summary( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, init_alpha = NULL, iter = NULL, tol_obj = NULL, tol_rel_obj = NULL, tol_grad = NULL, tol_rel_grad = NULL, tol_param = NULL, history_size = NULL, sig_figs = NULL, data_copy = character(0), variables = NULL, summaries = list(), summary_args = list(), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
algorithm |
(string) The optimization algorithm. One of |
init_alpha |
(positive real) The initial step size parameter. |
iter |
(positive integer) The maximum number of iterations. |
tol_obj |
(positive real) Convergence tolerance on changes in objective function value. |
tol_rel_obj |
(positive real) Convergence tolerance on relative changes in objective function value. |
tol_grad |
(positive real) Convergence tolerance on the norm of the gradient. |
tol_rel_grad |
(positive real) Convergence tolerance on the relative norm of the gradient. |
tol_param |
(positive real) Convergence tolerance on changes in parameter value. |
history_size |
(positive integer) The size of the history used when approximating the Hessian. Only available for L-BFGS. |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
data_copy |
Character vector of names of scalars in |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
and $optimize()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_mle_rep_summaries()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
The specific target objects returned by
tar_stan_mle_rep_summary(name = x, , stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run maximum likelihood
once per dataset.
Each dynamic branch returns a tidy data frames of maximum likelihood
estimates corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of maximum likelihood estimates.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other optimization:
tar_stan_mle()
,
tar_stan_mle_rep_draws()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mle_rep_summary( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_mle_rep_summary( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
CmdStanFit
objectCreate a target to run the $summary()
method of a CmdStanFit
object.
tar_stan_summary( name, fit, data = NULL, variables = NULL, summaries = NULL, summary_args = NULL, format = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_summary( name, fit, data = NULL, variables = NULL, summaries = NULL, summary_args = NULL, format = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
fit |
Symbol, name of a |
data |
Code to generate the |
variables |
(character vector) The variables to include. |
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
tar_stan_mcmc()
etc. with summary = TRUE
already gives you a
target with output from the $summary()
method.
Use tar_stan_summary()
to create additional specialized summaries.
tar_stan_summary()
returns target object to
summarize a CmdStanFit
object. The return value of the target
is a tidy data frame of summaries returned by the $summary()
method of the CmdStanFit
object.
See the "Target objects" section for background.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
# First, write your Stan model file, e.g. model.stan. # Then in _targets.R, write a pipeline like this: if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. # Running inside a temporary directory to avoid # modifying the user's file space. The file "model.stan" # created below lives in a temporary directory. # This satisfies CRAN policies. tar_stan_example_file("model.stan") targets::tar_script({ library(stantargets) list( # Run a model and produce default summaries. tar_stan_mcmc( your_model, stan_files = "model.stan", data = tar_stan_example_data() ), # Produce a more specialized summary tar_stan_summary( your_summary, fit = your_model_mcmc_model, data = your_model_data_model, variables = "beta", summaries = list(~quantile(.x, probs = c(0.25, 0.75))) ) )}, ask = FALSE) targets::tar_make() }) }
# First, write your Stan model file, e.g. model.stan. # Then in _targets.R, write a pipeline like this: if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. # Running inside a temporary directory to avoid # modifying the user's file space. The file "model.stan" # created below lives in a temporary directory. # This satisfies CRAN policies. tar_stan_example_file("model.stan") targets::tar_script({ library(stantargets) list( # Run a model and produce default summaries. tar_stan_mcmc( your_model, stan_files = "model.stan", data = tar_stan_example_data() ), # Produce a more specialized summary tar_stan_summary( your_summary, fit = your_model_mcmc_model, data = your_model_data_model, variables = "beta", summaries = list(~quantile(.x, probs = c(0.25, 0.75))) ) )}, ask = FALSE) targets::tar_make() }) }
Targets to run a Stan model once with variational Bayes and save multiple outputs.
tar_stan_vb( name, stan_files, data = list(), compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, iter = NULL, grad_samples = NULL, elbo_samples = NULL, eta = NULL, adapt_engaged = NULL, adapt_iter = NULL, tol_rel_obj = NULL, eval_elbo = NULL, output_samples = NULL, sig_figs = NULL, variables = NULL, variables_fit = NULL, summaries = list(), summary_args = list(), return_draws = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_vb( name, stan_files, data = list(), compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, iter = NULL, grad_samples = NULL, elbo_samples = NULL, eta = NULL, adapt_engaged = NULL, adapt_iter = NULL, tol_rel_obj = NULL, eval_elbo = NULL, output_samples = NULL, sig_figs = NULL, variables = NULL, variables_fit = NULL, summaries = list(), summary_args = list(), return_draws = TRUE, return_summary = TRUE, draws = NULL, summary = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of Stan model files. If you
supply multiple files, each model will run on the one shared dataset
generated by the code in |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
algorithm |
(string) The algorithm. Either |
iter |
(positive integer) The maximum number of iterations. |
grad_samples |
(positive integer) The number of samples for Monte Carlo estimate of gradients. |
elbo_samples |
(positive integer) The number of samples for Monte Carlo estimate of ELBO (objective function). |
eta |
(positive real) The step size weighting parameter for adaptive step size sequence. |
adapt_engaged |
(logical) Do warmup adaptation? The default is |
adapt_iter |
(positive integer) The maximum number of adaptation iterations. |
tol_rel_obj |
(positive real) Convergence tolerance on the relative norm of the objective. |
eval_elbo |
(positive integer) Evaluate ELBO every Nth iteration. |
output_samples |
(positive integer) Use |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
variables |
(character vector) The variables to include. |
variables_fit |
Character vector of variables to include in the
big |
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
return_draws |
Logical, whether to create a target for posterior draws.
Saves |
return_summary |
Logical, whether to create a target for
|
draws |
Deprecated on 2022-07-22. Use |
summary |
Deprecated on 2022-07-22. Use |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
,
$variational()
, and $summary()
methods of the CmdStanModel
class.
If you previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_vb()
returns a list of target objects.
See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_vb(name = x, stan_files = "y.stan", ...)
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: run the R expression in the data
argument to produce
a Stan dataset for the model. Returns a Stan data list.
x_vb_y
: run variational Bayes on the model and the dataset.
Returns a cmdstanr
CmdStanVB
object with all the results.
x_draws_y
: extract draws from x_vb_y
.
Omitted if draws = FALSE
.
Returns a tidy data frame of draws.
x_summary_y
: extract compact summaries from x_vb_y
.
Returns a tidy data frame of summaries.
Omitted if summary = FALSE
.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other variational Bayes:
tar_stan_vb_rep_draws()
,
tar_stan_vb_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_vb( your_model, stan_files = path, data = tar_stan_example_data(), variables = "beta", summaries = list(~quantile(.x, probs = c(0.25, 0.75))), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_vb( your_model, stan_files = path, data = tar_stan_example_data(), variables = "beta", summaries = list(~quantile(.x, probs = c(0.25, 0.75))), stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_vb_rep_draws()
creates targets to run
variational Bayes multiple times per model and
save only the draws from each run.
tar_stan_vb_rep_draws( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, iter = NULL, grad_samples = NULL, elbo_samples = NULL, eta = NULL, adapt_engaged = NULL, adapt_iter = NULL, tol_rel_obj = NULL, eval_elbo = NULL, output_samples = NULL, sig_figs = NULL, data_copy = character(0), variables = NULL, transform = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_vb_rep_draws( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = FALSE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, iter = NULL, grad_samples = NULL, elbo_samples = NULL, eta = NULL, adapt_engaged = NULL, adapt_iter = NULL, tol_rel_obj = NULL, eval_elbo = NULL, output_samples = NULL, sig_figs = NULL, data_copy = character(0), variables = NULL, transform = NULL, tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = "transient", garbage_collection = TRUE, deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
algorithm |
(string) The algorithm. Either |
iter |
(positive integer) The maximum number of iterations. |
grad_samples |
(positive integer) The number of samples for Monte Carlo estimate of gradients. |
elbo_samples |
(positive integer) The number of samples for Monte Carlo estimate of ELBO (objective function). |
eta |
(positive real) The step size weighting parameter for adaptive step size sequence. |
adapt_engaged |
(logical) Do warmup adaptation? |
adapt_iter |
(positive integer) The maximum number of adaptation iterations. |
tol_rel_obj |
(positive real) Convergence tolerance on the relative norm of the objective. |
eval_elbo |
(positive integer) Evaluate ELBO every Nth iteration. |
output_samples |
(positive integer) Use |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
data_copy |
Character vector of names of scalars in |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
transform |
Symbol or |
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Draws could take up a lot of storage. If storage becomes
excessive, please consider thinning the draws or using
tar_stan_vb_rep_summaries()
instead.
Most of the arguments are passed to the $compile()
and $variational()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_vb_rep_summaries()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_vb_rep_draws(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run variational Bayes
once per dataset.
Each dynamic branch returns a tidy data frames of draws
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of draws.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other variational Bayes:
tar_stan_vb()
,
tar_stan_vb_rep_summary()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_vb_rep_draws( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_vb_rep_draws( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
tar_stan_vb_rep_summaries()
creates targets
to run variational Bayes multiple times and
save only the summary output from each run.
tar_stan_vb_rep_summary( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, iter = NULL, grad_samples = NULL, elbo_samples = NULL, eta = NULL, adapt_engaged = NULL, adapt_iter = NULL, tol_rel_obj = NULL, eval_elbo = NULL, output_samples = NULL, sig_figs = NULL, data_copy = character(0), variables = NULL, summaries = list(), summary_args = list(), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
tar_stan_vb_rep_summary( name, stan_files, data = list(), batches = 1L, reps = 1L, combine = TRUE, compile = c("original", "copy"), quiet = TRUE, stdout = NULL, stderr = NULL, dir = NULL, pedantic = FALSE, include_paths = NULL, cpp_options = list(), stanc_options = list(), force_recompile = FALSE, seed = NULL, refresh = NULL, init = NULL, save_latent_dynamics = FALSE, output_dir = NULL, algorithm = NULL, iter = NULL, grad_samples = NULL, elbo_samples = NULL, eta = NULL, adapt_engaged = NULL, adapt_iter = NULL, tol_rel_obj = NULL, eval_elbo = NULL, output_samples = NULL, sig_figs = NULL, data_copy = character(0), variables = NULL, summaries = list(), summary_args = list(), tidy_eval = targets::tar_option_get("tidy_eval"), packages = targets::tar_option_get("packages"), library = targets::tar_option_get("library"), format = "qs", format_df = "fst_tbl", repository = targets::tar_option_get("repository"), error = targets::tar_option_get("error"), memory = targets::tar_option_get("memory"), garbage_collection = targets::tar_option_get("garbage_collection"), deployment = targets::tar_option_get("deployment"), priority = targets::tar_option_get("priority"), resources = targets::tar_option_get("resources"), storage = targets::tar_option_get("storage"), retrieval = targets::tar_option_get("retrieval"), cue = targets::tar_option_get("cue"), description = targets::tar_option_get("description") )
name |
Symbol, base name for the collection of targets. Serves as a prefix for target names. |
stan_files |
Character vector of paths to known existing Stan model files created before running the pipeline. |
data |
(multiple options) The data to use for the variables specified in the data block of the Stan program. One of the following:
|
batches |
Number of batches. Each batch is a sequence of branch targets containing multiple reps. Each rep generates a dataset and runs the model on it. |
reps |
Number of replications per batch. |
combine |
Logical, whether to create a target to combine all the model results into a single data frame downstream. Convenient, but duplicates data. |
compile |
(logical) Do compilation? The default is |
quiet |
(logical) Should the verbose output from CmdStan during
compilation be suppressed? The default is |
stdout |
Character of length 1, file path to write the stdout stream
of the model when it runs. Set to |
stderr |
Character of length 1, file path to write the stderr stream
of the model when it runs. Set to |
dir |
(string) The path to the directory in which to store the CmdStan
executable (or |
pedantic |
(logical) Should pedantic mode be turned on? The default is
|
include_paths |
(character vector) Paths to directories where Stan
should look for files specified in |
cpp_options |
(list) Any makefile options to be used when compiling the
model ( |
stanc_options |
(list) Any Stan-to-C++ transpiler options to be used
when compiling the model. See the Examples section below as well as the
|
force_recompile |
(logical) Should the model be recompiled even if was
not modified since last compiled. The default is |
seed |
(positive integer(s)) A seed for the (P)RNG to pass to CmdStan.
In the case of multi-chain sampling the single |
refresh |
(non-negative integer) The number of iterations between
printed screen updates. If |
init |
(multiple options) The initialization method to use for the variables declared in the parameters block of the Stan program. One of the following:
|
save_latent_dynamics |
(logical) Should auxiliary diagnostic information
about the latent dynamics be written to temporary diagnostic CSV files?
This argument replaces CmdStan's |
output_dir |
(string) A path to a directory where CmdStan should write
its output CSV files. For interactive use this can typically be left at
|
algorithm |
(string) The algorithm. Either |
iter |
(positive integer) The maximum number of iterations. |
grad_samples |
(positive integer) The number of samples for Monte Carlo estimate of gradients. |
elbo_samples |
(positive integer) The number of samples for Monte Carlo estimate of ELBO (objective function). |
eta |
(positive real) The step size weighting parameter for adaptive step size sequence. |
adapt_engaged |
(logical) Do warmup adaptation? |
adapt_iter |
(positive integer) The maximum number of adaptation iterations. |
tol_rel_obj |
(positive real) Convergence tolerance on the relative norm of the objective. |
eval_elbo |
(positive integer) Evaluate ELBO every Nth iteration. |
output_samples |
(positive integer) Use |
sig_figs |
(positive integer) The number of significant figures used
when storing the output values. By default, CmdStan represent the output
values with 6 significant figures. The upper limit for |
data_copy |
Character vector of names of scalars in |
variables |
(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.
|
summaries |
Optional list of summary functions passed to |
summary_args |
Optional list of summary function arguments passed to
|
tidy_eval |
Logical, whether to enable tidy evaluation
when interpreting |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Character of length 1, storage format of the data frame
of posterior summaries. We recommend efficient data frame formats
such as |
format_df |
Character of length 1, storage format of the data frame
targets such as posterior draws. We recommend efficient data frame formats
such as |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Most of the arguments are passed to the $compile()
and $variational()
methods of the CmdStanModel
class. If you
previously compiled the model in an upstream tar_stan_compile()
target, then the model should not recompile.
tar_stan_vb_rep_summaries()
returns a
list of target objects. See the "Target objects" section for
background.
The target names use the name
argument as a prefix, and the individual
elements of stan_files
appear in the suffixes where applicable.
As an example, the specific target objects returned by
tar_stan_vb_rep_summary(name = x, stan_files = "y.stan")
are as follows.
x_file_y
: reproducibly track the Stan model file. Returns
a character vector with paths to
the model file and compiled executable.
x_lines_y
: read the Stan model file for safe transport to
parallel workers. Omitted if compile = "original"
.
Returns a character vector of lines in the model file.
x_data
: use dynamic branching to generate multiple datasets
by repeatedly running the R expression in the data
argument.
Each dynamic branch returns a batch of Stan data lists that x_y
supplies to the model.
x_y
: dynamic branching target to run variational Bayes
once per dataset.
Each dynamic branch returns a tidy data frames of summaries
corresponding to a batch of Stan data from x_data
.
x
: combine all branches of x_y
into a single non-dynamic target.
Suppressed if combine
is FALSE
.
Returns a long tidy data frame of summaries.
Rep-specific random number generator seeds for the data and models
are automatically set based on the seed
argument, batch, rep,
parent target name, and tar_option_get("seed")
. This ensures
the rep-specific seeds do not change when you change the batching
configuration (e.g. 40 batches of 10 reps each vs 20 batches of 20
reps each). Each data seed is in the .seed
list element of the output,
and each Stan seed is in the .seed column of each Stan model output.
Most stantargets
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
Other variational Bayes:
tar_stan_vb()
,
tar_stan_vb_rep_draws()
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_vb_rep_summary( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }
if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") { targets::tar_dir({ # tar_dir() runs code from a temporary directory. targets::tar_script({ library(stantargets) # Do not use temporary storage for stan files in real projects # or else your targets will always rerun. path <- tempfile(pattern = "", fileext = ".stan") tar_stan_example_file(path = path) list( tar_stan_vb_rep_summary( your_model, stan_files = path, data = tar_stan_example_data(), batches = 2, reps = 2, stdout = R.utils::nullfile(), stderr = R.utils::nullfile() ) ) }, ask = FALSE) targets::tar_make() }) }