This vignette demonstrates how a user can edit, run, and stop a Terra / AnVIL workflow from within their R session. The configuration of the workflow can be retrieved and edited. Then this new configuration can be sent back to the Terra / AnVIL workspace for future use. With the new configuration defined by the user will then be able to run the workflow as well as stop any jobs from running.
AnVIL 1.14.2
Install the AnVIL package with
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos = "https://cran.r-project.org")
BiocManager::install("AnVIL")
Once installed, load the package with
library(AnVIL)
The first step will be to define the namespace (billing project) and name of the workspace to be used with the functions. In our case we will be using the Bioconductor AnVIL namespace and a DESeq2 workflow as the intended workspace.
avworkspace("bioconductor-rpci-anvil/Bioconductor-Workflow-DESeq2")
Each workspace can have 0 or more workflows. The workflows have a
name
and namespace
, just as workspaces. Discover the workflows
available in a workspace
avworkflows()
From the table returned by avworkflows()
, record the namespace and
name of the workflow of interest using avworkflow()
.
avworkflow("bioconductor-rpci-anvil/AnVILBulkRNASeq")
Each workflow defines inputs, outputs and certain code
execution. These workflow ‘configurations’ that can be retrieved with
avworkflow_configuration_get
.
config <- avworkflow_configuration_get()
config
This function is using the workspace namespace, workspace name,
workflow namespace, and workflow name we recorded above with
avworkspace()
and avworkflow()
.
There is a lot of information contained in the configuration but the only variables of interest to the user would be the inputs and outputs. In our case the inputs and outputs are pre-defined so we don’t have to do anything to them. But for some workflows these inputs / outputs may be blank and therefore would need to be defined by the user. We will change one of our inputs values to show how this would be done.
There are two functions to help users easily see the content of the
inputs and outputs, they are avworkflow_configuration_inputs
and
avworkflow_configuration_outputs
. These functions display the
information in a tibble
structure which users are most likely
familiar with.
inputs <- avworkflow_configuration_inputs(config)
inputs
outputs <- avworkflow_configuration_outputs(config)
outputs
Let’s change the salmon.transcriptome_index_name
field; this is an
arbitrary string identifier in our workflow.
inputs <-
inputs |>
mutate(
attribute = ifelse(
name == "salmon.transcriptome_index_name",
'"new_index_name"',
attribute
)
)
inputs
Since the inputs have been modified we need to put this information into
the configuration of the workflow. We can do this with
avworkflow_configuration_update()
. By default this function will take the
inputs and outputs of the original configuration, just in case there were no
changes to one of them (like in our example our outputs weren’t changed).
new_config <- avworkflow_configuration_update(config, inputs)
new_config
Use avworkflow_configuration_set()
to permanently update the
workflow to new parameter values.
avworkflow_configuration_set(new_config)
Actually, the previous command validates new_config
only; to update
the configuration in AnVIL (i.e., replacing the values in the
workspace workflow graphical user interface), add the argument dry = FALSE
.
## avworkflow_configuration_set(new_config, dry = FALSE)
To finally run the new workflow we need to know the name of the data set to be used in the workflow. This can be discovered by looking at the table of interest and using the name of the data set.
entityName <- avtable("participant_set") |>
pull(participant_set_id) |>
head(1)
avworkflow_run(new_config, entityName)
Again, actually running the new configuration requires the argument
dry = FALSE
.
## avworkflow_run(new_config, entityName, dry = FALSE)
config
is used to set the rootEntityType
and workflow method name
and namespace; other components of config
are ignored (the other
components will be read by Terra / AnVIL from values updated with
avworkflow_configuration_set()
).
We can see that the workflow is running by using the avworkflow_jobs
function. The elements of the table are ordered chronologically, with
the most recent submission (most likely the job we just started!)
listed first.
avworkflow_jobs()
Use avworkflow_stop()
to stop a currently running workflow. This
will change the status of the job, reported by avworkflow_jobs()
,
from ‘Submitted’ to ‘Aborted’.
avworkflow_stop() # dry = FALSE to stop
avworkflow_jobs()
Workflows can generate a large number of intermediate files (including
diagnostic logs), as well as final outputs for more interactive
analysis. Use the submissionId
from avworkflow_jobs()
to discover
files produced by a submission; the default behavior lists files
produced by the most recent job.
submissionId <- "fb8e35b7-df5d-49e6-affa-9893aaeebf37"
avworkflow_files(submissionId)
Workflow files are stored in the workspace bucket. The files can be
localized to the persistent disk of the current runtime using
avworkflow_localize()
; the default is again to localize files from
the most recently submitted job; use type=
to influence which files
(‘control’ e.g., log files, ‘output’, or ‘all’) are localized.
avworkflow_localize(
submissionId,
type = "output"
## dry = FALSE to localize
)
sessionInfo()