BiodbMain-class {biodb} | R Documentation |
The main class of the biodb
package.
In order to use the biodb package, you need first to create an instance of
this class.
The constructor takes a single argument, autoloadExtraPkgs
, to enable
(TRUE
or default) or disable (FALSE
) autoloading of extra
biodb packages.
Once the instance is created, some other important classes
(BiodbFactory
, BiodbPersistentCache
, BiodbConfig
, ...)
are instantiated (just once) and their instances are later accessible through
get*() methods.
addColsToDataframe(x, id.col, db, fields, limit = 3, prefix = "")
:
Using
x: A data frame containing at least one column with Biodb entry IDs identified by the parameter 'id.col'.
id.col: The name of the column containing IDs inside the input data frame.
db: The biodb database name for the entry IDs, or a connector ID, as a sinle character value.
fields: A character vector containing entry fields to add.
limit: The maximum number of field values to write into new columns. Used for fields that can contain more than one value. Set it to 0 to get all values.
prefix: Insert a prefix at the start of all field names.
Returned value: A data frame containing 'x' and new columns appended for the fields requested.
addObservers(observers)
:
Adds new observers. Observers will be called each time an event occurs. This is the way used in biodb to get feedback about what is going inside biodb code.
observers: Either a BiodbObserver
instance or a list of
BiodbObserver
instances.
Returned value: None.
collapseRows(x, sep = "|", cols = 1L)
:
Collapses rows of a data frame, by looking for duplicated values in the reference columns (parameter 'cols'). The values contained in the reference columns are supposed to be ordered inside the data frame, in the sens that all duplicated values are supposed to directly follow the original values. For all rows containing duplicated values, we look at values in all other columns and concatenate values in each column containing different values.
x: A data frame.
cols: The indices or the names of the columns used as reference.
sep: The separator to use when concatenating values in collapsed rows.
Returned value: A data frame, with rows collapsed.
computeFields(entries)
:
Computes missing fields in entries, for those fields that are comptable.
entries: A list of BiodbEntry
instances.
Returned value: None.
convertEntryIdFieldToDbClass(entry.id.field)
:
Gets the database class name corresponding to an entry ID field.
entry.id.field: The name of an ID field. It must end with ".id".
copyDb(conn.from, conn.to, limit = 0)
:
Copies all entries of a database into another database. The connector of the destination database must be editable.
conn.from: The connector of the source datababase to copy.
conn.to: The connector of the destination database.
limit: The number of entries of the source database to copy. If set to
NULL
, copy the whole database.
Returned value: None.
entriesFieldToVctOrLst(
entries,
field,
flatten = FALSE,
compute = TRUE,
limit = 0,
withNa = TRUE
)
:
Extracts the value of a field from a list of entries. Returns either a vector or a list depending on the type of the field.
entries: A list of BiodbEntry
instances.
field: The name of a field.
flatten: If set to TRUE
and the field has a cardinality greater
than one, then values be converted into a vector of class character in which
each entry values are collapsed.
compute: If set to TRUE
, computable fields will be output.
limit: The maximum number of values to retrieve for each entry. Set to 0 to get all values.
withNa: If set to TRUE, keep NA values. Otherwise filter out NAs values in vectors.
Returned value: A vector if the field is atomic or flatten is set to
TRUE
, otherwise a list.
entriesToDataframe(
entries,
only.atomic = TRUE,
null.to.na = TRUE,
compute = TRUE,
fields = NULL,
limit = 0,
drop = FALSE,
sort.cols = FALSE,
flatten = TRUE,
only.card.one = FALSE,
own.id = TRUE,
prefix = ""
)
:
Converts a list of entries or a list of list of entries
(BiodbEntry
objects) into a data frame.
entries: A list of BiodbEntry
instances or a list of list of
BiodbEntry
instances.
only.atomic: If set to TRUE
, output only atomic fields, i.e.: the
fields whose value type is one of integer, numeric, logical or character.
null.to.na: If set to TRUE
, each NULL
entry in the list is
converted into a row of NA values.
compute: If set to TRUE
, computable fields will be output.
fields: A character vector of field names to output. The data frame output will be restricted to this list of fields.
limit: The maximum number of field values to write into new columns. Used for fields that can contain more than one value. Set it to 0 to get all values.
drop: If set to TRUE
and the resulting data frame has only one
column, a vector will be output instead of data frame.
sort.cols: Sort columns in alphabetical order.
flatten: If set to TRUE
, then each field with a cardinality
greater than one, will be converted into a vector of class character whose
values are collapsed.
only.card.one: Output only fields whose cardinality is one.
own.id: If set to TRUE includes the database id field named '<database_name>.id' whose values are the same as the 'accession' field.
prefix: Insert a prefix at the start of all field names.
Returned value: A data frame containing the entries. Columns are named according to field names.
entriesToJson(entries, compute = TRUE)
:
Converts a list of BiodbEntry
objects into JSON. Returns a
vector of characters.
entries: A list of BiodbEntry
instances.
compute: If set to TRUE
, computable fields will added to JSON too.
Returned value: A list of JSON strings, the same length as entries list.
entryIdsToDataframe(
ids,
db,
fields = NULL,
limit = 3,
prefix = "",
own.id = FALSE
)
:
Construct a data frame using entry IDs and field values of the corresponding entries.
ids: A character vector of entry IDs or a list of character vectors of entry IDs.
db: The biodb database name for the entry IDs, or a connector ID, as a sinle character value.
fields: A character vector containing entry fields to add.
limit: The maximum number of field values to write into new columns. Used for fields that can contain more than one value. Set it to 0 to get all values.
own.id: If set to TRUE includes the database id field named '<database_name>.id' whose values are the same as the 'accession' field.
prefix: Insert a prefix at the start of all field names.
Returned value: A data frame containing in columns the requested field values, with one entry per line, in the same order than in 'ids' vector.
fieldIsAtomic(field)
:
DEPRECATED method to test if a field is an atomic field. The new
method is BiodbEntryField::isVector()
.
getConfig()
:
Returns the single instance of the BiodbConfig
class.
Returned value: The instance of the BiodbConfig
class attached to
this BiodbMain instance.
getDbsInfo()
:
Returns the single instance of the BiodbDbsInfo
class.
Returned value: The instance of the BiodbDbsInfo
class attached to
this BiodbMain instance.
getEntryFields()
:
Returns the single instance of the BiodbEntryFields
class.
Returned value: The instance of the BiodbEntryFields
class
attached to this BiodbMain instance.
getFactory()
:
Returns the single instance of the BiodbFactory
class.
Returned value: The instance of the BiodbFactory
class attached to
this BiodbMain instance.
getFieldClass(field)
:
DEPRECATED method to get the class of a field. The new method is
BiodbMain::getEntryFields()$get(field)$getClass()
.
getObservers()
:
Gets the list of registered observers.
Returned value: The list or registered observers.
getPersistentCache()
:
Returns the single instance of the BiodbPersistentCache class.
Returned value: The instance of the BiodbPersistentCache class attached to this BiodbMain instance.
getRequestScheduler()
:
Returns the single instance of the BiodbRequestScheduler
class.
Returned value: The instance of the BiodbRequestScheduler
class
attached to this BiodbMain instance.
loadDefinitions(file, package = "biodb")
:
Loads databases and entry fields definitions from YAML file.
file: The path to a YAML file containing definitions for BiodbMain
(databases, fields or configuration keys).
package: The package to which belong the new definitions.
Returned value: None.
saveEntriesAsJson(entries, files, compute = TRUE)
:
Saves a list of entries in JSON format. Each entry will be saved in a separate file.
entries: A list of BiodbEntry
instances.
files: A character vector of file paths, the same length as entries list.
compute: If set to TRUE
, computable fields will be saved too.
Returned value: None.
terminate()
:
Closes BiodbMain
instance. Call this method when you are done
with your BiodbMain
instance.
Returned value: None.
BiodbFactory
, BiodbPersistentCache
,
BiodbConfig
, BiodbObserver
,
BiodbEntryFields
,
BiodbDbsInfo
.
# Create an instance: mybiodb <- biodb::newInst() # Get the factory instance fact <- mybiodb$getFactory() # Terminate instance. mybiodb$terminate() mybiodb <- NULL