HSC_population_size_estimate {ISAnalytics} | R Documentation |
Hematopoietic stem cells population size estimate with capture-recapture
models.
HSC_population_size_estimate( x, metadata, stable_timepoints = NULL, aggregation_key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"), blood_lineages = blood_lineages_default(), timepoint_column = "TimePoint", seqCount_column = "seqCount_sum", fragmentEstimate_column = "fragmentEstimate_sum", seqCount_threshold = 3, fragmentEstimate_threshold = 3, nIS_threshold = 5, cell_type = "MYELOID", tissue_type = "PB" )
x |
An aggregated integration matrix. See details. |
metadata |
An aggregated association file. See details. |
stable_timepoints |
A numeric vector or NULL if there are no stable time points. |
aggregation_key |
A character vector indicating the key used for aggregating x and metadata. Note that x and metadata should always be aggregated with the same key. |
blood_lineages |
A data frame containing information on the blood
lineages. Users can supply their own, provided the columns |
timepoint_column |
What is the name of the time point column to use? Note that this column must be present in the key. |
seqCount_column |
What is the name of the column in x containing the values of sequence count quantification? |
fragmentEstimate_column |
What is the name of the column in x
containing the values of fragment estimate quantification? If fragment
estimate is not present in the matrix, param should be set to |
seqCount_threshold |
A single numeric value. After re-aggregating |
fragmentEstimate_threshold |
A single numeric value. Threshold value for fragment estimate, see details. |
nIS_threshold |
A single numeric value. If a group (row) in the metadata data frame has a count of distinct integration sites strictly greater than this number it will be kept, otherwise discarded. |
cell_type |
The cell types to include in the models. Note that the matching is case-insensitive. |
tissue_type |
The tissue types to include in the models. Note that the matching is case-insensitive. |
A data frame with the results of the estimates
Both x
and metadata
should be supplied to the function in aggregated
format (ideally through the use of aggregate_metadata
and aggregate_values_by_key
).
Note that the aggregation_key
, aka the vector of column names used for
aggregation, must contain at least the columns SubjectID, CellMarker,
Tissue and a time point column (the user can specify the name of the
column in the argument timepoint_column
).
If stable_timepoints
is a vector with length > 1, the function will look
for the first available stable time point and slice the data from that
time point onward. If NULL is supplied instead, it means there are no
stable time points available. Note that 0 time points are ALWAYS discarded.
Also, to be included in the analysis, a group must have at least 2
distinct non-zero time points.
If fragment estimate is present in the input matrix, the filtering logic
changes slightly: rows in the original matrix are kept if the sequence
count value is greater or equal than the seqCount_threshold
AND
the fragment estimate value is greater or equal to the
fragmentEstimate_threshold
IF PRESENT (non-zero value).
This means that for rows that miss fragment estimate, the filtering logic
will be applied only on sequence count. If the user wishes not to use
the combined filtering with fragment estimate, simply set
fragmentEstimate_threshold = 0
.
data("integration_matrices", package = "ISAnalytics") data("association_file", package = "ISAnalytics") aggreg <- aggregate_values_by_key( x = integration_matrices, association_file = association_file, value_cols = c("seqCount", "fragmentEstimate") ) aggreg_meta <- aggregate_metadata(association_file = association_file) estimate <- HSC_population_size_estimate( x = aggreg, metadata = aggreg_meta, fragmentEstimate_column = NULL, stable_timepoints = c(90, 180, 360), cell_type = "Other" )