outliers_by_pool_fragments {ISAnalytics}R Documentation

Identify and flag outliers based on pool fragments.

Description

[Experimental] Identify and flag outliers

Usage

outliers_by_pool_fragments(
  metadata,
  key = "BARCODE_MUX",
  outlier_p_value_threshold = 0.05,
  normality_test = FALSE,
  normality_p_value_threshold = 0.05,
  transform_log2 = TRUE,
  per_pool_test = TRUE,
  pool_col = "PoolID",
  min_samples_per_pool = 5,
  flag_logic = "AND",
  keep_calc_cols = TRUE,
  save_widget_path = NULL
)

Arguments

metadata

The metadata data frame

key

A character vector of numeric column names

outlier_p_value_threshold

The p value threshold for a read to be considered an outlier

normality_test

Perform normality test? Normality is assessed for each column in the key using Shapiro-Wilk test and if the values do not follow a normal distribution, other calculations are skipped

normality_p_value_threshold

Normality threshold

transform_log2

Perform a log2 trasformation on values prior the actual calculations?

per_pool_test

Perform the test for each pool?

pool_col

A character vector of the names of the columns that uniquely identify a pool

min_samples_per_pool

The minimum number of samples that a pool needs to contain in order to be processed - relevant only if per_pool_test = TRUE

flag_logic

A character vector of logic operators to obtain a global flag formula - only relevant if the key is longer than one. All operators must be chosen between: AND, OR, XOR, NAND, NOR, XNOR

keep_calc_cols

Keep the calculation columns in the output data frame?

save_widget_path

Either null or a string containing the path on disk where the report should be saved

Details

This particular test calculates for each column in the key

Optionally the test can be performed for each pool and a normality test can be run prior the actual calculations. Samples are flagged if this condition is respected:

If the key contains more than one column an additional flag logic can be specified for combining the results. Example: let's suppose the key contains the names of two columns, X and Y key = c("X", "Y") if we specify the the argument flag_logic = "AND" then the reads will be flagged based on this global condition: (tdist_X < outlier_p_value_threshold & zscore_X < 0) AND (tdist_Y < outlier_p_value_threshold & zscore_Y < 0)

The user can specify one or more logical operators that will be applied in sequence.

Value

A data frame of metadata with the column to_remove

See Also

Other Outlier tests: available_outlier_tests()

Examples

op <- options(ISAnalytics.widgets = FALSE)

path_AF <- system.file("extdata", "ex_association_file.tsv",
    package = "ISAnalytics"
)
association_file <- import_association_file(path_AF, root = NULL,
    dates_format = "dmy"
)
filtered_af <- outliers_by_pool_fragments(association_file, key = "VCN")
options(op)

[Package ISAnalytics version 1.2.1 Index]