.corr_distribution {proBatch}R Documentation

Calculates correlation of data matrix and calculates correlation distribution for all pairs of the replicated samples

Description

Calculates correlation of data matrix and calculates correlation distribution for all pairs of the replicated samples

Usage

.corr_distribution(data_matrix, repeated_samples, sample_annotation,
  biospecimen_id_col, sample_id_col, batch_col)

Arguments

data_matrix

features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. Usually the log transformed version of the original data

repeated_samples

if NULL, only repeated sample correlation is plotted

sample_annotation

data matrix with 1) sample_id_col (this can be repeated as row names) 2) biological and 3) technical covariates (batches etc)

biospecimen_id_col

column in sample_annotation that defines a unique bio ID, which is usually a combination of conditions or groups. Tip: if such ID is absent, but can be defined from several columns, create new biospecimen_id column

sample_id_col

name of the column in sample_annotation file, where the filenames (colnames of the data matrix) are found

batch_col

column in sample_annotation that should be used for batch comparison

Value

dataframe with the following columns, that are suggested to use for plotting in plot_sample_corr_distribution as plot_param:

  1. replicate

  2. batch_the_same

  3. batch_replicate

  4. batches

other columns are:

  1. sample_id_1 & sample_id_2, both generated from sample_id_col variable

  2. correlation - correlation of two corresponding samples

  3. batch_1 & batch_2 or analogous, created the same as sample_id_1


[Package proBatch version 1.0.0 Index]