Main Functions
KBoost(X, TFs, prior_weights, g, v, ite)
Function to infer gene regulatory network from gene expression data.
Input:
X
: an NxG matrix where N is the number of observations and G the number of genes.TFs
: a vector of numerical indexes of the K genes in X that are TFs (default 1:G).prior_weights
: a GxK matrix with the prior probabilities of each interaction (default is 0.5 for all values).g
: a positive scalar that corresponds to the width parameter in the RBF Kernel (default 40).v
: a positive scalar lower than 1 that is the shrinkage parameter for each boosting iteration (default 0.1).ite
: an integer that represents the maximum number of iterations (default 3).
Output:
List with the following fields:
GRN
: A matrix with the gene regulatory network.GRN_UP
: A matrix with the gene regulatory network before the heuristic step of multiplying each column by its variance.prior
: The prior for the best model at each iteration.model
: the transcription factors with the highest posteriors at each iteration per gene.prior_weights
: a GxK matrix with the prior probabilities of each interaction.g
: a positive scalar that corresponds to the width parameter in the RBF Kernel.v
: a positive scalar lower than 1 that is the shrinkage parameter for each boosting iteration.ite
: an integer that represents the maximum number of iterations.
KBoost_human_symbol(X, gen_names, g, v, ite, pos_weight, neg_weight)
Function to infer gene regulatory network from human cell lines or patient samples. This function automatically builds a prior from Gerstein et al. (2012) and uses the list of TFs from Lambert et al. (2018). The gene expression data needs to be a numerical matrix.
Input:
X
: an NxG numeric matrix with the expression values of G genes and N obersvations. The gene names can be specified as column names.gen_names
: a set of SYMBOL gene names that correspond to the names of the columns of X. Not required if column names of X are already gene names.g
: a positive scalar with the width parameter for the RBF kernel. (default = 40).v
: a number between 0 and 1 with the shrinkage parameter. (default = 0.1).ite
: an integer with the number of iterations (default = 3).pos_weight
: the prior weight for edges that were previously found in the Gerstein et al. network (default = 0.6).neg_weight
: the prior weight for edges that were not found in the Gerstein et al. network (default = 0.5).
Output:
List with the following fields:
GRN
: A matrix with the gene regulatory network.GRN_UP
: A matrix with the gene regulatory network before the heuristic step of multiplying each column by its variance.prior
: The prior for the best model at each iteration.model
: the transcription factors with the highest posteriors at each iteration per gene.prior_weights
: a GxK matrix with the prior probabilities of each interaction.g
: a positive scalar that corresponds to the width parameter in the RBF Kernel.v
: a positive scalar smaller than 1 that is the shrinkage parameter for each boosting iteration.ite
: an integer that represents the maximum number of iterations.
AUPR_AUROC_matrix(Net, G_mat, auto_remove, TFs, upper_limit)
Function to calculate the AUROC and AUPR of a known network.
Input:
Net
: An inferred network with the predictive probabilities that each transcription factor regulates each gene.G_mat
: A matrix with the gold standard network.auto_remove
: TRUE if the auto-regulation is to be discarded.TFs
: the indexes of the rows of Net that are TFs.upper_limit
: Max number of edges to use (default = all possible edges).
Output:
List with the following fields:
AUPR
: the area under the precision-recall (PR) curve.AUROC
: the area under the receiver operator characteristic (ROC) curve.th
: All the unique values of Net.Prec
: The precision at each value of th.Rec
: The recall at each value of th.FPR
: The false positive rate at each value of th.TP
: The true positives at each value of th.FP
: The false positives at each value of th.TN
: The true negatives at each value of th.FN
: The false negatives at each value of th.
d4_mfac(v, g, ite)
Function to produce the KBoost AUPR and AUROC results on the DREAM4 Multifactorial Challenge.
Input:
g
: a number larger than 0 that is the width parameter for the RBF Kernelv
: a number between 0 and 1 that is the shrinkage parameterite
: an integer with number of iterations.
Output:
auprs
: a matrix with the AUPR per D4 multifactorial dataset.aurocs
: a matrix with the AUROC per D4 multifactorial dataset.
get_prior_Gerstein(gen_names, TFs, pos_weight, neg_weight)
Function to build a prior from a previously built Network on ChIP-Seq from Gerstein et al. (2012).
Input:
gen_names
: the gene names of the G genes in the user’s subset in Symbol nomenclature.TFs
: the indexes of the K genes in the user’s subset which are TFs.pos_weight
: the prior weight for edges that were previously found in the Gerstein et al. networkneg_weight
: the prior weight for edges that were not found in the Gerstein et al. network
Output:
prior_weights
: a GxK matrix with prior weights that a TF regulates a gene given the network published by Gerstein et al.
grid_search_kboost(dataset, vs, gs, ite)
Function to perform a grid search and find the best hyperparameters.
Input:
dataset
: One of the three datasets in the package, 1 for IRMA, 2 for DREAM4 multifactorial and 3 for DREAM5.vs
: The range of values of v. All values need to be between 0 and 1.gs
: The range of values of g. All values need to be larger than 0.ite
: An integer that is the number of iterations.
Output:
List with the following fields:
aurocs
: a 3 dimensional marray with the AUROCs. Columns are the gs, the rows the datasets, vs, and the last dimension is the different datasets within a dataset.auprs
: a 3 dimensional matrix with the AUPRs. Columns are the gs, the rows the datasets, vs, and the last dimension is the different datasets within a dataset.
irma_check(g, v, ite)
Function to produce the AUPR and AUROC Results on the DREAM4 Multifactorial Challenge.
Input:
g
: a number larger than 0 that is the width parameter for the RBF Kernelv
: a number between 0 and 1 that is the shrinkage parameterite
: an integer with number of iterations.
Output:
auprs
: a matrix with the AUPR per IRMA dataset.aurocs
: a matrix with the AUROC per IRMA dataset.
net_dist_bin(GRN,TFs,thr)
Function to calculate the shortest distance between nodes.
Input:
GRN
: An inferred networks with the predictive probabilities that a transcription factor regulates a gene.TFs
: A vector with indexes of the rows of GRN which correspond to TFs.thr
: A scalar between 0 and 1 that is used select the edges with large posterior probabilities.
Output:
dist_mat
: A matrix with the shortest distances between TFs (columns) and all genes (rows).
Example:
net_summary_bin(GRN,TFs,thr,a,b)
Function to summarize the GRN filtered with a threshold.
Input:
GRN
: An inferred networks with the predictive probabilities that a transcription facor regulates a gene.TFs
: A vector with indexes of the rows of GRN which correspond to TFs.thr
: a scalar between 0 and 1, edges with posterior probabilities lower than thr will be discarded.a
: a scalar for the Katz and PageRank centrality measures. Default the inverse of the largest eigenvalue of GRN.b
: a scalar for the Katz and PageRank centrality measures. Default is 1.
Output: List with the following fields:
GRN_table
: a sorted table version of the GRN.Outdegree
: the outdegree of each TF.Indegree
: the indegree of each gene.
Close_centr
: A matrix with the closeness centrality measure per TF.
Example:
net_refine(Net)
Function to do a heuristic post-processing suggested by Slawek and Arodz that improves accuracy. Each column is multiplied by its variance.
Input:
Net
: a GRN with TFs in the columns.
Output:
Net
: a refined GRN.
write_GRN_D4(GRN,TFs, filename)
Function to write output in DREAM4 Challenge Format.
Input:
GRN
: a GxK gene regulatory network.TFs
: a K set of indixes of G that are TFs.filename
: a string with the name of the file to store the GRN.