padog {PADOG} | R Documentation |
This is a general purpose gene set analysis method that downplays the importance of genes that apear often accross the sets of genes analyzed. The package provides also a benchmark for gene set analysis in terms of sensitivity and ranking using 24 public datasets.
padog(esetm=NULL,group=NULL,paired=FALSE,block=NULL,gslist="KEGG.db",organism="hsa", annotation=NULL,gs.names=NULL,NI=1000,plots=FALSE,targetgs=NULL,Nmin=3, verbose=TRUE,parallel=FALSE,dseed=NULL,ncr=NULL)
esetm |
A matrix containing log transfomed and normalized gene expression data. Rows correspond to genes and columns to samples. |
group |
A character vector with the class labels of the samples. It can only contain "c" for control samples or "d" for disease samples. |
paired |
A logical value to indicate if the samples in the two groups are paired. |
block |
A character vector indicating the block ids of the samples classified by the group variable, if |
gslist |
Either the value "KEGG.db" or a list with the gene sets. If set to "KEGG.db", then gene sets will be made of all KEGG pathways for the |
annotation |
A valid chip annotation package if the rownames of |
organism |
A three letter string giving the name of the organism supported by the "KEGG.db" package. |
gs.names |
Character vector with the names of the gene sets. If specified, must have the same length as gslist. |
NI |
Number of iterations to determine the gene set score significance p-values. |
plots |
If set to TRUE then the distribution of the PADOG scores with and without weighting the genes in raw and standardized form are shown using boxplots.
A pdf file will be created in the current directory having the name provided in the |
targetgs |
The identifier of a traget gene set for which the scores will be highlighted in the plots produced if |
Nmin |
The minimum size of gene sets to be included in the analysis. |
verbose |
If set to TRUE, displays the number of iterations elapsed is displayed. |
parallel |
If set to TRUE, the |
dseed |
Optional initial seed for random number generator (integer). |
ncr |
The number of CPU cores used when |
See cited documents for more details.
A data frame containing the ranked pathways and various statistics: Name
is the name of the gene set;
ID
is the gene set identifier; Size
is the number of genes in the geneset; meanAbsT0
is the mean of absolute t-scores;
padog0
is the mean of weighted absolute t-scores;
PmeanAbsT
significance of the meanAbsT0; Ppadog
is the significance of the padog0 score;
Adi Laurentiu Tarca <atarca@med.wayne.edu>
Adi L. Tarca, Sorin Draghici, Gaurav Bhatti, Roberto Romero, Down-weighting overlapping genes improves gene set analysis, BMC Bioinformatics, 2012, submitted.
#run padog on a colorectal cancer dataset of the 24 datasets benchmark GSE9348 #use NI=1000 for accurate results. set="GSE9348" data(list=set,package="KEGGdzPathwaysGEO") x=get(set) #Extract from the dataset the required info exp=experimentData(x); dataset= exp@name dat.m=exprs(x) ano=pData(x) design= notes(exp)$design annotation= paste(x@annotation,".db",sep="") targetGeneSets= notes(exp)$targetGeneSets myr=padog( esetm=dat.m, group=ano$Group, paired=design=="Paired", block=ano$Block, targetgs=targetGeneSets, annotation=annotation, gslist="KEGG.db", organism="hsa", verbose=TRUE, Nmin=3, NI=25, plots=FALSE, dseed=1) myr2=padog( esetm=dat.m, group=ano$Group, paired=design=="Paired", block=ano$Block, targetgs=targetGeneSets, annotation=annotation, gslist="KEGG.db", organism="hsa", verbose=TRUE, Nmin=3, NI=25, plots=FALSE, dseed=1, paral=TRUE, ncr=2) myr[1:20,] all.equal(myr, myr2)