The TCGA tumor types cover a collection of anatomical compartments. Organizing tumor types into groups of related compartments may be fruitful. We will use the oncotree OBO representation from an NCI thesaurus OBO distribution in the Bioc 3.9 version of ontoProc.
This table was constructed by hand on Oct 10 2019 using materials in ontoProc package.
We will drop the CNTL class, and use only the first NCIT mapping when two seem to match.
controlindex = which(map_tcga_ncit[,1]=="CNTL")
tcgacodes = map_tcga_ncit[-controlindex,1]
ncitsites = map_tcga_ncit[-controlindex,3]
ssi = strsplit(ncitsites, "\\|")
sites = sapply(ssi, "[", 1)
simpmap = data.frame(code=tcgacodes, oncotr_site=otree$name[sites], ncit=sites,
stringsAsFactors=FALSE)
simpmap[sample(seq_len(nrow(simpmap)),5),]
## code oncotr_site
## NCIT:C8851 DLBC Diffuse Large B-Cell Lymphoma
## NCIT:C40217 UCEC Uterine Corpus Endometrial Stromal and Related Neoplasms
## NCIT:C2919 PRAD Prostate Adenocarcinoma
## NCIT:C9118 SARC Sarcoma
## NCIT:C39851 BLCA Bladder Urothelial Carcinoma
## ncit
## NCIT:C8851 NCIT:C8851
## NCIT:C40217 NCIT:C40217
## NCIT:C2919 NCIT:C2919
## NCIT:C9118 NCIT:C9118
## NCIT:C39851 NCIT:C39851
We now have a 1-1 mapping from TCGA code to NCIT site. These sites can be grouped according to organ system, using the knowledge that NCIT:C3263 is the ‘neoplasm by site’ (which really should be ‘system’) category.
poss_sys = otree$children["NCIT:C3263"][[1]] # all possible systems
allanc = otree$ancestors[simpmap$ncit]
specific = sapply(allanc, function(x) intersect(x, poss_sys)[1]) # ignore multiplicities
sys = unlist(otree$name[specific])
datatable(systab <- cbind(simpmap, sys=sys))
Neither thymoma nor mesothelioma have NCIT organ system mappings per se.
We now have 12 categories for 33 tumor types. A code pattern for finding the TCGA codes for a given system is:
## code oncotr_site ncit
## NCIT:C40195 CESC Cervical Squamous Neoplasm NCIT:C40195
## NCIT:C7550 OV Ovarian Serous Adenocarcinoma NCIT:C7550
## NCIT:C2919 PRAD Prostate Adenocarcinoma NCIT:C2919
## NCIT:C8591 TGCT Testicular Germ Cell Tumor NCIT:C8591
## NCIT:C42700 UCS Uterine Carcinosarcoma NCIT:C42700
## sys
## NCIT:C40195 Reproductive System Neoplasm
## NCIT:C7550 Reproductive System Neoplasm
## NCIT:C2919 Reproductive System Neoplasm
## NCIT:C8591 Reproductive System Neoplasm
## NCIT:C42700 Reproductive System Neoplasm