1 What `curatedMetagenomicData` provides

curatedMetagenomicData provides 6 types of data for each dataset:

species-level taxonomic profiles, expressed as relative abundance from phylum to strain level.
presence of unique, clade-specific markers,
abundance of unique, clade-specific markers,
abundance of gene families,
metabolic pathway coverage, and
metabolic pathway abundance.

Types 1-3 are generated by MetaPhlAn2; 4-6 are generated by HUMAnN2.

Currently, curatedMetagenomicData provides:

2,875 samples from 12 datasets, primarily of the human gut but including body sites profiled in the Human Microbiome Project
processed data from whole-metagenome shotgun metagenomics, with manually-curated metadata, as integrated and documented Bioconductor ExpressionSet objects
~200 fields of specimen metadata from original papers, supplementary files, and websites, with manual curation to standardize annotations
processing of data through the MetaPhlAn2 pipeline for taxonomic abundance, and HUMAnN2 pipeline for metabolic analysis
This effort required analyzing ~30T of raw sequencing data, generating ~52T of intermediate files

These datasets are documented in the reference manual.

2 Using `curatedMetagenomicData` Resources

Use of the resources in curatedMetagenomicData is simplified with the use of Bioconductor’s ExperimentHub platform, which allows for the accessing of data through an intuitive interface. First, curatedMetagenomicData is installed using BiocInstaller and then called as a library - the process allows for the user to simply call datasets as functions because the package is aware of the resources present in ExperimentHub S3 buckets.

# BiocInstaller::biocLite("curatedMetagenomicData")  #Bioconductor version
# BiocInstaller::biocLite("vobencha/curatedMetagenomicData")  #bleeding edge version
library(curatedMetagenomicData)

3 Browsing ExperimentHub

In the next section we will demonstrate convenience functions that make ExperimentHub transparent, but it can be useful and powerful to interact directly with ExperimentHub. A “hub” connects you to the ExperimentHub server and its metadata, without downloading any data:

suppressPackageStartupMessages(library(ExperimentHub))
eh = ExperimentHub()

## snapshotDate(): 2016-10-01

The following queries ExperimentHub for all records matching “curatedMetagenomicData”:

myquery = query(eh, "curatedMetagenomicData")

This is an abbreviated list of what was found:

myquery

## ExperimentHub with 186 records
## # snapshotDate(): 2016-10-01 
## # $dataprovider: Department of Psychology, Abdul Haq Campus, Federal Ur...
## # $species: Homo Sapiens
## # $rdataclass: ExpressionSet
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## #   tags, rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH178"]]' 
## 
##           title                                              
##   EH178 | HMP_2012.genefamilies_relab.anterior_nares         
##   EH179 | HMP_2012.genefamilies_relab.buccal_mucosa          
##   EH180 | HMP_2012.genefamilies_relab.hard_palate            
##   EH181 | HMP_2012.genefamilies_relab.keratinized_gingiva    
##   EH182 | HMP_2012.genefamilies_relab.l_retroauricular_crease
##   ...     ...                                                
##   EH359 | ZellerG_2014.marker_abundance.stool                
##   EH360 | ZellerG_2014.marker_presence.stool                 
##   EH361 | ZellerG_2014.metaphlan_bugs_list.stool             
##   EH362 | ZellerG_2014.pathabundance_relab.stool             
##   EH363 | ZellerG_2014.pathcoverage.stool

Note that the first column, “EH178” etc, are unique IDs for each dataset. For a full list, mcols(myquery) produces a data.frame. The following is unevaluated in this script, but can be used to display an interactive spreadsheet of the results:

View(mcols(myquery))

Or to write the search results to a csv file:

write.csv(mcols(myquery), file="curatedMetagenomicData_allrecords.csv")

4 Advanced searching of ExperimentHub

Tags can (eventually) help identify useful datasets:

head(myquery$tags)

## [1] "Homo_sapiens_Data, ReproducibleResearch, MicrobiomeData"
## [2] "Homo_sapiens_Data, ReproducibleResearch, MicrobiomeData"
## [3] "Homo_sapiens_Data, ReproducibleResearch, MicrobiomeData"
## [4] "Homo_sapiens_Data, ReproducibleResearch, MicrobiomeData"
## [5] "Homo_sapiens_Data, ReproducibleResearch, MicrobiomeData"
## [6] "Homo_sapiens_Data, ReproducibleResearch, MicrobiomeData"

Titles can be used to select body site and data type. Here are the MetaPhlan2 taxonomy tables for all the available Human Microbiome Project (HMP) body sites, found using a Perl regular expression to find “HMP” at the beginning of a string, followed by any number of characters “.*“, followed by the word”metaphlan“:

grep("(?=HMP)(?=.*metaphlan)", myquery$title, perl=TRUE, val=TRUE)

##  [1] "HMP_2012.metaphlan_bugs_list.anterior_nares"         
##  [2] "HMP_2012.metaphlan_bugs_list.buccal_mucosa"          
##  [3] "HMP_2012.metaphlan_bugs_list.hard_palate"            
##  [4] "HMP_2012.metaphlan_bugs_list.keratinized_gingiva"    
##  [5] "HMP_2012.metaphlan_bugs_list.l_retroauricular_crease"
##  [6] "HMP_2012.metaphlan_bugs_list.mid_vagina"             
##  [7] "HMP_2012.metaphlan_bugs_list.palatine_tonsils"       
##  [8] "HMP_2012.metaphlan_bugs_list.posterior_fornix"       
##  [9] "HMP_2012.metaphlan_bugs_list.r_retroauricular_crease"
## [10] "HMP_2012.metaphlan_bugs_list.saliva"                 
## [11] "HMP_2012.metaphlan_bugs_list.stool"                  
## [12] "HMP_2012.metaphlan_bugs_list.subgingival_plaque"     
## [13] "HMP_2012.metaphlan_bugs_list.supragingival_plaque"   
## [14] "HMP_2012.metaphlan_bugs_list.throat"                 
## [15] "HMP_2012.metaphlan_bugs_list.tongue_dorsum"          
## [16] "HMP_2012.metaphlan_bugs_list.vaginal_introitus"

Here are all the metabolic pathway abundance for the stool body site:

grep("(?=.*stool)(?=.*metaphlan)", myquery$title, perl=TRUE, val=TRUE)

##  [1] "HMP_2012.metaphlan_bugs_list.stool"           
##  [2] "KarlssonFH_2013.metaphlan_bugs_list.stool"    
##  [3] "LeChatelierE_2013.metaphlan_bugs_list.stool"  
##  [4] "LomanNJ_2013_Hi.metaphlan_bugs_list.stool"    
##  [5] "LomanNJ_2013_Mi.metaphlan_bugs_list.stool"    
##  [6] "NielsenHB_2014.metaphlan_bugs_list.stool"     
##  [7] "Obregon_TitoAJ_2015.metaphlan_bugs_list.stool"
##  [8] "QinJ_2012.metaphlan_bugs_list.stool"          
##  [9] "QinN_2014.metaphlan_bugs_list.stool"          
## [10] "RampelliS_2015.metaphlan_bugs_list.stool"     
## [11] "ZellerG_2014.metaphlan_bugs_list.stool"

Or, here are all the data products available for the HiSeq component of the “Loman” study:

(lomannames = grep("LomanNJ_2013_Hi", myquery$title, perl=TRUE, val=TRUE))

## [1] "LomanNJ_2013_Hi.genefamilies_relab.stool" 
## [2] "LomanNJ_2013_Hi.marker_abundance.stool"   
## [3] "LomanNJ_2013_Hi.marker_presence.stool"    
## [4] "LomanNJ_2013_Hi.metaphlan_bugs_list.stool"
## [5] "LomanNJ_2013_Hi.pathabundance_relab.stool"
## [6] "LomanNJ_2013_Hi.pathcoverage.stool"

We could create a list of ExpressionSet objects containing all of the Loman HiSeq products as follows (not evaluated):

(idx = grep("LomanNJ_2013_Hi", myquery$title, perl=TRUE))
loman.list = lapply(idx, function(i){
  return(myquery[[i]])
})
names(loman.list) = lomannames
loman.list

5 Convenience functions

Individual data projects can be fetched without having to think about ExperimentHub. A function call to a dataset name returns a Bioconductor ExpressionSet object:

suppressPackageStartupMessages(library(curatedMetagenomicData))
loman.eset = LomanNJ_2013_Hi.metaphlan_bugs_list.stool()

## snapshotDate(): 2016-10-01

## see ?curatedMetagenomicData and browseVignettes('curatedMetagenomicData') for documentation

## loading from cache '/home/biocbuild//.ExperimentHub/289'

6 Features of `ExpressionSet` Objects

All datasets are represented as ExpressionSet objects because of the integrative nature of the class and its ability to bind data and metadata. There are three main functions, from the Biobase package, that provide access to experiment-level metadata, subject-level metadata, and the data itself.

To access the experiment-level metadata the experimentData() function is used to return a MIAME (Minimum Information About a Microarray Experiment) object.

experimentData( loman.eset )

## Experiment data
##   Experimenter name: Loman NJ, Constantinidou C, Christner M, Rohde H, Chan JZ, Quick J, Weir JC, Quince C, Smith GP, Betley JR, Aepfelbacher M, Pallen MJ 
##   Laboratory: Institute of Microbiology and Infection, University of Birmingham, Birmingham, England. 
##   Contact information:  
##   Title: A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. 
##   URL:  
##   PMIDs: 23571589 
## 
##   Abstract: A 308 word abstract is available. Use 'abstract' method.
##   notes:
##    Sequencing technology:     
##       Illumina

To access the subject-level metadata the pData() function is used to return a data.frame containing subject-level variables of study.

head( pData( loman.eset ) )

##          subjectID first repeat stooltexture daysafteronset  hus
## OBK1122    OBK1122  1122      0         <NA>             NA <NA>
## OBK1196    OBK1196  1196      0         <NA>             NA <NA>
## OBK1253    OBK1253  1253      0         <NA>             NA <NA>
## OBK2535    OBK2535  2535      0       smooth              3    y
## OBK2535b      <NA>    NA     NA         <NA>             NA <NA>
## OBK2638    OBK2638  2638      0       bloody              5    y
##          stec_count shigatoxin2elisa readsmillions nonhuman stec_coverage
## OBK1122        <NA>             <NA>          93.0       46          <NA>
## OBK1196        <NA>             <NA>          73.6      >99          <NA>
## OBK1253        <NA>             <NA>          82.4      >99          <NA>
## OBK2535    moderate         positive          55.0      >99           619
## OBK2535b       <NA>             <NA>            NA     <NA>          <NA>
## OBK2638        high         positive          30.8       32            29
##          stxab_detected stx_ratio typingdata c_difficile_frequency
## OBK1122               y        NA       <NA>              positive
## OBK1196               n        NA       <NA>              negative
## OBK1253               n        NA       <NA>              positive
## OBK2535               y         5          y                0.0014
## OBK2535b           <NA>        NA       <NA>                  <NA>
## OBK2638               y         2          y              negative
##                 disease bodysite country number_reads
## OBK1122  stec2-positive    stool germany       464060
## OBK1196  stec2-positive    stool germany     30181380
## OBK1253  stec2-positive    stool germany     65704818
## OBK2535  stec2-positive    stool germany     43597736
## OBK2535b stec2-positive    stool germany     22253314
## OBK2638  stec2-positive    stool germany      1105492

To access the data itself (in this case relative abundance), the exprs() function returns a variables by samples (rows by columns) numeric matrix. Note the presence of “synthetic” clades at all levels of the taxonomy, starting with kingdom, e.g. k__bacteria here:

exprs( loman.eset )[1:6, 1:5]  #first 6 rows and 5 columns

##                                               OBK1122   OBK1196   OBK1253
## k__Bacteria                                 100.00000 100.00000 100.00000
## k__Bacteria|p__Bacteroidetes                 83.55662  30.41764  79.41203
## k__Bacteria|p__Firmicutes                    15.14819  68.30191  19.28784
## k__Bacteria|p__Proteobacteria                 1.29520   0.06303   0.97218
## k__Bacteria|p__Bacteroidetes|c__Bacteroidia  83.55662  30.41764  79.41203
## k__Bacteria|p__Firmicutes|c__Clostridia      14.93141  33.22720  11.26228
##                                              OBK2535 OBK2535b
## k__Bacteria                                 99.39953 99.63084
## k__Bacteria|p__Bacteroidetes                15.14525  7.03101
## k__Bacteria|p__Firmicutes                   25.71252 45.17325
## k__Bacteria|p__Proteobacteria               58.53298 47.29051
## k__Bacteria|p__Bacteroidetes|c__Bacteroidia 15.14525  7.03101
## k__Bacteria|p__Firmicutes|c__Clostridia     16.50313 13.97996

Bioconductor provides further documentation of the ExpressionSet class and has published an excellent introduction.

6.1 E. coli prevalence

Here’s a direct, exploratory analysis of E. coli prevalence in the Loman dataset using the ExpressionSet object. More elegant solutions will be provided later using subsetting methods provided by the phyloseq package, but for users familiar with grep() and the ExpressionSet object, such manual methods may suffice.

First, which E. coli-related taxa are available?

grep("coli", rownames(loman.eset), value=TRUE)

## [1] "k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli"                                 
## [2] "k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli|t__Escherichia_coli_unclassified"
## [3] "k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Anaerotruncus|s__Anaerotruncus_colihominis"                                          
## [4] "k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Anaerotruncus|s__Anaerotruncus_colihominis|t__GCF_000154565"

Create a vector of E. coli relative abundances. This grep call with a “$” at the end selects the only row that ends with “s__Escherichia_coli“:

x = exprs( loman.eset )[grep("s__Escherichia_coli$", rownames( loman.eset)), ]
summary( x )

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.5484  2.3059 13.3786 11.6938 90.4474

This could be plotted as a histogram:

hist( x, xlab = "Relative Abundance", main="Prevalence of E. Coli",
      breaks="FD")

7 Taxonomy-Aware Analysis using `phyloseq`

For the MetaPhlAn2 bugs datasets (but not other data types), you gain a lot of taxonomy-aware, ecological analysis and plotting by conversion to a phyloseq class object. curatedMetagenomicData provides the ExpressionSet2phyloseq() function to make this easy:

loman.pseq = ExpressionSet2phyloseq( loman.eset )

7.1 Estimating Absolute Raw Count Data

Absolute raw count data can be estimated from the relative count data by multiplying the columns of the ExpressionSet data by the number of reads for each sample, as found in the pData column “number_reads”. You could do this manually using the sweep() function (dividing by 100 to convert relative abundance percentages to fractions):

loman.counts = sweep(exprs( loman.eset ) / 100, 2, loman.eset$number_reads, "*")
loman.counts[1:6, 1:5]

##                                                OBK1122     OBK1196
## k__Bacteria                                 464060.000 30181380.00
## k__Bacteria|p__Bacteroidetes                387752.851  9180463.52
## k__Bacteria|p__Firmicutes                    70296.691 20614459.00
## k__Bacteria|p__Proteobacteria                 6010.505    19023.32
## k__Bacteria|p__Bacteroidetes|c__Bacteroidia 387752.851  9180463.52
## k__Bacteria|p__Firmicutes|c__Clostridia      69290.701 10028427.50
##                                                OBK1253  OBK2535 OBK2535b
## k__Bacteria                                 65704818.0 43335945 22171164
## k__Bacteria|p__Bacteroidetes                52177529.8  6602986  1564633
## k__Bacteria|p__Firmicutes                   12673040.2 11210077 10052545
## k__Bacteria|p__Proteobacteria                 638769.1 25519054 10523706
## k__Bacteria|p__Bacteroidetes|c__Bacteroidia 52177529.8  6602986  1564633
## k__Bacteria|p__Firmicutes|c__Clostridia      7399860.6  7194991  3111004

or just set the relab argument in Expressionset2phyloseq() to FALSE:

loman.pseq2 = ExpressionSet2phyloseq( loman.eset, relab=FALSE )
otu_table( loman.pseq2 )[1:6, 1:5]

## OTU Table:          [6 taxa and 5 samples]
##                      taxa are rows
##                   OBK1122  OBK1196  OBK1253  OBK2535 OBK2535b
## k__Bacteria        464060 30181380 65704818 43335945 22171164
## p__Bacteroidetes   387753  9180464 52177530  6602986  1564633
## p__Firmicutes       70297 20614459 12673040 11210077 10052545
## p__Proteobacteria    6011    19023   638769 25519054 10523706
## c__Bacteroidia     387753  9180464 52177530  6602986  1564633
## c__Clostridia       69291 10028427  7399861  7194991  3111004

Note the simplified row names of the OTU table, resulting from the default argument simplify=TRUE. The simplified names are convenient, and lossless because taxonomic information is now attainable by tax_table(loman.pseq2)

7.2 Components of a phyloseq object

The phyloseq objects currently contain 3 components, with extractor functions hinted at by its show method:

loman.pseq

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 736 taxa and 44 samples ]
## sample_data() Sample Data:       [ 44 samples by 19 sample variables ]
## tax_table()   Taxonomy Table:    [ 736 taxa by 8 taxonomic ranks ]

otu_table() returns the same thing as exprs( loman.eset ) did, the Operational Taxanomic Unit (OTU) table. Here are the first 6 rows and 5 columns:

otu_table( loman.pseq )[1:6, 1:5]

## OTU Table:          [6 taxa and 5 samples]
##                      taxa are rows
##                     OBK1122   OBK1196   OBK1253  OBK2535 OBK2535b
## k__Bacteria       100.00000 100.00000 100.00000 99.39953 99.63084
## p__Bacteroidetes   83.55662  30.41764  79.41203 15.14525  7.03101
## p__Firmicutes      15.14819  68.30191  19.28784 25.71252 45.17325
## p__Proteobacteria   1.29520   0.06303   0.97218 58.53298 47.29051
## c__Bacteroidia     83.55662  30.41764  79.41203 15.14525  7.03101
## c__Clostridia      14.93141  33.22720  11.26228 16.50313 13.97996

The same patient or participant data that was available from pData( loman.eset ) is now availble using sample_data() on the phyloseq object:

sample_data( loman.pseq )[1:6, 1:5]

##          subjectID first repeat. stooltexture daysafteronset
## OBK1122    OBK1122  1122       0         <NA>             NA
## OBK1196    OBK1196  1196       0         <NA>             NA
## OBK1253    OBK1253  1253       0         <NA>             NA
## OBK2535    OBK2535  2535       0       smooth              3
## OBK2535b      <NA>    NA      NA         <NA>             NA
## OBK2638    OBK2638  2638       0       bloody              5

But this object also is aware of the taxonomic structure, which will enable the powerful subsetting methods of the phyloseq package.

head( tax_table( loman.pseq ) )

## Taxonomy Table:     [6 taxa by 8 taxonomic ranks]:
##                   Kingdom    Phylum           Class         Order Family
## k__Bacteria       "Bacteria" NA               NA            NA    NA    
## p__Bacteroidetes  "Bacteria" "Bacteroidetes"  NA            NA    NA    
## p__Firmicutes     "Bacteria" "Firmicutes"     NA            NA    NA    
## p__Proteobacteria "Bacteria" "Proteobacteria" NA            NA    NA    
## c__Bacteroidia    "Bacteria" "Bacteroidetes"  "Bacteroidia" NA    NA    
## c__Clostridia     "Bacteria" "Firmicutes"     "Clostridia"  NA    NA    
##                   Genus Species Strain
## k__Bacteria       NA    NA      NA    
## p__Bacteroidetes  NA    NA      NA    
## p__Firmicutes     NA    NA      NA    
## p__Proteobacteria NA    NA      NA    
## c__Bacteroidia    NA    NA      NA    
## c__Clostridia     NA    NA      NA

7.3 Subseting / Pruning

The process of subsetting begins with the names of phylogenetic ranks:

rank_names( loman.pseq )

## [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
## [8] "Strain"

Taxa can be filtered by these rank names. For example, to return an object with only species and strains:

subset_taxa( loman.pseq, !is.na(Species))

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 535 taxa and 44 samples ]
## sample_data() Sample Data:       [ 44 samples by 19 sample variables ]
## tax_table()   Taxonomy Table:    [ 535 taxa by 8 taxonomic ranks ]

To keep only phylum-level data (not class or finer, and not kingdom-level):

subset_taxa( loman.pseq, is.na(Class) & !is.na(Phylum))

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 8 taxa and 44 samples ]
## sample_data() Sample Data:       [ 44 samples by 19 sample variables ]
## tax_table()   Taxonomy Table:    [ 8 taxa by 8 taxonomic ranks ]

Or to keep only Bacteroidetes phylum. Note that taxa names have been shortened from the rownames of the ExpressionSet object, for nicer plotting.

loman.bd = subset_taxa( loman.pseq, Phylum == "Bacteroidetes")
head( taxa_names( loman.bd ) )

## [1] "p__Bacteroidetes"      "c__Bacteroidia"        "o__Bacteroidales"     
## [4] "f__Bacteroidaceae"     "f__Porphyromonadaceae" "f__Rikenellaceae"

7.4 Advanced Pruning

The phyloseq package provides advanced pruning of taxa, such as the following which keeps only taxa that are among the most abundant 5% in at least five samples:

keepotu = genefilter_sample(loman.pseq, filterfun_sample(topp(0.05)), A=5)
summary(keepotu)

##    Mode   FALSE    TRUE 
## logical     583     153

subset_taxa(loman.pseq, keepotu)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 153 taxa and 44 samples ]
## sample_data() Sample Data:       [ 44 samples by 19 sample variables ]
## tax_table()   Taxonomy Table:    [ 153 taxa by 8 taxonomic ranks ]

Note that phyloseq also provides topk() for selecting the most abundant k taxa, and other functions for advanced pruning of taxa.

7.5 Taxonomy Heatmap

The phyloseq package provides the plot_heatmap() function to create heatmaps using a variety of built-in dissimilarity metrics for clustering. Here, we apply the same abundance filter as above, keep only strain-level OTUs. This function supports a large number of distance and ordination methods, here we use Bray-Curtis dissimilarity for distance and PCoA as the ordination method for organizing the heatmap.

loman.filt = subset_taxa(loman.pseq, keepotu & !is.na(Strain))
plot_heatmap(loman.filt, method="PCoA", distance="bray")

7.6 Taxonomy Histogram

Here we plot the top 20 most abundant species (not strains), defined by the sum of abundance across all samples in the dataset:

loman.sp = subset_taxa(loman.pseq, !is.na(Species) & is.na(Strain))
par(mar = c(20, 4, 0, 0) + 0.15) #increase margin size on the bottom
barplot(sort(taxa_sums(loman.sp), TRUE)[1:20] / nsamples(loman.sp),
        ylab = "Total counts", las = 2)

7.7 Alpha Diversity Estimation

The phyloseq package calculates numerous alpha diversity measures. Here we compare three diversity in the species-level data, stratifying by stool texture:

alphas = c("Shannon", "Simpson", "InvSimpson")
plot_richness(loman.sp, "stooltexture", measures = alphas)

Let’s compare these three alpha diversity measures:

pairs( estimate_richness(loman.sp, measures = alphas) )

7.8 Beta Diversity / Dissimilarity Clustering

Numerous beta diversity / dissimilarity are provided by the distance() function when provided a phyloseq object, and these can be used for any kind of clustering or classification scheme. For example, here is a hierarchical clustering dendrogram produced by the hclust() from the base R stats package with “Ward” linkage:

mydist = distance(loman.sp, method="bray")
myhclust = hclust( mydist )
plot(myhclust, main="Bray-Curtis Dissimilarity", 
     method="ward.D", xlab="Samples", sub = "")

7.9 Ordination Analysis

The phyloseq package provides a variety of ordination methods, with convenient options for labelling points. Here is a Principal Coordinates Analysis plot of species-level taxa from the Loman dataset, using Bray-Curtis distance:

ordinated_taxa = ordinate(loman.sp, method="PCoA", distance="bray")
plot_ordination(loman.sp, ordinated_taxa, color="stooltexture", 
                title = "Bray-Curtis Principal Coordinates Analysis")

plot_scree(ordinated_taxa, title="Screeplot")

8 Adding Datasets

Authors welcome the addition of new datasets provided they can be run through the MetaPhlAn2 and HUMAnN2 pipelines and have curated per sample metadata. To have a dataset considered for addition please contact the maintainer and provide a TSV file containing curated per sample metadata, along with information about how the raw data can be transfered (SCP, SFTP, etc.) for processing. Provided the dataset is a curated human metagenomic study that can be run through the MetaPhlAn2 and HUMAnN2 pipelines it will be added by the process outlined in the wiki as soon as is practically possible.

9 Reporting Bugs

Development of the curatedMetagenomicData package occurs on GitHub and bugs reported via GitHub issues will recieve the quickest response. Please visit the project repository and file an issue should you find one.

10 Other Issues

Visit the Bioconductor support site at https://support.bioconductor.org/, breifly describe your issue, and add the curatedmetagenomicdata tag.

curatedMetagenomicData vignette: microbial taxonomy and functional data for the human microbiome

Abstract

Contents

1 What `curatedMetagenomicData` provides

2 Using `curatedMetagenomicData` Resources

3 Browsing ExperimentHub

4 Advanced searching of ExperimentHub

5 Convenience functions

6 Features of `ExpressionSet` Objects

6.1 E. coli prevalence

7 Taxonomy-Aware Analysis using `phyloseq`

7.1 Estimating Absolute Raw Count Data

7.2 Components of a phyloseq object

7.3 Subseting / Pruning

7.4 Advanced Pruning

7.5 Taxonomy Heatmap

7.6 Taxonomy Histogram

7.7 Alpha Diversity Estimation

7.8 Beta Diversity / Dissimilarity Clustering

7.9 Ordination Analysis

8 Adding Datasets

9 Reporting Bugs

10 Other Issues

curatedMetagenomicData vignette: microbial taxonomy and functional data for the human microbiome

Abstract

Contents

1 What curatedMetagenomicData provides

2 Using curatedMetagenomicData Resources

3 Browsing ExperimentHub

4 Advanced searching of ExperimentHub

5 Convenience functions

6 Features of ExpressionSet Objects

6.1 E. coli prevalence

7 Taxonomy-Aware Analysis using phyloseq

7.1 Estimating Absolute Raw Count Data

7.2 Components of a phyloseq object

7.3 Subseting / Pruning

7.4 Advanced Pruning

7.5 Taxonomy Heatmap

7.6 Taxonomy Histogram

7.7 Alpha Diversity Estimation

7.8 Beta Diversity / Dissimilarity Clustering

7.9 Ordination Analysis

8 Adding Datasets

9 Reporting Bugs

10 Other Issues

1 What `curatedMetagenomicData` provides

2 Using `curatedMetagenomicData` Resources

6 Features of `ExpressionSet` Objects

7 Taxonomy-Aware Analysis using `phyloseq`