Contents

1 Overview

GWENA (Gene Whole co-Expression Network Analysis) is an R package to perform gene co-expression network analysis in a single pipeline. This pipeline includes functional enrichment of modules of co-expressed genes, phenotypcal association, topological analysis and comparisons of networks between conditions.

Using transcriptomics data from either RNA-seq or microarray, the package follows the steps displayed in Figure 1:

  1. Input: data is provided as a data.frame or a matrix of expression intensities (pre-normalized).
  2. Gene filtering: data is filtered according to the transcriptomic technology used.
  3. Network building: a matrix of similarity score is computed between each gene with Spearman correlation, then transformed into an adjacency matrix, and finally into a topological overlap matrix.
  4. Modules detection: groups of genes with closest similarity scores are detected as modules.
  5. Biological integration: gene set enrichment analysis and phenotypic association (if phenotypes are provided) are performed on modules.
  6. Graph visualization and topological analysis: hub genes are identified, as well as visualization of modules.
  7. Networks comparison: if multiple conditions are available (time points, treatments, phenotype, etc.), analysis of modules preservation/non-preservation between conditions can be performed.

This document gives a brief tutorial using a subset of a microarray data set to show the content and value of each step in the pipeline.

.

Figure 1. Analysis pipeline of GWENA, from expression data to characterization of the modules and comparison of conditions.

Figure 1. Analysis pipeline of GWENA, from expression data to characterization of the modules and comparison of conditions.

2 Main steps of the pipeline

2.1 Starting with GWENA

Installation can either be from:

  1. the official version of the last Bioconductor release (recommended).
  2. the last stable version from the Bioc Devel branch.
  3. the day-to-day development version from the Github repository.
if (!requireNamespace("BiocManager", quietly=TRUE))
  install.packages("BiocManager")

# 1. From Bioconductor release
BiocManager::install("GWENA")

# 2. From Bioconductor devel
BiocManager::install("GWENA", version = "devel")

# 3. From Github repository
BiocManager::install("Kumquatum/GWENA")
# OR
if (!requireNamespace("devtools", quietly=TRUE))
  install.packages("devtools")
devtools::install_github("Kumquatum/GWENA")

Package loading:

library(GWENA)
library(magrittr) # Not mandatory, but in this tutorial we use the pipe `%>%` to ease readability.

threads_to_use <- 1

2.2 Input data

2.2.1 The expression data

GWENA support expression matrix data coming from either RNA-seq or microarray experiments. Expression data have to be stored as text or spreadsheet files and formatted with genes as columns and samples as rows. To read this file with R, use the appropriate function according to the data separator (e.g. read.csv, read.table). Moreover, the expression data have to be normalized and transcripts expression reduced to the gene level (See How can I reduce my transcriptomic data to the gene level ? since GWENA is designed to build gene co-expression networks.

In this vignette, we use the microarray data set GSE85358 from the Kuehne et al. study. This data was gathered from a skin ageing study and has been processed and normalized with the R script provided in Additional data n°10 of the corresponding article.

# Import expression table
data("kuehne_expr")
# If kuehne_expr was in a file :
# kuehne_expr = read.table(<path_to_file>, header=TRUE, row.names=1)

# Number of genes
ncol(kuehne_expr)
#> [1] 15801
# Number of samples
nrow(kuehne_expr)
#> [1] 48

# Overview of expression table
kuehne_expr[1:5,1:5]
#>                  A_19_P00325768 A_19_P00800244 A_19_P00801821 A_19_P00802027
#> 253949420929_1_1       10.27450       5.530172       10.75672       16.78277
#> 253949420929_1_2       10.23440       5.712894       11.05393       16.25480
#> 253949420929_1_3       10.54336       5.889068       10.92150       16.39615
#> 253949420929_1_4       10.32649       5.646343       10.55770       16.37210
#> 253949420929_2_1       10.13626       5.726866       11.23012       16.41413
#>                  A_19_P00802201
#> 253949420929_1_1       8.549254
#> 253949420929_1_2       8.313369
#> 253949420929_1_3       8.469018
#> 253949420929_1_4       7.983723
#> 253949420929_2_1       7.521542

# Checking expression data set is correctly defined
is_data_expr(kuehne_expr)
#> $bool
#> [1] TRUE
#> 
#> $reason
#> NULL

2.2.2 The metadata

To be able to perform the phenotypic association step of the pipeline (optional), we need to specify in another matrix the information associated with each sample (e.g. condition, treatment, phenotype, experiment date…). This information is often provided in a separate file (also text or spreadsheet) and can be read in R with read.csv or read.table functions.

# Import phenotype table (also called traits)
data("kuehne_traits")
# If kuehne_traits was in a file :
# kuehne_traits = read.table(<path_to_file>, header=TRUE, row.names=1)

# Phenotype
unique(kuehne_traits$Condition)
#> [1] "young" "old"

# Overview of traits table
kuehne_traits[1:5,]
#>          Slide Array Exp Condition Age
#> 1 253949420929     1 1_1     young  23
#> 2 253949420929     2 1_2       old  66
#> 3 253949420929     3 1_3     young  21
#> 4 253949420929     4 1_4       old  62
#> 5 253949420929     5 2_1     young  25

2.2.3 Using SummarizedExperiment object

GWENA is also compatible with the use of SummarizedExperiment. The previous dataset can therefore be transformed as one and used in the next steps

se_kuehne <- SummarizedExperiment::SummarizedExperiment(
  assays = list(expr = t(kuehne_expr)),
  colData = S4Vectors::DataFrame(kuehne_traits)
)

S4Vectors::metadata(se_kuehne) <- list(
  experiment_type = "Expression profiling by array",
  transcriptomic_technology = "Microarray",
  GEO_accession_id = "GSE85358",
  overall_design = paste("Gene expression in epidermal skin samples from the",
                         "inner forearms 24 young (20 to 25 years) and 24 old",
                         "(55 to 66 years) human volunteers were analysed", 
                         "using Agilent Whole Human Genome Oligo Microarrays",
                         "8x60K V2."),
  contributors = c("Kuehne A", "Hildebrand J", "Soehle J", "Wenck H", 
                   "Terstegen L", "Gallinat S", "Knott A", "Winnefeld M", 
                   "Zamboni N"),
  title = paste("An integrative metabolomics and transcriptomics study to",
                "identify metabolic alterations in aged skin of humans in", 
                "vivo"),
  URL = "https://www.ncbi.nlm.nih.gov/pubmed/28201987",
  PMIDs = 28201987
)

2.3 Gene filtering

Although the co-expression method implemented within GWENA is designed to manage and filter out low co-expressed genes, it is advisable to first reduce the dataset size. Indeed, loading a full expression matrix without filtering for uninformative data will result in excessive processing time, CPU and memory usage, and data storage. However, the author urges the users to proceed carefully during the filtering as it will impact the gene network building.

Multiple filtration methods have been natively implemented :

  • For RNA-seq and microarray:
    • filter_low_var : Filtering on low variation of expression
  • For RNA-seq data:
    • filter_RNA_seq(<...>, method = "at least one"): only one sample needs to have a value above the minimal count threshold in the gene
    • filter_RNA_seq(<...>, method = "mean"): the means of all samples for the gene needs to be above min_count
    • filter_RNA_seq(<...>, method = "all"): all samples for the gene need to be above min_count

NB: The authors of WGCNA (used in GWENA for network building) advise against using differentially expressed (DE) genes as a filter since its module detection method is based on unsupervised clustering. Moreover, using DE genes will break the scale-free property (small-world network) on which the adjacency matrix is calculated.

In this example, we will be filtering the low variable genes with filter_low_var function.

kuehne_expr_filtered <- filter_low_var(kuehne_expr, pct = 0.7, type = "median")

# Remaining number of genes
ncol(kuehne_expr_filtered)
#> [1] 11060

2.4 Network building

Gene co-expression networks are an ensemble of genes (nodes) linked to each other (edges) according to the strength of their relation. In GWENA, this strength is estimated by the computation of a (dis)similarity score which can start with a distance (euclidian, minkowski, …) but is usually a correlation. Among these, Pearson’s one is the most popular, however in GWENA we use Spearman correlation by default. It is less sensitive to outliers which are frequent in transcriptomics datasets and does not assume that the data follows the normal distribution.

The co-expression network is built according to the following sub-steps :

  1. A correlation (or distance) between each pair of genes is computed.
  2. The correlation distributions are fitted to a power law.
  3. An adjacency score is computed by adjusting previous correlations by the fitted power law.
  4. A topological overlap score is computed by accounting for the network’s topology.

These successive adjustments improve the detection of modules for the next step.

# In order to fasten the example execution time, we only take an 
# arbitary sample of the genes. 
kuehne_expr_filtered <- kuehne_expr_filtered[, 1:1000]

net <- build_net(kuehne_expr_filtered, cor_func = "spearman", 
                 n_threads = threads_to_use)

# Power selected :
net$metadata$power
#> [1] 8

# Fit of the power law to data ($R^2$) :
fit_power_table <- net$metadata$fit_power_table
fit_power_table[fit_power_table$Power == net$metadata$power, "SFT.R.sq"]
#> [1] 0.917733

2.5 Modules detection

At this point, the network is a complete graph: all nodes are connected to all other nodes with different strengths. Because gene co-expression networks have a scale free property, groups of genes are strongly linked with one another. In co-expression networks these groups are called modules and assumed to be representative of genes working together to a common set of functions.

Such modules can be detected using unsupervised learning or modeling. GWENA use the hierarchical clustering but other methods can be used (kmeans, Gaussian mixture models, etc.).

detection <- detect_modules(kuehne_expr_filtered, 
                            net$network, 
                            detailled_result = TRUE,
                            merge_threshold = 0.25)

Important: Module 0 contains all genes that did not fit into any modules.

Since this operation tends to create multiple smaller modules with highly similar expression profile (based on the eigengene of each), they are usually merged into one.

# Number of modules before merging :
length(unique(detection$modules_premerge))
#> [1] 10
# Number of modules after merging: 
length(unique(detection$modules))
#> [1] 4

plot_modules_merge(modules_premerge = detection$modules_premerge, 
                   modules_merged = detection$modules)

#>             [,1] [,2]
#>  [1,] -1.0000000    1
#>  [2,] -0.7777778    1
#>  [3,]  0.1111111    1
#>  [4,]  0.3333333    1
#>  [5,]  0.5555556    1
#>  [6,] -0.5555556    1
#>  [7,] -0.3333333    1
#>  [8,] -0.1111111    1
#>  [9,]  0.7777778    1
#> [10,]  1.0000000    1
#> [11,] -1.0000000   -1
#> [12,] -0.4444444   -1
#> [13,]  0.4444444   -1
#> [14,]  1.0000000   -1

Resulting modules contain more genes whose repartition can be seen by a simple barplot.

ggplot2::ggplot(data.frame(detection$modules %>% stack), 
                ggplot2::aes(x = ind)) + ggplot2::stat_count() +
  ggplot2::ylab("Number of genes") +
  ggplot2::xlab("Module")

Each of the modules presents a distinct profile, which can be plotted in two figures to separate the positive (+ facet) and negative (- facet) correlations profile. As a summary of this profile, the eigengene (red line) is displayed to act as a signature.

# plot_expression_profiles(kuehne_expr_filtered, detection$modules)

2.6 Biological integration

2.6.1 Functional enrichment

A popular way to explore the modules consists of linking them with a known biological function by using currated gene sets. Among the available ones, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), WikiPathways, Reactome, Human Phenotype Ontology (HPO) put modules into a broader systemic perspective.

In oppositions, databases references like TRANSFAC, miRTarBase, Human Protein Atlas (HPA), and CORUM give more details about tissue/cell/condition information.

Using the over-representation analysis (ORA) tool GOSt from g:Profiler, we can retrieve the biological association for each module and plot it as follows.

enrichment <- bio_enrich(detection$modules)
plot_enrichment(enrichment)

2.6.2 Phenotypic association

If phenotypic information is available about the samples provided, an association test can help to determine if a module is specifically linked to a trait. In this case, module 1 seems to be strongly linked to Age.

# With data.frame/matrix
phenotype_association <- associate_phenotype(
  detection$modules_eigengenes, 
  kuehne_traits %>% dplyr::select(Condition, Age, Slide))

# With SummarizedExperiment
phenotype_association <- associate_phenotype(
  detection$modules_eigengenes, 
  SummarizedExperiment::colData(se_kuehne) %>% 
    as.data.frame %>% 
    dplyr::select(Condition, Age, Slide))

plot_modules_phenotype(phenotype_association)

Combination of phenotypic information with the previous functional enrichment can guide further analysis.

2.7 Graph visualization and topological analysis

Information can be retrieved from the network topology itself. For example, hub genes are highly connected genes known to be associated with key biological functions. They can be detected by different methods :

  • get_hub_high_co: Highest connectivity, select the top n (n depending on parameter given) highest connected genes. Similar to WGCNA’s selection of hub genes
  • get_hub_degree: Superior degree, select genes whose degree is greater than the average connection degree of the network. Definition from network theory.
  • get_hub_kleinberg: Kleinberg’s score, select genes whose Kleinberg’s score is superior to the provided threshold.

Manipulation of graph objects can be quite demanding in memory and CPU usage. Caution is advised when choosing to plot networks larger than 100 genes. Since co-expression networks are complete graphs, readability is hard because all genes are connected with each other. In order to clarity visualization, edges with a similarity score below a threshold are removed.

#>            [,1]     [,2]
#>   [1,] 64.93975 139.7672
#>   [2,] 70.50936 135.9482
#>   [3,] 54.69798 140.9025
#>   [4,] 55.06715 142.1749
#>   [5,] 65.86808 154.2860
#>   [6,] 74.13471 141.8777
#>   [7,] 70.02242 151.6104
#>   [8,] 49.71984 151.1883
#>   [9,] 69.32566 146.7384
#>  [10,] 54.30070 143.4047
#>  [11,] 61.82025 145.2963
#>  [12,] 69.69771 145.0437
#>  [13,] 63.89665 151.6343
#>  [14,] 50.13254 142.2581
#>  [15,] 73.26960 158.0052
#>  [16,] 78.02433 149.0874
#>  [17,] 69.39525 161.3114
#>  [18,] 58.38538 158.7596
#>  [19,] 76.50788 156.1431
#>  [20,] 53.06427 140.0784
#>  [21,] 69.56874 148.5078
#>  [22,] 77.68244 141.3868
#>  [23,] 50.10714 143.5711
#>  [24,] 66.30372 161.1658
#>  [25,] 77.82097 153.4496
#>  [26,] 68.46983 153.5390
#>  [27,] 60.44078 137.1752
#>  [28,] 75.41343 153.3169
#>  [29,] 64.79610 160.8814
#>  [30,] 66.24112 137.3566
#>  [31,] 52.79108 139.1238
#>  [32,] 56.02567 140.4775
#>  [33,] 62.86100 163.7421
#>  [34,] 47.87982 156.3714
#>  [35,] 71.88949 135.6381
#>  [36,] 78.38624 144.1900
#>  [37,] 70.70979 155.8096
#>  [38,] 67.54512 146.2192
#>  [39,] 72.07175 154.1351
#>  [40,] 75.57990 154.7700
#>  [41,] 69.92907 154.4787
#>  [42,] 64.29323 158.6168
#>  [43,] 75.56114 144.4986
#>  [44,] 66.35684 147.3058
#>  [45,] 47.59274 142.9105
#>  [46,] 62.90396 148.5501
#>  [47,] 79.25634 154.3687
#>  [48,] 51.85751 157.1011
#>  [49,] 61.07592 147.6620
#>  [50,] 60.02754 159.2988
#>  [51,] 51.79356 143.4852
#>  [52,] 62.55506 136.8334
#>  [53,] 56.59801 158.5448
#>  [54,] 62.66736 140.0713
#>  [55,] 74.49683 162.4187
#>  [56,] 65.03467 138.2366
#>  [57,] 49.79077 139.8167
#>  [58,] 54.58427 142.0561
#>  [59,] 59.57587 133.4174
#>  [60,] 56.93637 137.6822
#>  [61,] 49.53540 149.7155
#>  [62,] 75.93098 146.1382
#>  [63,] 72.35904 155.6316
#>  [64,] 58.90832 161.8631
#>  [65,] 69.44432 141.6461
#>  [66,] 60.70902 155.6064
#>  [67,] 55.43655 157.0876
#>  [68,] 60.40966 163.6698
#>  [69,] 50.40070 145.2693
#>  [70,] 46.93788 150.7456
#>  [71,] 76.65733 143.6839
#>  [72,] 58.42192 160.3868
#>  [73,] 51.91589 153.3511
#>  [74,] 52.61557 145.2729
#>  [75,] 47.80529 154.6699
#>  [76,] 60.70831 152.4185
#>  [77,] 61.84442 138.0557
#>  [78,] 74.76947 149.1100
#>  [79,] 79.17218 145.3227
#>  [80,] 76.62392 152.5621
#>  [81,] 50.13206 156.6566
#>  [82,] 68.63779 160.1226
#>  [83,] 66.59949 156.3417
#>  [84,] 54.79778 165.0452
#>  [85,] 64.73194 153.1769
#>  [86,] 51.46111 144.2989
#>  [87,] 56.79237 143.5313
#>  [88,] 54.13432 160.8415
#>  [89,] 59.60473 154.0368
#>  [90,] 53.41310 144.8968
#>  [91,] 63.08264 132.9900
#>  [92,] 72.32947 141.9487
#>  [93,] 79.71040 149.8943
#>  [94,] 54.15831 141.8028
#>  [95,] 77.13903 154.6346
#>  [96,] 55.64960 161.6795
#>  [97,] 67.33359 154.8298
#>  [98,] 56.20835 141.6988
#>  [99,] 71.64606 139.9066
#> [100,] 49.06588 159.7045
#> [101,] 63.40335 138.6262
#> [102,] 67.19810 136.2840
#> [103,] 74.12053 139.3135
#> [104,] 60.46794 160.7641
#> [105,] 51.06586 141.1621
#> [106,] 51.19893 150.6626
#> [107,] 78.08851 157.7247
#> [108,] 71.65990 157.2912
#> [109,] 60.09158 157.7350
#> [110,] 55.71904 151.0855
#> [111,] 64.52079 143.9718
#> [112,] 55.39359 136.6014
#> [113,] 79.75994 148.5003
#> [114,] 59.88230 148.9413
#> [115,] 60.70905 166.0820
#> [116,] 78.30810 150.4413
#> [117,] 54.72811 158.3382
#> [118,] 70.16648 149.9620
#> [119,] 54.39097 163.8237
#> [120,] 73.58818 156.5269
#> [121,] 68.00358 166.2761
#> [122,] 50.32429 141.2508
#> [123,] 69.54794 164.1881
#> [124,] 61.60956 167.1667
#> [125,] 62.60151 158.3336
#> [126,] 51.89209 142.8268
#> [127,] 50.53429 158.0884
#> [128,] 56.05161 137.8622
#> [129,] 59.15830 156.2122
#> [130,] 72.38442 152.5508
#> [131,] 57.91372 150.3994
#> [132,] 65.00343 142.4329
#> [133,] 52.08423 139.1018
#> [134,] 49.04863 154.4980
#> [135,] 58.06974 153.3365
#> [136,] 77.25941 145.1910
#> [137,] 46.63365 155.7025
#> [138,] 48.27512 150.9941
#> [139,] 70.77385 161.1177
#> [140,] 70.78976 143.6093
#> [141,] 76.46050 157.7312
#> [142,] 52.66264 147.4326
#> [143,] 52.09517 163.0230
#> [144,] 57.88259 155.0108
#> [145,] 68.89029 135.4433
#> [146,] 71.96586 160.5747
#> [147,] 69.41571 162.7521
#> [148,] 58.57420 165.5380
#> [149,] 67.35606 150.5748
#> [150,] 66.32709 140.6039
#> [151,] 75.85079 139.2184
#> [152,] 73.65158 150.8363
#> [153,] 47.58758 157.9957
#> [154,] 68.60555 136.8430
#> [155,] 53.41260 143.2040
#> [156,] 57.42202 166.2797
#> [157,] 53.51933 145.4695
#> [158,] 50.72197 143.9413
#> [159,] 77.77233 146.4119
#> [160,] 64.08581 141.0494
#> [161,] 53.10965 150.4707
#> [162,] 51.04264 160.8884
#> [163,] 66.66307 144.0663
#> [164,] 59.69563 164.8280
#> [165,] 73.05977 143.2438
#> [166,] 65.94331 135.1157
#> [167,] 56.54419 144.1391
#> [168,] 59.36948 135.4137
#> [169,] 53.04163 136.9916
#> [170,] 62.02779 149.9330
#> [171,] 68.06609 148.4107
#> [172,] 63.95027 136.8413
#> [173,] 55.53891 163.1959
#> [174,] 68.14038 144.6357
#> [175,] 53.38934 138.2887
#> [176,] 64.06603 134.7641
#> [177,] 50.43533 152.4511
#> [178,] 65.63815 145.3838
#> [179,] 73.31002 161.3971
#> [180,] 56.81470 163.5542
#> [181,] 71.58262 158.8342
#> [182,] 64.51397 155.3871
#> [183,] 56.16096 154.3155
#> [184,] 49.48276 143.2633
#> [185,] 64.43928 163.8240
#> [186,] 53.08676 164.0321
#> [187,] 58.19346 148.6438
#> [188,] 79.78580 151.4169
#> [189,] 72.72762 138.7467
#> [190,] 56.68606 156.0718
#> [191,] 52.43079 160.1931
#> [192,] 72.40272 144.4682
#> [193,] 68.69131 133.9509
#> [194,] 68.12981 164.7049
#> [195,] 62.54838 146.7435
#> [196,] 67.19323 159.9509
#> [197,] 51.46019 139.7604
#> [198,] 75.59721 141.8316
#> [199,] 52.49846 151.8709
#> [200,] 64.57663 149.7815
#> [201,] 67.92348 141.7306
#> [202,] 74.84358 158.0110
#> [203,] 79.46266 152.9471
#> [204,] 54.75763 155.5463
#> [205,] 53.89849 139.6584
#> [206,] 53.92846 162.3882
#> [207,] 64.20143 156.9325
#> [208,] 51.85745 155.5437
#> [209,] 74.29284 145.3060
#> [210,] 70.27378 134.5865
#> [211,] 68.74004 155.8657
#> [212,] 74.90557 161.1270
#> [213,] 72.39214 149.7165
#> [214,] 56.28091 149.7678
#> [215,] 46.60526 147.8622
#> [216,] 76.42647 149.3218
#> [217,] 77.89928 156.4106
#> [218,] 46.15498 152.9359
#> [219,] 71.79156 147.1290
#> [220,] 73.69079 137.6171
#> [221,] 65.65959 133.7874
#> [222,] 61.42197 141.0468
#> [223,] 62.57239 161.3043
#> [224,] 65.04617 165.3766
#> [225,] 76.62265 140.3783
#> [226,] 73.29036 163.2507
#> [227,] 56.59989 142.3204
#> [228,] 63.09212 153.7864
#> [229,] 74.53011 143.4999
#> [230,] 70.10263 137.2462
#> [231,] 64.85809 166.8124
#> [232,] 74.37522 159.7400
#> [233,] 58.88142 151.7350
#> [234,] 64.02274 162.3282
#> [235,] 50.85623 162.2104
#> [236,] 70.95789 141.9612
#> [237,] 75.10152 156.4539
#> [238,] 54.63822 136.9309
#> [239,] 50.93128 143.0457
#> [240,] 56.38555 137.0227
#> [241,] 50.92879 159.4387
#> [242,] 48.45363 140.7396
#> [243,] 53.53192 157.0198
#> [244,] 67.82439 157.5057
#> [245,] 55.48493 160.0146
#> [246,] 51.50741 136.7822
#> [247,] 56.27159 144.7513
#> [248,] 59.67195 146.8163
#> [249,] 50.92775 146.5839
#> [250,] 70.90202 164.8735
#> [251,] 65.84334 157.6442
#> [252,] 54.76027 145.5601
#> [253,] 64.19959 146.2964
#> [254,] 57.25236 162.0127
#> [255,] 57.26411 138.5911
#> [256,] 59.44913 166.7597
#> [257,] 65.99652 164.0151
#> [258,] 76.17058 160.6132
#> [259,] 53.81144 143.9639
#> [260,] 72.89099 146.0023
#> [261,] 46.64143 154.1161
#> [262,] 62.69114 155.7184
#> [263,] 61.75350 159.7520
#> [264,] 69.48068 165.6278
#> [265,] 74.77912 151.8786
#> [266,] 53.28665 141.4225
#> [267,] 75.74799 159.2715
#> [268,] 54.96233 153.2476
#> [269,] 75.02467 138.0281
#> [270,] 57.85350 157.2757
#> [271,] 62.99095 144.3506
#> [272,] 69.89407 157.2925
#> [273,] 73.03540 159.5769
#> [274,] 66.50981 149.1145
#> [275,] 63.57259 160.0070
#> [276,] 50.25808 155.3056
#> [277,] 61.40662 142.7626
#> [278,] 60.26631 150.5852
#> [279,] 53.26337 155.2740
#> [280,] 72.03970 162.3205
#> [281,] 55.73866 145.2912
#> [282,] 63.04631 142.3156
#> [283,] 63.55100 165.2695
#> [284,] 68.54909 151.4564
#> [285,] 48.85456 157.0244
#> [286,] 52.40075 142.1041
#> [287,] 66.57192 142.3584
#> [288,] 66.00309 159.1782
#> [289,] 77.18690 159.1157
#> [290,] 71.05277 138.4947
#> [291,] 72.07778 164.0896
#> [292,] 67.60119 163.3854
#> [293,] 52.88738 142.1929
#> [294,] 54.26449 151.6733
#> [295,] 61.69099 157.1743
#> [296,] 64.70364 148.0013
#> [297,] 60.68739 134.3035
#> [298,] 58.16702 140.1616
#> [299,] 54.94367 143.5182
#> [300,] 71.26274 148.4406
#> [301,] 50.73133 139.3993
#> [302,] 62.28270 165.9028
#> [303,] 55.56131 137.2461
#> [304,] 70.57960 153.0264
#> [305,] 50.58565 153.9364
#> [306,] 54.46087 147.3239
#> [307,] 78.33770 151.9457
#> [308,] 49.46422 145.6403
#> [309,] 56.60713 138.6043
#> [310,] 52.29234 140.7732
#> [311,] 65.83666 162.5876
#> [312,] 66.94232 152.5541
#> [313,] 71.88275 137.2195
#> [314,] 57.42844 164.6665
#> [315,] 53.55954 147.5777
#> [316,] 54.80995 139.0255
#> [317,] 68.27408 139.9915
#> [318,] 73.35061 148.5638
#> [319,] 69.44195 138.7516
#> [320,] 58.43745 134.2060
#> [321,] 45.81761 149.3812
#> [322,] 74.04418 147.1159
#> [323,] 47.49328 152.4222
#> [324,] 74.80579 140.4990
#> [325,] 78.52735 147.7288
#> [326,] 61.38018 133.1092
#> [327,] 52.33345 158.5204
#> [328,] 65.63847 151.2710
#> [329,] 67.43728 134.7356
#> [330,] 49.13212 141.5750
#> [331,] 67.97687 138.2250
#> [332,] 63.43860 166.9369
#> [333,] 76.93951 147.8077
#> [334,] 73.99457 154.8821
#> [335,] 66.36810 166.6022
#> [336,] 54.88459 139.9339
#> [337,] 60.87358 135.7028
#> [338,] 66.72574 165.2850
#> [339,] 52.56030 161.6778
#> [340,] 79.52078 146.7388
#> [341,] 48.82647 152.9398
#> [342,] 48.25940 148.9247
#> [343,] 76.91771 151.0549
#> [344,] 48.19736 142.1766
#> [345,] 73.75163 153.2451
#> [346,] 51.04126 138.5722
#> [347,] 48.90827 158.4941
#> [348,] 70.23468 159.2593
#> [349,] 60.97406 139.2961
#> [350,] 71.05333 145.6678
#> [351,] 47.05433 149.1693
#> [352,] 68.95524 143.2153
#> [353,] 61.96044 162.6012
#> [354,] 62.64307 134.2503
#> [355,] 67.79387 162.0256
#> [356,] 52.91378 140.6860
#> [357,] 61.55540 164.5944
#> [358,] 64.54899 133.1667
#> [359,] 56.45953 141.1195
#> [360,] 73.23456 136.3858
#> [361,] 52.81028 144.6291
#> [362,] 68.72907 158.5735
#> [363,] 56.75906 152.4522
#> [364,] 75.48507 150.5798
#> [365,] 70.12906 140.3231
#> [366,] 58.57946 163.4370
#> [367,] 60.38865 162.2488
#> [368,] 72.92452 140.5826
#> [369,] 60.54962 144.5550
#> [370,] 66.77113 139.0944
#> [371,] 65.04669 136.1064
#> [372,] 78.62491 155.4259
#> [373,] 56.92315 160.3026
#> [374,] 45.85359 151.4148
#> [375,] 50.09159 146.3123
#> [376,] 49.62175 160.8523
#> [377,] 56.10427 165.4204
#> [378,] 53.47266 153.7006
#> [379,] 61.49995 153.9977
#> [380,] 62.38192 151.5880
#> [381,] 70.84890 163.0841
#> [382,] 75.47188 147.5199
#> [383,] 53.65971 159.2716
#> [384,] 62.38355 135.5550
#> [385,] 76.75604 142.3606
#> [386,] 71.81630 151.0794
#> [387,] 78.28782 142.9112
#> [388,] 66.95145 133.3518

2.8 Networks comparison

A co-expression network can be built for each of the experimental conditions studied (e.g. control/test) and then be compared with each other to detect differences of patterns in co-expression. These may indicate breaks of inhibition, inefficiency of a factor of transcription, etc. These analyses can focus on preserved modules between conditions (e.g. to detect housekeeping genes), or unpreserved modules (e.g. to detect genes contributing to a disease).

GWENA uses a comparison test based on random re-assignment of gene names inside modules to see whether patterns inside modules change (from NetRep package). This permutation test is repeated a large number of times to evaluate the significance of the result obtained.

To perform the comparison, all previous steps leading to modules detection need to be done for each condition. To save CPU, memory and time, the parameter keep_cor_mat from the build_net function can be switched to TRUE so the similarity matrix is kept and can be passed to compare_conditions. If not, the matrix is re-computed in compare_conditions.

# Expression by condition with data.frame/matrix
samples_by_cond <- lapply(kuehne_traits$Condition %>% unique, function(cond){
  df <- kuehne_traits %>% 
    dplyr::filter(Condition == cond) %>%
    dplyr::select(Slide, Exp)
  apply(df, 1, paste, collapse = "_")
}) %>% setNames(kuehne_traits$Condition %>% unique)

expr_by_cond <- lapply(samples_by_cond %>% names, function(cond){
  samples <- samples_by_cond[[cond]]
  kuehne_expr_filtered[which(rownames(kuehne_expr_filtered) %in% samples),]
}) %>% setNames(samples_by_cond %>% names)


# Expression by condition with SummarizedExperiment
se_expr_by_cond <- lapply(unique(se_kuehne$Condition), function(cond){
     se_kuehne[, se_kuehne$Condition == cond]
}) %>% setNames(unique(se_kuehne$Condition))


# Network building and modules detection by condition
net_by_cond <- lapply(expr_by_cond, build_net, cor_func = "spearman", 
                      n_threads = threads_to_use, keep_matrices = "both")

mod_by_cond <- mapply(detect_modules, expr_by_cond, 
                      lapply(net_by_cond, `[[`, "network"), 
                      MoreArgs = list(detailled_result = TRUE), 
                      SIMPLIFY = FALSE)


comparison <- compare_conditions(expr_by_cond, 
                                 lapply(net_by_cond, `[[`, "adja_mat"), 
                                 lapply(net_by_cond, `[[`, "cor_mat"),  
                                 lapply(mod_by_cond, `[[`, "modules"), 
                                 pvalue_th = 0.05)

The final object contains a table summarizing the comparison of the modules, directly available with the comparison$result$young$old$comparison command. The comparison take into account the permutation test result and the z summary.

(#tab:condition_comparison)Modules preservation with young as reference
comparison
preserved
preserved
inconclusive
preserved
moderately preserved
preserved

The detail of the pvalues can also be seen as a heatmap. Since all evaluation metrics of compare_conditions need to be significant to consider a module preserved/unpreserved/one of them, it could be interesting to see which metrics prevented a module to be significant.

plot_comparison_stats(comparison$result$young$old$p.values)

3 Frequently asked questions

1. How can I reduce my transcriptomic data to the gene level?

Microarray probes are not reduced to gene level the same way RNA-seq transcripts are. But in both cases, the optimal collapsing strategy depends on the analysis goal, here co-expression network analysis. * For microarray, the highest mean expression is the most robust regarding the expression correlation. You can use the collapseRows R function available in the WGCNA package which also allow to use other methods like median. * For RNA-seq, it is recommended to sum the transcripts counts for a gene

2. What is an eigengene?

A module’s eigengene is a gene (real or estimated) whose expression profile summarizes the profile of expression of the whole module. In WGCNA, it is the first component of an SVD performed on the module’s expression matrix.

3. What should I do if I get warning/error “No fitting power could be found for provided fit_cut_off” ?

You should first verify your data. This implies : * Your data are RNA-seq or microarray data * You have gene names as columns and samples as row or if you use get_fit.cor you had it when you computed your correlation matrix on it * You didn’t filtered your data in a way that breaks the scale-free property. Classic wrong filter is usign only differentially expressed genes (see question 2. from WGCNA package FAQ) If you verified these causes, you may have set a fit_cut_off too high (default is 0.9 default).

4. Why do the first modules have so many genes as the last ones have very few?

5. Why GWENA doesn’t provide normalization methods ?

GWENA is design to support both RNA-seq and microarray data. However each of these technologies have its own normalization methods, partly because of the distribution of the expression (discrete against continuous). Also microarrays normalization steps aim to remove noise like mRNA quality variability, batch effect, background effect, etc. While RNA-seq normalization steps aim to account for differences of gene length, sequencing depths, GC content, etc. but can also take into account classic noise like batch effect.

Some of this normalization methods require metadata and specific constructor package in the case of microarrays. To avoid over-complexification of the input for this pipeline and since the experimenters are in the best position to know the best normalisations to apply, we prefer to ask for already normalized methods.

6. Why forcing expression datasets to have no NA values?

As you may know in R, missing values in cor and cov function by default propagate missing values in each column and row where there are found. Multiple options are available in these function to manage missing values. However some of them are not available for all type of correlations available, and not all imputations methods are wise to use (take a look at this article: Pairwise-complete correlation considered dangerous). Since WGCNA running requires no missing values, I prefere forcing to have a complete dataset. You have therefore the full understanding of the imputation you compute for your missing values. If you have no idea how to do it, see Dealing with missing data: Key assumptions and methods for applied analysis for general information about missing values imputation, and Dealing with missing values in large-scale studies: microarray data imputation and beyond for more transcriptomic-specific imputation.

7. Why did you wrapped multiple functions of WGCNA like pickSoftThreshold or adjacency ? Wasn’t they already working?

Short answer is “Yes they were”. However, the parameters of these functions and their syntax didn’t eased tehir use. Moreover, the current succession of functions to built modules repeated multiple times the correlation computation which takes quite some time. Also, I took the opportunity to integrate native sperman correlation.

8. Why didn’t you use S3 class to create objects usable by your functions ?

GWENA architecture is designed to be modular, meaning each step method is easily exchangeable for another method (i.e. changing the detection step which is a hierarchical cultering to a kmeans) as long as the input and output format are the same. Also using native data types ensure the ease of compatibility with other tools (i.e. using cytoscape for the visualization step instead of the GWENA’s one).

9. Why the arg network_type = "signed" implies a modification of the similarity score even though it is already signed?

Since a correlation matrix is already signed, one could as what is this similarity <- (1 + cor_mat) / 2 operation. It is simply because ulterior steps of estimating a scale-free index in WGCNA implies a log10 transformation. Therefore you can’t have negative numbers. Because a correlation matrix have values in [0;1], the operation will keep the distribution and avoid negative values.

10. Why plot_expression_profiles() computes a PCA for eigengenes instead of using eigengenes given by detect_modules()?

Eigenenes provided by detect_modules are from a SVD, and PCA is equivalent to performing the SVD on the centered data (see ‘Running PCA and SVD in R’). Because expression profiles are using centered values (for clarity of representation), the PCA to represent eigengenes is more accurate.


If you have a question or a misunderstanding, send an email to lemoine.gwenaelle[@t)gmail{d0t]com