1 Temporal data: Simulate an example data set

2 MEFISTO framework

2.1 Create a MOFA object with covariates
2.2 Prepare a MOFA object
2.3 Run MOFA
2.4 Down-stream analysis

2.4.1 Variance explained per factor
2.4.2 Relate factors to the covariate
2.4.3 Exploration of weights
2.4.4 Interpolation

library(MOFA2)
library(tidyverse)
library(pheatmap)

1 Temporal data: Simulate an example data set

To illustrate the MEFISTO method in MOFA2 we simulate a small example data set with 4 different views and one covariates defining a timeline using make_example_data. The simulation is based on 4 factors, two of which vary smoothly along the covariate (with different lengthscales) and two are independent of the covariate.

set.seed(2020)

# set number of samples and time points
N <- 200
time <- seq(0,1,length.out = N)

# generate example data
dd <- make_example_data(sample_cov = time, n_samples = N,
                        n_factors = 4, n_features = 200, n_views = 4,
                        lscales = c(0.5, 0.2, 0, 0))
# input data
data <- dd$data

# covariate matrix with samples in columns
time <- dd$sample_cov
rownames(time) <- "time"

Let’s have a look at the simulated latent temporal processes, which we want to recover:

df <- data.frame(dd$Z, t(time))
df <- gather(df, key = "factor", value = "value", starts_with("simulated_factor"))
ggplot(df, aes(x = time, y = value)) + geom_point() + facet_grid(~factor)

2 MEFISTO framework

Using the MEFISTO framework is very similar to using MOFA2. In addition to the omics data, however, we now additionally specify the time points for each sample. If you are not familiar with the MOFA2 framework, it might be helpful to have a look at MOFA2 tutorials first.

2.1 Create a MOFA object with covariates

To create the MOFA object we need to specify the training data and the covariates for pattern detection and inference of smooth factors. Here, sample_cov is a matrix with samples in columns and one row containing the time points. The sample order must match the order in data columns. Alternatively, a data frame can be provided containing one sample columns with samples names matching the sample names in the data.

First, we start by creating a standard MOFA model.

sm <- create_mofa(data = dd$data)

## Creating MOFA object from a list of matrices (features as rows, sample as columns)...

Now, we can add the additional temporal covariate, that we want to use for training.

sm <- set_covariates(sm, covariates = time)
sm

## Untrained MEFISTO model with the following characteristics: 
##  Number of views: 4 
##  Views names: view_1 view_2 view_3 view_4 
##  Number of features (per view): 200 200 200 200 
##  Number of groups: 1 
##  Groups names: group1 
##  Number of samples (per group): 200 
##  Number of covariates per sample: 1 
##

We now successfully created a MOFA object that contains 4 views, 1 group and 1 covariate giving the time point for each sample.

2.2 Prepare a MOFA object

Before training, we can specify various options for the model, the training and the data preprocessing. If no options are specified, the model will use the default options. See also get_default_data_options, get_default_model_options and get_default_training_options to have a look at the defaults and change them where required. For illustration, we only use a small number of iterations.

Importantly, to activate the use of the covariate for a functional decomposition (MEFISTO) we now additionally to the standard MOFA options need to specify mefisto_options. For this you can just use the default options (get_default_mefisto_options), unless you want to make use of advanced options such as alignment across groups.

data_opts <- get_default_data_options(sm)

model_opts <- get_default_model_options(sm)
model_opts$num_factors <- 4

train_opts <- get_default_training_options(sm)
train_opts$maxiter <- 100

mefisto_opts <- get_default_mefisto_options(sm)

sm <- prepare_mofa(sm, model_options = model_opts,
                   mefisto_options = mefisto_opts,
                   training_options = train_opts,
                   data_options = data_opts)

2.3 Run MOFA

Now, the MOFA object is ready for training. Using run_mofa we can fit the model, which is saved in the file specified as outfile. If none is specified the output is saved in a temporary location.

outfile = file.path(tempdir(),"model.hdf5")
sm <- run_mofa(sm, outfile, use_basilisk = TRUE)

## 
##         #########################################################
##         ###           __  __  ____  ______                    ### 
##         ###          |  \/  |/ __ \|  ____/\    _             ### 
##         ###          | \  / | |  | | |__ /  \ _| |_           ### 
##         ###          | |\/| | |  | |  __/ /\ \_   _|          ###
##         ###          | |  | | |__| | | / ____ \|_|            ###
##         ###          |_|  |_|\____/|_|/_/    \_\              ###
##         ###                                                   ### 
##         ######################################################### 
##        
##  
##         
## use_float32 set to True: replacing float64 arrays by float32 arrays to speed up computations...
## 
## Successfully loaded view='view_1' group='group1' with N=200 samples and D=200 features...
## Successfully loaded view='view_2' group='group1' with N=200 samples and D=200 features...
## Successfully loaded view='view_3' group='group1' with N=200 samples and D=200 features...
## Successfully loaded view='view_4' group='group1' with N=200 samples and D=200 features...
## 
## 
## Loaded 1 covariate(s) for each sample...
## 
## 
## Model options:
## - Automatic Relevance Determination prior on the factors: False
## - Automatic Relevance Determination prior on the weights: True
## - Spike-and-slab prior on the factors: False
## - Spike-and-slab prior on the weights: False
## Likelihoods:
## - View 0 (view_1): gaussian
## - View 1 (view_2): gaussian
## - View 2 (view_3): gaussian
## - View 3 (view_4): gaussian
## 
## 
## 
## 
## ######################################
## ## Training the model with seed 42 ##
## ######################################
## 
## 
## ELBO before training: -668156.67 
## 
## Iteration 1: time=0.07, ELBO=-112628.52, deltaELBO=555528.150 (83.14339606%), Factors=4
## Iteration 2: time=0.05, Factors=4
## Iteration 3: time=0.05, Factors=4
## Iteration 4: time=0.05, Factors=4
## Iteration 5: time=0.05, Factors=4
## Iteration 6: time=0.06, ELBO=-60038.13, deltaELBO=52590.398 (7.87096802%), Factors=4
## Iteration 7: time=0.05, Factors=4
## Iteration 8: time=0.05, Factors=4
## Iteration 9: time=0.05, Factors=4
## Iteration 10: time=0.05, Factors=4
## Iteration 11: time=0.06, ELBO=-59874.46, deltaELBO=163.670 (0.02449573%), Factors=4
## Iteration 12: time=0.05, Factors=4
## Iteration 13: time=0.05, Factors=4
## Iteration 14: time=0.05, Factors=4
## Iteration 15: time=0.05, Factors=4
## Iteration 16: time=0.06, ELBO=-59706.33, deltaELBO=168.127 (0.02516275%), Factors=4
## Iteration 17: time=0.05, Factors=4
## Iteration 18: time=0.05, Factors=4
## Iteration 19: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 20: time=5.03, Factors=4
## Iteration 21: time=0.06, ELBO=-58693.09, deltaELBO=1013.239 (0.15164696%), Factors=4
## Iteration 22: time=0.05, Factors=4
## Iteration 23: time=0.05, Factors=4
## Iteration 24: time=0.05, Factors=4
## Iteration 25: time=0.05, Factors=4
## Iteration 26: time=0.06, ELBO=-58472.73, deltaELBO=220.363 (0.03298078%), Factors=4
## Iteration 27: time=0.05, Factors=4
## Iteration 28: time=0.05, Factors=4
## Iteration 29: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 30: time=4.01, Factors=4
## Iteration 31: time=0.06, ELBO=-58287.07, deltaELBO=185.655 (0.02778614%), Factors=4
## Iteration 32: time=0.05, Factors=4
## Iteration 33: time=0.05, Factors=4
## Iteration 34: time=0.05, Factors=4
## Iteration 35: time=0.05, Factors=4
## Iteration 36: time=0.06, ELBO=-58181.29, deltaELBO=105.779 (0.01583143%), Factors=4
## Iteration 37: time=0.05, Factors=4
## Iteration 38: time=0.05, Factors=4
## Iteration 39: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 40: time=3.75, Factors=4
## Iteration 41: time=0.06, ELBO=-58026.89, deltaELBO=154.399 (0.02310820%), Factors=4
## Iteration 42: time=0.05, Factors=4
## Iteration 43: time=0.05, Factors=4
## Iteration 44: time=0.05, Factors=4
## Iteration 45: time=0.05, Factors=4
## Iteration 46: time=0.06, ELBO=-57938.97, deltaELBO=87.921 (0.01315869%), Factors=4
## Iteration 47: time=0.05, Factors=4
## Iteration 48: time=0.05, Factors=4
## Iteration 49: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 50: time=4.02, Factors=4
## Iteration 51: time=0.06, ELBO=-57831.57, deltaELBO=107.408 (0.01607527%), Factors=4
## Iteration 52: time=0.05, Factors=4
## Iteration 53: time=0.05, Factors=4
## Iteration 54: time=0.05, Factors=4
## Iteration 55: time=0.05, Factors=4
## Iteration 56: time=0.06, ELBO=-57760.44, deltaELBO=71.127 (0.01064523%), Factors=4
## Iteration 57: time=0.05, Factors=4
## Iteration 58: time=0.05, Factors=4
## Iteration 59: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 60: time=3.36, Factors=4
## Iteration 61: time=0.06, ELBO=-57683.56, deltaELBO=76.879 (0.01150616%), Factors=4
## Iteration 62: time=0.05, Factors=4
## Iteration 63: time=0.05, Factors=4
## Iteration 64: time=0.05, Factors=4
## Iteration 65: time=0.05, Factors=4
## Iteration 66: time=0.06, ELBO=-57646.16, deltaELBO=37.400 (0.00559749%), Factors=4
## Iteration 67: time=0.05, Factors=4
## Iteration 68: time=0.05, Factors=4
## Iteration 69: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 70: time=3.59, Factors=4
## Iteration 71: time=0.06, ELBO=-57618.65, deltaELBO=27.506 (0.00411674%), Factors=4
## Iteration 72: time=0.05, Factors=4
## Iteration 73: time=0.05, Factors=4
## Iteration 74: time=0.05, Factors=4
## Iteration 75: time=0.05, Factors=4
## Iteration 76: time=0.06, ELBO=-57601.36, deltaELBO=17.292 (0.00258806%), Factors=4
## Iteration 77: time=0.05, Factors=4
## Iteration 78: time=0.05, Factors=4
## Iteration 79: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 80: time=4.46, Factors=4
## Iteration 81: time=0.06, ELBO=-57581.23, deltaELBO=20.130 (0.00301271%), Factors=4
## Iteration 82: time=0.05, Factors=4
## Iteration 83: time=0.05, Factors=4
## Iteration 84: time=0.05, Factors=4
## Iteration 85: time=0.05, Factors=4
## Iteration 86: time=0.06, ELBO=-57565.76, deltaELBO=15.472 (0.00231561%), Factors=4
## Iteration 87: time=0.05, Factors=4
## Iteration 88: time=0.05, Factors=4
## Iteration 89: time=0.05, Factors=4
## Optimising sigma node...
## Iteration 90: time=4.69, Factors=4
## Iteration 91: time=0.06, ELBO=-57548.34, deltaELBO=17.420 (0.00260714%), Factors=4
## Iteration 92: time=0.05, Factors=4
## Iteration 93: time=0.05, Factors=4
## Iteration 94: time=0.05, Factors=4
## Iteration 95: time=0.05, Factors=4
## Iteration 96: time=0.06, ELBO=-57534.17, deltaELBO=14.170 (0.00212078%), Factors=4
## Iteration 97: time=0.05, Factors=4
## Iteration 98: time=0.05, Factors=4
## Iteration 99: time=0.05, Factors=4
## 
## 
## #######################
## ## Training finished ##
## #######################
## 
## 
## Saving model in /home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpsXf5Mb/model.hdf5...

2.4 Down-stream analysis

2.4.1 Variance explained per factor

Using plot_variance_explained we can explore which factor is active in which view. plot_factor_cor shows us whether the factors are correlated.

plot_variance_explained(sm)

r <- plot_factor_cor(sm)

2.4.2 Relate factors to the covariate

The MOFA model has learnt scale parameters for each factor, which give us an indication of the smoothness per factor along the covariate (here time) and are between 0 and 1. A scale of 0 means that the factor captures variation independent of time, a value close to 1 tells us that this factor varys very smoothly along time.

get_scales(sm)

##    Factor1    Factor2    Factor3    Factor4 
## 0.00000000 0.01082415 0.99875934 0.99602045

In this example, we find two factors that are non-smooth and two smooth factors. Using plot_factors_vs_cov we can plot the factors along the time line, where we can distinguish smooth and non smooth variation along time.

plot_factors_vs_cov(sm, color_by = "time")

For more customized plots, we can extract the underlying data containing the factor and covariate values for each sample.

df <- plot_factors_vs_cov(sm, color_by = "time",
                    legend = FALSE, return_data = TRUE)
head(df)

##      sample covariate value.covariate  factor value.factor  group   color_by
## 1  sample_1      time      0.00000000 Factor1   -4.7768631 group1 0.00000000
## 2  sample_1      time      0.00000000 Factor2    0.5450538 group1 0.00000000
## 3  sample_1      time      0.00000000 Factor4   -1.4026462 group1 0.00000000
## 4  sample_1      time      0.00000000 Factor3   -2.5425448 group1 0.00000000
## 5 sample_10      time      0.04522613 Factor4   -1.4980541 group1 0.04522613
## 6 sample_10      time      0.04522613 Factor2    0.1091705 group1 0.04522613
##   shape_by
## 1        1
## 2        1
## 3        1
## 4        1
## 5        1
## 6        1

We can compare the above plots to the factors that were simulated above and find that the model recaptured the two smooth as well as two non-smooth patterns in time. Note that factors are invariant to the sign, e.g. Factor 4 is the negative of the simulated factor but we can simply multiply the factors and its weights by -1 to obtain exactly the simulated factor.

2.4.3 Exploration of weights

As with standard MOFA, we can now look deeper into the meaning of these factors by exploring the weights or performing feature set enrichment analysis.

plot_weights(sm, factors = 4, view = 1)

plot_top_weights(sm, factors = 3, view = 2)

In addition, we can take a look at the top feature values per factor along time and see that their patterns are in line with the pattern of the corresponding Factor (here Factor 3).

plot_data_vs_cov(sm, factor=3,
                         features = 2,
                         color_by = "time",
                         dot_size = 1)

2.4.4 Interpolation

Furthermore, we can interpolate or extrapolate a factor to new values. Here, we only show the mean of the prediction, to obtain uncertainties you need to specify the new values before training in get_default_mefisto_options(sm)$new_values.

sm <- interpolate_factors(sm, new_values = seq(0,1.1,0.01))
plot_interpolation_vs_covariate(sm, covariate = "time",
                                factors = "Factor3")

Session Info

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] pheatmap_1.0.12  lubridate_1.9.3  forcats_1.0.0    stringr_1.5.1   
##  [5] dplyr_1.1.4      purrr_1.0.2      readr_2.1.5      tidyr_1.3.1     
##  [9] tibble_3.2.1     ggplot2_3.5.1    tidyverse_2.0.0  MOFA2_1.16.0    
## [13] BiocStyle_2.34.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1      farver_2.1.2          filelock_1.0.3       
##  [4] fastmap_1.2.0         digest_0.6.37         timechange_0.3.0     
##  [7] lifecycle_1.0.4       magrittr_2.0.3        compiler_4.4.1       
## [10] rlang_1.1.4           sass_0.4.9            tools_4.4.1          
## [13] utf8_1.2.4            yaml_2.3.10           corrplot_0.95        
## [16] knitr_1.48            S4Arrays_1.6.0        labeling_0.4.3       
## [19] reticulate_1.39.0     DelayedArray_0.32.0   plyr_1.8.9           
## [22] RColorBrewer_1.1-3    abind_1.4-8           HDF5Array_1.34.0     
## [25] Rtsne_0.17            withr_3.0.2           BiocGenerics_0.52.0  
## [28] grid_4.4.1            stats4_4.4.1          fansi_1.0.6          
## [31] colorspace_2.1-1      Rhdf5lib_1.28.0       scales_1.3.0         
## [34] tinytex_0.53          cli_3.6.3             mvtnorm_1.3-1        
## [37] rmarkdown_2.28        crayon_1.5.3          generics_0.1.3       
## [40] reshape2_1.4.4        tzdb_0.4.0            cachem_1.1.0         
## [43] rhdf5_2.50.0          zlibbioc_1.52.0       parallel_4.4.1       
## [46] BiocManager_1.30.25   XVector_0.46.0        matrixStats_1.4.1    
## [49] basilisk_1.18.0       vctrs_0.6.5           Matrix_1.7-1         
## [52] jsonlite_1.8.9        dir.expiry_1.14.0     bookdown_0.41        
## [55] IRanges_2.40.0        hms_1.1.3             S4Vectors_0.44.0     
## [58] ggrepel_0.9.6         magick_2.8.5          jquerylib_0.1.4      
## [61] glue_1.8.0            cowplot_1.1.3         uwot_0.2.2           
## [64] stringi_1.8.4         gtable_0.3.6          munsell_0.5.1        
## [67] pillar_1.9.0          basilisk.utils_1.18.0 htmltools_0.5.8.1    
## [70] rhdf5filters_1.18.0   R6_2.5.1              evaluate_1.0.1       
## [73] lattice_0.22-6        highr_0.11            png_0.1-8            
## [76] bslib_0.8.0           Rcpp_1.0.13           SparseArray_1.6.0    
## [79] xfun_0.48             MatrixGenerics_1.18.0 pkgconfig_2.0.3

Illustration of MEFISTO on simulated data with a temporal covariate

2024-10-29