Contents

1 Introduction

The GSgalgoR package provides a practical but straightforward callback mechanism for adapting different galgo() execution sections to final user needs. The GSgalgoR callbacks mechanism enables adding custom functions to change the galgo() function behavior by including minor modification to galgo’s workflow. A common application of the callback mechanism is to implement personalized reports, saving partial information during the evolution process or compute the execution time.

There are five possible points where the user can hook its own code inside galgo() execution process.

Each one of the five possible hooks can be accessed through parameters with the *_callback* suffix in the galgo() function.

galgo(...,
    start_galgo_callback = callback_default,# `galgo()` is about to start.
    end_galgo_callback = callback_default,  # `galgo()` is about to finish.
    start_gen_callback = callback_default, # At the beginning of each generation
    end_gen_callback = callback_default,    # At the end of each generation
    report_callback = callback_default,     # In the middle of the generation,
                                            #  right after the new mating pool 
                                            #  have been created.
    ...) 

2 Example 1: A simple custom callback function definition

A callback function definition can be any R function accepting six parameters.

-userdir: the directory (“character”) where the user can save information into local filesystem. -generation: the number (“integer”) of the current generation/iteration. -pop_pool: the data.frame containing the resulting solutions for current iteration. -pareto: the solutions found by galgo() accross all generations in the solution space -prob_matrix: the expression set (“matrix) where features are rows and samples distributed in columns. -current_time: The current time (an object of class”POSIXct").

The following callback function example prints the generation number and current time every two iterations

library(GSgalgoR)


my_callback <-
    function(userdir = "",
        generation,
            pop_pool,
            pareto,
            prob_matrix,
            current_time) {
    # code starts  here
    if (generation%%2 == 0)
        message(paste0("generation: ",generation,
                    " current_time: ",current_time))
    }

then, the my_callback() function needs to be assigned to some of the available hooks provided by the galgo(). An example of such assignment and the resulting output is provided in the two snippets below.

A reduced version of the TRANSBIG dataset is used to setup the expression and clinical information required for the galgo() function.

library(breastCancerTRANSBIG)
data(transbig)
train <- transbig
rm(transbig)
expression <- Biobase::exprs(train)
clinical <- Biobase::pData(train)
OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs)
# use a reduced dataset for the example
expression <- expression[sample(1:nrow(expression), 100), ]
# scale the expression matrix
expression <- t(scale(t(expression)))

Then, the galgo() function is invoked and the recently defined function my_callback() is assigned to the report_callback hook-point.

library(GSgalgoR)
# Running galgo
GSgalgoR::galgo(generations = 6, 
            population = 15, 
            prob_matrix = expression, 
            OS = OS,
    start_galgo_callback = GSgalgoR::callback_default, 
    end_galgo_callback = GSgalgoR::callback_default,
    report_callback = my_callback,      # call `my_callback()` in the mile 
                                        # of each generation/iteration.
    start_gen_callback = GSgalgoR::callback_default,
    end_gen_callback = GSgalgoR::callback_default) 
#> Using CPU for computing pearson distance
#> generation: 2 current_time: 2021-05-23 17:24:27
#> generation: 4 current_time: 2021-05-23 17:24:28
#> generation: 6 current_time: 2021-05-23 17:24:30
#> NULL

3 Example 2: Saving partial population pool using custom callback function

The following callback function save in a temporary directory the solutions obtained every five generation/iteration. A file the number of the generation and with a rda. extension will be left in a directory defined by the tempdir() function.

my_save_pop_callback <-
    function(userdir = "",
            generation,
            pop_pool,
            pareto,
            prob_matrix,
            current_time) {
        directory <- paste0(tempdir(), "/")
        if (!dir.exists(directory)) {
            dir.create(directory, recursive = TRUE)
        }
        filename <- paste0(directory, generation, ".rda")
        if (generation%%2 == 0){
            save(file = filename, pop_pool)
        }
        message(paste("solution file saved in",filename))
    }

As usual, the galgo() function is invoked and the recently defined function my_save_pop_callback() is assigned to the end_gen_callback hook-point. As a result, every five generation/iteration the complete solution obtained by galgo will be saved in a file.

# Running galgo
GSgalgoR::galgo(
    generations = 6, 
    population = 15, 
    prob_matrix = expression, 
    OS = OS,
    start_galgo_callback = GSgalgoR::callback_default, 
    end_galgo_callback = GSgalgoR::callback_default,   
    report_callback = my_callback,# call `my_callback()` 
                                #  in the middle of each generation/iteration.
    start_gen_callback = GSgalgoR::callback_default,
    end_gen_callback = my_save_pop_callback # call `my_save_pop_callback()` 
                                            # at the end of each 
                                            #   generation/iteration
    ) 
#> Using CPU for computing pearson distance
#> solution file saved in /tmp/RtmpEPzWtn/1.rda
#> generation: 2 current_time: 2021-05-23 17:24:37
#> solution file saved in /tmp/RtmpEPzWtn/2.rda
#> solution file saved in /tmp/RtmpEPzWtn/3.rda
#> generation: 4 current_time: 2021-05-23 17:24:39
#> solution file saved in /tmp/RtmpEPzWtn/4.rda
#> solution file saved in /tmp/RtmpEPzWtn/5.rda
#> generation: 6 current_time: 2021-05-23 17:24:40
#> solution file saved in /tmp/RtmpEPzWtn/6.rda
#> NULL

4 Callbacks implemented in GSgalgoR

By default, GSfalgoR implements four callback functions

callback_default() a simple callback that does nothing at all. It is just used for setting the default behavior of some of the hook-points inside galgo() callback_base_report() a report callback for printing basic information about the solution provided by galgo() such as fitness and crowding distance. callback_no_report() a report callback for informing the user galgo is running. Not valuable information is shown. callback_base_return_pop() a callback function for building and returning t he galgo.Obj object.

In the the default definition of the galgo() function the hook-points are defined as follow:

-start_galgo_callback = callback_default

-end_galgo_callback = callback_base_return_pop

-report_callback = callback_base_report

-start_gen_callback = callback_default

-end_gen_callback = callback_default

Notice by using the callback mechanism it is possible to modify even the returning value of the galgo() function. The default callback_base_return_pop() returns a galgo.Obj object, however it would simple to change that behavior for something like the my_save_pop_callback() and the function will not returning any value.

# Running galgo
GSgalgoR::galgo(
    generations = 6, 
    population = 15, 
    prob_matrix = expression, 
    OS = OS,
    start_galgo_callback = GSgalgoR::callback_default, 
    end_galgo_callback = my_save_pop_callback,
    report_callback = my_callback,  # call `my_callback()` 
                                    # in the middle of each generation/iteration
    start_gen_callback = GSgalgoR::callback_default,
    end_gen_callback = GSgalgoR::callback_default
    ) 
#> Using CPU for computing pearson distance
#> generation: 2 current_time: 2021-05-23 17:24:47
#> generation: 4 current_time: 2021-05-23 17:24:49
#> generation: 6 current_time: 2021-05-23 17:24:51
#> solution file saved in /tmp/RtmpEPzWtn/6.rda

For preserving the return behavior of the galgo() function,
callback_base_return_pop() should be called inside a custom callback. An example of such situation is shown below:


another_callback <-
    function(userdir = "",
            generation,
            pop_pool,
            pareto,
            prob_matrix,
            current_time) {
    # code starts  here

    # code ends here  
    callback_base_return_pop(userdir,
                            generation,
                            pop_pool,
                            prob_matrix,
                            current_time)
    }

5 Session info

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] survminer_0.4.9             ggpubr_0.4.0               
#>  [3] ggplot2_3.3.3               genefu_2.24.2              
#>  [5] AIMS_1.24.0                 e1071_1.7-7                
#>  [7] iC10_1.5                    iC10TrainingData_1.3.1     
#>  [9] impute_1.66.0               pamr_1.56.1                
#> [11] cluster_2.1.2               biomaRt_2.48.0             
#> [13] survcomp_1.42.0             prodlim_2019.11.13         
#> [15] survival_3.2-11             Biobase_2.52.0             
#> [17] BiocGenerics_0.38.0         GSgalgoR_1.2.1             
#> [19] breastCancerUPP_1.30.0      breastCancerTRANSBIG_1.30.0
#> [21] BiocStyle_2.20.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] readxl_1.3.1           backports_1.2.1        BiocFileCache_2.0.0   
#>   [4] splines_4.1.0          GenomeInfoDb_1.28.0    digest_0.6.27         
#>   [7] SuppDists_1.1-9.5      foreach_1.5.1          htmltools_0.5.1.1     
#>  [10] magick_2.7.2           fansi_0.4.2            magrittr_2.0.1        
#>  [13] memoise_2.0.0          doParallel_1.0.16      openxlsx_4.2.3        
#>  [16] limma_3.48.0           Biostrings_2.60.0      prettyunits_1.1.1     
#>  [19] colorspace_2.0-1       blob_1.2.1             rappdirs_0.3.3        
#>  [22] haven_2.4.1            xfun_0.23              dplyr_1.0.6           
#>  [25] crayon_1.4.1           RCurl_1.98-1.3         jsonlite_1.7.2        
#>  [28] zoo_1.8-9              iterators_1.0.13       glue_1.4.2            
#>  [31] gtable_0.3.0           zlibbioc_1.38.0        XVector_0.32.0        
#>  [34] car_3.0-10             abind_1.4-5            scales_1.1.1          
#>  [37] DBI_1.1.1              rstatix_0.7.0          Rcpp_1.0.6            
#>  [40] gridtext_0.1.4         xtable_1.8-4           progress_1.2.2        
#>  [43] foreign_0.8-81         bit_4.0.4              proxy_0.4-25          
#>  [46] mclust_5.4.7           km.ci_0.5-2            stats4_4.1.0          
#>  [49] lava_1.6.9             httr_1.4.2             ellipsis_0.3.2        
#>  [52] pkgconfig_2.0.3        XML_3.99-0.6           farver_2.1.0          
#>  [55] sass_0.4.0             dbplyr_2.1.1           utf8_1.2.1            
#>  [58] tidyselect_1.1.1       labeling_0.4.2         rlang_0.4.11          
#>  [61] AnnotationDbi_1.54.0   munsell_0.5.0          cellranger_1.1.0      
#>  [64] tools_4.1.0            cachem_1.0.5           cli_2.5.0             
#>  [67] generics_0.1.0         RSQLite_2.2.7          broom_0.7.6           
#>  [70] evaluate_0.14          stringr_1.4.0          fastmap_1.1.0         
#>  [73] yaml_2.2.1             bootstrap_2019.6       knitr_1.33            
#>  [76] bit64_4.0.5            zip_2.1.1              survMisc_0.5.5        
#>  [79] purrr_0.3.4            KEGGREST_1.32.0        xml2_1.3.2            
#>  [82] rstudioapi_0.13        compiler_4.1.0         filelock_1.0.2        
#>  [85] curl_4.3.1             png_0.1-7              ggsignif_0.6.1        
#>  [88] tibble_3.1.2           bslib_0.2.5.1          stringi_1.6.2         
#>  [91] highr_0.9              forcats_0.5.1          lattice_0.20-44       
#>  [94] Matrix_1.3-3           markdown_1.1           survivalROC_1.0.3     
#>  [97] KMsurv_0.1-5           vctrs_0.3.8            pillar_1.6.1          
#> [100] lifecycle_1.0.0        BiocManager_1.30.15    jquerylib_0.1.4       
#> [103] data.table_1.14.0      bitops_1.0-7           R6_2.5.0              
#> [106] bookdown_0.22          KernSmooth_2.23-20     gridExtra_2.3         
#> [109] rio_0.5.26             IRanges_2.26.0         codetools_0.2-18      
#> [112] assertthat_0.2.1       withr_2.4.2            S4Vectors_0.30.0      
#> [115] GenomeInfoDbData_1.2.6 ggtext_0.1.1           hms_1.1.0             
#> [118] grid_4.1.0             nsga2R_1.0             tidyr_1.1.3           
#> [121] class_7.3-19           rmarkdown_2.8          carData_3.0-4         
#> [124] rmeta_3.0