SimFFPE-package {SimFFPE}R Documentation

NGS Read Simulator for FFPE Tissue

Description

The NGS (Next-Generation Sequencing) reads from FFPE (Formalin-Fixed Paraffin-Embedded) samples contain numerous artifact chimeric reads (ACRS), which can lead to false positive structural variant calls. These ACRs are derived from the combination of two single-stranded DNA (ss-DNA) fragments with short reverse complementary regions (SRCRs). This package simulates these artifact chimeric reads as well as normal reads for FFPE samples on the whole genome / several chromosomes / large regions.

Details

Package: SimFFPE
Type: Package
Title: NGS Read Simulator for FFPE Tissue
Version: 1.4.0
Authors@R: person("Lanying", "Wei", email="lanying.wei@uni-muenster.de", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-4281-8017"))
Description: The NGS (Next-Generation Sequencing) reads from FFPE (Formalin-Fixed Paraffin-Embedded) samples contain numerous artifact chimeric reads (ACRS), which can lead to false positive structural variant calls. These ACRs are derived from the combination of two single-stranded DNA (ss-DNA) fragments with short reverse complementary regions (SRCRs). This package simulates these artifact chimeric reads as well as normal reads for FFPE samples on the whole genome / several chromosomes / large regions.
License: LGPL-3
Encoding: UTF-8
Depends: Biostrings
Imports: dplyr, foreach, doParallel, truncnorm, GenomicRanges, IRanges, Rsamtools, parallel, graphics, stats, utils, methods
Suggests: BiocStyle
biocViews: Sequencing, Alignment, MultipleComparison, SequenceMatching, DataImport
git_url: https://git.bioconductor.org/packages/SimFFPE
git_branch: RELEASE_3_13
git_last_commit: 13e0b8e
git_last_commit_date: 2021-05-19
Date/Publication: 2021-05-19
Author: Lanying Wei [aut, cre] (<https://orcid.org/0000-0002-4281-8017>)
Maintainer: Lanying Wei <lanying.wei@uni-muenster.de>

The NGS (Next-Generation Sequencing) reads from FFPE (Formalin-Fixed Paraffin-Embedded) samples contain numerous artifact chimeric reads (ACRs), which can lead to false positive structural variant calls. These ACRs are derived from the combination of two single-stranded DNA (ss-DNA) fragments with short reverse complementary regions (SRCR). This package simulates these artifact chimeric reads as well as normal reads for FFPE samples. To simplify the simulation, the genome is divided into small windows, and SRCRs are found within the same window (adjacent ss-DNA combination) or between different windows (distant ss-DNA simulation). For adjacent ss-DNA combination events, the original genomic distance between and strands of two combined SRCRs are also simulated based on real data. The simulation can cover whole genome, or several chromosomes, or large regions, or whole exome, or targeted regions. It also supports enzymatic / random fragmentation and paired-end / single-end sequencing simulations. Fine-tuning can be achieved by adjusting the parameters, and multi-threading is surported. Please check the package vignette for the guidance of fine-tuning Index of help topics:

SimFFPE-package         NGS Read Simulator for FFPE Tissue
calcPhredScoreProfile   Estimate Phred score profile for FFPE read
                        simulation
readSimFFPE             Simulate normal and artifact chimeric reads in
                        NGS data of FFPE samples for whole genome /
                        several chromosomes / large regions
targetReadSimFFPE       Simulate normal and artifact chimeric reads in
                        NGS data of FFPE samples for exonic / targeted
                        regions

.

There are three available functions for NGS read simulation of FFPE samples:

1. calcPhredScoreProfile: Calculate positional Phred score profile from BAM file for read simulation.

2. readSimFFPE: Simulate artifact chimeric reads on whole genome, or several chromosomes, or large regions.

3. targetReadSimFFPE: Simulate artifact chimeric reads in exonic / targeted regions.

Author(s)

NA

Maintainer: NA

See Also

calcPhredScoreProfile, readSimFFPE, targetReadSimFFPE

Examples


PhredScoreProfilePath <- system.file("extdata", "PhredScoreProfile2.txt",
                                     package = "SimFFPE")
PhredScoreProfile <- as.matrix(read.table(PhredScoreProfilePath, skip = 1))
colnames(PhredScoreProfile)  <- 
    strsplit(readLines(PhredScoreProfilePath)[1], "\t")[[1]]

referencePath <- system.file("extdata", "example.fasta", package = "SimFFPE")
reference <- readDNAStringSet(referencePath)

## Simulate reads of the first three sequences of the reference genome

sourceSeq <- reference[1:3]
outFile1 <- paste0(tempdir(), "/sim1")
readSimFFPE(sourceSeq, referencePath, PhredScoreProfile, outFile1, 
            coverage = 80, enzymeCut = TRUE, threads = 2)

## Simulate reads for targeted regions

bamFilePath <- system.file("extdata", "example.bam", package = "SimFFPE")
regionPath <- system.file("extdata", "regionsBam.txt", package = "SimFFPE")
regions <- read.table(regionPath)
PhredScoreProfile <- calcPhredScoreProfile(bamFilePath, targetRegions = regions)

regionPath <- system.file("extdata", "regionsSim.txt", package = "SimFFPE")
targetRegions <- read.table(regionPath)

outFile <- paste0(tempdir(), "/sim2")
targetReadSimFFPE(referencePath, PhredScoreProfile, targetRegions, outFile,
                  coverage = 80, readLen = 100, meanInsertLen = 180, 
                  sdInsertLen = 50, enzymeCut = FALSE)

[Package SimFFPE version 1.4.0 Index]