Sequencerr

Last updated: 4 years ago (view history), Time to read: 6 mins

Measuring and suppressing sequencer errors in next-generation sequencing data

Authors Eric M Davis, Yu Sun, Yanling Liu, Pandurang Kolekar, Ying Shao, Karol Szlachta, Heather L Mulder, Dongren Ren, Stephen V Rice, Zhaoming Wang, Joy Nakitandwe, Alex Gout, Bridget Shaner, Salina Hall, Leslie L Robison, Stanley Pounds, Jefferey Klco, John Easton, Xiaotu Ma*
Publication Davis, E.M. et al. SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data. Genome Biol 22(1):37 (2021). doi: 10.1186/s13059-020-02254-2
Technical Support Contact Us

Overview

There is currently no method to precisely measure the errors that occur in the sequencing instrument, which is critical for next generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. We propose a novel computational method, SequencErr, to address this challenge by measuring base concordance in overlapping region between forward and reverse reads. Analysis of 3,777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates >100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and >90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tiles. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and that removal of outlier error-prone tiles improves sequencing accuracy. Our study revealed novel insights into the nature DNA sequencing errors incurred in sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets

Inputs

Name Type Description Example
BAM file Input file Binary version of the SAM file format (*.bam) Sample.bam
BAM index file Input file Index file for the BAM file (*.bai) Sample.bam.bai

Outputs

Name Format Description
PairError file .txt Base concordance/discordance counts
Counts file .txt Base call frequencies for each genomic coordinate

Details of SequencErr Input Files and Parameters

  1. BAM file name [Required]

    A BAM file to process.

    • This app currently only supports DNA sequencing.
    • The bam file should be generated by “bwa aln”. It works on “bwa MEM” but may take a lot more resources.
    Notes on preparing the BAM file
    • Read names must have all the 7 fields as described below,
    • [instrument]:[run number]:[flowcell ID]:[lane]:[tile]:[x-pos]:[y-pos]
    • Example: A041:30:HHTYVDSXX:1:2242:28366:18897
    • BAM format details here
    • See this page for details of the fields in readname
  2. BAM index file name [Required]

    The BAM index for your BAM file

  3. sample id [Required]

    A unique name or identifier for the sample

  4. trimming length [optional]

    Number of bases to trim off the 5’ and 3’ of the read. Default: 5

  5. hard quality cutoff [optional]

    A hard threshold for discarding reads. If the fraction of bases with quality scores falling below this value exceeds fcut, the read will be filtered. Default: 20

  6. don’t report base counts [optional]

    Set this to prevent large count files from being generated Default: true

Running the Analysis

note

This tool is intended free-of-charge for non-profit usages. Please contact Dr. Xiaotu Ma for for-profit usages and modifications

Please refer to the following steps to learn how to launch the workflow, hook up input files, adjust parameters, run analysis and inspect output files.

Log in and Launch the SequencErr

SequencErr application can be accessed from https://platform.stjude.cloud/workflows/sequencerr

Sequencerr log in 1

Please refer to the following instructions.

Sequencerr log in 2

Sequencerr log in 3

Sequencerr launch 4

Choose Input BAM and Index Files

Users can upload and download data files with command line interactions as described here

Sequencerr BAM 5

Sequencerr BAM 6

Sequencerr BAM 7

Sequencerr BAM 8

Follow the similar steps to choose and select the corresponding BAM index file

Sequencerr BAM Index 9

Sequencerr BAM Index 10

Provide Input Parameters

Sequencerr Parameters 11

Sequencerr Parameters 12

Run the Analysis

Sequencerr Run Analysis 13

Sequencerr Run Status 14

Locate Output File(s) after Completion of the Analysis

Sequencerr Output 15

Interpreting results

  • *.pairError.txt file: A text file containing Instrument, Flowcell, Lane and Tile level base concordance/discordance counts

pairErrorOut

  • *.counts.txt file [optional] : A text file containing the base call frequencies and coverage for each genomic coordinate at specified base quality cut-off

testCountOut

Similar Topics

Running our Workflows
Working with our Data Overview
Upload/Download Data (local)