DNA Sequencing and Analysis Methods
General

What Are The Different DNA Sequencing And Analysis Methods?

We have come a long way since the discovery of the double-helix structure of DNA. We have explored nature’s blueprint that is the complete genetic code of a human being. We are now investigating the variations in the genetic code to understand how different types of mutations and modifications within the human genome can influence diversities in phenotype, disease progression, and treatment.

The DNA sequencing methods have evolved significantly since the complete amino acid sequencing of insulin by Frederick Sanger. The Sanger method was inspired by the initial efforts of R. Padmanabhan, Wu and colleagues. Their methods paved the way for primer-based extension and sequencing of DNA by chain-termination method under the vigilance of Sanger at the MRC Center, Cambridge in 1977. Irrespective of which DNA sequencing method researchers have adopted over the past decades, at least one working principle has remained constant – the process aims to determine the order of the nucleotides in the DNA of a given sample.

What Are The Classical DNA Sequencing Techniques?

Here’s a brief look at the different DNA sequencing methods that have defined the frontiers in genomics and metagenomics in the last four decades

The classical sequencing methods

1. Maxam-Gilbert Sequencing

1977 was a defining year for studies in genomics. Allan Maxam and Walter Gilbert are attributed for the publication of the first standardized DNA sequencing method that leveraged the chemical modification of DNA. The chemical modification of the bases resulted in cleavage at specific sites (nucleotides). It uses radioactive labeling at the 5′-end of the DNA molecule. By varying the concentration of the modifying agent, the teams can control the sites of cleavage, which generates a family of differently-sized DNA fragments. Visualization is easy – the gel is placed on an x-ray film that produces dark bands corresponding to radiolabeled DNA. The analysis is relatively simple, as long as the fragments are small and non-repetitive.

2. Sanger Sequencing

The chain termination method was also introduced in 1977 by Frederick Sanger et. al. It had been the most popular DNA sequencing technique until the standardization of other advanced methods. Sanger sequencing uses single-stranded DNA as the template, DNA primer, deoxynucleoside triphosphates (dNTPs), DNA polymerase, and di-deoxynucleoside triphosphates (ddNTPs) as chain terminators. The lack of radioactivity made the Sanger method more favorable than the Maxam-Gilbert method. After controlled DNA extension, there is a round of heat denaturation and separation of the strands using gel electrophoresis.

The Sanger method of DNA sequencing and analysis is straightforward and fast for short DNA sequences. Laboratories combine the standard DNA dye-terminator sequencing with high-throughput automated DNA sequence analyzers for the quick determination of DNA sequence.

What Are The Challenges Of Classical DNA Sequencing And Analysis Techniques?

  • The initial lag period for primer binding in the Sanger method corresponds to poor quality of DNA sequencing in the first 15 to 40 bases.
  • The efficiency of the process decreases after 700 to 900 bases.
  • The power of resolution is insufficient for larger DNA sequences.

What Is Next-Generation Sequencing

Next-generation sequencing or NGS does not refer to a single method of DNA sequencing. There are several standardized methods of DNA sequencing and analysis that fall under the NGS category. Almost all of them share a few features. They are

  1. Fast
  2. Low-cost
  3. Capable of sequencing medium to large DNA fragments
  4. High-reliability
  5. Capable of running massively parallel sequencing reactions simultaneously

All NGS methods are high-throughput techniques.

What Are The High-Throughput DNA Sequencing And Analysis Technologies?

454 Sequencing

The 454 DNA sequencing is a large-scale pyrosequencing technique that can efficiently sequence around 400-600 megabases within a 10-hour run period. The effectiveness of this DNA sequencing and analysis process is limited due to the limitation in the sizes of the individual reads of DNA sequences. It makes genome assembly quite the challenge. Due to the high expenses and the lack of demand, Roche had declared to discontinue 454 Pyrosequencing of DNA in 2013.

Illumina: Solexa Sequencing By Synthesis

The Illumina sequencing technique leverages reversible terminators for the sequencing-by-synthesis process. Illumina currently offers several proprietary sequencer systems and machines for DNA sequencing via high-throughput methods. Now, HiSeq is one of the most widely used sequencing platforms from Illumina. It offers more than 3 billion reads per flow cell within 72-hours. The MiSeq is the cost-effective alternative that delivers 25 million reads with 300 base pair read length from each end.

Ion Torrent Semiconductor Sequencing

The Ion Torrent Systems Inc. developed a DNA sequencing and analysis system that depends on sequencing chemistry along with a semiconductor-based detection system. The method relies on the detection of hydrogen ions released during the DNA polymerization reaction. The hypersensitive ion sensor detects the hydrogen ion, and the intensity of the electronic signal is proportional to the number of homopolymer repeats.

Nanopore Sequencing

The nanopore method is relatively new. This process depends on the electric signals detected as each nucleotide passes through the cyclodextrin-bound alpha-hemolysin pores. There are currently two main types – protein and solid-state nanopore sequencing. Each pore has a detection region that can recognize the charge difference between the four nucleotide bases.

What Is DNA Sequencing Data Analysis?

The analysis of data recorded from DNA sequencing helps the user transform raw data into useful information. DNA seq data analysis typically involves four steps

  1. Trimming of the overlapping sequence data
  2. Multiple alignments of obtained DNA sequences
  3. Checking of consistency between chromatogram peak data and reading text
  4. Correction of any misreads due to software limitation

The analysis of NGS data is particularly complicated due to the bulk of raw data generated after each complete cycle. The results depend on the adaptor addition process as well as the variation in DNA library construction. Simple quality control checks run by commercially available DNA sequencing and analysis tools are imperative to reduce the errors in the final results. Always opt for NGS data analysis software that generates detailed QC reports after each step.

The optimal analysis of the data obtained from NGS should help in getting information on the target DNA sequence or gene, and aide in the discovery of new genes. They can help in the prediction of advanced structures. DNA sequence data analytics now plays a vital role in the development of personalized medicine and targeted therapy based on single nucleotide polymorphism data and other forms of gene polymorphism.