Tools for analysis of DNA sequence

” There is a long history of how DNA sequencing can bring certainty to people’s lives.” ~ Craig Venter

Going back to basics, we all know that Deoxyribonucleic acid (DNA) is the molecule that carries the majority of an organism’s genetic information. (In some cases, RNA- Ribonucleic acid. For example-viruses.) The fundamental units of DNA molecules are nucleotides (represented by the letters A, C, G, or T). Now how do we identify the precise order of these nucleotides of a DNA molecule? Here comes DNA sequencing.

WHAT IS DNA SEQUENCING?

Now, this is simple. The process of identifying the exact order of nucleotides in a DNA molecule is known as sequencing. It is used to identify the order of the four bases in a strand of DNA: adenine (A), guanine (G), cytosine (C), and thymine (T). Individual genes, whole chromosomes, and entire genomes of organisms are sequenced via DNA sequencing. Sequencing DNA has also become the most effective method of sequencing RNA and proteins. Currently, available sequencing methods can generate millions of such DNA reads in a reasonable amount of time and at a moderate cost. Consider this: the cost of sequencing a human genome has dropped the $100 barrier and can be completed in a few days whilst it cost about $2.7 billion when the first sequencing of the human genome was done and it took a decade to finish it.

BACKGROUND AND HISTORY OF SEQUENCING

In the previous years, researchers have used manual methods to analyze the DNA sequencing data, but this was very time-consuming and requires a lot of energy. But fortunately, with the advancements in technology, large data can be stored and analyzed efficiently. Frederick Sanger, a British biochemist, laid the groundwork for protein sequencing. By the year 1955, he finished the sequence of all amino acids in insulin. His research proved that proteins were made up of chemical entities with a defined pattern rather than a mix of things. In 1977, Frederick Sanger and his colleagues invented a technology called Sanger Sequencing, which allowed DNA to be sequenced by creating fragments. We all accept the fact that this is a remarkable achievement in the history of genomics. For around 40 years, it was the most extensively used sequencing method.

METHODS OF DNA SEQUENCING

Now you are probably wondering how DNA can be sequenced. There are three major steps in DNA sequencing- cloning, sequencing, and analyzing. The traditional method of DNA sequencing involves the chemical sequencing method (Maxam Gilbert sequencing) and chain termination method (sanger sequencing). In the chemical sequencing method, a radioactive label is attached to the 5’ end of DNA and subsequent breaks are made at nucleotides. Then autoradiography is used to analyze the sequences.

 On the other hand, in the Sanger sequencing technique, di-deoxynucleoside triphosphates (ddNTPs) are modified. Now the DNA polymerase I cannot differentiate between usual deoxynucleotide triphosphate(dNTPs) and ddNTPs, those new strands with ddNTPs lack a 3’ -OH group required for the formation of a phosphodiester bond between two nucleotides, thus stopping the elongation of DNA. We can learn the DNA sequence by tagging ddNTPs. But here’s the problem- this method is limiting its use in large fragment sequencing and is time-consuming. After decades of dominance, new approaches, including the shotgun strategy and bridge PCR, are developed and widely employed as a result of their success.

Next-generation sequencing (NGS), often known as high-throughput sequencing, is a broad phrase that encompasses a variety of current sequencing techniques, including

  • Illumina (Solexa) sequencing
  • Roche 454 sequencing
  • Ion torrent Proton / PGM sequencing
  • SOLiD sequencing

DATA ANALYSIS OF DNA SEQUENCING

The four steps of DNA sequencing data analysis are as follows:

  • Overlapping sequences are trimmed.
  • Template sequences are aligned many times.
  • Check for consistency in reading text and chromatogram peak data.
  • Mistakes in software are reviewed and corrected.

The procedure for data analysis of NGS includes the generation of quality control reports and the use of software such as Illumina. Data generated by Illumina sequencing instruments are immediately sent to BaseSpace Sequence Hub and securely preserved. 

The two stages of analyzing nucleic acid sequences with a computer program are:

  • The simple search for sequences with known attributes, which includes position determination.
  • The second stage seeks to detect subtle, less obvious sequence patterns, such as promoters, which are regulating elements. Catalogs of sequence patterns can be used to present the results.

The Two Categories of Computational Approaches includes:

  • A global alignment is a type of global optimization in which the alignment is “forced” to span the whole length of all query sequences.
  • Local alignments find similar sections within long sequences that are often very divergent overall.

METHODS AND TOOLS FOR DNA SEQUENCE ANALYSIS

Basically, Sequence alignment is a technique for identifying regions of similarity in DNA, RNA, or protein sequences. The similarity seen between the sequences could be due to functional, structural, or evolutionary links. Pairwise sequence alignment is when two sequences are compared. Multiple sequence alignment occurs when we compare more than two sequences.

Some of the tools that are used for data analysis are:

  • The BLAST algorithm- also known as the Basic Local Alignment Search Tool, compares primary biological sequence information.
  • Clustal- is a powerful tool for aligning nucleic acid and protein sequences numerous times.
  • MEGA (Molecular Evolutionary Genetics Analysis.)- It has consistently aided users in doing statistical analyses of biological macromolecules to research molecular evolution and build phylogenetic trees.

APPLICATIONS OF DNA SEQUENCING DATA ANALYSIS

Some of the applications of DNA sequencing data analysis are:

  • To obtain information encoded in a gene
  • Detection of new mutations
  • Discovering new genes
  • Predicting advanced structures
  • Achieving personalized medicine.

CONCLUDING THOUGHTS

We cannot ignore the fact that in the science of genomics, sequence data analysis has grown increasingly significant. Bioinformatics has made the task of analysis considerably easier for biologists by providing a variety of software solutions that eliminate the need for tedious human labour.

Written by : Lina Fatima.M , 3rd year B.Tech biotechnology

6 thoughts on “Tools for analysis of DNA sequence

Leave a comment

Design a site like this with WordPress.com
Get started