What is DNA?
DNA is the unique genetic code found in most cells in humans as well as in organisms such as bacteria, many viruses, parasites, and plants. It is structured like a twisted ladder, with two sides and rungs made up of two pieces that fit together. These rungs are created from two of four possible bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Because of their shape, adenine always pairs with thymine and guanine always pairs with cytosine to form a complete rung. These bases are bonded at the sides of the ladder to a sugar and phosphate, which form the vertical backbone of the DNA double helix (the “sides” of the ladder). The base, sugar, and phosphate form a unit called a nucleotide, which is referred to by the base (A, C, T, or G) it contains.
Specific segments of DNA called genes serve as templates to make (transcribe) RNA. The information contained in the RNA is then often translated by tiny molecular machines into proteins. There are approximately 20,000 genes in the human genome. The information contained within these genes allows our cells to produce an enormous variety of proteins that serve as the building blocks for our bodies and that govern how the body works.
What is DNA sequencing?
DNA sequencing is a laboratory method used to determine the order of the bases within the DNA. Differences in the sequence of these 3 billion base pairs in the human genome lead to each person’s unique genetic makeup. In medicine, DNA sequencing is used for a range of purposes, including diagnosis and treatment of diseases. In general, sequencing allows health care practitioners to determine if a gene or the region that regulates a gene contains changes, called variants or mutations, that are linked to a disorder.
When considering or undergoing genetic testing, it is important to seek help in interpreting these results from a genetics expert such as a medical geneticist or genetic counselor to better understand the test results, implications of the results, and any potential risk of having or passing a genetic condition on to your children.
How is DNA sequencing performed?
While methods for DNA sequencing have evolved over the years, the technique generally consists of breaking long strands of DNA into many small pieces, using one of several types of tests to determine the order of the nucleotide bases that make up those pieces, and then reassembling the data back in the order of the original DNA strand.
Developed in the 1970’s, this is the method that was used in the Human Genome Project from 1990-2003 to completely sequence the DNA of a human for the first time. Sanger sequencing relies on chemicals called dideoxynucleotides, which are also known as ‘chain terminating’ nucleotides. When one is incorporated into a growing copy of DNA sequence, no other nucleotide can be added onto the chain after it. Each dideoxynucleotide has a unique fluorescent “tag” that allows A, T, C, and G to be clearly identified.
Incorporation of a dideoxynucleotide occurs at random, resulting in multiple copies of the DNA template, all different lengths. These fluorescently-labeled DNA fragments are then separated by size in a process called electrophoresis. As each fragment stops in a slightly different spot based on how many nucleotides are in the chain, the color at the end of each fragment shows exactly which base is in each position along the DNA sequence.
For many years, Sanger sequencing has been the gold standard for clinical DNA sequencing to look at single genes or a few genes at a time. Sanger sequencing is reliable, but it can only read one short section of DNA from one person at a time. Sanger sequencing also has a limited ability to detect changes if they are greatly outnumbered by normal copies of a gene, which can happen when some cells have a variant or mutation (disease-causing variant) and some don’t. For example:
- If a cell has undergone a genetic change that allows it to grow in an uncontrolled fashion (a tumor), the genetic code of this cell and any that come from division of this cell will have a variant that is not present in any of the other cells in a person’s body.
- A person may have two different genetic codes in normally dividing, non-tumor cells that are present in mixed proportions throughout the body; this is a situation referred to as somatic mosaicism.
In order for Sanger sequencing to be able to tell that there is more than one variation of the genetic code present, at least 15-20% of the DNA tested needs to contain the same variant or mutation (disease-causing variant).
Next-Generation Sequencing (NGS)
The Human Genome Project, completed in 2003, took over a decade using Sanger sequencing to determine a single individual’s genome. It is now possible to sequence the human genome in matter of days. Sequencing time and cost have declined dramatically thanks to a group of sequencing technologies called next-generation sequencing (NGS). NGS methods are faster than Sanger sequencing because they sequence millions of small DNA fragments from many different parts of the genome all at the same time, rather than reading in a single DNA fragment from one region of the genome. Because all of these reactions are happening “in parallel” (at the same time), NGS is also sometime referred to as massively parallel sequencing.
An additional benefit of NGS is that the sensitivity for detecting alterations that are present in a very low level is much better than with Sanger sequencing. For NGS, as little as 2%-5% of the DNA tested needs to contain the same variant or mutation (disease-causing variant) to be detected.
Within our DNA, there are sections that code for proteins and there are areas both within genes and between different genes that do not. The protein-coding sections of genes are called exons, and the intervening areas that separate the exons within a single gene are called introns. The areas between different genes are referred to as “non-coding regions.”
- The collection of all the exons of all 20,000 known genes in humans is referred to as the exome, and sequencing of this set of information by NGS is called whole exome sequencing.
- In contrast, if we take all DNA, including exons, introns, and non-coding regions, this set of information is referred to as the genome. When NGS is used to evaluate the entire human genome, it is called whole genome sequencing.
With this ability to read vast sections of DNA, the results of DNA sequencing must be interpreted carefully, as not all changes to the DNA sequence have a known effect.
- Some changes are known to cause problems with the structure or function of a gene product (e.g., protein), and these are referred to as disease-causing, or “pathogenic” variants.
- Other changes are known to have no effect at all on the final gene product and are considered “benign” (harmless) variants.
- Some changes don’t have clear evidence either way (of being pathogenic or benign) and are called “variants of uncertain significance.”
How is DNA sequencing used?
There are a wide variety of medical applications for DNA sequencing. These techniques can be used to test one gene or several genes to help diagnose medical conditions. Some examples include:
- Targeted sequencing—sequencing of select variants or areas within a gene’s exons or full exons (the segments of DNA that code for proteins). When there is a known effect of certain types of changes to one or more genes, it may help guide medical care to only test for these known changes. One example is testing a tissue biopsy sample from a melanoma to determine whether or not the cells have a mutation (disease-causing variant) in the BRAF gene. A mutation in BRAF is found in more than 50% of melanomas, and people with advanced melanoma that have BRAF mutations may respond to drugs that target these mutations, a treatment referred to as targeted cancer therapy. Targeted drugs work differently than standard chemotherapy and may have fewer side effects.
- Single gene sequencing—sequencing all exons of a gene, often including parts of the non-coding areas (e.g., the sequence before a gene [promoter] or between exons [introns]). An example of this is sequencing the FBN1 gene. Mutations (disease-causing variants) in this gene cause Marfan syndrome, a disorder that affects the connective tissue that makes up many parts of the body, including bones, muscles, ligament, blood vessels, and heart valves. Over 1,000 different mutations have been found in FBN1, so it is important to evaluate the entire sequence of the FBN1 gene.
- Multi-gene panel sequencing—sequencing parts or all of several genes to detect mutations (disease-causing variants) that can cause a genetic disorder. An example of this is a panel to test for mutations in the MLH1, MSH2, MSH6, PMS2 and EPCAM genes. Mutations in these genes can cause Lynch syndrome, an inherited disorder that increases the risk of many types of cancer, especially colon cancer and endometrial cancer.
- Whole genome sequencing or whole exome sequencing (described above)—examples include sequencing the genome or exome of infants with rare metabolic disorders or children with developmental delays and/or intellectual disabilities. This method of testing may be used after other testing has failed to reveal a diagnosis.
- Whole genome sequencing of microbes—in addition to sequencing the genome of humans, whole genome sequencing can be used to sequence the genomes of other organisms. An example is sequencing the genomes of bacteria in suspected outbreaks. By comparing sequences of bacteria and identifying differences, public health scientists can determine how closely related the bacteria are and how likely it is that they are part of the same outbreak. (For more information on this topic, read Infectious Disease Genetic Testing)
As DNA sequencing technology advances, a broader number of applications for these techniques will continue to make their way into clinical and laboratory testing settings.
(Updated 2015 December 18). DNA Sequencing. NIH National Human Genome Research Institute. Available online at https://www.genome.gov/10001177/dna-sequencing-fact-sheet/. Accessed June 2019.
Heater, J. M and Chain, B. (2016 January). The sequence of sequencers: The history of sequencing DNA. Genomics. Available online at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4727787/. Accessed June 2019.
Smith, Moyra. (2017 May 3). DNA sequence analysis in clinical medicine, proceeding cautiously. Frontiers in Molecular Biosciences. Available online at https://www.frontiersin.org/articles/10.3389/fmolb.2017.00024/full. Accessed June 2019.
(©1998-2019). Drug-gene testing. Mayo Clinic. Available online at http://mayoresearch.mayo.edu/center-for-individualized-medicine/drug-gene-testing.asp. Accessed June 2019.
(©2019). Integrating NGS into mainstream laboratory testing. Abbot Laboratories. Available online at https://www.informatics.abbott/us/en/resources/integrating-next-gen-sequencing-into-lab-testing-whitepaper. Accessed June 2019.
Sakai L, et al. FBN1: The Disease-Causing Gene for Marfan Syndrome and Other Genetic Disorders. Gene. 2016 Oct 10; 591(1): 279–291. Published online 2016 Jul 18. doi: 10.1016/j.gene.2016.07.033. Accessed July 2019.
(February 11, 2016) Centers for Disease Control and Prevention. PulseNet Methods, Whole Genome Sequencing (WGS). Available online at https://www.cdc.gov/pulsenet/pathogens/wgs.html. Accessed September 2019.