DNA, RNA, and Proteins: The Central Dogma
The One-Line Version
DNA → RNA → Protein
transcription translation
DNA stores the information. RNA carries it to where it's needed. Proteins do the actual work.
This flow is called the central dogma of molecular biology, a phrase Francis Crick coined in 1958. The word "dogma" was a joke (Crick thought "central hypothesis" sounded too modest); the flow itself is real, widespread, and essential.
Almost every biotech concept you'll encounter is either using, modifying, or reading from some point in this pipeline.
DNA
DNA (deoxyribonucleic acid) is the archive. It's a long molecule that stores the information needed to build and run an organism.
Structure
DNA is a double helix: two strands wound around each other. The strands are made of repeating units called nucleotides, each containing one of four bases:
A Adenine
T Thymine
G Guanine
C Cytosine
The strands pair up in a specific way: A always pairs with T, G always pairs with C. This is complementary base pairing. If one strand reads ATGCC, the other reads TACGG, running in the opposite direction (they're antiparallel).
Why this matters: complementary base pairing is the basis for everything else. DNA replicates by opening the helix and using each strand as a template for the new one. Transcription copies DNA into RNA using the same pairing rules. PCR, CRISPR, and sequencing all depend on this pairing.
The 4-letter alphabet
DNA information is stored in sequences of the four bases. "Three billion base pairs" means three billion of these letter positions in the human genome. The information capacity is, by information-theory standards, modest: about 750 megabytes, slightly less than a CD.
But that information is densely functional. A few percent of it encodes proteins. Much of the rest regulates when, where, and how much each protein is made.
How DNA is stored in cells
In eukaryotes, DNA is wound around proteins called histones, forming a bead-like structure, and those beads are wound into fibres, which are wound further into chromosomes. A human has 46 chromosomes (23 pairs, one from each parent).
Unwound, the DNA in a single human cell would stretch about two metres. It fits into a nucleus about 6 micrometres across because of the folding.
RNA
RNA (ribonucleic acid) is similar to DNA with three key differences:
- Single-stranded, usually
- Has ribose sugar instead of deoxyribose (hence the name)
- Uses uracil (U) instead of thymine (T)
RNA has many jobs. The most famous is messenger RNA (mRNA), which carries a protein-coding message from the DNA to the ribosomes where proteins are made.
Kinds of RNA
mRNA messenger RNA: carries the protein-coding message
tRNA transfer RNA: brings amino acids to the ribosome during translation
rRNA ribosomal RNA: structural part of the ribosome itself
miRNA microRNA: regulates gene expression by binding to mRNAs
siRNA small interfering RNA: another regulatory role
lncRNA long non-coding RNA: various regulatory jobs
snRNA small nuclear RNA: involved in RNA splicing
Most RNA in a cell is not mRNA. Ribosomes are mostly rRNA by mass. RNA does far more than "be a messenger"; the central dogma's simple picture is an accurate summary but a simplification.
mRNA in the news
mRNA vaccines (COVID-19 vaccines from Moderna and Pfizer/BioNTech) are one current biotech success story. They work by delivering an mRNA that encodes a viral protein; your cells read the mRNA and make the protein, which your immune system learns to recognise. The mRNA degrades within days.
This is biotech using the central dogma directly: give a cell an mRNA, let the cell do what cells do with mRNA.
Proteins
Proteins are the workers of the cell. Every job that gets done inside a cell is done by a protein, usually.
- Enzymes catalyse chemical reactions
- Structural proteins hold cells and tissues together
- Receptors detect signals
- Antibodies defend against invaders
- Transporters carry molecules across membranes
- Motors move things around
- Signalling proteins coordinate behaviour
Structure
A protein is a chain of amino acids, folded into a specific 3D shape. There are 20 amino acids in biology (with a couple of rare exceptions), each with different chemical properties.
Primary structure the linear sequence of amino acids
Secondary structure local patterns: helices and sheets
Tertiary structure the overall 3D shape
Quaternary structure arrangement of multiple protein chains
The sequence of amino acids determines the shape (mostly; chaperones help). The shape determines the function. A misfolded protein usually doesn't work, and sometimes actively harms the cell (prions, Alzheimer's plaques).
Chapter 5 covers proteins in more depth.
Transcription: DNA to RNA
Transcription is the process that copies a stretch of DNA into an RNA molecule.
- An enzyme called RNA polymerase binds to the DNA at the start of a gene (at a promoter)
- The DNA helix unwinds locally
- The polymerase reads one strand (the template strand) and builds a complementary RNA
- The RNA is released; the DNA zips back up
In eukaryotes, the RNA is then processed:
- Splicing: non-coding parts (introns) are removed, leaving only coding parts (exons)
- 5' cap: added to the front to mark it as legitimate mRNA
- Poly-A tail: hundreds of A's added to the tail for stability and export
Processed mRNA leaves the nucleus through nuclear pores and heads to the ribosomes in the cytoplasm.
Translation: RNA to Protein
Translation is where the mRNA's sequence is read and turned into a protein.
Codons
mRNA is read in groups of three letters (codons). Each codon specifies one amino acid. With 4 letters in groups of 3, there are 64 possible codons for 20 amino acids:
AUG starts translation (and codes for methionine)
UAA stop
UAG stop
UGA stop
(the other 61 codons code for amino acids)
Multiple codons can code for the same amino acid (the code is "redundant"), which is one reason mutations sometimes have no effect.
The ribosome
Ribosomes are the machines that do the reading. They:
- Bind to an mRNA at the start codon (AUG)
- Read codon by codon, bringing in the right amino acid each time via a tRNA that matches the codon
- Link the amino acids together into a growing chain
- Stop at a stop codon and release the finished protein
The protein then folds into its shape, often with help from chaperone proteins, and goes to do its job.
Reverse Transcription and Retroviruses
The central dogma says DNA → RNA → protein. But some viruses break this: retroviruses like HIV have an RNA genome and an enzyme called reverse transcriptase that copies RNA back into DNA. The DNA then inserts into the host genome.
This is not a violation of the central dogma; it's an extension. The original dogma said "information, once in protein, cannot go back to nucleic acids", which remains true. RNA-to-DNA flow was added to the picture in the 1970s.
Reverse transcriptase is biotech's friend: it's how you make cDNA (complementary DNA) from mRNA, which is useful for many experiments.
Why the Central Dogma Matters
Almost every biotech tool operates on this pipeline:
- PCR copies DNA
- Sequencing reads DNA
- CRISPR edits DNA
- RNA interference blocks mRNA
- mRNA vaccines deliver mRNA
- Protein drugs are proteins
- AlphaFold predicts protein structure from sequence
- Drug discovery often targets proteins
Understand the pipeline and you understand what these tools are doing in broad strokes. The specifics are chapter material for the rest of the tutorial.
Common Pitfalls
"DNA is the most important molecule." It's the archive, not the worker. Cells spend most of their energy making and running proteins. DNA just stores the plans
"The genome is the organism's blueprint." Blueprint is too literal. The genome is a set of parts lists and instructions that are interpreted in context. The same genome can produce very different cells depending on which genes are expressed
"Mutations always change the protein." Often they don't, because of codon redundancy. A mutation that changes AAA to AAG both code for lysine, so the protein is identical. Such mutations are called synonymous or silent
"Genes are the only interesting part of DNA." Non-coding DNA (about 98% of the human genome) includes regulatory sequences, RNA-encoding regions, and structural elements. "Junk DNA" was an early dismissal that has aged badly
"Proteins fold once and stay." Many proteins flex and change shape as part of their job. Folding is dynamic; shape is not always fixed
Next Steps
Continue to 04-genes-and-genomes.md for how DNA is actually organised.