DNA, RNA, and Proteins: The Central Dogma

The One-Line Version

DNA   →   RNA   →   Protein
      transcription    translation

DNA stores the information. RNA carries it to where it's needed. Proteins do the actual work.

This flow is called the central dogma of molecular biology, a phrase Francis Crick coined in 1958. The word "dogma" was a joke (Crick thought "central hypothesis" sounded too modest); the flow itself is real, widespread, and essential.

Almost every biotech concept you'll encounter is either using, modifying, or reading from some point in this pipeline.

DNA

DNA (deoxyribonucleic acid) is the archive. It's a long molecule that stores the information needed to build and run an organism.

Structure

DNA is a double helix: two strands wound around each other. The strands are made of repeating units called nucleotides, each containing one of four bases:

A   Adenine
T   Thymine
G   Guanine
C   Cytosine

The strands pair up in a specific way: A always pairs with T, G always pairs with C. This is complementary base pairing. If one strand reads ATGCC, the other reads TACGG, running in the opposite direction (they're antiparallel).

Why this matters: complementary base pairing is the basis for everything else. DNA replicates by opening the helix and using each strand as a template for the new one. Transcription copies DNA into RNA using the same pairing rules. PCR, CRISPR, and sequencing all depend on this pairing.

The 4-letter alphabet

DNA information is stored in sequences of the four bases. "Three billion base pairs" means three billion of these letter positions in the human genome. The information capacity is, by information-theory standards, modest: about 750 megabytes, slightly less than a CD.

But that information is densely functional. A few percent of it encodes proteins. Much of the rest regulates when, where, and how much each protein is made.

How DNA is stored in cells

In eukaryotes, DNA is wound around proteins called histones, forming a bead-like structure, and those beads are wound into fibres, which are wound further into chromosomes. A human has 46 chromosomes (23 pairs, one from each parent).

Unwound, the DNA in a single human cell would stretch about two metres. It fits into a nucleus about 6 micrometres across because of the folding.

RNA

RNA (ribonucleic acid) is similar to DNA with three key differences:

  1. Single-stranded, usually
  2. Has ribose sugar instead of deoxyribose (hence the name)
  3. Uses uracil (U) instead of thymine (T)

RNA has many jobs. The most famous is messenger RNA (mRNA), which carries a protein-coding message from the DNA to the ribosomes where proteins are made.

Kinds of RNA

mRNA    messenger RNA: carries the protein-coding message
tRNA    transfer RNA: brings amino acids to the ribosome during translation
rRNA    ribosomal RNA: structural part of the ribosome itself
miRNA   microRNA: regulates gene expression by binding to mRNAs
siRNA   small interfering RNA: another regulatory role
lncRNA  long non-coding RNA: various regulatory jobs
snRNA   small nuclear RNA: involved in RNA splicing

Most RNA in a cell is not mRNA. Ribosomes are mostly rRNA by mass. RNA does far more than "be a messenger"; the central dogma's simple picture is an accurate summary but a simplification.

mRNA in the news

mRNA vaccines (COVID-19 vaccines from Moderna and Pfizer/BioNTech) are one current biotech success story. They work by delivering an mRNA that encodes a viral protein; your cells read the mRNA and make the protein, which your immune system learns to recognise. The mRNA degrades within days.

This is biotech using the central dogma directly: give a cell an mRNA, let the cell do what cells do with mRNA.

Proteins

Proteins are the workers of the cell. Every job that gets done inside a cell is done by a protein, usually.

  • Enzymes catalyse chemical reactions
  • Structural proteins hold cells and tissues together
  • Receptors detect signals
  • Antibodies defend against invaders
  • Transporters carry molecules across membranes
  • Motors move things around
  • Signalling proteins coordinate behaviour

Structure

A protein is a chain of amino acids, folded into a specific 3D shape. There are 20 amino acids in biology (with a couple of rare exceptions), each with different chemical properties.

Primary structure      the linear sequence of amino acids
Secondary structure    local patterns: helices and sheets
Tertiary structure     the overall 3D shape
Quaternary structure   arrangement of multiple protein chains

The sequence of amino acids determines the shape (mostly; chaperones help). The shape determines the function. A misfolded protein usually doesn't work, and sometimes actively harms the cell (prions, Alzheimer's plaques).

Chapter 5 covers proteins in more depth.

Transcription: DNA to RNA

Transcription is the process that copies a stretch of DNA into an RNA molecule.

  1. An enzyme called RNA polymerase binds to the DNA at the start of a gene (at a promoter)
  2. The DNA helix unwinds locally
  3. The polymerase reads one strand (the template strand) and builds a complementary RNA
  4. The RNA is released; the DNA zips back up

In eukaryotes, the RNA is then processed:

  • Splicing: non-coding parts (introns) are removed, leaving only coding parts (exons)
  • 5' cap: added to the front to mark it as legitimate mRNA
  • Poly-A tail: hundreds of A's added to the tail for stability and export

Processed mRNA leaves the nucleus through nuclear pores and heads to the ribosomes in the cytoplasm.

Translation: RNA to Protein

Translation is where the mRNA's sequence is read and turned into a protein.

Codons

mRNA is read in groups of three letters (codons). Each codon specifies one amino acid. With 4 letters in groups of 3, there are 64 possible codons for 20 amino acids:

AUG    starts translation (and codes for methionine)
UAA    stop
UAG    stop
UGA    stop
(the other 61 codons code for amino acids)

Multiple codons can code for the same amino acid (the code is "redundant"), which is one reason mutations sometimes have no effect.

The ribosome

Ribosomes are the machines that do the reading. They:

  1. Bind to an mRNA at the start codon (AUG)
  2. Read codon by codon, bringing in the right amino acid each time via a tRNA that matches the codon
  3. Link the amino acids together into a growing chain
  4. Stop at a stop codon and release the finished protein

The protein then folds into its shape, often with help from chaperone proteins, and goes to do its job.

Reverse Transcription and Retroviruses

The central dogma says DNA → RNA → protein. But some viruses break this: retroviruses like HIV have an RNA genome and an enzyme called reverse transcriptase that copies RNA back into DNA. The DNA then inserts into the host genome.

This is not a violation of the central dogma; it's an extension. The original dogma said "information, once in protein, cannot go back to nucleic acids", which remains true. RNA-to-DNA flow was added to the picture in the 1970s.

Reverse transcriptase is biotech's friend: it's how you make cDNA (complementary DNA) from mRNA, which is useful for many experiments.

Why the Central Dogma Matters

Almost every biotech tool operates on this pipeline:

  • PCR copies DNA
  • Sequencing reads DNA
  • CRISPR edits DNA
  • RNA interference blocks mRNA
  • mRNA vaccines deliver mRNA
  • Protein drugs are proteins
  • AlphaFold predicts protein structure from sequence
  • Drug discovery often targets proteins

Understand the pipeline and you understand what these tools are doing in broad strokes. The specifics are chapter material for the rest of the tutorial.

Common Pitfalls

"DNA is the most important molecule." It's the archive, not the worker. Cells spend most of their energy making and running proteins. DNA just stores the plans

"The genome is the organism's blueprint." Blueprint is too literal. The genome is a set of parts lists and instructions that are interpreted in context. The same genome can produce very different cells depending on which genes are expressed

"Mutations always change the protein." Often they don't, because of codon redundancy. A mutation that changes AAA to AAG both code for lysine, so the protein is identical. Such mutations are called synonymous or silent

"Genes are the only interesting part of DNA." Non-coding DNA (about 98% of the human genome) includes regulatory sequences, RNA-encoding regions, and structural elements. "Junk DNA" was an early dismissal that has aged badly

"Proteins fold once and stay." Many proteins flex and change shape as part of their job. Folding is dynamic; shape is not always fixed

Next Steps

Continue to 04-genes-and-genomes.md for how DNA is actually organised.