Core Lab Tools: PCR, Cloning, Sequencing, Gels

The Problem

DNA is tiny. One copy of a gene in one cell is not enough to do anything with. To study, modify, or use DNA, you need to:

  1. Copy it (to have enough material)
  2. Cut it (to manipulate specific regions)
  3. Sort it (to isolate pieces by size or sequence)
  4. Read it (to know what sequence you have)
  5. Insert it (to put modified DNA back into cells)

The tools below correspond to these jobs. Most modern biotech is built on some combination of them.

PCR: Polymerase Chain Reaction

PCR is the "copy" tool. Given a tiny amount of DNA, PCR can produce millions of copies of a specific region in a few hours.

Invented by Kary Mullis in 1983 (Nobel Prize, 1993), PCR is one of the most important inventions in biotech history. Before PCR, you cloned a gene into bacteria and grew colonies to get more of it. After PCR, you had a handful of copies in a test tube.

How it works

PCR uses the same basic process cells use to copy DNA, just in a tube with controlled heating and cooling:

  1. Denaturation (94 to 95°C): the double-stranded DNA splits into single strands
  2. Annealing (50 to 65°C): short DNA primers bind to the regions flanking your target
  3. Extension (72°C): a heat-stable DNA polymerase extends from each primer, copying the region between them
  4. Repeat: each cycle doubles the amount of DNA

Thirty cycles turns one molecule into roughly a billion. That's enough to work with.

Why it needs a heat-stable polymerase

Normal DNA polymerase from animal cells would be destroyed by the 95°C denaturation step. PCR uses Taq polymerase, from the bacterium Thermus aquaticus found in Yellowstone hot springs. Taq is stable at 95°C and works at 72°C. Biotechnology often borrows from extremophiles.

What PCR is used for

  • Testing a patient sample for a specific pathogen (the COVID-19 PCR test)
  • Amplifying a gene before cloning it
  • Quantifying DNA (quantitative PCR, qPCR)
  • Checking whether a specific sequence is present
  • Forensic DNA analysis from tiny samples

Variants

  • qPCR (real-time PCR): measures amount of DNA during amplification, useful for quantification
  • RT-PCR (reverse transcription PCR): converts RNA to DNA first, then amplifies. For RNA viruses, gene expression studies
  • Digital PCR: divides sample into thousands of partitions, counts positives, gives absolute quantification

Gel Electrophoresis

Gel electrophoresis is the "sort" tool. Given a mix of DNA fragments of different sizes, it separates them so you can see which sizes are present.

How it works

  1. Mix your DNA with a loading buffer that weights it and contains a dye
  2. Load into wells at one end of an agarose gel (a Jell-O-like material)
  3. Apply voltage; DNA is negatively charged and moves toward the positive electrode
  4. Smaller fragments move faster; larger ones move slower
  5. Stain and image; visible bands appear at each size

The result: a ladder of bands at different positions, each corresponding to a different fragment size.

Why it matters

It's the quickest way to know what sizes of DNA you have. After PCR, you run a gel to confirm the product is the expected size. After a restriction digest, you run a gel to see what pieces you got.

Similar techniques exist for protein (SDS-PAGE, a polyacrylamide gel) and for RNA. Same principle: separate by size under an electric field.

Sanger Sequencing

Sanger sequencing is the "read" tool (the original one). Given a DNA fragment, Sanger sequencing produces the actual sequence of letters.

Invented by Fred Sanger in 1977 (his second Nobel Prize), Sanger sequencing was the workhorse for decades and is still used today for reading short fragments.

How it works

  1. Denature the DNA
  2. Add primer, normal nucleotides, and a small amount of chain-terminating fluorescently-labelled nucleotides
  3. Polymerase extends from the primer; occasionally incorporates a terminator, which stops extension and marks the position with a specific colour
  4. Run the products through a capillary gel; they come out in order of size
  5. A detector reads the colours as they pass, giving the sequence letter by letter

Sanger produces reads of up to about 1,000 base pairs with high accuracy. For short fragments, it's cheap and precise.

Next-Generation Sequencing

For large-scale sequencing, the field moved to next-generation sequencing (NGS) in the late 2000s. NGS sequences millions of fragments in parallel.

The dominant platform is Illumina:

  1. DNA is fragmented into short pieces (typically 150-300 bp)
  2. Fragments are attached to a flow cell and amplified in place
  3. Each cluster is sequenced by "sequencing-by-synthesis": each cycle incorporates one fluorescently-labelled nucleotide, imaged, then the label is removed
  4. Billions of short reads are produced per run
  5. Software assembles or maps the reads to a reference

Illumina reads are short but very cheap. A full human genome at 30x coverage (each position read 30 times) costs under $500 in high-throughput runs.

Long-read sequencing

Illumina reads are short (150-300 bp); assembly is hard for repetitive regions. Long-read sequencing technologies (Oxford Nanopore, PacBio) produce reads of 10,000+ bp, which resolves repeats and structural variants.

Nanopore: DNA passes through a nanopore; changes in electrical current indicate which nucleotide is passing. Can be done on a USB-sized device. Accuracy is lower than Illumina but improving; read length is the advantage.

What sequencing is used for

  • Research (comparing genomes, finding disease variants)
  • Clinical diagnostics (whole-genome sequencing for rare disease, cancer genome profiling)
  • Microbiome studies (sequencing all the bacteria in a sample)
  • Forensic identification
  • Ancient DNA (Neanderthal genome, historical specimens)
  • SARS-CoV-2 tracking during the pandemic

Cloning

Cloning (in biotech) means getting a specific DNA fragment into a vector and reproducing it in cells. Not the same as cloning a whole animal, which is a specific more elaborate process.

The basic workflow

  1. Cut out your DNA of interest with restriction enzymes (or amplify it by PCR)
  2. Cut your vector (usually a plasmid, a small circular DNA that replicates in bacteria) with the same restriction enzymes
  3. Ligate (join) the insert into the vector using DNA ligase
  4. Transform the vector into bacteria (the DNA enters some cells)
  5. Select for cells that took up the vector (usually using antibiotic resistance)
  6. Grow colonies; each colony is clonally derived from one original cell and contains your DNA

The end result: lots of cells, all containing your DNA fragment. You can then isolate the fragment in large amounts, modify it, express the encoded protein, or do further engineering.

Restriction enzymes and ligases

  • Restriction enzymes: cut DNA at specific sequences (e.g. EcoRI cuts at GAATTC). They evolved in bacteria as defence against viral DNA. Biotech borrowed them in the 1970s
  • DNA ligase: joins DNA ends together

Both are standard laboratory reagents. Modern cloning often uses more advanced methods (Gibson assembly, Golden Gate cloning) that avoid needing specific restriction sites.

Why this is central

Almost any biotech application that involves a specific DNA sequence starts with a cloned version of it. Want to make a recombinant protein? Clone the gene. Want to make a CRISPR knockout? Clone the guide RNA into a plasmid. Want to engineer a pathway? Clone its genes into a vector.

Mass Spectrometry (for Proteins)

Mass spec identifies and quantifies proteins.

How it works (simplified):

  1. Fragment proteins into peptides (cuts at specific amino acids)
  2. Ionise the peptides and accelerate them through a mass spectrometer
  3. Measure the mass-to-charge ratio of each ion
  4. Match the masses against a database of known peptides

Mass spec identifies proteins in a sample, quantifies their relative amounts, and can detect modifications like phosphorylation.

Proteomics is to proteins what genomics is to DNA. It's less mature: proteins can't be amplified like DNA, so sensitivity is the limit. Mass spec has become enormously more sensitive and throughput-capable over the last decade.

Cell Culture

Growing cells outside an organism. Two broad kinds:

Primary cells

Taken directly from tissue. Limited lifespan (usually 20-50 divisions before senescence). More biologically relevant but less convenient.

Immortalised cell lines

Modified to divide indefinitely. Convenient but can drift genetically over time.

Famous cell lines:

  • HeLa cells: cervical cancer cells from Henrietta Lacks, taken in 1951 without her consent. Still widely used; their ethics remain contested
  • HEK 293: human embryonic kidney cells. Workhorse for recombinant protein expression
  • CHO (Chinese hamster ovary): used for almost all therapeutic antibody production
  • E. coli: the go-to bacterium; cheap and easy

Cell culture is the substrate for most molecular biology experiments. Modern biotech lives in dishes and flasks.

Western Blot

A classic technique for detecting specific proteins:

  1. Run protein samples on an SDS-PAGE gel (separates by size)
  2. Transfer the proteins to a membrane
  3. Block unreactive spots, then probe with an antibody specific to the protein of interest
  4. Wash, add a secondary antibody tagged with an enzyme or fluorophore
  5. Image; the protein shows up as a band at its expected size

Western blots answer "is my protein present, and in what amount?". Not as quantitative as mass spec but cheaper and more common.

Related techniques:

  • Northern blot: for RNA
  • Southern blot: for DNA (rare these days; PCR replaced most uses)

The Workflow in Practice

A typical mid-size biotech experiment might involve:

  1. Clone a target gene into an expression vector
  2. Transform into E. coli; select colonies
  3. Extract plasmid; sequence it to confirm the insert is correct
  4. Transform into mammalian cells; induce protein expression
  5. Purify the protein; run a gel to confirm size and purity
  6. Mass spec to confirm identity
  7. Functional assays: enzyme activity, binding, cell effects

Each step uses multiple of the tools above. A PhD student might spend months on one experiment involving all of them.

Common Pitfalls

"PCR is simple." In principle. In practice, designing good primers, avoiding contamination, optimising conditions for a specific sequence, and interpreting results take training

"Sequencing gives the answer." Sequencing gives data. Interpreting it (variant calling, structural variants, epigenetic context) is a whole field. Raw sequence is a starting point

"Cloning is a single technique." There are dozens of variants. The right one depends on your insert, your vector, and what you're trying to do

"Gel bands are clean." They often aren't. Smears, non-specific bands, degraded DNA are common. Interpreting gels takes practice

"Immortalised cells are 'normal'." They aren't. Decades in culture plus the immortalising mutations make them unreliable stand-ins for the original tissue. Results from cell lines don't always replicate in primary cells or in animals

Next Steps

Continue to 09-crispr-and-gene-editing.md for the tool that has changed what's routinely possible in a lab.