Proteins in Action: Enzymes, Structure, Signals
Proteins Do the Work
If DNA is the archive and RNA is the courier, proteins are the workforce. Almost every active process in a cell is carried out by a protein. There are tens of thousands of different proteins in a typical human cell; many thousands are unique to particular cell types.
Proteins are made from chains of amino acids, and how they fold determines what they do.
The 20 Amino Acids
Biology uses 20 amino acids (with two rare exceptions used in specific contexts). Each has the same backbone and a different side chain that gives it distinct chemistry:
Hydrophobic ("water-avoiding"): Ala, Val, Leu, Ile, Met, Phe, Trp, Pro
Polar / uncharged: Gly, Ser, Thr, Cys, Tyr, Asn, Gln
Positively charged: Lys, Arg, His
Negatively charged: Asp, Glu
The three-letter codes are standard; chemists usually know them by sight. You don't need to memorise them. Worth knowing that the chemistry of the side chain (charged, hydrophobic, polar) largely determines how the amino acid interacts with others in a folded protein.
The Four Levels of Structure
Primary the linear sequence of amino acids, read from DNA
Secondary local folding patterns: alpha-helices and beta-sheets
Tertiary the overall 3D shape of one protein chain
Quaternary how multiple protein chains fit together (when they do)
Primary
Just the sequence, written left to right from the amino end (N-terminus) to the carboxyl end (C-terminus). A sequence of, say, 300 amino acids might represent a protein of 34 kilodaltons.
The primary sequence determines the folded shape (Anfinsen's dogma, 1961): given the right conditions, a protein folds to its native shape spontaneously.
Secondary
Local patterns stabilised by hydrogen bonds in the backbone:
- Alpha helix: a right-handed spiral
- Beta sheet: flat, stranded structure
- Turns and loops: the connectors
These patterns are the building blocks of the larger fold.
Tertiary
The overall 3D shape of a single chain, held together by many weak interactions: hydrogen bonds, hydrophobic interactions, salt bridges, sometimes disulphide bonds between cysteines. The shape is what determines the function.
Quaternary
Some proteins work as assemblies of multiple chains. Haemoglobin has four subunits (two alpha, two beta). Antibodies have four chains (two heavy, two light). The ribosome has dozens. The assembly structure is the quaternary level.
The Folding Problem
For decades, predicting a protein's folded shape from its sequence was an open problem. Sequences are easy to read; structures were hard to solve.
Experimental methods:
- X-ray crystallography: grow a crystal of the protein, shoot X-rays, work back to the structure. The workhorse; difficult and slow
- NMR spectroscopy: works in solution; limited to smaller proteins
- Cryo-electron microscopy (cryo-EM): freeze samples, image with electron microscopes; revolutionised the field in the 2010s
Computational prediction:
- AlphaFold (DeepMind, 2020-2022): a deep learning model that predicts protein structure from sequence with accuracy comparable to many experimental methods
- AlphaFold 3 (2024): extends to protein-protein and protein-small-molecule complexes
- RoseTTAFold and other models: alternatives with various strengths
This is one of the most important biotech developments of the last decade. Before AlphaFold, figuring out a protein structure could take years. Now, a reasonable prediction takes minutes. Experimental methods are still needed for many cases, but the landscape has shifted.
What Proteins Do
Roughly sorted by function:
Enzymes
Biological catalysts: proteins that speed up specific chemical reactions by many orders of magnitude without being consumed.
Examples:
- DNA polymerase: copies DNA
- ATP synthase: makes ATP from ADP, the cell's energy currency
- Lactase: breaks down lactose in milk
- Amylase: breaks down starch (found in saliva)
- Pepsin, trypsin, chymotrypsin: digestive enzymes
- Kinases: add phosphate groups to other proteins (there are 500+ human kinases; many are drug targets)
Enzymes work by binding their substrate (the molecule they act on) in a specific pocket, stabilising the transition state of the reaction, and releasing the product. Enzyme specificity is extraordinary: lactase metabolises lactose and leaves other sugars alone.
Structural proteins
Physical scaffolding:
- Collagen: the most abundant protein in mammals; forms tendons, skin, and connective tissue
- Keratin: hair, nails, the outer layer of skin
- Actin and tubulin: the cell's cytoskeleton
- Elastin: gives skin and lungs elasticity
Transport proteins
Carry things:
- Haemoglobin: carries oxygen in blood
- Albumin: carries various hydrophobic molecules in blood
- Membrane transporters: move specific molecules across cell membranes
- Ion channels: let specific ions through
Signalling proteins
The cell's communication system:
- Receptors: sit on the cell surface, detect signals from outside (hormones, neurotransmitters, cytokines)
- G proteins and other intracellular messengers
- Hormones: proteins like insulin, growth hormone, leptin
A large fraction of pharmaceuticals target G-protein-coupled receptors (GPCRs), a superfamily with over 800 members.
Motor proteins
Proteins that convert chemical energy (usually ATP) into mechanical work:
- Myosin: drives muscle contraction
- Kinesin: walks along microtubules, hauling cargo in cells
- Dynein: similar, different direction
Defensive proteins
- Antibodies (immunoglobulins): specifically bind to invaders
- Complement system proteins: part of the innate immune response
Regulation: Proteins Get Turned On and Off
Proteins don't just exist; they're regulated constantly. A protein is often inactive until modified:
- Phosphorylation: adding a phosphate group (done by kinases, reversed by phosphatases). Flips many proteins on or off
- Ubiquitination: adding small ubiquitin tags, usually to mark a protein for destruction
- Cleavage: some proteins are made as inactive precursors and activated by cleaving off a piece (insulin, many digestive enzymes)
- Allostery: the protein's shape changes when something binds to it, changing its activity
Whole signalling cascades work on phosphorylation: a signal hits a receptor, the receptor activates a kinase, the kinase phosphorylates another kinase, and so on. Some cascades amplify signals millions of times.
Misfolding and Disease
When proteins misfold, things go wrong:
- Alzheimer's disease: amyloid-beta plaques and tau tangles, both misfolded proteins
- Parkinson's disease: misfolded alpha-synuclein aggregates
- Prion diseases (CJD, mad cow): misfolded prion proteins that induce other prion proteins to misfold; functionally "infectious" proteins
- Cystic fibrosis: misfolded CFTR protein, degraded before it reaches the cell surface
- Sickle cell disease: a single amino acid change makes haemoglobin aggregate abnormally
Treating misfolding is hard. Some drugs work as "chaperones" to help folding. Some clear aggregates. Some prevent aggregation. It's an active area with mixed success.
Proteins as Drug Targets
Most drugs act on proteins. The classic categories:
- Enzyme inhibitors: block an enzyme's active site (aspirin inhibits COX enzymes; statins inhibit HMG-CoA reductase)
- Receptor agonists: bind to receptors and activate them (opioid drugs like morphine)
- Receptor antagonists: bind to receptors and block them (beta-blockers block beta-adrenergic receptors)
- Ion channel modulators: change how ion channels open or close (many anaesthetics)
And the biologics:
- Monoclonal antibodies: custom antibodies that bind specific targets (trastuzumab, nivolumab, adalimumab)
- Protein drugs: replacement proteins like insulin, growth hormone, erythropoietin
- Fusion proteins: engineered proteins combining functional domains (etanercept)
A majority of pharmaceutical R&D is about finding, optimising, and testing molecules that act on specific proteins.
The Proteome
The proteome is the full set of proteins produced by an organism, cell, or tissue. It's far more dynamic than the genome:
- Genome: roughly fixed (mutations aside)
- Proteome: changes minute by minute; different tissues, different conditions
Studying the proteome is harder than studying the genome because you can't amplify proteins the way you can amplify DNA. The main tools:
- Mass spectrometry: identify proteins by their mass signature after cleavage
- Antibody-based assays (Western blots, ELISAs, immunohistochemistry)
- Protein arrays: thousands of antibodies in a grid, used to profile samples
Proteomics has not yet had the "genomics moment" where the tools got so cheap that everyone could do it. It's closer than it was.
Common Pitfalls
"All proteins do one thing." Many proteins are multi-functional, especially in different contexts. A protein might work as an enzyme in one cell type and a transcription factor in another
"Protein sequence determines structure." It does, at the fundamental level. It does not always in practice, because cellular context matters: chaperones, crowding, modifications. AlphaFold predicts the native fold; it doesn't tell you what happens in vivo all the time
"Bigger proteins are more sophisticated." Not particularly. Titin is the largest known protein (34,000+ amino acids) and its job is mechanical: it's muscle tissue scaffolding. Small proteins can be highly sophisticated
"Enzymes work in one direction only." Many reactions are reversible. The enzyme favours the direction determined by substrate concentrations, not anything built into itself
"Folding is fast." Small proteins fold in milliseconds. Large ones can take much longer and may need chaperones to reach their native shape. Misfolding is common
Next Steps
Continue to 06-genetics.md for how traits pass between generations.