The Competitiveness of Nations in a Global Knowledge-Based Economy

Matt Ridley

Genome: The Autobiography of a Species in 23 Chapters

HarperCollins, NYC, 1999

Content

The human genome - the complete set of human genes - comes packaged in twenty-three separate pairs of chromosomes. Of these, twenty-two pairs are numbered in approximate order of size, from the largest (number 1) to the smallest (number 22), while the remaining pair consists of the sex chromosomes: two large X chromosomes in women, one X and one small Y in men. In size, the X comes between chromosomes 7 and 8, whereas the Y is the smallest.

The number 23 is of no significance. Many species, including our closest relatives among the apes, have more chromosomes, and many have fewer. Nor do genes of similar function and type necessarily cluster on the same chromosome. So a few years ago, leaning over a lap-top computer talking to David Haig, an evolutionary biologist, I was slightly startled to hear him say that chromosome 19 was his favourite chromosome. It has all sorts of mischievous genes on it, he explained. I had never thought of chromosomes as having personalities before. They are, after all, merely arbitrary collections of genes. But Haig’s chance remark planted an idea in my head and I could not get it out. Why not try to tell the unfolding story of the human genome, now being discovered in detail for the first time, chromosome by chromosome, by picking a gene from

each chromosome to fit the story as it is told? Primo Levi did something similar with the periodic table of the elements in his autobiographical short stories. He related each chapter of his life to an element, one that he had had some contact with during the period he was describing.

I began to think about the human genome as a sort of autobiography in its own right - a record, written in ‘genetish’, of all the vicissitudes and inventions that had characterised the history of our species and its ancestors since the very dawn of life. There are genes that have not changed much since the very first single-celled creatures populated the primeval ooze. There are genes that were developed when our ancestors were worm-like. There are genes that must have first appeared when our ancestors were fish. There are genes that exist in their present form only because of recent epidemics of disease. And there are genes that can be used to write the history of human migrations in the last few thousand years. From four billion years ago to just a few hundred years ago, the genome has been a sort of autobiography for our species, recording the important events as they occurred.

I wrote down a list of the twenty-three chromosomes and next to each I began to list themes of human nature. Gradually and painstakingly I began to find genes that were emblematic of my story. There were frequent frustrations when I could not find a suitable gene, or when I found the ideal gene and it was on the wrong chromosome. There was the puzzle of what to do with the X and Y chromosomes, which I have placed after chromosome 7, as befits the X chromosome’s size. You now know why the last chapter of a book that boasts in its subtitle that it has twenty-three chapters is called Chapter 22.

It is, at first glance, a most misleading thing that I have done. I may seem to be implying that chromosome 1 came first, which it did not. I may seem to imply that chromosome 11 is exclusively concerned with human personality, which it is not. There are probably 60,000-80,000 genes in the human genome and I could not tell you about all of them, partly because fewer than 8,000 have

been found (though the number is growing by several hundred a month) and partly because the great majority of them are tedious biochemical middle managers.

But what I can give you is a coherent glimpse of the whole: a whistle-stop tour of some of the more interesting sites in the genome and what they tell us about ourselves. For we, this lucky generation, will be the first to read the book that is the genome. Being able to read the genome will tell us more about our origins, our evolution, our nature and our minds than all the efforts of science to date. It will revolutionise anthropology, psychology, medicine, palaeontology and virtually every other science. This is not to claim that everything is in the genes, or that genes matter more than other factors. Clearly, they do not. But they matter, that is for sure.

This is not a book about the Human Genome Project - about mapping and sequencing techniques - but a book about what that project has found. Some time in the year 2000, we shall probably have a rough first draft of the complete human genome. In just a few short years we will have moved from knowing almost nothing about our genes to knowing everything. I genuinely believe that we are living through the greatest intellectual moment in history. Bar none. Some may protest that the human being is more than his genes. I do not deny it. There is much, much more to each of us than a genetic code. But until now human genes were an almost complete mystery. We will be the first generation to penetrate that mystery. We stand on the brink of great new answers but, even more, of great new questions. This is what I have tried to convey in this book.

Primer

The second part of this preface is intended as a brief primer, a sort of narrative glossary, on the subject of genes and how they work. I hope that readers will glance through it at the outset and return to it at intervals if they come across technical terms that are not explained. Modem genetics is a formidable thicket of jargon. I have

tried hard to use the bare minimum of technical terms in this book, but some are unavoidable.

The human body contains approximately 1,00 trillion (million million) CELLS, most of which are less than a tenth of a millimetre across. Inside each cell there is a black blob called a NUCLEUS. Inside the nucleus are two complete sets of the human GENOME (except in egg cells and sperm cells, which have one copy each, and red blood cells, which have none). One set of the genome came from the mother and one from the father. In principle, each set includes the same 60,000-80,000 GENES on the same twenty-three CHROMOSOMES. In practice, there are often small and subtle differences between the paternal and maternal versions of each gene, differences that account for blue eyes or brown, for example. When we breed, we pass on one complete set, but only after swapping bits of the paternal and maternal chromosomes in a procedure known as RECOMBINATION.

Imagine that the genome is a book.

There are twenty-three chapters, called CHROMOSOMES.

Each chapter contains several thousand stories, called GENES.

Each story is made up of paragraphs, called EXONS, which are interrupted by advertisements called INTRONS.

Each paragraph is made up of words, called CODONS. Each word is written in letters called BASES.

There are one billion words in the book, which makes it longer than 5,000 volumes the size of this one, or as long as 800 Bibles. If I read the genome out to you at the rate of one word per second for eight hours a day, it would take me a century. If I wrote out the human genome, one letter per millimetre, my text would be as long as the River Danube. This is a gigantic document, an immense book, a recipe of extravagant length, and it all fits inside the microscopic nucleus of a tiny cell that fits easily upon the head of a pin.

The idea of the genome as a book is not, strictly speaking, even a metaphor. It is literally true. A book is a piece of digital information,

written in linear, one-dimensional and one-directional form and defined by a code that transliterates a small alphabet of signs into a large lexicon of meanings through the order of their groupings. So is a genome. The only complication is that all English books read from left to right, whereas some parts of the genome read from left to right, and some from right to left, though never both at the same time.

(Incidentally, you will not find the tired word ‘blueprint’ in this book, after this paragraph, for three reasons. First, only architects and engineers use blueprints and even they are giving them up in the computer age, whereas we all use books. Second, blueprints are very bad analogies for genes. Blueprints are two-dimensional maps, not one-dimensional digital codes. Third, blueprints are too literal for genetics, because each part of a blueprint makes an equivalent part of the machine or building; each sentence of a recipe book does not make a different mouthful of cake.)

Whereas English books are written in words of variable length using twenty-six letters, genomes are written entirely in three-letter words, using only four letters: A, C, G and T (which stand for adenine, cytosine, guanine and thymine). And instead of being written on flat pages, they are written on long chains of sugar and phosphate called DNA molecules to which the bases are attached as side rungs. Each chromosome is one pair of (very) long DNA molecules.

The genome is a very clever book, because in the right conditions it can both photocopy itself and read itself. The photocopying is known as REPLICATION, and the reading as TRANSLATION. Replication works because of an ingenious property of the four bases:

A likes to pair with T, and G with C. So a single strand of DNA can copy itself by assembling a complementary strand with Ts opposite all the As, As opposite all the Ts, Cs opposite all the Gs and Gs opposite all the Cs. In fact, the usual state of DNA is the famous DOUBLE HELIX of the original strand and its complementary pair intertwined.

To make a copy of the complementary strand therefore brings

back the original text. So the sequence ACGT become TGCA in the copy, which transcribes back to ACGT in the copy of the copy. This enables DNA to replicate indefinitely, yet still contain the same information.

Translation is a little more complicated. First the text of a gene is TRANSCRIBED into a copy by the same base-pairing process, but this time the copy is made not of DNA but of RNA, a very slightly different chemical. RNA, too, can carry a linear code and it uses the same letters as DNA except that it uses U, for uradil, in place of T. This RNA copy, called the MESSENGER RNA, is then edited by the excision of all introns and the splicing together of all exons (see above).

The messenger is then befriended by a microscopic machine called a RIBOSOME, itself made partly of RNA. The ribosome moves along the messenger, translating each three-letter codon in turn into one letter of a different alphabet, an alphabet of twenty different AMINO ACIDS, each brought by a different version of a molecule called TRANSFER RNA. Each amino acid is attached to the last to form a chain in the same order as the codons. When the whole message has been translated, the chain of amino acids folds itself up into a distinctive shape that depends on its sequence. It is now known as a PROTEIN.

Almost everything in the body, from hair to hormones, is either made of proteins or made by them. Every protein is a translated gene. In particular, the body’s chemical reactions are catalysed by proteins known as ENZYMES. Even the processing, photocopying error-correction and assembly of DNA and RNA molecules themselves - the replication and translation - are done with the help of proteins. Proteins are also responsible for switching genes on and off, by physically attaching themselves to PROMOTER and ENHANCER sequences near the start of a gene’s text. Different genes are switched on in different parts of the body.

When genes are replicated, mistakes are sometimes made. A letter (base) is occasionally missed out or the wrong letter inserted. Whole sentences or paragraphs are sometimes duplicated, omitted or

reversed. This is known as MUTATION. Many mutations are neither harmful nor beneficial, for instance if they change one codon to another that has the same amino acid ‘meaning’: there are sixty-four different codons and only twenty amino acids, so many DNA ‘words’ share the same meaning. Human beings accumulate about one hundred mutations per generation, which may not seem much given that there are more than a million codons in the human genome, but in the wrong place even a single one can be fatal.

All rules have exceptions (including this one). Not all human genes are found on the twenty-three principal chromosomes; a few live inside little blobs called mitochondria and have probably done so ever since mitochondria were free-living bacteria. Not all genes are made of DNA: some viruses use RNA instead. Not all genes are recipes for proteins. Some genes are transcribed into RNA but not translated into protein; the RNA goes directly to work instead either as part of a ribosome or as a transfer RNA. Not all reactions are catalysed by proteins; a few are catalysed by RNA instead. Not every protein comes from a single gene; some are put together from several recipes. Not all of the sixty-four three-letter codons specifies an amino acid: three signify STOP commands instead. And finally, not all DNA spells out genes. Most of it is a jumble of repetitive or random sequences that is rarely or never transcribed: the so-called junk DNA.

That is all you need to know. The tour of the human genome can begin.

[blank]

Chromosome 1

Life

All forms that perish other forms supply,

(By turns we catch the vital breath and die)

Like bubbles on the sea of matter borne,

They rise, they break, and to that sea return.

Alexander Pope, An Essay on Man

In the beginning was the word. The word proselytised the sea with its message, copying itself unceasingly and forever. The word discovered how to rearrange chemicals so as to capture little eddies in the stream of entropy and make them live. The word transformed the land surface of the planet from a dusty hell to a verdant paradise. The word eventually blossomed and became sufficiently ingenious to build a porridgy contraption called a human brain that could discover and be aware of the word itself.

My porridgy contraption boggles every time I think this thought. In four thousand million years of earth history, I am lucky enough to be alive today. In five million species, I was fortunate enough to be born a conscious human being. Among six thousand million people on the planet, I was privileged enough to be born in the

country where the word was discovered. In all of the earth’s history, biology and geography, I was born just five years after the moment when, and just two hundred miles from the place where, two members of my own species discovered the structure of DNA and hence uncovered the greatest, simplest and most surprising secret in the universe. Mock my zeal if you wish; consider me a ridiculous materialist for investing such enthusiasm in an acronym. But follow me on a journey back to the very origin of life, and I hope I can convince you of the immense fascination of the word.

‘As the earth and ocean were probably peopled with vegetable productions long before the existence of animals; and many families of these animals long before other families of them, shall we conjecture that one and the same kind of living filaments is and has been the cause of all organic life?’ asked the polymathic poet and physician Erasmus Darwin in 1794. [1] It was a startling guess for the time, not only in its bold conjecture that all organic life shared the same origin, sixty-five years before his grandson Charles’ book on the topic, but for its weird use of the word ‘filaments’. The secret of life is indeed a thread.

Yet how can a filament make something live? Life is a slippery thing to define, but it consists of two very different skills: the ability to replicate, and the ability to create order. Living things produce approximate copies of themselves: rabbits produce rabbits, dandelions make dandelions. But rabbits do more than that. They eat grass, transform it into rabbit flesh and somehow build bodies of order and complexity from the random chaos of the world. They do not defy the second law of thermodynamics, which says that in a closed system everything tends from order towards disorder, because rabbits are not closed systems. Rabbits build packets of order and complexity called bodies but at the cost of expending large amounts of energy. In Erwin Schrodinger’s phrase, living creatures ‘drink orderliness’ from the environment.

The key to both of these features of life is information. The ability to replicate is made possible by the existence of a recipe, the information that is needed to create a new body. A rabbit’s egg

carries the instructions for assembling a new rabbit. But the ability to create order through metabolism also depends on information - the instructions for building and maintaining the equipment that creates the order. An adult rabbit, with its ability to both reproduce and metabolise, is prefigured and presupposed in its living filaments in the same way that a cake is prefigured and presupposed in its recipe. This is an idea that goes right back to Aristotle, who said that the ‘concept’ of a chicken is implicit in an egg, or that an acorn was literally ‘informed’ by the plan of an oak tree. When Aristotle’s dim perception of information theory, buried under generations of chemistry and physics, re-emerged amid the discoveries of modern genetics, Max Delbruck joked that the Greek sage should be given a posthumous Nobel prize for the discovery of DNA. [2]

The filament of DNA is information, a message written in a code of chemicals, one chemical for each letter. It is almost too good to be true, but the code turns out to be written in a way that we can understand. Just like written English, the genetic code is a linear language, written in a straight line. Just like written English, it is digital, in that every letter bears the same importance. Moreover, the language of DNA is considerably simpler than English, since it has an alphabet of only four letters, conventionally known as A, C, G and T.

Now that we know that genes are coded recipes, it is hard to recall how few people even guessed such a possibility. For the first half of the twentieth century, one question reverberated unanswered through biology: what is a gene? It seemed almost impossibly mysterious. Go back not to 1953, the year of the discovery of DNA’s symmetrical structure, but ten years further, to 1943. Those who will do most to crack the mystery, a whole decade later, are working on other things in 1943. Francis Crick is working on the design of naval mines near Portsmouth. At the same time James Watson is just enrolling as an undergraduate at the precocious age of fifteen at the University of Chicago; he is determined to devote his life to ornithology. Maurice Wilkins is helping to design the atom bomb in the United States. Rosalind Franklin is studying the structure of coal for the British government.

In Auschwitz in 1943, Josef Mengele is torturing twins to death in a grotesque parody of scientific inquiry. Mengele is trying to understand heredity, but his eugenics proves not to be the path to enlightenment. Mengele’s results will be useless to future scientists.

In Dublin in 1943, a refugee from Mengele and his ilk, the great physicist Erwin Schrodinger is embarking on a series of lectures at Trinity College entitled What is life?’ He is trying to define a problem. He knows that chromosomes contain the secret of life, but he cannot understand how: ‘It is these chromosomes... that contain in some kind of code-script the entire pattern of the individual’s future development and of its functioning in the mature state.’ The gene, he says, is too small to be anything other than a large molecule, an insight that will inspire a generation of scientists, including Crick, Watson, Wilkins and Franklin, to tackle what suddenly seems like a tractable problem. Having thus come tantalisingly close to the answer, though, Schrodinger veers off track. He thinks that the secret of this molecule’s ability to carry heredity lies in his beloved quantum theory, and is pursuing that obsession down what will prove to be a blind alley. The secret of life has nothing to do with quantum states. The answer will not come from physics. [3]

In New York in 1943, a sixty-six-year-old Canadian scientist, Oswald Avery, is putting the finishing touches to an experiment that will decisively identify DNA as the chemical manifestation of heredity. He has proved in a series of ingenious experiments that a pneumonia bacterium can be transformed from a harmless to a virulent strain merely by absorbing a simple chemical solution. By 1943, Avery has concluded that the transforming substance, once purified, is DNA. But he will couch his conclusions in such cautious language for publication that few will take notice until much later. In a letter to his brother Roy written in May ‘943, Avery is only slightly less cautious: [4]

If we are right, and of course that’s not yet proven, then it means that nucleic acids [DNA] are not merely structurally important but functionally active substances in determining the biochemical activities and specific

characteristics of cells - and that by means of a known chemical substance it is possible to induce predictable and hereditary changes in cells. That is something that has long been the dream of geneticists.

Avery is almost there, but he is still thinking along chemical lines. ‘All life is chemistry’, said Jan Baptista van Helmont in 1648, guessing. At least some life is chemistry, said Friedrich Wöhler in 1828 after synthesising urea from ammonium chloride and silver cyanide, thus breaking the hitherto sacrosanct divide between the chemical and biological worlds: urea was something that only living things had produced before. That life is chemistry is true but boring, like saying that football is physics. Life, to a rough approximation, consists of the chemistry of three atoms, hydrogen, carbon and oxygen, which between them make up ninety-eight per cent of all atoms in living beings. But it is the emergent properties of life - such as heritability - not the constituent parts that are interesting. Avery cannot conceive what it is about DNA that enables it to hold the secret of heritable properties. The answer will not come from chemistry.

In Bletchley, in Britain, in 1943, in total secrecy, a brilliant mathematician, Alan Turing, is seeing his most incisive insight turned into physical reality. Turing has argued that numbers can compute numbers. To crack the Lorentz encoding machines of the German forces, a computer called Colossus has been built based on Turing’s principles: it is a universal machine with a modifiable stored program. Nobody realises it at the time, least of all Turing, but he is probably closer to the mystery of life than anybody else. Heredity is a modifiable stored program; metabolism is a universal machine. The recipe that links them is a code, an abstract message that can be embodied in a chemical, physical or even immaterial form. Its secret is that it can cause itself to be replicated. Anything that can use the resources of the world to get copies of itself made is alive; the most likely form for such a thing to take is a digital message - a number, a script or a word. [5]

In New Jersey in 1943, a quiet, reclusive scholar named Claude Shannon is ruminating about an idea he had first had at Princeton

a few years earlier. Shannon’s idea is that information and entropy are opposite faces of the same coin and that both have an intimate link with energy. The less entropy a system has, the more information it contains. The reason a steam engine can harness the energy from burning coal and turn it into rotary motion is because the engine has high information content - information injected into it by its designer. So does a human body. Aristotle’s information theory meets Newton’s physics in Shannon’s brain. Like Turing, Shannon has no thoughts about biology. But his insight is of more relevance to the question of what is life than a mountain of chemistry and physics. Life, too, is digital information written in DNA. [6]

In the beginning was the word. The word was not DNA. That came afterwards, when life was already established, and when it had divided the labour between two separate activities: chemical work and information storage, metabolism and replication. But DNA contains a record of the word, faithfully transmitted through all subsequent aeons to the astonishing present.

Imagine the nucleus of a human egg beneath the microscope. Arrange the twenty-three chromosomes, if you can, in order of size, the biggest on the left and the smallest on the right. Now zoom in on the largest chromosome, the one called, for purely arbitrary reasons, chromosome 1. Every chromosome has a long arm and a short arm separated by a pinch point known as a centromere. On the long arm of chromosome 1, close to the centromere, you will find, if you read it carefully, that there is a sequence of 120 letters - As, Cs, Gs and Ts - that repeats over and over again. Between each repeat there lies a stretch of more random text, but the 120-letter paragraph keeps coming back like a familiar theme tune, in all more than 100 times. This short paragraph is perhaps as close as we can get to an echo of the original word.

This ‘paragraph’ is a small gene, probably the single most active gene in the human body. Its 120 letters are constantly being copied into a short filament of RNA. The copy is known as ₅S RNA. It sets up residence with a lump of proteins and other RNAs, carefully intertwined, in a ribosome, a machine whose job is to translate

DNA recipes into proteins. And it is proteins that enable DNA to replicate. To paraphrase Samuel Butler, a protein is just a gene’s way of making another gene; and a gene is just a protein’s way of making another protein. Cooks need recipes, but recipes also need cooks. Life consists of the interplay of two kinds of chemicals: proteins and DNA.

Protein represents chemistry, living, breathing, metabolism and behaviour - what biologists call the phenotype. DNA represents information, replication, breeding, sex - what biologists call the genotype. Neither can exist without the other. It is the classic case of chicken and egg: which came first, DNA or protein? It cannot have been DNA, because DNA is a helpless, passive piece of mathematics, which catalyses no chemical reactions. It cannot have been protein, because protein is pure chemistry with no known way of copying itself accurately. It seems impossible either that DNA invented protein or vice versa. This might have remained a baffling and strange conundrum had not the word left a trace of itself faintly drawn on the filament of life. Just as we now know that eggs came long before chickens (the reptilian ancestors of all birds laid eggs), so there is growing evidence that RNA came before proteins.

RNA is a chemical substance that links the two worlds of DNA and protein. It is used mainly in the translation of the message from the alphabet of DNA to the alphabet of proteins. But in the way it behaves, it leaves little doubt that it is the ancestor of both. RNA was Greece to DNA’s Rome: Homer to her Virgil.

RNA was the word. RNA left behind five little clues to its priority over both protein and DNA. Even today, the ingredients of DNA are made by modifying the ingredients of RNA, not by a more direct route. Also DNA’s letter Ts are made from RNA’s letter Us. Many modern enzymes, though made of protein, rely on small molecules of RNA to make them work. Moreover, RNA, unlike DNA and protein, can copy itself without assistance: give it the right ingredients and it will stitch them together into a message. Wherever you look in the cell, the most primitive and basic functions require the presence of RNA. It is an RNA-dependent enzyme

that takes the message, made of RNA, from the gene. It is an RNA-contaiing machine, the ribosome, that translates that message, and it is a little RNA molecule that fetches and carries the amino acids for the translation of the gene’s message. But above all, RNA - unlike DNA - can act as a catalyst, breaking up and joining other molecules including RNAs themselves. It can cut them up, join the ends together, make some of its own building blocks, and elongate a chain of RNA. It can even operate on itself cutting out a chunk of text and splicing the free ends together again. [7]

The discovery of these remarkable properties of RNA in the early 1980s, made by Thomas Cech and Sidney Altman, transformed our understanding of the origin of life. It now seems probable that the very first gene, the ‘ur-gene’, was a combined replicator-catalyst, a word that consumed the chemicals around it to duplicate itself. It may well have been made of RNA. By repeatedly selecting random RNA molecules in the test tube based on their ability to catalyse reactions, it is possible to ‘evolve’ catalytic RNAs from scratch - almost to rerun the origin of life. And one of the most surprising results is that these synthetic RNAs often end up with a stretch of RNA text that reads remarkably like part of the text of a ribosomal RNA gene such as the ₅S gene on chromosome 1.

Back before the first dinosaurs, before the first fishes, before the first worms, before the first plants, before the first fungi, before the first bacteria, there was an RNA world - probably somewhere around four billion years ago, soon after the beginning of planet earth’s very existence and when the universe itself was only ten billion years old. We do not know what these ‘ribo-organisms’ looked like. We can only guess at what they did for a living, chemically speaking. We do not know what came before them. We can be pretty sure that they once existed, because of the clues to RNA’s role that survive in living organisms today. [8]

These ribo-organisms had a big problem. RNA is an unstable substance, which falls apart within hours. Had these organisms ventured anywhere hot, or tried to grow too large, they would have faced what geneticists call an error catastrophe - a rapid decay of

the message in their genes. One of them invented by trial and error a new and tougher version of RNA called DNA and a system for making RNA copies from it, including a machine we’ll call the proto-ribosome. It had to work fast and it had to be accurate. So it stitched together genetic copies three letters at a time, the better to be fast and accurate. Each threesome came flagged with a tag to make it easier for the proto-ribosome to find, a tag that was made of amino acid. Much later, those tags themselves became joined together to make proteins and the three-letter word became a form of code for the proteins - the genetic code itself. (Hence to this day, the genetic code consists of three-letter words, each spelling out a particular one of twenty amino acids as part of a recipe for a protein.) And so was born a more sophisticated creature that stored its genetic recipe on DNA, made its working machines of protein and used RNA to bridge the gap between them.

Her name was Luca, the Last Universal Common Ancestor. What did she look like, and where did she live? The conventional answer is that she looked like a bacterium and she lived in a warm pond, possibly by a hot spring, or in a marine lagoon. In the last few years it has been fashionable to give her a more sinister address, since it became clear that the rocks beneath the land and sea are impregnated with billions of chemical-fuelled bacteria. Luca is now usually placed deep underground, in a fissure in hot igneous rocks, where she fed on sulphur, iron, hydrogen and carbon. To this day, the surface life on earth is but a veneer. Perhaps ten times as much organic carbon as exists in the whole biosphere is in thermophilic bacteria deep beneath the surface, where they are possibly responsible for generating what we call natural gas. [9]

There is, however, a conceptual difficulty about trying to identify the earliest forms of life. These days it is impossible for most creaures to acquire genes except from their parents, but that may not always have been so. Even today, bacteria can acquire genes from other bacteria merely by ingesting them. There might once have been widespread trade, even burglary, of genes. In the deep past chromosomes were probably numerous and short, containing just one

gene each, which could be lost or gained quite easily. If this was so, Carl Woese points out, the organism was not yet an enduring entity. It was a temporary team of genes. The genes that ended up in all of us may therefore have come from lots of different ‘species’ of creature and it is futile to try to sort them into different lineages. We are descended not from one ancestral Luca, but from the whole community of genetic organisms. Life, says Woese, has a physical history, but not a genealogical one. [10]

You can look on such a conclusion as a fuzzy piece of comforting, holistic, communitarian philosophy - we are all descended from society, not from an individual species - or you can see it as the ultimate proof of the theory of the selfish gene: in those days, even more than today, the war was carried on between genes, using organisms as temporary chariots and forming only transient alliances; today it is more of a team game. Take your pick.

Even if there were lots of Lucas, we can still speculate about where they lived and what they did for a living. This is where the second problem with the thermophilic bacteria arises. Thanks to some brilliant detective work by three New Zealanders published in 1998, we can suddenly glimpse the possibility that the tree of life, as it appears in virtually every textbook, may be upside down. Those books assert that the first creatures were like bacteria, simple cells with single copies of circular chromosomes, and that all other living things came about when teams of bacteria ganged together to make complex cells. It may much more plausibly be the exact reverse. The very first modern organisms were not like bacteria; they did not live in hot springs or deep-sea volcanic vents. They were much more like protozoa: with genomes fragmented into several linear chromosomes rather than one circular one, and ‘polyploid’ - that is, with several spare copies of every gene to help with the correction of spelling errors. Moreover, they would have liked cool climates. As Patrick Forterre has long argued, it now looks as if bacteria came later, highly specialised and simplified descendants of the Lucas, long after the invention of the DNA-protein world. Their trick was to drop much of the equipment of the RNA world specifically to

enable them to live in hot places. It is we that have retained the primitive molecular features of the Lucas in our cells; bacteria are much more ‘highly evolved’ than we are.

This strange tale is supported by the existence of molecular ‘fossils’ - little bits of RNA that hang about inside the nucleus of your cells doing unnecessary things such as splicing themselves out of genes: guide RNA, vault RNA, small nuclear RNA, small nucleolar RNA, self-splicing introns. Bacteria have none of these, and it is more parsimonious to believe that they dropped them rather than we invented them. (Science, perhaps surprisingly, is supposed to treat simple explanations as more probable than complex ones unless given reason to think otherwise; the principle is known in logic as Occam’s razor.) Bacteria dropped the old RNAs when they invaded hot places like hot springs or subterranean rocks where temperatures can reach 170 °C - to minimise mistakes caused by heat, it paid to simplify the machinery. Having dropped the RNAs, bacteria found their new streamlined cellular machinery made them good at competing in niches where speed of reproduction was an advantage - such as parasitic and scavenging niches. We retained those old RNAs, relics of machines long superseded, but never entirely thrown away. Unlike the massively competitive world of bacteria, we - that is all animals, plants and fungi - never came under such fierce competition to be quick and simple. We put a premium instead on being complicated, in having as many genes as possible, rather than a streamlined machine for using them. [11]

The three-letter words of the genetic code are the same in every creature. CGA means arginine and GCG means alanine - in bats, in beetles, in beech trees, in bacteria. They even mean the same in the misleadingly named archaebacteria living at boiling temperatures in sulphurous springs thousands of feet beneath the surface of the Atlantic ocean or in those microscopic capsules of deviousness called viruses. Wherever you go in the world, whatever animal, plant, bug or blob you look at, if it is alive, it will use the same dictionary and know the same code. All life is one. The genetic code, bar a few tiny local aberrations, mostly for unexplained reasons in the ciliate

protozoa, is the same in every creature. We all use exactly the same language.

This means - and religious people might find this a useful argument - that there was only one creation, one single event when life was born. Of course, that life might have been born on a different planet and seeded here by spacecraft, or there might even have been thousands of kinds of life at first, but only Luca survived in the ruthless free-for-all of the primeval soup. But until the genetic code was cracked in the 1960s, we did not know what we now know: that all life is one; seaweed is your distant cousin and anthrax one of your advanced relatives. The unity of life is an empirical fact. Erasmus Darwin was outrageously close to the mark: ‘One and the same kind of living filaments has been the cause of all organic life.’

In this way simple truths can be read from the book that is. the genome: the unity of all life, the primacy of RNA, the chemistry of the very earliest life on the planet, the fact that large, single-celled creatures were probably the ancestors of bacteria, not vice versa. We have no fossil record of the way life was four billion years ago. We have only this great book of life, the genome. The genes in the cells of your little finger are the direct descendants of the first replicator moircules; through an unbroken chain of tens of billions of copyings, they come to us today still bearing a digital message that has traces of those earliest struggles of life. If the human genome can tell us things about what happened in the primeval soup, how much more can it tell us about what else happened during the succeeding four million millennia. It is a record of our history written in the code for a working machine.

NOTES
CHR0M0S0ME 1

The idea that the gene and indeed life itself consists of digital information is found in Richard Dawkins’s River out of Eden (Weidenfeld and Nicolson, 1995) and in Jeremy Campbell’s Grammatical man (Allen Lane, 1983). An excellent account of the debates that still rage about the origin of life is found in Paul Davies’s The fifth miracle (Penguin, 1998). For more detailed information on the RNA world, see Gesteland, R. F. and Atkins, J. F. (eds) (1993). The RNA world. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

[1] Darwin, E. (1794) Zoonomia: or the laws of organic life. Vol. II, p. 244. Third edition (1801). J. Johnson, London.
[2]Campbell, J. (1983). Grammatical man: information, entropy, language and lifè., Allen Lane, London.
[3] Schrodinger, E. (1967). What is life? Mind and matter. Cambridge University Press, Cambridge.
[4] Quoted in Judson, H. F. (1979). The eighth day of creation. Jonathan Cape, London.

315

[5] Hodges, A. (1997). Turing. Phoenix, London.
[6] Campbell, J. (1983) Grammatical man: information, entropy, language and lifè. Allen Lane, London.
[7] Joyce, G. F. (1989). RNA evolution and the origins of life. Nature 338: 217-24; Unrau, P. J. and Bartel, D. P. (1998). RNA-catalysed nucleotide synthesis. Nature 395: 260-63.
[8] Gesteland, R. F. and Atkins, J. F. (eds) (1983). The RNA world. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
[9]Gold, T. (1992). The deep, hot biosphere. Proceedings of the National Academy of Sciences of the USA 89: 6045-49; Gold, T. (1997). An unexplored habitat for life in the universe? American Scientist 85: 408-11.
[10] Woese, C. (1998). The universal ancestor. Proceedings of the National Academy of Sciences of the USA 95: 6854-9.
[11] Poole, A. M., Jeffares, D. C and Penny, D. (1998). The path from the RNA world. Journal of Molecular Evolution 46: 1-17; Jeffares, D. C., Poole, A. M. and Penny, D. (1998). Relics from the RNA world. Journal of Molecular Evolution 46: 18-36.

316