Late last year, we described a genome sequencing technique that brought the price of consumables down to under $5,000. That technique, offered by Complete Genomics, has now been put to use: all the genomes have been obtained from a family of four in which both children suffer from two genetic disorders. In addition to identifying likely causative mutations, the full family pedigree has produced new measures of human mutation and recombination.
So far, as each genome has been completed, it's typically been compared to a reference genome that's meant to represent a "typical" human. But the human population is large and diverse, and the differences between a typical person and the reference may have been present in our population for thousands of years. In contrast, by knowing the sequence of a child and both its parents, the changes in DNA that occur as a result of recombinations and mutations in each parent's germ cells can be tracked in exquisite detail.
Of course, you first have to get rid of the errors. Any method of sequencing DNA has a known error rate, and there are certain sequences in the genome that are more prone to these mistakes than others.
All together, sequencing the genome identified over 4 million bases in which at least one of the genomes differed from the reference sequence. But nearly a million of these were identical in all four individuals; another 3.4 million had been identified as sites of common variations within the human genome. When all of these were eliminated, there were only 323,255 base changes that appeared to be distinct to this family. The authors of the paper then focused on getting rid of some of the errors.
Some of these errors are caused by repetitive sequences, which sometimes cause genome assembly algorithms to delete portions of the repeat. That dropped the number down to about 50,000 differences that were either new mutations or sequencing errors. They resequenced every one of these regions and found only 28 that appeared to be new mutations in the offspring; each of these was directly confirmed by mass spectroscopy of the relevant stretch of DNA. The authors estimate that, all told, they've eliminated approximately 70 percent of the sequencing errors, producing an accuracy of 99.999 percent.
After estimating the false negative rate, the authors concluded that humans have a mutation rate of 1.1x10-8 at each base, which means that every individual is likely to have been born with approximately 70 new mutations. That's a bit less than half of previous estimates, but it's within the range defined by our differences with the chimp genome and an estimated time of divergence of 5 million years.
The precise map of differences also enabled the authors to track where pieces of the original parental genomes had been swapped by recombination. The precise bases can't be identified, given that most of the genome is identical in all four individuals, but they were able to identify 155 crossover sites within a median precision of 2,600 bases. Most of these occurred within known "hotspots" of recombination that had been identified previously.
As if all of this data wasn't enough, the family itself had been chosen because both children (and neither parent) suffers from two genetic diseases: Miller syndrome and primary ciliary dyskinesia. The former has not had a gene definitively associated with it; the latter has had a number.
The simplest explanation of this would be a single recessive mutation, with the parents heterozygous, and the offspring homozygous. Since the diseases are rare, the authors excluded any base differences that have been previously identified as common within the human population. Only a single gene fit this pattern when identical mutations were considered. But it's possible for different mutations in the same gene to cause a single phenotype, with each parent carrying a distinct base change. Three additional genes matched this pattern.
So the authors sequenced these genes in two other individuals with Miller's syndrome, and identified DHODH, a gene previously suggested to be associated with the disease as its likely cause. One surprise is that the primary ciliary dyskinesia is likely to be caused by a completely separate mutation. DNAH5, another of the four genes to come through this analysis, had previously been identified as a cause of that disorder. So, the family appears to be unlucky enough to be dealing with two rare, recessive mutations.
That we've reached the point where this work is even possible is simply amazing. I've really got no words for the fact that it was done by a mere 15-author collaboration, only two of whom hail from Complete Genomics. As the authors point out, we're at the point where it may be cheaper and easier to sequence entire pedigrees than hunt down enough affected individuals to identify a Mendelian trait by traditional methods.
The fact that the approach generates additional useful data—things like human mutation rates and recombination locations have always been based on much rougher estimates—is really quite a significant bonus. I anticipate that this won't be the first paper of its sort, and these estimates will continue to be refined as more family pedigrees are available at the genome level.
Science, 2010. DOI: 10.1126/science.1186802 (About DOIs).