Two main methods of genome mapping
| Genetic mapping | Physical mapping |
|---|---|
| Establishment of relative positions of genes. | Assigning genes to particular positions along a DNA strand (along a chromosome) |
| Measurement of a tendency of genes to segregate together through meiosis. | Studying somatic cells and their genetic material |
| Performed in family studies. | Involves some physical measurements/procedures |
Genetic mapping
Definitions
Locus (pl. loci)-
a position of a gene along the chromosome.
Allele
Allele is a form of a gene - most of the genes have different forms with different phenotypic effects.
Genetic mapping
- The basis was established in the beginning of this century by Morgan.
- Allows identification of genes that are detectable only be their phenotypic effect.
One can map a gene not knowing its function nor sequence.
Meiosis and cross-over - basis for linkage
Recombination can be detected indirectly
It is not possible to see if the recombination has occured. It is a molecular event. But analysis of the progeny phenotype can tell if a recombination has occurred, or not
Genetic linkage
- Assuming that the cross-over is a random process, the frequency of recombination between two loci would reflect the distance between two gene loci.
- Genetic linkage is a tendency of 2 genes not to recombine (another words: to pass together) through the meiosis.
- Measuring this tendency would allow to estimate the genetic distance (or closeness) between two loci.
- Important: two genes on the same chromosome (syntenic) are not necessarily linked.
Units of linkage - ‘Morgans’
- An unit - Morgan - for genetic linkage is such genetic length of a chromosome over which one recombination event is observed per meiosis
- This means that at such a distance there is 100% chance of occurring a cross-over
- centiMorgan (cM) means 1% chance of recombination
Strategies for genetic mapping
- Linkage analysis in organisms such as fruit fly or mouse which can be subjects of experimental test crosses
- Linkage analysis in humans which cannot be test crossed but the analysis of family pedigrees can be adopted
- Linkage analysis in bacteria which do not go through meiosis
Pedigrees for linkage analysis
- A family in which we intend to map a gene must also fulfill following criteria:
- Must be informative - that is parent which carry a disease locus is heterozygous both at disease locus and marker locus
- Phase must be known that is what is alignment of disease and marker alleles
Example of pedigree analysis in human
Possible interpretations of the pedigree
Mother’s chromosomes:
Hypthesis I Hypothesis II
disease | M1 health | M1
health | M2 disease | M2
1 disease | M1 parental recombined
2 health | M2 parental recombined
3 disease | M1 parental recombined
4 disease | M1 parental recombined
5 health | M2 parental recombined
6 disease | M2 recombined parental
Recombination frequency 1/6=6,7% 5/6=83,3%
Nail patella syndrome and ABO blood group
Nail-patella syndrome is AD syndrome, always showing some expression with close linkage to ABO blood group locus (10 cM)
Multiple dysplasias of osseous and other mesenchymal tissues (hypoplastic and split nails, hypoplastic to absent patella, dark, cloverleaf pigmentation at inner margin of inner margin of iris
LOD score
To estimate if two loci are linked we need to:
- Calculate series of likelihoods that two loci are linked at various values of θ (theta) where θ=0.00 - no recombination up to θ=0.50 - random assortment
- Calculate that these loci are not linked (theta=>0.50)
Logarithm of ratio of those two values gives LOD (logarithm of odds) score (Z)
Maximum likelihood estimation
LOD >= 3 (equivalent to greater then 1000:1 in favor of linkage) is considered a proof for linkage between 2 loci (at the given θ (theta))
The θ (theta) at which LOD is the greatest is the genetic distance between two loci
Genetic distance
- The human genome is about 3000 cM long and consists of 3 billion base pairs (bps).
- 1 mln bps roughly corresponds 1 cM.
- Chromosomes can be about 100 - 300 cMs long what means 1 - 3 crossovers per chromosome.
- Average recombination rate increases as the length of the chromosome arm decreases.
Some remarks on genetic distance
- Genetic distances are approximately additive.
- Genetic distance and physical distance are not the same!
- The frequency of crossover during oogenesis is roughly twice of that during spermatogenesis - genetic distances of “female” chromosomes are longer then “male” ones.
Rate of recombination and length of chromosome arm
For large chromosomes, the average recombination rates are very similar, but as chromosome arm length decreases, average recombination rates rise markedly.
Physical mapping
Physical mapping
- Uses molecular biology techniques to establish position of characteristic sequences in DNA molecules
- The ultimate goal of physical mapping is the complete sequence of a genome – this corresponds to mapping with 1 base pair resolution
Physical mapping - road map
<picture>
FISH method of physical mapping
<picture>
Methods of genome sequencing
- Hierarchical shotgun method- sequencing of overlapping large-insert clones spanning the genome - applied by human genome project an international, publicaly funded effort
- Whole genome shotgun sequencing - applied by Celera Genomics of Rockville, Maryland
- The hierarchical shotgun sequencing strategy
Genome fragmentation
<picture>
It is impossible to sequence whole genome
Genome fragmentation is an initial stage in preparing a library of clones
Clones in a library?
After genome fragmentation fragments are separated and inserted into vectors which allow manipulation and cloning in host cells (E. coli) in BACs or bacterial artificial chromosomes, which make a large-insert cloning system.
Clones library is a set of vectors with inserts.
Genome-wide physical map of clones
Clones in a given library have to be aligned in proper order
For this several methods can be used among others:
STS mapping
Restriction enzyme fingerprinting
Those methods relay on finding unique sequences in a clone
STS mapping
STS (sequence tagged sites) are short DNA regions unique for whole genome
This means that two clones containing same STS must overlap
PCR is performed using a pair of primers specific for that region
Restriction enzyme fingerprinting
DNA from each clone digested with an restriction enzyme
Sizes of the resulting fragments measured on agarose gel electrophoresis
Banding patterns compared
Maximum length of sequencing is ~750 bp
Finding genes
Comparison of functional cloning and positional cloning
Positional cloning
Applications of human gene mapping
- Allows mapping and cloning of disease genes,
- Testing hypotheses about genetic background of diseases
- Diagnostic information in genetic counseling
Human Genome Project (HGP)
Began in the USA in 1990 when the National Institutes of Health and the Department of Energy joined forces
HGP scientists:
- Mapped & sequenced the genomes of important experimental organisms
- Completed working draft covering 90% of the genome in 2000 (published February 2001)
- Completed in 2003, the Human Genome Project (HGP) was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health. During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany, China, and others.
Project goals were to
- identify all the approximately 20,000-25,000 genes in human DNA,
- determine the sequences of the 3 billion chemical base pairs that make up human DNA,
- store this information in databases,
- improve tools for data analysis,
- Map and sequence the genomes of important model organisms
- transfer related technologies to the private sector,
- address the ethical, legal, and social issues (ELSI) that may arise from the project.
HGP goal - map and sequence the human genome:
- To construct detailed genetic and physical maps of the human genome;
- To determine the complete nucleotide sequence of human DNA to grater then 99,99% accuracy;
- Map all the human genes
- Chart variations in DNA spelling among human beings
The ethical, legal, and social implications (ELSI)
Three to five percent of the HGP budget funded research on the ethical, legal, and social implications (ELSI) of having so much new genetic information about our species
Mutation rate is about twice as high in male as in female
Draft and finished genome sequence
Generating a sequence of the human genome involved three steps:
- Selecting the BAC clones to be sequenced,
- Sequencing them,
- And assembling the individual sequenced clones into an overall genome sequence.
For draft sequence the 4-fold average sequence coverage was required with no clone below 3-fold (corresponding to 99% accuracy).
For finished sequence it is about 9-fold (99,99%).
Draft human genome sequence
Published in February 2001by HGP in Nature (Feb. 15, 2001) and Celera Genomics
in Science (Feb. 16, 2001)
Are freely accessible in the Internet
Human Genome on Nature web pages - free
The Sequence of the Human Genome, Venter et al.
Conclusions from draft sequence of human genome
- The human genome contains 3164,7 million bases
- Average gene consists of 3000 bases
- 30,000 – 40,000 protein coding genes, more then 50% of unknown function
- The full set of proteins (the ‘proteome’) encoded by human genome is more complex than those of invertebrates
The wheat from the chaff
- Only 2% of the genome encodes proteins
- Hundreds of human genes appear likely to have resulted from horizontal transfer from bacteria
- About half of human genome derives from transposable elements (but in in the human genome most of them are inactive)
- Segmental duplications are much more frequent in humans then in yeast, fly or worm (the pericentric and telemeric regions are filled with them)
HGP and medicine
- Will help reveal which genes contribute to the risks for common diseases.
- Bring to light the molecular processes that normally maintain the human body in good working order.
- Allow the prediction of individuals' responsiveness to particular drugs (pharmacogenomics).
Single nucleotide polymorphism (SNP)
- Single nucleotide polymorphism is a polymorphism caused by the change of a single nucleotide.
- Most genetic variation between individual humans is believed to be due to SNPs.
- Over 1.42 million SNPs have been identified.
- An average density is one SNP every 1,9 kb.
- The order of almost all (99,9%) nucleotide bases is exactly the same in all people
Human Genome Organization (HUGO)
- Organization of scientist involved in the HGP
- Fosters the exchange of data and biochemicals
- Encourages the spreading and sharing technologies
- Provides an information on aspects of human genome projects
- Serves as an interface between the community of researches and funding agencies