About |
Database |
Tutorial |
Genomes are scattered with simple repeats, which occur in tandem from a single base pair to several base pairs. Based on the length of the repeated stretch, they are classified as,
We find in the literature that microsatellites are variously referred to as simple sequence repeats (SSR), short tandem repeats (STR), and variable number tandem repeats (VNTR). Although microsatellites occur in coding regions, they exist predominantly in non-coding regions, where they evolve neutrally. In coding regions selection against frame shift mutations prevent their expansion except in case of tri-nucleotide repeats. Such expansions assume importance for various reasons. For instance, in humans tri-nucleotide expansion is associated with diseases Fragile X syndrome (expansion of CGG repeats >200 in the 5’UTR of FMRi gene); Friedreich Ataxia (expansion of a GAA repeat in the FRDA gene).
Microsatellites as markers
Microsatellites for comparative genomic analysis of insects Apart from being the source of popular genetic markers, microsatellites per se have attracted a lot of attention with respect to their evolution, distribution, expansion, mutation, and disintegration. Questions are also asked about the functional role of microsatellites in particular and biological significance of the microsatellites in general. Genetic studies and whole genome sequence analysis have established various characteristics of microsatellites as listed below.
Insects have long exhibited the greatest genetic diversity on earth that has puzzled mankind. Biologists have relied on insects to unravel many fundamental tenets of biology. Whole sequence genomes of insects have lived up to the reputation and have thrown immense variability in size and genome organization. Among others, we have five fully sequenced genomes of Drosophila melanogaster (as a model organism it provides maximum annotated data), Anopheles gambiae (another Dipteran but economically highly important as a vector), Tribolium castaneum (relatively older insect order of Coleoptera), Apis mellifera (relatively a recent insect order, Hymenoptera) and Bombyx mori (a Lepidopteran, members of which are crop pests). Using five fully sequenced insect genomes; following questions may be addressed:
Extraction of Microsatellites Microsatellites were extracted from five whole genomes sequences namely Bombyx mori, Drosophila melanogaster, Apis mellifera, Anopheles gambiae and Tribolium castaneum. Source of Sequences We extracted perfect and imperfect (caused by substitutions and indels) microsatellites. Example:- consensus pattern (2 bp): AT microsatellite sequence : AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT consensus sequence :AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT Imperfect microsatellite: consensus pattern (4 bp): GATA microsatellite sequence : GATA GATC GATA GAT- GATA GATA consensus sequence :GATA GATA GATA GATA GATA GATA Size of microsatellites was restricted as follows
The whole genome sequences were submitted to tandem repeat finder version 4 (http://tandem.bu.edu/trf/trf.html) to extract microsatellites. Input to the program consists of the following parameter
Detection Parameters: Matching probability Pm = 0.80 and indel probability Pi = 0.10 A minimum alignment score 30. Microsatellites which meets or exceeds the alignment score are reported. The microsatellites were extracted using two sets of parameters 2,-4,-5 and 2,-5,-7 to maximise the number of microsatellites within the defined limits. Further, these were combined and redundancy was removed. Since TRF extracts microsatellites with a constraint to generate the best possible score for a repeat stretch, it often returns repeats such as “(AAAAA)5” for a nucleotide stretch of 25 'A's, instead of (A)25. Such errors were corrected during verification stage. Extraction of Repetitive Elements and Genes : Repetitive elements: Short stretches of DNA with the capacity to move between different points within a genome. The insect genomes were submitted to repeat masker (http://www.repeatmasker.org/) to extract the repetitive elements. Genes: A unit of hereditary information. A gene is a piece of a DNA molecule that specifies the production of a particular protein. The masked sequence from Repeat masker was submitted to Genscan (http://genes.mit.edu/GENSCAN.html) to extract genes. Locations of microsatellites were found based on the indices of repetitive elements and genes. If a microsatellite is placed on both gene and repetitive element then it is shown that it is present on repetitive element. Extraction of Compound Repeats A compound microsatellite repeat is defined here as an occurrence of
two or more microsatellites contiguously with intervening non-repeat
sequence of <= 70 bp. Extraction of microsatellite families A microsatellite family consists of all those microsatellites occurring in a genome, which possess highly matching flanking sequences, the stringency of the sequence match being: percentage match=95% and alignment length= 85%. Please note that various authors describe microsatellite family as a set of microsatellites with similar features, most often the repeat motif. However, in the context of present study, microsatellite family members are paralogues for the flanking sequences.
100bp of 5' and 3' flanking sequences of each microsatellite were
extracted. The sequence match was done by performing all versus all
blast with parameters e=1, match=95% and alignment length= 85%.
Grouping of microsatellites based on sequence match (both +/+ and +/-
alignments) was done using a perl script.
|
||||||||||||||||||||||||||||||||