ANALYSIS

Microsatellites are simple sequence repeats that exhibit complex patterns in their frequency of occurrence, genomic distribution, mutability, function and evolution. Apart from being the source of popular genetic markers, microsatellites per se have attracted a lot of attention with respect to their origin, distribution, expansion, mutation, and disintegration. Questions are also asked about the functional role of microsatellites in particular and biological significance of the microsatellites in general.

Genetic studies and whole genome sequence analysis have established non-random distribution, variability and high mutability as characteristics of microsatellites as listed below. Evidences are accruing, which support the role of microsatellites in gene regulation, transcription and protein function. Existence of qualitative and quantitative differences between microsatellites of different genomes and their role in adaptive evolution have also been theorized. However, such studies require information on type (mono to hexa), motif (GC%), abundance (motif preferences), frequency, distribution (linkage group-wise and chromosomal position), location (exon, intron, regulatory element, transposon), nature (perfect, imperfect and compound), and copy number (existence of paralogs) etc. of microsatellites not only on a whole genome basis but also as a comparative analysis of multiple genomes defined by phylogeny.

Insects have long exhibited the greatest genetic diversity on earth that has puzzled mankind. Biologists have relied on insects to unravel many fundamental tenets of biology. Whole sequence genomes of insects have lived up to the reputation and have thrown immense variability in size and genome organization. Among others, we have five fully sequenced genomes of Drosophila melanogaster (as a model organism it provides maximum annotated data), Anopheles gambiae (another Dipteran but economically highly important as a vector), Tribolium castaneum (relatively older insect order of Coleoptera), Apis mellifera (relatively a recent insect order, Hymenoptera) and Bombyx mori (a Lepidopteran, members of which are crop pests). Using five fully sequenced insect genomes; following questions may be addressed:

  • Are the microsatellites equally common everywhere in the genome?
  • Does the length of microsatellites have any relationship with their number?
  • Are the sequences flanking microsatellites anything to do with the origin of microsatellites?
  • Does the microsatellite size affect microsatellite mutation rate?
  • Does the GC content of the microsatellite motif affect the length, repeat units, or mutation rate of microsatellites?
  • Do genomes possess hotspots and islands of microsatellites? In other words, do microsatellites occur as clusters (compound microsatellites)? Is there any favoured association of microsatellites in the compound repeats?
  • Do microsatellites occur as families of common flanking sequence in the genomes?

InSatDb, with an interactive interface, allows users to obtain genome level information on frequency and distribution of microsatellites motif-wise or across-the-board in a single genome or for comparative genomic analysis. One can access microsatellite cluster (compound repeats) information, and particulars of the microsatellites with common flanking sequences (microsatellite family). Following section gives a flavour of the types of analysis that can be carried out using the data obtained from InSatDb.

 

Scientific name Common name Order Chr. no. Genome Size(MB)
GC%
Microsatellite content (%Genome) Microsatellites per MB genome
Bombyx mori
Silkworm Lepidoptera
28
397.71
37.33
0.72
280
Drosophila melanogaster
Fruit fly Diptera
4
118.36
42.45
1.56
538
Anopheles gambiae
Mosquito Diptera
3
287.79
40.51
1.58
525
Apis mellifera
Honey bee Hymenoptera
16
228.45
32.28
3.4
1035
Tribolium castaneum
Red flour beetle Coleoptera
10
198.06
25.53
0.41
122

 

CHARACTERISTICS OF INSECT GENOMES

FREQUENCY OF MICROSATELLITE OCURRENCE

TYPE OF MICROSATELLITES IN INSECT GENOMES

 

Insect No. of repeats Perfect (%) Imperfect (%) Compound (%)
Bombyx mori
111006
28.08
71.91
2.41
Drosophila melanogaster
63637
18.9
81
5.39
Anopheles gambiae
150936
30.48
69.51
4.7
Apis mellifera
236480
23.37
76.62
6.75
Tribolium castaneum
24246
14.03
85.96
2.55

 


MOTIF WISE DISTRIBUTION OF MICROSATELLITES

 

Insect Mono Di Tri Tetra Penta Hexa
Bombyx mori
27.31
13.97
26.98
23.29
7.21
2.59
Drosophila melanogaster
18.98
27.81
25.34
12.30
5.48
10.04
Anopheles gambiae
8.7
48.09
31.26
8.51
1.99
1.40
Apis mellifera
23.88
36.10
20.77
12.34
4.18
2.7
Tribolium castaneum
21.29
4.32
38.97
23.73
7.74
3.93

 

TYPICAL MICROSATELLITES IN INSECT GENOMES

Insect Longest Repeat Motif Number of Repeat Motifs Length (bp) Consensus match
Bombyx mori Perfect
TATTC
64
320
100
Imperfect
AC
298
596
75
Drosophila melanogaster Perfect
ACAGAT
72:7
436
100
Imperfect
CCCAGT
331
1986
99
Anopheles gambiae Perfect
AAGAAC
83
498
100
Imperfect
TTA
466
1398
86
Apis mellifera Perfect
GCGAAG
45
270
100
Imperfect
AT
1187
2374
66
Tribolium castaneum Perfect
AAAGAT
77
462
100
Imperfect
TATTCC
217
1302
87

 

Does GC content influence the microsatellites frequency?

  • On the whole, AT-rich microsatellites are abundant in the 5 insect genomes
  • Most of the microsatellites seem to be within 20% GC bracket
  • 1-12% GC range has as many as 43% Apis microsatellites, which are mainly mono and dinucleotide repeats and 45-55% GC range has 12% Apis microsatellites of which 84% are dinucleotide repeats

Does length influence the microsatellite frequency?

  • On the whole, shorter microsatellites are abundant in the 5 insect genomes
  • As the length of the microsatellite increases their number decreases logarithmically typified by Bombyx microsatellites
  • Number of microsatellites decreases drastically in Drosophila as the length increases. On the other hand, Anopheles and Tribolium have longer microsatellites in a relatively high frequency
  • Among the 5 genomes analysed, more than 83% of the microsatellites are shorter than 50 bp; this number is as high as 93% and 91% in Bombyx and Drosophila

COMPOSITION OF THE MICROSATELLITES

Does GC content influence length of microsatellite?

  • Distribution of lengths of microsatellites in relation to their GC content is uniform across five insect genomes
  • Average length of the microsatellite across GC content is 37±9 bp
  • Between 0-5% GC content, microsatellites tend to be longer than 60 bp
  • In Anopheles, very long microsatellites (104, 167 and 457 bp) were found to be either AT rich content (1 and 2 GC %) or GC rich (97%) respectively
  • In Bombyx, between 48% to 89% GC, average size of microsatellites is significantly large (48 bp) due to longer tetra and hexa repeat microsatellites

Does GC content influence occurrence of imperfections in a microsatellite?

  • The relation between GC content of the microsatellites and imperfections reflected the number of microsatellites with the said GC content
  • 40-60% GC range has as many as 53% Anopheles microsatellites, which are mainly dinucleotide repeats
  • 40-70% GC range has as many as 45% Drosophila, microsatellites, which are mainly dinucleotide repeats
  • 1-12% GC range has as many as 43% Apis microsatellites, which are mainly mono and dinucleotide repeats


Compound microsatellite analysis (length)

 

  • Red dotted lines indicate genome GC%
  • GC values are of microsatellite (m) and intervening (i) sequences

Combination of compound repeats

Top five compound repeats (in number)

 

Bombyx mori Apis mellifera Anopheles gambiae Drosophila melanogaster Tribolium castaneum
254
tri-di
1499
di-di
1431
di -di
522
tri-tri
122
tri-tri
240
mono-tri
1112
di-tri
1256
tri-tri
362
di-di
51
tetra-tri
226
tri-tri
1107
di-mono
815
di-tetra
222
tri-di
43
tri-tetra
185
mono-mono
1055
mono-mono
791
di-tri
209
di-tri
37
mono-tri
158
tri-tetra
1051
tri-di
277
di-tetra
116
mono-di
36
mono-mono

 


  • Microsatellites occurring contiguously account for nearly 3.2% in the insect genomes analysed
  • On account of high density of microsatellites, Apis has higher number of compound loci (6.12% of the total microsatellites)
  • Length of intervening sequences is greater than that of microsatellites

Do longer microsatellites tend to have greater decomposed repeat sequences?

 

  • Shorter microsatellites not only predominate microsatellite population in the 5 insect genomes, they also seem to possess higher number of imperfect bases
  • Nearly 90% of the microsatellites fall within 50bp size, which constitute the bulk of imperfect microsatellites
  • Majority of imperfect microsatellites in this range are dinucleotide repeats in Drosophila, Apis and Anopheles; tri- and tetranucleotide repeats in Bombyx and Tribolium
  • Higher frequency of imperfect microsatellites are present in Drosophila and the least are found in Anopheles
  • Beyond 100bp, microsatellites remained more of perfect types

Microsatellite Families

100bp of 5' and 3' flanking sequences of  each microsatellite were extracted. The sequence match was done by performing all versus all blast with parameters e=1, match=95% and alignment length= 85%. Grouping of microsatellites based on sequence match (both +/+ and +/- alignments) was done using a perl script.

Examples of microsatellite families of different species.

Bombyx mori (click on the image to enlarge)

Drosophila melanogaster (click on the image to enlarge)

Anopheles gambiae (click on the image to enlarge)


Apis mellifera (click on the image to enlarge)

 

Tribolium castaneum (click on the image to enlarge)

 

ORIGIN OF MICROSATELLITES

Neutrality of microsatellite origin



  • Microsatellites in insects are AT rich (on an average 23.4% GC)
  • However, they exist within regions that are not always AT rich
  • In Tribolium and Apis, the microsatellites exist in regions that have GC content less than that of the genome; in Bombyx, Drosophila and Anopheles, GC content of the flanking regions is greater than that of the genome
  • In Drosophila and Anopheles, average GC content of the microsatellites is almost equivalent to that of the genome

Does GC content influence microsatellite expansion?

  • Microsatellite expansion takes place during DNA replication, repair or recombination and hence GC content could have a bearing on the expansion process
  • Among 5 insects, there was no linear correlation between GC content and the average number of repeat units
  • There are three peaks: 30% of Tribolium microsatellites (at 5-20% GC, nearly half of which are mono repeats); about 7% of Bombyx microsatellites (at 45-65% GC, more than half are dinucleotide repeats); Bombyx microsatellites (at 75-95% GC, half are trinucleotide repeats)

Do imperfections occur all along the microsatellite?

    • Existence, expansion and replication of a microsatellite depends upon boundary delimitation
    • Imperfect repeat units originate because of substitutions and indels
    • Analysis of 5 genomes suggests that interruptions, if at all, occur mainly in the middle region of the repeat sequence and the ends seem to be selected against decomposition
    • Anopheles has the least non-consensus bases in the microsatellites. Though Triboium has the maximum imperfect bases throughout, at the 5’ and 3’ extremes they are negligible

    DISTRIBUTION OF MICROSATELLITES

    Availability of physical map of the chromosomes facilitates visualising microsatellite distribution at a macro scale.

    Legend
    X axis: size in basepairs     Y axis: Frequency per 1000 bp window
    Green bar: Genes(Genescan predictions), Red bar: Microsatellites (Tandem Repeat Finder predictions), Black bar: Repetitive elements (Repeat Masker predictions)

     

     

    Occurrence of microsatellites in Introns and Exons

     

      Bombyx Apis Drosophila Anopheles Tribolium
    % of microsatellites on exons
    1.3
    2.2
    7.1
    6.7
    3.2
    % of microsatellites on introns
    27.3
    16.01
    15.45
    19.82
    18.73