|
Distribution
Distribution
WGS= Whole Genome Shot-gun.
EST= Expressed Sequence Tag.
BAC= Bacterial Artificial Chromosome.
Total-Genomic refers to the 22.43 Mb Genomic and Zchr-BAC sequences used for analysis.
Genomic refers to 21.76 Mb WGS sequences from chromosomes other than Z.
Zchr-BAC refers to 0.67Mb Z chromosome derived BAC sequences.
EST refers to 6.3 MB sequences from ~9300 non redundant ESTs.
Microsatellites are
widely distributed in B. mori genome, with about 3 Kb of repeats
per Mb of Genomic and EST sequences.
Out of the 0.31%
microsatellite repeats in silkworm genome, mono-,
di-, tri-, tetra-, penta- and hexanucleotide repeats represent 0.110, 0.116,
0.053, 0.018, 0.006 and 0.003%of the genome respectively.
Among trinucleotide, TAA repeats were the most abundant repeats in the genomic sequences comprising of almost 50%
of trinucleotide repeats followed by GTA and TGA. Except these three trinucleotide repeat types all the remaining repeats were over represented in ESTs.
Silkworm genome of 530 Mb
accounts for 1.63 Mb for microsatellite repeats equivalent to 0.31% of the genome.
The total number
of mono-, di-, tri-, tetra-, penta- and hexanucleotide repeat units in
the genome are 5.9 (0.59 Mb), 4.4 (0.62 Mb), 1.5 (0.28 Mb) 0.02 (0.10
Mb), 0.006 (0.03 Mb) and 0.002 (0.02 Mb) millions, respectively.
Amoung trinucleotide
repeat tracks
TAA repeats were significantly over represented in Zchr-BAC
compared to Genomic and EST sequences. GCA, CGA and CGG were significantly
higher than those in Genomic sequences were as GAA and GGA were completely
absent. Tetra-, penta- and hexanucleotide repeat motifs were very scarce in the Zchr-BACs.
A/T stretches are highly common than C/G stretches.
Greater than 20 repeat unit tracks of A/T is more common in Genomic and ESTs.
Among dinucleotides,
CG repeats are least abundant.Among dinucleotides in ESTs CA and GA are as abundant
as TA repeats, and CG repeats are relatively more compared to Genomic and Zchr-BAC
sequences.
Trinucleotide
repeats show a drastic reduction in number when the length of repeats
increases.
Tetra-, penta- and hexanucleotide repeats have
large groups of repeat type, thus we classified them based on AT percentage
starting from 0 to 100. For e.g. ATTT is a
100%
AT rich
tetranucleotide
repeat
, CTTT is a
75%
AT rich
tetranucleotide
repeat
, ATCG is a 50
AT rich
tetranucleotide
repeat
etc. Maximum
tetranucleotide
repeats were observed under 75% AT rich
(single C/G)
category followed by 100% AT (no C/G)
rich repeats.
80%
(single C/G)
and 100% AT rich
(no C/G)
repeat types constitute greater than 60% of the total number of pentanucleotide
repeats. 83.3% (single C/G)
and 100% AT rich
(no C/G)
repeat types were greater than 50% of the total number of hexanucleotide
repeats.
We observed
a common phenomenon where the repeat length
and repeat number were inversely proportional. This was more pronounced in trinucleotide repeats, especially because of their abundance in both the genome and ESTs.
All repeat types
have large number of repeat below 15 bp. Repeat tracks longer than 15 bp
are very less in number.
The average length of mononucleotides was more as compared to other repeat classes.
Flanking sequences
To test the base composition
surrounding the microsatellite repeats, we calculated GC percentage of
100 bases upstream and 100
downstream
sequences
flanking different classes of repeat motifs.
Significant differences
in AT/GC composition between upstream and downstream flanking sequences
was not observed except in the case of 100% AT rich penta repeats, where
downstream flanking sequence showed 5% lower GC content compared to upstream
sequences in both Genomic and EST sequences.
We observed
a positive correlation between the GC content of the repeat and the GC content of the flanking sequences in all
repeat types. As and when the GC content of the repeat
increased, GC content of the flanking sequence also
increased,
with a minimum of 30% to a maximum of 60%.
|