BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to separately search protein or DNA sequence databases. BLAST is best used for sequence similarity searching, rather than for motif searching.
A fairly complete on-line guide to BLAST searching can be found at the NCBI BLAST Help Manual.
BLAST searches offered by Wild Silkbase allow users to compare any query sequence to Antheraea assama, Samia cynthia ricini and Antheraea mylitta EST sequence datasets.
Program
Wild Silkbase offers these three BLAST programs to accommodate different types of searches:
BLASTN compares a nucleotide query sequence against a nucleotide sequence dataset.
TBLASTX compares the six-frame translations of a DNA sequence to the six-frame translations of a nucleotide sequence
dataset.
TBLASTN compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands).
Query sequence
Sequence can be submitted for a BLAst search in two different ways. The sequence can be typed or pasted in the text box, or the sequence can be uploaded from a text file on your computer. All sequences must be in FASTA-format, i.e., each sequence begins with the ">" character in the first position, followed by descriptive text (the "definition line"). One or more lines containing the sequence then follow. These lines may be of varying length and should contain only sequence characters that are valid to BLAST.
Wild Silkbase offers a selection of sequence databases that can be searched, depending on the user's requirements.
"All ESTs"
"Antheraea assama (All)"
"Antheraea assama Embryo (96 Hrs)"
"Antheraea assama Brain"
"Antheraea assama Testis"
"Antheraea assama Ovary"
"Antheraea assama Midgut"
"Antheraea assama Fatbody"
"Antheraea assama Middle Silkgland"
"Antheraea assama Posterior Silkgland"
"Antheraea assama Epidermis"
"Antheraea assama Compound Eye"
"Samia cynthia ricini (All)"
"Samia cynthia ricini Embryo (96 Hrs)"
"Samia cynthia ricini Fatbody I+II"
"Samia cynthia ricini Fatbody I"
"Samia cynthia ricini Fatbody II"
"Antheraea mylitta (All)"
"Antheraea mylitta Fatbody"
Options
Changing the E-Value determines the stringency of a BLAST search.
A lower E-value increases the stringency (to be used if short and / or very A/T-rich sequences are submitted),
a higher E-Value decreases the stringency of a search. The default is 0.1, which means no alignment with a value higher than 10 is displayed.
The number of Alignments to show determines how many alignments are displayed.
The number of Descriptions to show how many one-line descriptions are displayed.
The default Word Size is 11 nucleotides for DNA and 3 amino acids for Proteins.
The Matrix is a general purpose matrix. The BLOSUM matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur among consensus blocks within related proteins. BLOSUM 62, the default, is among the best of the available matrices for detecting weak similarities. Other supported options are BLOSUM 45, BLOSUM 80,PAM 30, and PAM 70. Adjustments to the matrix may be in order when a search for very distant relatives of the query is being performed.
Filtering is ON by default and filters the query sequence for low complexity regions. In a protein search low complexity regions appear as X's in the alignment while in a nucleotide search they appear as n's.
The score and E-value of a match may be affected slightly by filtering since it
effectively shortens the query length. The DUST and SEG algorithms are used.
Results
BLAST search results are returned directly to the user's web browser in HTML format. The sequence IDs on the BLAST result page is furthur linked to the information like organism name, Tissue Type, Sequennce Length, Unigene ID and Sequence.
A link to Clustal W alignment file of the sequences matched in the databases is also provided on the result page.
The keyword search option allows user to search keyword against the database. The user can choose from the three different options of search i.e., Go Terms, EST Clone ID and Unigene ID. The wildcard character "%" can be used in the search to broaden the search results.
The Homolog Finder provides user the facility of finding homolog of the query sequence against six whole insect genomes (Aedes aegypti, Anopheles gambiae, Apis mellifera, Bombyx mori, Drosophila melanogaster and Tribolium castaneum). The result page shows single sequence matched against the selected database.
SSR Finder provides a tabulated data on SSR of selected Organism and repeat typeTandem Repeat Finder is used to extract SSRs with specific parameter settings (match = 2, mismatch = 3, indel = 5, match propability = 0.8, indel probability = 0.1, minimum score = 25, maximum period = 10).
GO Viewer allows user to view the EST unigene sequences according to Gene Ontology (GO) terms. GO terms are given to sequences according to BLAST similarity search against the GO seqdblite database, which contains GO terms, gene products and the sequences associated with these gene products. This is the same as seqdb, except all IEA associations have been removed. The IEA associations provide relatively little value compared to the curated associations, and they slow querying
down immensely.
cSNPs are detected with SEAN software (D. Huntley, A. Baldo, S. Johri, and M. Sergot (February 15, 2006) SEAN: SNP prediction and display program utilizing EST sequence clusters. Bioinformatics, 22(4): 495 - 496). Right now only cSNPs of Antheraea mylitta are available, the data for Antheraea assama and Samia cynthia ricini will be updated soon.
Download page is password protected to avoid unauthorised use of data. Users are requested to mail at wildsilkbase@cdfd.org.in for the login ID and password. All sequence and annotation data are available for download. The sequence files are in FASTA format and zipped, while the annotation data files are in csv format which can be directly uploaded to excel sheets.