The GABI-Kat "SimpleSearch" database contains information about the FSTs that have been produced in the context of GABI (renamed to PLANT 2030). The sequences included have a significant similarity with sequences from the Arabidopsis thaliana genome. The sequences have been quality-trimmed, and the T-DNA part of the sequence has been removed.
SimpleSearch allows to find GK insertion alleles of A. thaliana lines, describes and visualizes the insertion position relative the genome, links to the NASC stock centre if an allele is available from there, and also allows to place orders for lines (see here).
Collection of links to the GABI-Kat database (GABI-Kat database front end)
- Search T-DNA insertions in genes of your interest in our FST-based database SimpleSearch. For detailed information on how to perform the searches please see SimpleSearch help.
- Submit your request details online in the web order form. This is one of the required steps for completing your order.
- Check the status of your request at the request tracking page. You can only do this after you received the e-mail confirming the receipient of your order.
- Get confirmation sequences electronically if you would like to have the electronic version of the sequences on the report sheet which you received together with the seeds.
There are three kinds of search possible:
1a) search for gene hits (up to 20 AGI-codes at a time, with SPACE as separator),
1b) text-based search for words (substrings) of the annotation text,
2) search for line ID or GenBank accession number,
3) pseudochromosome position based search for a genome area (range of positions),
4) sequence-based search using BLAST.
The "gene hit search" addresses FSTs which qualify as gene hits (see GABI-Kat FAQ for our definition of a "gene hit"). Genomic DNA is considered, so you will also find hits in the introns of these genes. You can either enter the AGI gene code, or a keyword to search the gene annotation text (you need to choose one of them). If you search by gene code, you can enter up to 20 A. thaliana gene codes at the same time. Separate the codes by SPACE. The keyword search allows you to do a substring search of the gene annotation text (see here for details on the data source).
Searching for a line ID or the GenBank accession number of a FSTs will directly lead to the FST page ("Line and FST details"). On top of the FST page, the line specific information is displayed, such as availability and segregation analysis, followed by information for one or more FST hits found for the given line. For each hit the FST sequence, the respective sequenced BAC (more exactly: annotation unit), and information about the predicted insertion position are displayed. A link to confirmation sequences is given for lines that have been confirmed for an insertion and are available from NASC. Also, a link to "Primer and wt amplicons" is presented. The primer described is the one that we used for the confirmation of the predicted insertion in the line. The wt amplicon - if available - results from the usage of the line in the project GABI DUPLO. More information about this project can be found on the DUPLOdb help page.
The "Search for genome range" will list all GABI-Kat hits in the defined genome region. The input values should be pseudochromosome positions from TAIRv10.
The sequence-based search covers all FST sequences included in the database. Nucleotide sequence or protein sequence are accepted. You can limit the sequence divergence between input sequence and FST by decreasing the expect value.
The gene hit search result page shows the corresponding GABI-Kat line ID with a link to the FST sequence, the AGI gene code, and a link to a visualisation of the gene (or genome region) at the respective insertion position. Different hits (FST-based predictions) are distinguished using different colors:
Confirmed: The insertion is confirmed and has been donated to NASC.
Unknown: Work on this insertion has not been done yet, the insertion can be ordered.
Failed: The insertion could not be confimed.
Outdated: The insertion prediction results from an original FST and has been updated with data from the confirmation sequence.
The visualisation tool in SimpleSearch (online since Dec 2012) displays more information than the previous version.
- Annotated genes are shown as bars below the genomic sequence representation. The annotation of the genes is colour coded. Coding sequences of protein coding genes are shown in dark blue, 5'- and 3'-UTRs in light blue. Introns are shown as thin lines. Transposable element genes are symbolised in light purple.
- If an insertion is confirmed in a specific line and the visualisation is accessed by clicking the graphic view icon from the "Line and FST details" page, more details are presented. Primers used for confirmation attempts are shown as brown triangles directly below the genomic sequence. Confirmation sequences are represented by orange bars.
- An example of a confirmed line with primers and confirmation sequences can be found here.
- In some cases, confirmation sequences can be found on both sides of an insertion. This is more frequently the case when a line (an insertion allele) has been processed in the context of the project GABI DUPLO. More information can be found on the DUPLOdb help page.
- To simplify access to genomic regions of interest it is possible (since July 2014) to jump to a specific gene of interest by entering the AGI code of the gene in the respective field in the visualisation tool.
Possible error messages from the web order form
Error messages like this:
- Firstname is empty or has invalid characters.
- E-mail address is empty or invalid.
- Valid input for institution is required.
... indicate that you need to add input or to correct the term you put into the respective field.
An error messages like this:
- VAT number is required for the selected country.
... indicates that you selected an EU country but did not provide the VAT ID of your institute or institution.
An error messages like this:
- VAT number is invalid, a valid VAT number looks like FR1234567890.
... indicates that you provided an invalid VAT ID.
Other, hopefully self explanatory error messages are:
- No gene or BAC code for line ID <lineid> entered.
- No line ID for gene or BAC code <gene/bac-code> entered.
- Line ID <lineid> has multiple genecodes. Please use one of them instead of the BAC-ID.
- Line ID <lineid> has been recently donated to NASC, but we have not received a NASC ID yet. Please order from NASC when the line becomes available from them.
- Line ID <lineid> is available from NASC (<nascid>), please order it from NASC.
- Line ID and gene or BAC code <lineid - gene/bac-code> don't match (the line has no insertion at that locus).
- Line ID <lineid> died - no seeds available.
- Confirmation of the insertion in <lineid - gene/bac-code> failed, it can not be ordered.
Possible error messages when searching for GenBank IDs or gene hits
Error messages like this after using the search form:
- Error, GenBank accession number FX892764 not found!
... indicatethat there is no FST in the SimpleSearch database, which hits the respective genomic region.
- Error, the FST represented by AccNo BX892764 has been withdrawn due to uselessness of this data.
... indicate that the respective FST has been withdrawn from the SimpleSearch database. In such cases an FST, which hit the genomic region, was available but it has turned out, that our old insertion site prediction was invalid. Since we do not want to offer lines, which lack the chance of being confirmable, we removed the useless FSTs from SimpleSearch.
Overview about the structure of SimpleSearch website (dynamic part)
The following picture gives an overview of how the different pages of SimpleSearch can be accessed.
How to link SimpleSearch pages from your website
If you want to link information from SimpleSearch on your website you can use the following links to access the data:
where [lineid] is the line that you are interested in.
where [sequenceid] is the sequence name from the report sheet.
- Segregation Results:
where [lineid] is the line that you are interested in.
where [lineid] is the line that you are interested in and [genecode] specifies the insertion.
- Graphic View (new version from December 2012):
where [lineid] is the line that you are interested in and [genecode] specifies the insertion in that line.
- Graphic View (old version):
where [genecode] is the Genecode you are interested in.
What is FASTA format
>name of the query sequence TTCTAGGGGTTCTCTCAAATCTGCTCTTCAACCATGGCGGACGAATCTCAATACTCATCGGATACTTACTCCAACAAACG CAAATACGAAGAACCAACCGCTCCTCCTCCATCAACTCGCAGACCTACCGGCTTCTCTTCTGGTCCGATCCCATCTGCTT CAGTTGATCCCACCGCACCTACCGGTCTTCCACCTTCTTCTTACAACAGCGTTCCTCCTCCGATGGATGAAATCCAGATT GCTAAACAAAAAGCACAAGAAATCGCTGCTCGTCTTCTTAATAGCGCTGATGCTAAACGTCCTCGTGTTGACAATGGTGC TTCTTATGATTATGGTGACAACAAAGGATTTAGCTCATATCCCTCTGGTTCGTTCTTTAAAATCTCTTTTAACTTCTTTT GTTTATGGAATTTACGGTTTGGAATTGAAAACTTACTGATTGTGATTTGATCTTGATTTAGAGGGTAAGCAGATGTC
The nucleic acid codes supported are:
A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C U --> uridine D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any) - gap of indeterminate length
The accepted amino acid codes are:
A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length