Expert download
1. Multiple FASTA file of all FSTs
NOTE: the format has been changed between release version 23 and 24! Field 5 is different from before.File description:
A text file in multiple FASTA format with all GABI-Kat FSTs with BLAST hits to the Arabidopsis thaliana genome. Each sequence was trimmed and/or masked to remove T-DNA derived sequences. In every sequence entry, the header line (the line that starts with ">") consists of five fields that are seperated by a pipe "|":
1. internal sequence name,
2. EMBL/Genbank accession number,
3. sequence name in EMBL/Genbank,
4. GABI-Kat line ID, and
5. predicted T-DNA insertion position on TAIRv10 pseudochromosome.
In this dataset, the internal sequence name is a unique identifier, the sequence name in EMBL/Genbank is not unique. For the predicted T-DNA insertion positions, we now include the predicted insertion position on the TAIRv10 pseudochromosomes (these are the same as in TAIRv9). Example: "chr1:21514044".
Download: GK_tdna_v24.txt
(Multiple FASTA file of all FSTs, release 24, size 60Mb, download will open in a new tab/window.)
2. List of confirmed insertions
File description:
This text file includes a list of (almost) all GABI-Kat insertions that have been confirmed and donated to NASC. The file is tab-delimited and the columns are from left to right:
- LineID: the GK LineID,
- NASC-ID: the ID of the T3-Set at NASC,
- Accession-Number: EMBL/GenBank accession number for the FST that corresponds to this insertion,
- Hit: either the AGI gene code (for gene hits) or the BAC-ID is given,
- Position of insertion: the predicted T-DNA insertion position (position on TAIRv9 pseudochromosome)
Note: Due to some differences between predicted insertion sites according to the TIGRv5 and TAIRv10 genome sequence, there are some lines missing. These will be added soon.
Download: gk_confirmed_insertions_v20110217.txt
(Tab delimited text file with all confirmed insertions, size 390kb)
3. Tab-delimited file of GABI-Kat confirmation primers
File description:
The file includes the primer information for the confirmed GABI-Kat lines available at NASC. Essentially, the same information is also available from the SimpleSearch pages for individual insertion alleles.
For example: http://www.gabi-kat.de/db/getseq.php?plantid=048E04&genecode=at5g43175
The tab-delimited file has seven fields. These are from left to right:
- LineID,
- gene ID or BAC (annotation unit) code,
- NASC-ID,
- gene specific primer sequence 1,
- border 1,
- gene specific primer sequence 2, and
- border 2.
The border field gives either LB or RB for the respective border. The respective T-DNA primer sequences used in the PCR for the two borders can be found in the header part of the file (lines beginning with #). The primer sequences are all in 5' to 3' direction. The majority of the insertions has only one confirmed border. As a result, the primer sequence 2 and border 2 fields are empty in many cases. In some cases, two insertions are confirmed for a single line. An example is 033D05 which containes confirmed insertion alleles for At1g11920 and At3g54350. Obviously, these insertions are segregating in the T2 families (T3 sets) available from NASC.
Note: Due to some differences between predicted insertion sites according to the TIGRv5 and TAIRv9 genome sequence, there are primer data some lines missing. These will be added soon.
Download: gk_primer_v20110217.txt (Tab-delimited file of primers, size 520kb)
Although these text files are big, it's easy to save them by right click on the file and choose "Save Link As" or "Save Target As".
Neither the use for commercial purposes, nor the redistribution of any data from the GABI-Kat SimpleSearch database to third parties nor the distribution of parts of files or derivative products to any third parties is permitted.
© Copyright 2002 - 2011




gabi-kat.de 
