1. Multiple FASTA file of all FSTs
NOTE-1: the format has been changed between release version 23 and 24! Field 5 is different from before.
NOTE-2: the annotation basis for gene structures is now Araport11 (as of 20160823), while the genome sequence is still TAIR9.

File description:
A text file in multiple FASTA format with all GABI-Kat FSTs with BLAST hits to the Arabidopsis thaliana genome. Each sequence was trimmed and/or masked to remove T-DNA derived sequences. In every sequence entry, the header line (the line that starts with ">") consists of five fields that are seperated by a pipe "|":

1. internal sequence name,
2. EMBL/Genbank accession number,
3. sequence name in EMBL/Genbank,
4. GABI-Kat line ID, and
5. predicted T-DNA insertion based on original FST (position on TAIRv10 pseudochromosome).

In this dataset, the internal sequence name is a unique identifier, the sequence name in EMBL/Genbank is not unique. The file includes FSTs generated from about 10,000 GABI-Kat lines by the Ecker lab (OMalley et al., in preparation - see News & history entry of 2016-01-09). For the predicted T-DNA insertions, we include the predicted insertion position on the TAIR9 pseudochromosomes. Example: "chr1:21514044".

Download: GK_tdna_v28.txt
(Multiple FASTA file of all FSTs, release 28, size 64 Mb, download will open in a new tab/window.)



2. List of confirmed insertions

File description:
This text file includes a list of all GABI-Kat insertions and insertion positions that have been confirmed and donated to NASC. This includes the "2nd borders" of insertions that have been studied at the "north" as well as the "south" junction of the inserted T-DNA to the genome (see Kleinboelting et al., 2015). The "all" might be "almost all" for very few cases in which a line might be available from NASC but we lack a GenBank/ENA AccNo for the FST. The file is tab-delimited and the columns from left to right are:

- LineID: the GK LineID,
- NASC-ID: the ID of the T3-Set at NASC,
- Accession-Number: EMBL/GenBank accession number for the FST that corresponds to this insertion,
- Hit: either the AGI gene code (for gene hits) or the BAC-ID (annotation unit) is given,
- Position of insertion: the T-DNA insertion position as deduced from confirmation sequencing.

Download: gk_confirmed_insertions_v20160829.txt
(Tab delimited text file with confirmed insertions, size 670 kb, download will open in a new tab/window.)



3. Tab-delimited file of GABI-Kat confirmation primers

File description:
The file includes the primer information for the confirmed GABI-Kat lines available at NASC. Essentially, the same information is also available from the SimpleSearch pages for individual insertion alleles.
For example:
The tab-delimited file has seven fields. These are from left to right:

- LineID,
- AGI gene code or BAC-ID (annotation unit) code,
- gene specific primer sequence 1,
- border 1,
- gene specific primer sequence 2, and
- border 2.

The border field gives either LB or RB for the respective border. The respective T-DNA primer sequences used in the PCR for the two borders can be found in the header part of the file (lines beginning with #). The primer sequences are all in 5' to 3' direction. The majority of the insertions has only one confirmed border. As a result, the primer sequence 2 and border 2 fields are empty in many cases. In some cases, two insertions are confirmed for a single line. An example is 033D05 which containes confirmed insertion alleles for At1g11920 and At3g54350. Obviously, these insertions are segregating in the T2 families (T3 sets) available from NASC.

Download: gk_primer_v20160829.txt (Tab-delimited file of primers, size 800kb)



Although these text files are big, it's easy to save them by right click on the file and choose "Save Link As" or "Save Target As".



