DUPLOdb help

 

The "DUPLOdb" database contains information about paralogous gene pairs in the Arabidopsis thaliana genome, which were identified by the project GABI-DUPLO (see partners). Each pair of protein-coding genes that displays more than 60% similarity AND less than 20% gaps (in a pairwise amino acid sequence alignment) was considered and received a unique pairID. Based on the Complete pair list, we used several criteria to select those gene pairs for which construction of a double mutant is feasible. The first fraction identified was Genetically unlinked pairs, and subsequently DUPLO gene pairs. These lists are explained on the start page of DUPLOdb. Very shortly, T-DNA insertion alleles from the GABI-Kat and the SALK collection were assigned to the respective genes in the pairs, if useful alleles are available.

If a pair is marked green in any of the lists or in search results the respective double mutant (DM) is at NASC, if the pair ID is dark green, a phenotype description is available.

For a part of the "DUPLO gene pairs" listed, we were able to create double mutants during the duration of the project. Unfortunatly, the funding was not continued to finish more double mutants.

 

Search options in DUPLOdb:

 

There are four kinds of search possible in DUPLOdb:
1) Search for gene hits and filter for (DUPLO) pairs
2) Search for pair ID
3) Search for line ID
4) Search in double mutant description
A DUPLO pair is a true pair of exactly two genetically unlinked genes for which we have been able to assign a useful insertion allele to BOTH "parental" genes.

 

In the gene hits search, one or several AGI code(s) of interest can be entered. If the respective gene is represented by a GABI-Kat or SALK line in DUPLOdb, the output list shows line ID, pair ID, gene annotation and status of the insertion in the line. A second option is to search the gene annotation text (imported from TAIRv10). If the search is successful, all lines in DUPLOdb with the annotation text are listed (it is a substring search).
In addition, there are four filtering options. The search can be restricted to
- all (predicted) insertions / alleles (no filter)
  (this allows access to ALL insertions in DUPLOdb, no matter if they are assigned to pairs or not),
- (predicted) insertions / alleles assigned to pairs
  (filters for predicted insertions and confirmed insertion alleles that are assigned to pairs),
- (predicted) insertions / alleles in DUPLO pairs 
  (filters for predicted insertions and confirmed insertion alleles that are assigned to DUPLO pairs), and
- alleles in pairs that have a DM at NASC
  (filters for availability of of a line set at NASC containing both parents and the double mutant).

For the lines listed with either of the last two options selected, further valuable information concerning double mutant generation is available in DUPLOdb. This information, which includes a link to NASC for donated DMs, can be accessed by following the pairID link (example: pairID = 49).

In the pair ID search, the ID (a simple number) of a pair can be entered. The respective numbers are also found in the different pair lists mentioned above. The pairID is stable and can be re-used at a later visit on the DUPLOdb page for conveniently finding a pair of interest.

In the line ID search, GABI-Kat or SALK line IDs can be entered and the respective lines are found IF the line is available in DUPLOdb. If a line has already been analyzed in the context of GABI-DUPLO, information about the confirmation process and the wildtype amplicon is available for the respective (predicted) insertion. In case of SALK lines which have not been addressed in the context of GABI-DUPLO, no information is presented. There is also an option to search for GenBank accession numbers of FSTs of interest, similar to the one available in SimpleSearch. In the search in double mutant descriptions, keywords can be entered, which relate to the phenotype of the double mutants. Pairs are found, for which we have found and documented a double mutant phenotype with the respective keyword. See this example for a pair that can be found with the search term "dwarf".

 

Pair view:

 

The gene pairs in GABI-DUPLO are combinations of exactly two genes, and these are represented by two parental insertion lines that each contain an insertion in the respective gene. For each parent, the available information is outlined in the detailed view of the respective pair.

To qualify as a pair in DUPLOdb, the open reading frames of the respective genes have to show a similarity of more than 60% and less than 20% gaps. The component value in the pair view describes the number of "related" genes (homologous according to the same similarity parameters). We attempted to only work with pairs that have a component value of two. However, since the first evaluations were performed using TIGRv5 data before 2008, some of the DMs generated at the beginning of the project do not qualify any more when TAIRv10 annotation was used. Obviously, pairs with component value = 2 have the best prediction for a new mutant phenotype in the DM. We considered a gene pair to be "Genetically unlinked" if the genes are either on two chromosomes, or have a distance of more than 7.5 Mbp. This is required for the feasibility of the generation of double mutants, because the mutants are generated by crossing the assigned insertion lines with a comparably low number of offspring genotyped for each cross.

The references listed on the pair details page (see this example) have been manually collected during the duration of the DUPLO project. We explicitly make no claim to be complete or exhaustive.

 

Graphic view icons:

 

For alleles represented in the DUPLOdb, a graphic view has been implemented, which is accessible from the line and pair view pages.

The graphic view page displays a genome fragment around the gene or FST as an image, with all the genes and FSTs in this region. The FSTs are represented by triangels that mark the deduced insertion position, as well as the genes are clickable and the image can be zoomed in and out several times. See the legend for details about the symbols used. The FSTs are distinguished by 5 different colors:

test Confirmed: The insertion is confirmed and has been donated to NASC.

Unknown: Work on this insertion has not been done yet, the insertion can be ordered.

Failed: The insertion could not be confimed.

Outdated: The insertion prediction results from an original FST and has been updated with data from the confirmation sequence.

No useful allele available: there is no insertion prediction for an CDSi allele of the respective gene in either SALK or GABI-Kat. Still, clicking the symbol will lead to the respective gene.


 

Access to genotyping data:

 

Confirmation data for lines/alleles which were analyzed in the context of GABI-DUPLO are found via the "show confirmation sequences" link. The data presented in DUPLOdb resembles the data which is also presented in SimpleSearch. For further information on SimpleSearch, please visit the SimpleSearch help page.

The primers used for genotyping are accessible via the "show primer details" link, which is found at the pair view or the line view, if we have generated data for the respective alleles. This data is available for most of the lines on the DUPLO gene pairs list, even if the homozygous single mutants have not been generated. The work to confirm additional parent alleles is (as of January 2013) still continued at GABI-Kat. We present wet-lab verified primer data that can be used for genotyping. Most of the DUPLO lines have at least been confirmed for the predicted insertion allele and a suitable wildtype amplicon for genotyping has been checked experimentally.

 

Details on the genotyping process and the genotyping primers:

The workflow in the genotyping process in GABI-DUPLO worked as follows.

First, the predicted insertions that are assigned to a pair were confirmed by PCR and amplicon sequencing. In the PCR for the "insertion specific amplicon", a T-DNA border specific primer (T-DNA primer) was used together with a primer specific for the genomic A. thaliana DNA at the predicted insertion site (locus specific primer). In general, the T-DNA primer in the PCR was 8474 for GABI-Kat LB-FSTs, 3144 for GABI-Kat RB-FSTs, and LBb3 for SALK FSTs (all LB). Amplicon sequencing was performed using the locus specific primer and 8409 for GABI-Kat LB-FSTs, 3144 for GABI-Kat RB-FSTs, and LBb3 for SALK FSTs. Early in the DUPLO project a different T-DNA primer (R204) was used in the PCR and sequencing for the SALK lines.

Second, a wildtype (wt) amplicon was established using the locus specific primer from the confirmation process and a 2nd locus specific primer in reverse orientation on the other side of the insertion site. The wt amplicon was verified using genomic Col-0 DNA and the two locus specific primers by comparing the observed amplicon size with the expected size predicted from the genomic sequence according to TAIRv10. In some cases, the 2nd locus specific primer was also used to perform a second confirmation of the predicted insertion position on the other side of the T-DNA. An example for such a case can be found here.

The insertion specific amplicon and the wt amplicon were used for the genotyping of the lines. In case of GABI-Kat LB-FSTs 8474 was used as T-DNA primer in the initial analysis of the parental lines. 8409 was used as T-DNA primer in the analysis of the double mutants. We recommend using 8474 for genotyping purposes, but both primers should work. For SALK lines we recommend using LBb3 as T-DNA primer, since an unspecific product is frequently observed when using R204.

 

Details on the exact primer sequences used for specific lines can be found on the "genotyping and primer details" page, which is linked for each line on the "pair details" page and on the "line and FST details" page.

The sequences of the T-DNA primers are:

8474: 5'-ATAATAACGCTGCGGACATCTACATTTT-3'
8409: 5'-ATATTGACCATCATACTCATTGC-3'
3144: 5'-GTGGATTGATGTGATATCTCC-3'
LBb3: 5'-ATTTTGCCGATTTCGGAAC-3'
R204: 5'-GCGTGGACCGCTTGCTGCAACT-3'

Note: The primer LBb3 is the one that is recommended by SALK. It is also used in the SALK genotyping project. The primer is found on the SALK website and is named LBb1.3 there.

 

Access to data from SALK lines:

 

We do not attempt to duplicate databases that allow access to SALK and SALKc lines, but want to make the data we have generated for SALKs available. The SALK parents list shows all SALK lines for which we have data. In DUPLOdb (including the visualisation), in general only those SALK alleles are shown, which we have assigned to a DUPLO pair for providing a "parent allele", AND for which we really have data. Note that some pairs have SALK parents assigned, on which we have not done any work (yet).

 

Access to publication data


During the project we have collected publications linked to gene pairs in order to focus work on yet unknown double mutants. Most of the information about known double mutants comes from the SeedGenes Project (www.seedgenes.org), other publications are linked at the corresponding pair site. For example:
http://www.gabi-kat.de/db/duplopair.php?pair_id=216
You can access all pairs where publication data is available with the DUPLOdb-search. Use the "Search in double mutant description" and select "List pairs that have a paper assigned".
The papers have been found manually and is therefore not exhaustive. The references are no longer updated since March 2013.