Plant Pathogen Transcript Assemblies
The clustering of ESTs and construction of plant pathogen transcript assemblies (TAs) is very similar to the DFCI/TIGR gene indices. Both of them use the TGI clustering tools (available at http://compbio.dfci.harvard.edu/tgi/software/)for the sequence screen and cleaning, cluster, and assembly. However, there are several differences to be highlighted here:
- TAs are the assemblies of the expressed sequences from dbEST (EST) and GenBank (cDNA), and those "virtual" transcript sequences derived from genomic sequences are excluded.
- The TA clusters are assembled into the consensus sequences by the program CAP3 rather than Paracel Transcript Assembler.
- More stringent criteria are applied in the construction of TAs: 50 bp minimum match, 95% minimum identity in the overlap region, 20 bp maximum unmatched overhang.
Plant pathogen TA identifiers are of the form TA#_####, where # is the number of the transcript assembly and _#### represents the taxon id from NCBI
Plant Pathogen Transcript Assemblies Overview