FT(ID:6704/ft:002)


for Features Table




Related languages
DELILA => FT   Influence

References:
  • Fristensky B "Feature expressions: creating and manipulating sequence datasets" Nucleic Acids Res. 1993 Dec 25;21(25):5997-6003 view details Abstract: Annotation of features, such as introns, exons and protein coding regions in GenBank/EMBL/DDBJ entries is now standardized through use of the Features Table (FT) language. The essence of the FT language is described by the relation 'expression-->sequence', meaning that each FT expression evaluates to a sequence. For example, the expression M74750:1..50 evaluates to the first 50 bases of the sequence with accession number M74750. Because FT is intrinsic to the database definition, it can serve as a software- and platform-independent lingua franca for sequence manipulation. The XYLEM package makes it possible to create and manipulate sequence datasets using FT expressions. FEATURES is a program that resolves FT expressions into their corresponding sequences. Annotated features can be retrieved either by feature key or by expression. Even unannotated portions of a sequence can be retrieved by user-generated FT expressions. Applications of the FT language include retrieval of subsequences from large sequence entries, generation of chromosome models or artificial DNA constructs, and representation of restriction maps or mutants Extract: INTRODUCTION
    INTRODUCTION
    While the widespread availability of sequence databases has been
    of great value to molecular biologists, most database usage is
    limited to a few simple tasks: searching for entries by keyword,
    retrieval of entries, and sequence similarity searches. More
    sophisticated projects often require the creation of large database
    subsets, representing particular taxa, organs, tissues, or other
    groupings which merit comparison. One of the earliest studies
    of this type analyzed 124 mRNA sequences from E. coli to infer
    a set of rules for identification of ribosome binding sites [1]. More
    recently, 369 Alul dispersed repetitive elements were categorized
    into subfamilies to enable reconstruction of their evolutionary
    history [2]. Such projects require not only the ability to organize
    sequences into discrete groups, but also to extract specific
    subsequences from each database entry for analysis of comparable
    features.
    A sequence query language, that is, a language in which
    expressions, upon evaluation, yield sequence, would offer many
    advantages in dataset construction. The sequences themselves
    need not be stored, but rather, the instructions necessary to
    recreate the dataset. Interestingly, the most ambitious attempts
    at writing sequence query languages predate GenBank [3] itself.
    Schroeder and Blattner [4] described DNA*, which permitted
    concatenation and complementation of DNA sequences using a
    terse syntax. Another approach was that of DELILA
    (DEoxyribonucleic acid LIbrary LAnguage, [5]. DELILA
    encompassed both a hierarchical syntax for description of
    genomes, as well as a query language in which named features
    served as reference points within a coordinate system. Because
    both languages predated the current databases, they do not contain
    syntax for reference to database entries. While more recent tools
    have been able to parse GenBank entries for direct use of data
    fields by other programs [6], automated access to the features
    annotated in the Features Table has been difficult to realize.
    The development of the Feature Table language (FT) [7] as
    an integral part of database annotation was a fundamental step
    in making sequence data more useable because each feature in
    a GenBank entry is now annotated in a standard, machineparsable
    syntax. The universality of this language now makes
    it possible to specify any DNA sequence using an expression,
    as given by the relation
    expression - sequence
    This task has been implemented in the FEATURES program,
    which is part the XYLEM package, to be described in this paper
    (Table I). While fully accessible through a menu-driven interface,
    the simplest form of the FEATURES command is
    features expression > sequence
    meaning that FEATURES can take a FT expression as input and
    write a sequence to the output. For example, given the following
    feature annotated in the GenBank entry with primary accession
    number M74750:
    terminator 609..650
    /label =T7-terminator
    typing the command
    features M74750:T7-terminator
    would return the sequence
    ataaccccttggggcctctaaacgggtcttgaggggttttt
    representing that part of the sequence spanning bases 609 to 650,
    as identified by the field 'label =T7-terminator'.

    Table I. List of XYLEM programs and functions

    High-level tools
    FINDKEY Search for one or more keywords in database
    FETCH Retrieve one or more entries from database
    FEATURES Extract features by feature key or expression

    Low-level tools
    SPL1TDB Split a database into annotation, sequence and index
    IDENTIFY Used by FINDKEY to identify entries containing keywords
    GETLOC Used by FETCH to retrieve entries from a split database
    GETOB Used by FEATURES to parse Feature Table expressions
    UDS Update an. existing dataset with new versions of entries
    DBSTAT Calculate amino acid frequencies in a protein database
    RIBOSOME Translate file of nucleic acid sequences into protein
    SHUFFLE Given a random seed, shuffles each sequence in a file
    REFORM Multiple alignment printing tool
    GBUPDATE Download GenBank databe by FTP; calls SPLITDB
    PIRUPDATE Download PIR database by FTP; calls SPL1TDB
    The XYLEM tools (Table I) automate the management of
    online databases, as well as the construction of sequence database
    subsets. Even non-expert users should be able to create datasets
    for use in multiple alignments, phylogenetic studies, structure
    comparisons and other types of analyses.