FT(ID:6704/ft:002)

Country: us
- languages for us
- us/1993
Began: 1993

for Features Table

Influence

References:

Fristensky B "Feature expressions: creating and manipulating sequence datasets" Nucleic Acids Res. 1993 Dec 25;21(25):5997-6003 view details Abstract: Annotation of features, such as introns, exons and protein coding regions in GenBank/EMBL/DDBJ entries is now standardized through use of the Features Table (FT) language. The essence of the FT language is described by the relation 'expression-->sequence', meaning that each FT expression evaluates to a sequence. For example, the expression M74750:1..50 evaluates to the first 50 bases of the sequence with accession number M74750. Because FT is intrinsic to the database definition, it can serve as a software- and platform-independent lingua franca for sequence manipulation. The XYLEM package makes it possible to create and manipulate sequence datasets using FT expressions. FEATURES is a program that resolves FT expressions into their corresponding sequences. Annotated features can be retrieved either by feature key or by expression. Even unannotated portions of a sequence can be retrieved by user-generated FT expressions. Applications of the FT language include retrieval of subsequences from large sequence entries, generation of chromosome models or artificial DNA constructs, and representation of restriction maps or mutants Extract: INTRODUCTION
INTRODUCTION
While the widespread availability of sequence databases has been
of great value to molecular biologists, most database usage is
limited to a few simple tasks: searching for entries by keyword,
retrieval of entries, and sequence similarity searches. More
sophisticated projects often require the creation of large database
subsets, representing particular taxa, organs, tissues, or other
groupings which merit comparison. One of the earliest studies
of this type analyzed 124 mRNA sequences from E. coli to infer
a set of rules for identification of ribosome binding sites [1]. More
recently, 369 Alul dispersed repetitive elements were categorized
into subfamilies to enable reconstruction of their evolutionary
history [2]. Such projects require not only the ability to organize
sequences into discrete groups, but also to extract specific
subsequences from each database entry for analysis of comparable
features.
A sequence query language, that is, a language in which
expressions, upon evaluation, yield sequence, would offer many
advantages in dataset construction. The sequences themselves
need not be stored, but rather, the instructions necessary to
recreate the dataset. Interestingly, the most ambitious attempts
at writing sequence query languages predate GenBank [3] itself.
Schroeder and Blattner [4] described DNA*, which permitted
concatenation and complementation of DNA sequences using a
terse syntax. Another approach was that of DELILA
(DEoxyribonucleic acid LIbrary LAnguage, [5]. DELILA
encompassed both a hierarchical syntax for description of
genomes, as well as a query language in which named features
served as reference points within a coordinate system. Because
both languages predated the current databases, they do not contain
syntax for reference to database entries. While more recent tools
have been able to parse GenBank entries for direct use of data
fields by other programs [6], automated access to the features
annotated in the Features Table has been difficult to realize.
The development of the Feature Table language (FT) [7] as
an integral part of database annotation was a fundamental step
in making sequence data more useable because each feature in
a GenBank entry is now annotated in a standard, machineparsable
syntax. The universality of this language now makes
it possible to specify any DNA sequence using an expression,
as given by the relation
expression - sequence
This task has been implemented in the FEATURES program,
which is part the XYLEM package, to be described in this paper
(Table I). While fully accessible through a menu-driven interface,
the simplest form of the FEATURES command is
features expression > sequence
meaning that FEATURES can take a FT expression as input and
write a sequence to the output. For example, given the following
feature annotated in the GenBank entry with primary accession
number M74750:
terminator 609..650
/label =T7-terminator
typing the command
features M74750:T7-terminator
would return the sequence
ataaccccttggggcctctaaacgggtcttgaggggttttt
representing that part of the sequence spanning bases 609 to 650,
as identified by the field 'label =T7-terminator'.

Table I. List of XYLEM programs and functions

High-level tools
FINDKEY Search for one or more keywords in database
FETCH Retrieve one or more entries from database
FEATURES Extract features by feature key or expression

Low-level tools
SPL1TDB Split a database into annotation, sequence and index
IDENTIFY Used by FINDKEY to identify entries containing keywords
GETLOC Used by FETCH to retrieve entries from a split database
GETOB Used by FEATURES to parse Feature Table expressions
UDS Update an. existing dataset with new versions of entries
DBSTAT Calculate amino acid frequencies in a protein database
RIBOSOME Translate file of nucleic acid sequences into protein
SHUFFLE Given a random seed, shuffles each sequence in a file
REFORM Multiple alignment printing tool
GBUPDATE Download GenBank databe by FTP; calls SPLITDB
PIRUPDATE Download PIR database by FTP; calls SPL1TDB
The XYLEM tools (Table I) automate the management of
online databases, as well as the construction of sequence database
subsets. Even non-expert users should be able to create datasets
for use in multiple alignments, phylogenetic studies, structure
comparisons and other types of analyses.