BNA(ID:8245/)


for Biological Sequence Analysis

Gene sequence storage and querying language for an relational system written as a VSAPL workspace

German Cancer Research Center, Institute for Documentation, Infonnation and Statistics,


Related languages
VSAPL => BNA   Written using

References:
  • Osterburg, C., Sommer, R., Glatting, K.H. "BNA" Techn. Rep. No. 17. German Cancer Research Center. Institute for Documentation, Information and Statistics, Heidelberg 1979 (in German) view details
  • Osterburg, G; Glatting, K H and R Sommer "Computer programs for the analysis and the management of DNA sequences" pp207?216. view details Abstract: A program package is described for the management and the analysis of DNA sequence data. The programs - with the exception of a few Fortran routines - are written in the programming language APL. They are best used interactively although batch processing is possible. The package has been in constant use for about 3 years and contains programs for most of the routine problems presently found in a DNA sequencing laboratory.
    External link: Online copy Extract: Data organization and management
    Data organization and management.

    As a conceptual frame for organization we used the relational data model [13]. A table or relation is defined by giving a table name and a list of field names. For example
    VIRUSDNA (SEQUENCE,NAME,GENES,REFERENCE)
    might specify a table of virus DNA sequences with four fields.
    Automatically, an additional field DATE is supplied which contains the data of the last change of the corresponding sequence data. Several tables can be grouped into a workspace.
    Several workspaces may form a library. Workspaces are identified by names of up to 8 characters. Libraries are identified by numbers from 0 to 99999. There is one special library, denoted by PUBLIC, which contains workspaces (i.e. sequence data) accessible to each user. All other libraries are private in the sense that only the owner has access.
    After entering the BSA system, the user gets an empty workspace (called the open workspace). The following commands for data management are available.
    REL Relname (list of field-names) defines a new table;
    ADD Relname (list of fieldnames) adds new fields to an existing table;
    DEL Relname (list of fieldname) deletes fields of a table or complete tables, if the list is missing;
    LST lists the table names existing in the actual workspace;
    LOA Libno WS/Tablel Table2 . ../ loads the workspace WS from library Libno (or Tablel, Table2 from WS resp.) into the open workspace;
    COP Libno WS/Tablel Table2 ..../ same as LOA but-without first scratching the open workspace;
    SAV enters into secondary storage the contents of the open workspace. For security reasons, the user is explicitly asked to enter library number and workspace name where data should be stored;
    UPD TableX [Query] either allows insertion of new or correction of old sequence data from TableX. Query (the general format will be explained below) denotes an expression used by the system to identify those sequences to be updated;
    OUT TE/PR schedules the output, of the programs to be executed later on, to the terminal (TE) or to a high speed printer;
    Extract: transformations
    A transformation may be used to transform data from a table
    before they are passed to the requested program. A few examples
    of transformations may demonstrate the usefulness of this feature:
    - SEQUENCE + CSTRANG SEQUENCE
    converts a DNA into its complementary strand. Used in
    conjunction with the program TRANSLATE, means that the
    complementary strand will be translated into amino acid
    sequences. (- is the APL symbol for assignment)
    - SEQUENCE +- ASKURZ TRANSL SEQUENCE
    translates (Program TRANSL) a DNA sequence into the
    amino acid sequence in a 3 letter code (starting from
    the first base) while ASKURZ translates a 3 letter code
    into a 1 letter code. This transformation can be used
    for finding homologies between amino acid sequences
    when only DNA sequences are stored. No additional
    storage into the database of the derived sequences
    is necessary.
    - SEQUENCE +- 100 + 200 + TAKEGENE SEQUENCE
    selects only bases between base 201 and base 300 of
    the original sequence. + (Take) and + (Drop) are
    special APL operation symbols.
          in Nucleic Acids Res. 10(1) Jan 11 1982 Special issue "devoted to the applications of computers to research on nucleic acids" view details