IDL(ID:6006/idl005)

Information Description Language 


Jerome Sable, RCA 1962

Lispy language designed to store information (data) using semantic structures




Related languages
IDL => DM-1   Influence

References:
  • Sable, J. D. "Use of semantic structure in information systems" pp40-42 view details Abstract: This paper exlplores semi-automatic techniques
    that can be brought to bear on the problem of semantic anal.vsis
    and how semantic st.rueture, once determined, can be effectively
    used in the information retrieval system. The following point.s
    will be covered: (1) The wmabulary of a data base is not the same
    as the vocabulary of inquiries made on that. data base; the question
    of the relevance of information to a request for information
    can be stated in terms of the existence of a scope relationship between
    terms in the request and terms in the data. (2) Scope relationships,
    related to lneaning, can be represented by associating
    terms with nodes on a direet¢,d graph. (3) The directed graph can
    be const rm?t ed from data obtained by a linguistic experinlenL with
    the users of the information system. (.t) An IR language, for the
    expression of data and queries, can be based upon this structured
    vocabulary. Extract: Scope as a binary relation
    By semantic structure is meant, a mapping of an aspect of language related to meaning. Semantic structure will be based on a basic binary relation called scope. The semantic structure of a vocabulary can be represented by associating terms in the vocabulary with nodes on a directed graph. The scope of a node is the set of nodes that can be reached from it on the directed graph. In a tree structure the property of scope is displayed as a subtree, that is, the scope of a node is that part of the structure having the node as a head. The motivation is to create an index in which under each term is listed only those texts that use it, yet have the term denote all those texts listed under it or arty term within its scope. These ideas can perhaps be clarified with reference to figure 2. The nodes in a tree can be named according to the unique path from tile apex to the node in question, using what has been called a tree code. For example, the tree code of the node labeled "A" is 212, The scope of node "A" is the indicated subtree. An example of a sample vocabulary that has been placed in a tree structure is shown in Figure 3.

    The corpus upon which the semantic analysis will be made is composed of the information carrying documents (texts, messages), and the queries (requests for information) of the users.  Queries will be made up in general with words of low discrimination (high in the tree) while information carrying words from relevant documents will contain words of higher discrimination (low in the tree). Extract: The Information Description Language
    A formal language will be described which will be expressive enough to paraphrase active declarative sentences that describe the objects of concern in a positive way, yet will be simple enough to permit straight-forward processing and have properties of self-indexing that make it useful for information retrieval. It will be called Information Description Language (IDL).

    The IDL vocabulary is constructed by putting all words that can serve as descriptors or connectives (relation words) on a directed graph so that scope relationships are shown. For example, the vocabulary shown in Figure 3 might be part of the vocabulary for an army personnel file. Terms are named by giving the numbers of the path from the apex (*) to the word desired. For example, the statement "an officer named Jones is superior to Sgt. John Smith" is written as (:122, 111, 22 * (1212, 1:121)).
    An IDL vocabulary can be viewed as a discrimination tree, with greater degrees of discrimination represented by longer terms (greater depth). This ability to operate at varying levels of discrimination is useful in a system that must handle inputs from a variety of human sources at different degrees of precision.

    Each primitive term has a semantic interpretation if it names a tree path in the IDL vocabulary. Each digit in the term will then stand for a word. The semantic interpretation of a term is that each word is a modifier to the previous words. When two terms are joined by "*", the second modifies the first, and when one statement is imbedded in another it is subordinate to it.

    It is possible to write IDL primitive terms using the words that are implied by the digits rather than the digits themselves. This will customarily be done when the statements are generated by humans or are intended for human reading. It is also possible to abbreviate the term by the last word in the string and any previous words which may be necessary to imply a unique path in the vocabulary tree or clarify the reading, and the asterisk may be omitted.
    The previously given sentence can therefore be written as follows:
    (OFFICER, JONES, IS-SUPERIOR-TO (SERGEANT, SMITH JOHN))
    The grammar given above is, of course, only one of several that (.'an be proposed. It represents a compromise between a grammar allowing no phrase structure (subordination) and one permitting all of the complexity of a natural language. The optimum grammar is the one that permits adequate expressiveness for the particular problem at hand and is simple enough to permit efficient processing of its resulting sentences.
    The choosing of an appropriate grammar is an important task in the design of the information system. The grammar chosen here permits lattice-structured relationships between statements, the modification of one monadic predicate by another, and a tree-structured vocabulary of predicates.
    The data section of the information system is composed of two main sections, the thesaurus and the record file. The record file contains the records describing the items of the system, arid the thesaurus serves as the link between man and machine that permits the entry of verbal information into the record file, and the efficient retrieval and decoding of information for human consumption. The formal information description language IDL serves to efficiently describe the information in the system and, at the(. same time, as an indexing language that brings together terms that denote items with related properties. The thesaurus is composed of two files, an index and a glossary. The index is an ordered listing (in the IDL alphabet) of IDL terms and the associated glossary words together with the addresses of all records that use each term. The glossary is an alphabetical listing of all English words that are used in the description vocabulary, together with the IDL terms that use it.

          in [ACM] CACM 5(01) January 1962 "Design, Implementation and Application of IR-Oriented Languages," ACM Computer Language Committee on Information Retrieval on 20-21 October 1961 in Princeton, N. J. view details
  • Simmons, R.F., "Answering English Questions by Computer - A Survey" SDC Report SP-1536 Santa Monica, Calif.; April, 1964 view details
          in [ACM] CACM 5(01) January 1962 "Design, Implementation and Application of IR-Oriented Languages," ACM Computer Language Committee on Information Retrieval on 20-21 October 1961 in Princeton, N. J. view details
  • Simmons, R. F. "Answering English questions by computer: a survey" p53-70 view details Abstract: Fifteen experimental English language question-answering systems which are programmed and operating are described and reviewed. The systems range from a conversation machines to programs which make sentences about pictures and systems which translate from English into logical calculi. Systems are classified as list-structured data-based, graphic data-based, text-based and inferential. Principles and methods of operations are detailed and discussed.

    It is concluded that the data-base question-answerer has passed from initial research into the early developmental phase. The most difficult and important research questions for the advancement of general-purpose language processors are seen to be concerned with measuring meaning, dealing with ambiguities, translating into formal languages and searching large tree structures. DOI
          in [ACM] CACM 8(01) Jan 1965 view details