WHIRL(ID:8182/)


Word-based Heterogeneous Information Representation Language

Cohen ATT Shannon Labs 1997


References:
  • Cohen, William W. "The WHIRL approach to integration: An overview" in Proceedings of the AAAI-98 Workshop on AI and Information Integration. AAAI Press, 1998 view details Abstract: We describe a new integration system, in which information
    sources are converted into a highly structured
    collection of small fragments of text. Database-like
    queries to this structured collection of text fragments
    are approximated using a novel logic called WHIRL,
    which combines inference in the style of deductive
    databases with ranked retrieval methods from information
    retrieval. WHIRL allows queries that integrate
    information from information sources, without requiring
    the extraction and normalization of object identifiers
    that can be used as keys; instead, operations that
    in conventional databases require equality tests on keys
    are approximated using IR similarity metrics for text.
    This leads to a reduction in the amount of human engineering
    required to field an integration system.
    External link: Online copy
  • Cohen, W.W. "WHIRL: A word-based information representation language" Articial Intelligence, 2000 view details Abstract: We describe WHIRL, an "information representation language"
    that synergistically combines properties of logic-based and
    text-based representation systems. WHIRL is a subset of non-recursive
    Datalog that has been extended by introducing an atomic type for
    textual entities, an atomic operation for computing textual
    similarity, and a "soft" semantics; that is, inferences in
    WHIRL are associated with numeric scores, and presented to the user
    in decreasing order by score. We show that WHIRL strictly generalizes
    both ranked retrieval of documents, and logical deduction; that
    non-trivial queries about large databases can be answered
    efficiently; that WHIRL can be used to accurately integrate data from
    heterogeneous information sources, such as those found on the Web;
    that WHIRL can be used effectively for inductive classification of
    text; and finally, that WHIRL can be used to semi-automatically
    generate extraction programs for structured documents. External link: Online copy