WHIRL(ID:8182/)

Country: us
- languages for us
- us/1997
Began: 1997

Word-based Heterogeneous Information Representation Language

Cohen ATT Shannon Labs 1997

References:

Cohen, William W. "The WHIRL approach to integration: An overview" in Proceedings of the AAAI-98 Workshop on AI and Information Integration. AAAI Press, 1998 view details Abstract: We describe a new integration system, in which information
sources are converted into a highly structured
collection of small fragments of text. Database-like
queries to this structured collection of text fragments
are approximated using a novel logic called WHIRL,
which combines inference in the style of deductive
databases with ranked retrieval methods from information
retrieval. WHIRL allows queries that integrate
information from information sources, without requiring
the extraction and normalization of object identifiers
that can be used as keys; instead, operations that
in conventional databases require equality tests on keys
are approximated using IR similarity metrics for text.
This leads to a reduction in the amount of human engineering
required to field an integration system.
External link: Online copy

Cohen, W.W. "WHIRL: A word-based information representation language" Articial Intelligence, 2000 view details Abstract: We describe WHIRL, an "information representation language"
that synergistically combines properties of logic-based and
text-based representation systems. WHIRL is a subset of non-recursive
Datalog that has been extended by introducing an atomic type for
textual entities, an atomic operation for computing textual
similarity, and a "soft" semantics; that is, inferences in
WHIRL are associated with numeric scores, and presented to the user
in decreasing order by score. We show that WHIRL strictly generalizes
both ranked retrieval of documents, and logical deduction; that
non-trivial queries about large databases can be answered
efficiently; that WHIRL can be used to accurately integrate data from
heterogeneous information sources, such as those found on the Web;
that WHIRL can be used effectively for inductive classification of
text; and finally, that WHIRL can be used to semi-automatically
generate extraction programs for structured documents. External link: Online copy