IDEA(ID:4520/ide002)

Visual data language

Began: 1996

for Interactive Data Exploration and Analysis

Visual data language

People:

Divesh Srivastava

References:

Selfridge, Peter G. and Srivastava, Divesh "A visual language for interactive data exploration and analysis" view details Abstract: The analysis of large amounts of data to extract generalizations, exceptions, trends, and hidden relationships is a common activity in the business and scientific communities. While some kinds of "knowledge" can be extracted automatically with preselected algorithms or data mining techniques, others require an experienced human, often an expert in analysis, the business or scientific context, or both. We have found that such humans combine exploration, the search for a relevant subset or view of the data, with analysis, statistical or other techniques for measurement. We designed and implemented a visual language, IDEA, to assist the data analyst in these two intertwined tasks.

The language is a convenient representation for data analysis and provides environmental support for keeping track of sequences of operations, reuse of the data analysis itself, and enforced semantics between operations and data.
External link: Online copy
in Proceedings of the IEEE Symposium on Visual Languages (VL), 1996 view details

Selfridge, Peter G. and Srivastava, Divesh "A visual language for interactive data exploration" view details Abstract: IDEA is a typed, graph-based, visual language. An IDEA program is a directed acyclic graph (DAG) that represents the actions performed by the BDA during a data exploration and analysis session, as nodes of the DAG, and various relationships between the actions, as edges of the DAG. It is important to note that the IDEA visual language does not assist the BDA in the visual construction of the action itself; this is accomplished through action-specific tools, for example, visual query languages. External link: Online copy ps Extract: The IDEA Visual Language
The IDEA Visual Language
After empirical study of the BDA's actions we designed and implemented an environment called IDEA (IDEA stands for Interactive Database Exploration and Analysis). IDEA is a typed, graph-based, visual language. An IDEA program is a directed acyclic graph (DAG) that represents the actions performed by the data analyst during a data exploration and analysis session, as nodes of the DAG, and various relationships between the actions, as edges of the DAG. Each interaction with an external tool, for example, composing a query, segmenting a relation, or viewing summary information, is an action. Edges can represent derivation (corresponding to data flow), temporal
(representing a total ordering of actions by the BDA), and implicit computation (representing a possible data dependency between nodes).

The implementation of IDEA is based on a client-server architecture and allows a BDA to explore a subset of the data, to construct re-usable IDEA programs and intuitively captures the notion of an analysis session in a form that can be run on larger data sets, shared and re-used. Figure 2 illustrates a snapshot of an IDEA session. More details on the database aspects of this work can be found in [4], and more details on this work from the perspective of visual languages and knowledge discovery can be found in [5].
in Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996 view details

Selfridge, Peter G.; Srivastava, Divesh and Lynn O. Wilson "IDEA: Interactive Data Exploration and Analysis" pp24-34 view details Abstract: The analysis of business data is often an ill-defined task
characterized by large amounts of noisy data.  Because of this,
business data analysis must combine two kinds of intertwined
tasks: exploration and analysis.  Exploration is the process
of finding the appropriate subset of data to analyze, and
analysis is the process of measuring the data to provide the
business answer.  While there are many tools available both
for exploration and for analysis, a single tool or set of tools
may not provide full support for these intertwined tasks.  We
report here on a project that set out to understand a specific
business data analysis problem and build an environment to
support it.  The results of this understanding are,
first of all, a detailed list of requirements of this task;
second, a set of capabilities that meet these requirements; and
third, an implemented client-server'solution that addresses many
of these requirements and identifies others for future work.
Our solution incorporates several novel perspectives on data
analysis and combines a history mechanism with a graphical,
re-usable representation of the analysis and exploration process.
Our approach emphasizes using the database itself to represent
as many of these functions as possible. External link: Online copy ps
      in Proceedings of the 1996 ACM SIGMOD Conference on Management of Data view details