IDEA(ID:4520/ide002)Visual data languagefor Interactive Data Exploration and Analysis Visual data language People: References: The language is a convenient representation for data analysis and provides environmental support for keeping track of sequences of operations, reuse of the data analysis itself, and enforced semantics between operations and data. External link: Online copy in Proceedings of the IEEE Symposium on Visual Languages (VL), 1996 view details The IDEA Visual Language After empirical study of the BDA's actions we designed and implemented an environment called IDEA (IDEA stands for Interactive Database Exploration and Analysis). IDEA is a typed, graph-based, visual language. An IDEA program is a directed acyclic graph (DAG) that represents the actions performed by the data analyst during a data exploration and analysis session, as nodes of the DAG, and various relationships between the actions, as edges of the DAG. Each interaction with an external tool, for example, composing a query, segmenting a relation, or viewing summary information, is an action. Edges can represent derivation (corresponding to data flow), temporal (representing a total ordering of actions by the BDA), and implicit computation (representing a possible data dependency between nodes). The implementation of IDEA is based on a client-server architecture and allows a BDA to explore a subset of the data, to construct re-usable IDEA programs and intuitively captures the notion of an analysis session in a form that can be run on larger data sets, shared and re-used. Figure 2 illustrates a snapshot of an IDEA session. More details on the database aspects of this work can be found in [4], and more details on this work from the perspective of visual languages and knowledge discovery can be found in [5]. in Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996 view details characterized by large amounts of noisy data. Because of this, business data analysis must combine two kinds of intertwined tasks: exploration and analysis. Exploration is the process of finding the appropriate subset of data to analyze, and analysis is the process of measuring the data to provide the business answer. While there are many tools available both for exploration and for analysis, a single tool or set of tools may not provide full support for these intertwined tasks. We report here on a project that set out to understand a specific business data analysis problem and build an environment to support it. The results of this understanding are, first of all, a detailed list of requirements of this task; second, a set of capabilities that meet these requirements; and third, an implemented client-server'solution that addresses many of these requirements and identifies others for future work. Our solution incorporates several novel perspectives on data analysis and combines a history mechanism with a graphical, re-usable representation of the analysis and exploration process. Our approach emphasizes using the database itself to represent as many of these functions as possible. External link: Online copy ps in Proceedings of the 1996 ACM SIGMOD Conference on Management of Data view details |