Legal Retrieval(ID:5303/leg001)Information retrieval system for legal documentsLegal information retrieval language Kehl, Horty, Bacon, Mitchell, University of Pittsburgh, PA , 1961 References: The phrase "information retrieval" covers such a myriad of topics that it, is necessary to outline the problems in which we are interested. Our programming research was initiated by the needs of lawyers for high speed computer assistance in their studies. There are many different problems in this work. The body of text includes statutes, regulations, and cases on a variety of topics. Statutes were selected for the first phase of the project. These involved the use of a technical vocabulary (not to the extent of a scientific vocabulary). However, the use of words is not primarily metaphorical. In addition, lawyers have a variety of questions beyond the simple identification of a document by a search. They are sometimes interested in the use of a particular wording or phrase. Other times they are interested only in citations. Sometimes, having pursued a search and received results, they wish to probe deeper into the same question. This meant that there would not be just one search progrmn but an integrated family of programs that were needed. Because of the difficulties of meaning (after all, this is what lawyers are for--not leafing pages in a library), neither traditional legal index methods not some of the more modern means of linguistic coding devices were adequate. With this background, the technical assumptions on our programming research will seem more natural. 2. Assumptions a. The body of text is large and will continue to grow. This assumes automatic updating procedures. b. The text itself consists of alphabetical or numerical characters and punctuation, but does not include graphs, formulas, or pictures. c. There is no pre-coding. The entire text is keypunched and transferred to magnetic tape. This assumption permits continued progress of the project with the development of page reader scanning devices. d. There are several programs to be used in an interrelated manner. The search language is supplemented by the availability of a key word in context program as well as programs for statistical studies of the vocabulary, and special thesaurus studies. This assumption requires the use of a large scale general purpose computer, in our case an IBM 7070 with 10,000-word core storage and l0 magnetic tape drives. e. The search language itself is a higher level language (such as the formula translation languages for mathematicians) rather than a machine level language. That is, it assumes input questions are in English so that the user may participate directly without technical training. It assumes the possibility of processing statements sequentially with logical decisions such as "if A, then skip B" controlling the sequence of execution. This permits the user to probe deeper into his inquiry automatically when desirable. The remainder of this paper is concerned with the actual programs in use and examples of their application. They illustrate some of the considerations and decisions that have to be made in the process of developing a text processing system. in [ACM] CACM 4(09) (September 1961) view details LEGAL RETRIEVAL An Information Retrieval Language for Legal Studies. The needs of lawyers for high speed computer assistance in retrieval of information for legal studies. A variety of problems are involved, such as statutes, regulations, wording, citations, etc. Sometimes, a particular word or phrase is needed, and other times, a citation is sufficient. Studies are being made using the statutes of Pennsylvania, and the statutes of states are being prepared in [ACM] CACM 5(01) January 1962 "Design, Implementation and Application of IR-Oriented Languages," ACM Computer Language Committee on Information Retrieval on 20-21 October 1961 in Princeton, N. J. view details This system is of interest on several counts. It was one of the earliest attempts to use natural language for IR purposes. It is also one of the largest, as far as volume of text is concerned. This year the Health Science Center reported a corpus of 50,000,000 words on magnetic tape. The Center's attack on the problem has been blunt and massive, eschewing the various statistical and linguistic approaches to classification or analysis, which most researchers consider essential for effective exploitation of natural language for IR. The burden of semantics is put on the user, who selects the words of his query from the concordance and KWIC displays of the text to be searched. Some of the refinements planned for the system after 1961 are mentioned. in ACM Computing Reviews 5(05) September-October 1964 view details |