Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries

  • Simonetta Montemagni ,
  • Lucy Vanderwende

Proceedings of the Fourteenth International Conference on Computational Linguistics |

Published by Association for Computational Linguistics

Publication

As the research on extracting semantic information from on-line dictionaries proceeds, most progress has been made in the area of extracting the genus terms. Two methods are being used — patterns matching at the string level and at the structural analysis level — both of which seem to yield equally promising results. Little theoretical work, however, is being done to determine the set of possible differentiae to be identified, and therefore also the set of possible semantic relations that can be extracted from them. In fact, Wilks remarks that as far as identifying the differentiae and organizing that information into a list of properties is concerns, “such demands are beyond the abilities of the best current extraction techniques” (Wilks et al., 1989, p. 227). However, the current state of the art in computational linguistics demands that semantic information beyond genus terms be available now, on a large scale, to push forward the current theories, whether that is knowledge-based parsing or parsing first with a syntactic component, followed by a semantic component.

In this paper, we will focus on analyzing the definitions not for the genus terms, but for the semantic relations that can be extracted from the differentiae (Calzolari 1984). Although many have accepted the use of syntactic analyses for this purpose for some time now (for example Jensen and Binot 1987, Klavans 1990, Ravin 1990 and Vanderwende 1990, all of which use the PLNLP English Parser to provide the structural information), many others still do not. We will demonstrate with examples why only patterns based on syntactic information (henceforth, structural patterns) provide reliable semantic relations for the differentiae. Patterns that match definition text at the string level (henceforth, string patterns) are conceivable, but cannot capture the variations in the differentiae as easily as structural patterns. In addition, although it is possible to parse the definition texts using a grammar designed for one dictionary (e.g. a grammar of “Longmanese”, see Alshawi 1989), we have found that a general, broad-coverage grammar of English or of Italian provides a level of analysis that is as good as, and possibly superior to, a dictionary-specific grammar.