Skip to main content

The READERS project proposes new unsupervised computational models to automatically extract background knowledge after reading large amounts of unstructured text. This knowledge will be in the form of classes, categorized entities and predicates whose arguments are typified by probability distributions over classes. Classes themselves will be automatically organized into taxonomies related to the predicates in which they participate. In this way, new methods and models based on extensional definitions of concepts will be developed and deployed for the automatic creation of knowledge bases. Important, these will be closely related to textual representations and instrumental in enabling textual inferences. The extracted knowledge will be also linked to external human-made resources such as Freebase, DBPedia and WordNet, and the knowledge bases will be interfaced with several engines for performing disambiguation, relation extraction, term expansion, and measuring relatedness. A key part of the project will be the development of a reading matching that will use all these resources and tools. . The purpose of our reading machine is to answer queries about a given text. Texts are never self-contained and their interpretation always requires recovering large amounts of background knowledge. Thus, the Machine Reading technology under development must incorporate not only language processing but also the recovery and use of large amounts of background knowledge. This Machine Reading technology will be evaluated through Multiple-Choice Reading Comprehension tests (MRC) developed by humans over unseen documents. MRC tests enable objective and reproducible evaluation experiments, and will be 100% reusable as benchmarks available for the international community. Interestingly, the industrial partner in charge of the Machine Reading system development will apply the reverse technology to automatically generate MRC tests for the automatic assessment of children’s reading abilities. This reading machine will work with at least two languages, English and French. The support and coordination of an international evaluation campaign for Machine Reading in multiple languages (English, Spanish, French, German, Italian, Romanian, Bulgarian and Arabic) is part of the proposal. This evaluation campaign will serve to measure the progress in the development of the Machine Reading technology in a comparative/competitive environment. Evaluation exercises in specific domains such biomedicine will also provide a venue for technology transfer and allow us to assess the portability of the proposed technology.

Call Topic: From Data to New Knowledge (D2K), Call 2011
Start date: (36 months)
Funding support: 800000 €

Project partners

  • Universidad Nacional de Educación a Distancia (Spain)
  • UPV/EHU (Spain)
  • Synapse Développement (France)
  • University of Edinburgh (United Kingdom)