Distributed Representation of Entity Mentions Within and Across Multiple Text Documents

Aliakbar  Keshtkaran; Siti Sophiayati  Yuhaniz; Mohammad Reza  Rostami

Authors

Aliakbar Keshtkaran
Siti Sophiayati Yuhaniz
Mohammad Reza Rostami

Keywords:

Coreference Resolution, Cross-Document Coreference Resolution, Distributed Representation of Words, Information Extraction, Natural Language Processing

Abstract

Regarding to the importance of entities as a base of information for several NLP applications, Cross- Document Coreference Entity Resolution (CDCR) provides techniques for the identification of textual mentions of entities and clustering co-referent mentions across multiple documents. In such context, while prior works employ Knowledge Bases (KB) as a structured information resource to enrich the context of mentions, however these methods have limitations with KB’s unknown entities, with effects on the accuracy and performance of the task. Accordingly, this paper presents a new approach to improve the state-of-the-art by concentration on the knowledge provided by the input text of the mentions, regardless of any external knowledge resource. For this purpose, we first construct the context of mentions using the sequence of informative words around the mention (known as content-words). Furthermore, by abstraction of the mention vector representation to a limited size using an artificial neural network technique of continuous representation of words (i.e. Word2Vec), we reduce the computational cost of the co-referring mentions sub-task. By analyzing the results of experiments with two datasets, significant gains in the accuracy of CDCR as well as run-time efficiency are achieved, compared to the best prior methods.

Distributed Representation of Entity Mentions Within and Across Multiple Text Documents

Authors

Keywords:

Abstract

Author Biographies

Aliakbar Keshtkaran

Siti Sophiayati Yuhaniz

Mohammad Reza Rostami

Downloads

Published

How to Cite

Issue

Section

OIJI

Article Template