xLiD-Lexica: Cross-lingual Linked Data Lexica

Overview: In this work, we constructed cross-lingual linked data lexica, called xLiD-Lexica, by exploiting the multilingual Wikipedia and linked data sources, especially DBPedia. First, we provide the reference association between entities and labels, where labels are phrases can be used to refer to entities. The reference association of each pair of label and entity captures the relationship in the sense that to which extent the label refers to the corresponding entity and thus it is an intended sense of the label. Besides that, we also provide the co-occurrence association between entities and labels, where we utilize labels that co-occur with an entity in its immediate context to derive their co-occurrence frequency. Apart from labels, there are many more words contained in Wikipedia for different languages, which could be important resources for many tasks. Therefore, we also derive the co-occurrence association between entities and words. In order to derive such associations between entities and words / labels across languages, cross-language links that connect Wikipedia articles describing equivalent entities have been employed.



User Interface:

http://km.aifb.kit.edu/services/xlid-lexica-ui/

Query Examples:

Find the associations between English labels / words and "Mercedes-Benz"



Find the associations between German labels / words and "Mercedes-Benz"



Find the associations between Chinse labels / words and "Mercedes-Benz"




SPARQL Endpoint:

http://km.aifb.kit.edu/services/xlid-lexica-sparql/

Query Examples:

Find all entities with a surface form which contain "iPhone"
Select ?resource, ?label, ?probability 
from <http://www.xlid-lexica.org> where {
?resource <http://www.xlid-lexica.org/block> ?b1 .
?b1 <http://www.xlid-lexica.org/res#sf> ?sf .
?b1 <http://www.xlid-lexica.org/res#priorProbability> ?probability .
?sf <http://www.xlid-lexica.org/block> ?b2.
?b2 <http://www.xlid-lexica.org/sf#label> ?label .
?label bif:contains "iPhone" . }
order by DESC(?probability) limit 100

Retrieve top 100 resources for the given surface form "iphone"
Select ?resource, ?probability 
from <http://www.xlid-lexica.org> where {
?resource <http://www.xlid-lexica.org/block> ?b1 .
?b1 <http://www.xlid-lexica.org/res#sf> ?sf .
?b1 <http://www.xlid-lexica.org/res#priorProbability> ?probability .
?sf <http://www.xlid-lexica.org/block> ?b2.
?b2 <http://www.xlid-lexica.org/sf#label> "iphone"@en . }
order by DESC(?probability) limit 100


Datasets:

http://people.aifb.kit.edu/lzh/xlid-lexica-datasets/


Publications:

  • xLiD-Lexica: Cross-lingual Linked Data Lexica [pdf]
    Lei Zhang, Michael Färber, Achim Rettinger
    LREC 2014: 2101-2105
    (If you use the above datasets in scientific works, please cite this paper)
  • Bridging the Gap between Cross-lingual NLP and DBpedia by Exploiting Wikipedia [pdf]
    Lei Zhang, Achim Rettinger, Steffen Thoma
    NLP&DBpedia 2014




(c) 2015-2016 Lei Zhang, Institute AIFB, KIT
By Lei Zhang, AIFB