X-LiSA: Cross-lingual Semantic Annotation


In recent years, large repositories of structured knowledge, such as Wikipedia and Linked Open Data (LOD) sources including DBpedia, Freebase and YAGO etc., have become valuable resources for natural language processing, especially for the automatic aggregation of knowledge from textual data. One essential component, which leverages such knowledge bases (KBs), is the linking of words or phrases in specific text documents with elements from the KBs, which we call semantic annotation. At the same time, in order to achieve the goal that speakers of different languages have access to the same information, there is an impending need for systems that can help in overcoming language barriers by facilitating multilingual and cross-lingual access to information originally produced for a different culture and language. This poses new challenges to semantic annotation tools which typically are language dependent and link documents in one language to a KB grounded in the same language. Ultimately, the goal is to construct cross-lingual semantic annotation tools that can link words or phrases in unstructured text in one language to resources in the structured KBs in any other language or to language independent representations.


On one side, we have a knowledge base KB containing a set of entities, each of which has its description in language and the relations between these entities. On the other side, we have a document containing a set of name mentions in language L'. Cross-lingual semantic annotation is to link/annotate these name mentions contained in documents in language L' with their referent entities in KB in language L.


- http://km.aifb.kit.edu/services/text-annotation/
Input Parameters:
- source: the URL of a web page or raw text
- model: the NLP model used for mention detection (NER, POS and NGRAM)
- lang1: the language of input source information
- lang2: the language of output knowledge base resources
- kb: the knowledge base used for annotation (DBpedia or Wikipedia)
- XML output containing augmented text with links to a list of relevant resources in the knowledge base specified in the kb parameter
- The web page of the input URL with inserted annotations based on the resources in the knowledge base specified in the kb parameter


- Annotate a “Tagesschau” news page in German with both named and nominal entities based on English DBpedia
- Annotate a “CNN” news page in English with only named entities based on Chinese Wikipedia 
- Annotate raw text in Chinese using English DBpedia resources


(c) 2015-2016 Lei Zhang, Institute AIFB, KIT