In recent years, large repositories of structured knowledge, such as Wikipedia and Linked Open Data (LOD) sources including DBpedia, Freebase and YAGO etc., have become valuable resources for natural language processing, especially for the automatic aggregation of knowledge from textual data. One essential component, which leverages such knowledge bases (KBs), is the linking of words or phrases in specific text documents with elements from the KBs, which we call semantic annotation. At the same time, in order to achieve the goal that speakers of different languages have access to the same information, there is an impending need for systems that can help in overcoming language barriers by facilitating multilingual and cross-lingual access to information originally produced for a different culture and language. This poses new challenges to semantic annotation tools which typically are language dependent and link documents in one language to a KB grounded in the same language. Ultimately, the goal is to construct cross-lingual semantic annotation tools that can link words or phrases in unstructured text in one language to resources in the structured KBs in any other language or to language independent representations.
On one side, we have a knowledge base KB containing a set of entities, each of which has its description in language and the relations between these entities. On the other side, we have a document containing a set of name mentions in language L'. Cross-lingual semantic annotation is to link/annotate these name mentions contained in documents in language L' with their referent entities in KB in language L.
We annotated some sample data extracted from online newsfeed and social media using our service. Based on the annotated data modeled by RDF, we can answer complex questions regarding the data using SPARQL queries. (see some example queries)