Automatic text understanding has been an unsolved research problem for many years. This partially results from the dynamic and diverging nature of human languages, which ultimately results in many different varieties of natural language. This variations range from the individual level, to regional and social dialects, and up to seemingly separate languages and language families.
However, in recent years there have been considerable achievements in data driven approaches to computational linguistics exploiting the redundancy in the encoded information and the structures used. Those approaches are mostly not language specific or can even exploit redundancies across languages.
This progress in cross-lingual technologies is largely due to the increased availability of multilingual data in the form of static repositories or streams of documents. In addition parallel and comparable corpora like Wikipedia are easily available and constantly updated. Finally, cross-lingual knowledge bases like DBpedia can be used as an Interlingua to connect structured information across languages. This helps at scaling the traditionally monolingual tasks, such as information retrieval and intelligent information access, to multilingual and cross-lingual applications.
From the application side, there is a clear need for such cross-lingual technology and services. Available systems on the market are typically focused on multilingual tasks, such as machine translation, and don't deal with cross-linguality. A good example is one of the most popular news aggregators, namely Google News that collects news isolated per individual language. The ability to cross the border of a particular language would help many users to consume the breadth of news reporting by joining information in their mother tongue with information from the rest of the world.
Invited speakers will give a 30 minutes talk + 10 minutes questions each. Papers will each get a 10 minutes talk + 5 minutes questions. All accepted papers will also be presented as posters (we recommend DIN A1 portrait format). The rest of the time will be allocated to poster sessions and discussions.
INVITED TALKS:
‣ Bill Dolan - Natural Language Processing Group - Microsoft Research - USA
‣ Ivan Titov - Machine Learning for Natural Language Processing - Saarland University - Germany
‣ Ryan McDonald - Google Research - USA
‣ Abe Hsuan - Irwin & Hsuan LLP - USA
SCHEDULE (preliminary):
07:30 - 09:05 |
‣ Invited talk: Bill Dolan. Modeling Multilingual Grounded Language. ‣ Paper presentation: Ivan Vulić, Wim De Smet, Jie Tang and Marie-Francine Moens. Probabilistic Topic Modeling in Multilingual Settings: A Short Overview of Its Methodology and Applications ‣ Paper presentation: Alistair Kennedy and Graeme Hirst. Measuring Semantic Relatedness Across Languages |
09:05 - 09:30 | Coffee Break |
09:30 - 10:30 | ‣ Paper presentation: Philipp Petrenz and Bonnie Webber. Label Propagation for Fine-Grained Cross-Lingual Genre Classification ‣ Paper presentation: Yuhong Guo and Min Xiao. Cross Language Text Classification via Multi-view Subspace Learning ‣ Paper presentation: Andrej Muhic, Jan Rupnik and Primoz Skraba. Cross-Lingual Document Retrieval through Hub Languages ‣ Paper presentation: Mikhail Kozhevnikov and Ivan Titov. Cross-lingual Bootstrapping for Semantic Role Labeling ‣ Poster spotlight: Rishabh Mehrotra, Dat Chu, Syed Aqueel Haider and Ioannis Kakadiaris. Towards Learning Coupled Representations for Cross Lingual Information Retrieval ‣ Group discussions and poster session I (open end) |
10:30 - 15:30 | Break |
15:30 - 17:05 | ‣ Invited talk: Ryan McDonald. Advances in Cross-Lingual Syntactic Transfer. ‣ Invited talk: Ivan Titov. Inducing Cross-Lingual Semantic Representations of Words, Phrases, Sentences and Events. ‣ Paper presentation: Stephan Gouws, Gj van Rooyen and Yoshua Bengio. Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models |
17:05 - 17:30 | Coffee Break |
17:30 - 18:30 | ‣ Invited talk: Abe Hsuan. "Et tu, ©ompute?" Or, Is Text Translation by Machines
an Act of Breach or Revolution? ‣ Group discussions and poster session II (open end) |
The workshop on cross-lingual technologies (xLiTe) offers a platform for discussing algorithms and applications for statistical analysis of language resources covering many languages.
The xLiTe workshop is aimed at techniques, which strive for flexibility making them applicable across languages and language varieties with less manual effort and manual labeled training data. Such approaches might also be beneficial for solving the pressing task of analyzing the continuously evolving natural language varieties that are not well formed. Such data typically originates from social media, like text messages, forum posts or tweets and often is highly domain dependent.
Ideal contributions cover one or more of the topics listed below:
‣ Unsupervised and weakly supervised learning methods for cross-lingual technologies
‣ Cross-lingual technologies beyond statistical machine translation
‣ Cross-lingual representations of linguistic structure
And cover cross-lingual tasks, such as:
‣ Information diffusion across the languages
‣ Cross-lingual document linking and comparison
‣ Cross-lingual topic modeling
‣ Cross-lingual information extraction
‣ Cross-lingual semantic distances
‣ Cross-lingual semantic parsing
‣ Cross-lingual disambiguation
‣ Cross-lingual semantic annotation
‣ Cross-lingual language resources and knowledge bases
SUBMISSION INSTRUCTIONS:
We suggest to keep the paper under 4 pages (NOT including references). For projects that require more room for descriptions, we encourage the authors to include details of the work as appendix and/or other supplementary materials. Please use the NIPS style files and formatting instructions. The submissions should include the authors' name and affiliation since the review process will not be double blind (use \nipsfinalcopy). Topics that were recently published or presented elsewhere are allowed, provided that the extended abstract mentions this explicitly; topics that were presented in non-machine-learning conferences are especially encouraged. Accepted submissions will be presented either as contributed talks or as posters.
There will be TWO SUBMISSION DEADLINES:
‣ The first round of reviews will be for papers submitted before September 16, 2012 11:59PM PST. Those papers will receive proper reviews (early notification Oct 7) and in case of a weak reject the chance for resubmitting a revised version before the late submission deadline. This early notification might also be helpful for people who require a USA visa.
‣ Papers submitted late (before Oct 21, 2012 11:59PM PST) will only get a quick review and a late notification on Oct 28, 2012.
In both cases use the Easychair submission site to upload your paper. We will pursue a journal special issue with the topics of the workshop if we receive an appropriate number of high-quality submissions.
‣ Achim Rettinger, AIFB, Karlsruhe Institute of Technology, Germany
‣ Xavier Carreras, Technical University of Catalunya, Spain
‣ Marko Grobelnik, Jozef Stefan Institute, Slovenia
‣ Juanzi Li, Tsinghua University, China
‣ Blaz Fortuna, Jozef Stefan Institute, Slovenia
PROGRAM COMMITTEE:
‣ Philipp Cimiano, Semantic Computing Group, University of Bielefeld
‣ David Mimno, Department of Computer Science, Princeton University, USA