SemSearch2010 - Semantic Search 2010 Workshop

News

2010-05-20: All talks held at SemSearch2010 can be watched at http://videolectures.net/www2010_raleigh/
2010-04-24: Results for the Evaluation Track are available and uploaded to the webpage!
2010-04-24: Camera-ready version of the papers are available and uploaded to the webpage!
2010-04-24: We will have a best paper award!
2010-04-08: Queries for the Evaluation Track are available! Please submit your results now!
2010-04-08: Submission deadline for entity search results extended to April the 17th!
2010-04-19: Updates Workshop Program
2010-04-22: Proceedings of the workshop will be published as part of the ACM International Conference Proceedings Series!

Important Dates

Deadline for standard paper submissions:
March 6th, 2010 (12.00 AM, GMT)

Notification of acceptance standard papers:
April 8th, 2010

Camera-ready versions of standard papers:
April 22nd, 2010

Optional deadline for Entity Search system description submissions:
April 17th, 2010 (12.00 AM, GMT)

Deadline for Entity Search Evaluation results:
April 17th, 2010 (12.00 AM, GMT)

Notification of acceptance for Entity Search system papers:
April 22nd, 2010

WWW'10 Conference:
April 26th-30th, 2010

Workshop Day: April 26th, 2010

Important Links

EasyChair space (authors/reviewers)

WWW conference website

Workshop Support

The SEALS initiative is sponsoring and supporting the evaluation of search tools. The goal of the SEALS project is to provide an independent, open, scalable, extensible and sustainable infrastructure (the SEALS Platform) that allows the remote evaluation of semantic technologies thereby providing an objective comparison of the different existing semantic technologies. This will allow researchers and users to effectively compare the available technologies, helping them to select appropriate technologies and advancing the state of the art through continuous evaluation.

Proceedings published by

Objectives

In recent years we have witnessed tremendous interest and substantial economic exploitation of search technologies, both at web and enterprise scale. However, the representation of user queries and resource content in existing search appliances is still almost exclusively achieved by simple syntax-based descriptions of the resource content and the information need such as in the predominant keyword-centric paradigm (i.e. keyword queries matched against bag-of-words document representation).

On the other hand, recent advances in the field of semantic technologies have resulted in tools and standards that allow for the articulation of domain knowledge in a formal manner at a high level of expressivity. At the same time, semantic repositories and reasoning engines have only now advanced to a state where querying and processing of this knowledge can scale to realistic IR scenarios.

In parallel to these developments, in the past years we have also seen the emergence of important results in adapting ideas from IR to the problem of search in RDF/OWL data, folksonomies, microformat collections or semantically tagged natural text. Common to these scenarios is that the search is focused not on a document collection, but on metadata (which may be possibly linked to or embedded in textual information). Search and ranking in metadata stores is another key topic addressed by the workshop.

As such, semantic technologies are now in a state to provide significant contributions to IR problems.

Challenges

In this context, challenges for Semantic Search research will include, among others:

How can semantic technologies be applied to the IR problems?
How to address scalability and effectiveness of data Web search (by applying IR technologies)?
How to allow web users to exploit the expressiveness of the semantic data on the Web? I.e. how to lower the technical barriers for users to ask complex questions and to interact with web data to obtain concrete answers for complex needs?
And most importantly, how can this new generation of search systems that successfully exploit semantics for IR or for data Web search can be evaluated and compared (with standard IR systems or semantic repositories)?

Program

Each presentation is 20 minutes + 5 minutes for questions.

09:00 - 10:30 Session 1

09:05 Invited Talk: Why users need semantic search by Barney Pell (Bing)
Abstract

Abstract: While users dependence on search continues to increase, user satisfaction is not improving. This is partly because search is hard, and partly because users are becoming more demanding and pushing search beyond the traditional scope of information retrieval. Our research reveals three key problems of search: imprecise results, need for query refinement, and need to support complex tasks and decisions. Semantic technologies can help address these problems by providing improvements to core search results and also through enabling richer user experiences such as faceted navigation, entity-centered experiences, and task completion and decision tools. Hide

10:05 Paraphrasing Invariance Coefficient: Measuring Para-Query Invariance of Search Engines
Tomasz Imielinski and Jinyun Yan

10:30-11:00 Coffee Break

11:00 - 12:15 Session 2

Using BM25F for Semantic Search
José R. Pérez-Agüera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias and Victor Fresno

Distributed Indexing for Semantic Search
Peter Mika

Dear Search Engine: What’s your opinion about...? - Sentiment Analysis for Semantic Enrichment of Web Search Results
Gianluca Demartini and Stefan Siersdorfer

12:15-2:00 Lunch

2:00-3:40 Session 3

Automatic Modeling of User's Real World Activities from the Web for Semantic IR
Yusuke Fukazawa and Jun Ota

The Wisdom in Tweetonomies: Acquiring Latent Conceptual Structures from Social Awareness Streams
Claudia Wagner and Markus Strohmaier

A Large-Scale System for Annotating and Querying Quotations in News Feeds
Jisheng Liang, Navdeep Dhillon and Krzysztof Koperski

Semantically Enabled Exploratory Video Search
Joerg Waitelonis, Harald Sack, Zalan Kramer and Johannes Hercher

3:40-4:00 Coffee Break

4:00-5:30 Session 4

Entity Search: Building Bridges between Two Worlds
Krisztian Balog, Edgar Meij and Maarten de Rijke

Methodology and Campaign Design for the Evaluation of Semantic Search Tools
Stuart Wrigley, Dorothee Reinhard, Khadija Elbedweihy, Abraham Bernstein and Fabio Ciravegna

Discussion on the Entity Search Track (40')

Topics of Interest

Semantic Search is defined through two main directions. First is Semantic-driven IR, the application of semantic technologies to the IR problem. The second is Semantic Data Search, which mainly deals with the retrieval of semantic data. Main topics of interest for the envisioned workshop contributions include (but are not limited to) the following:

Semantic-driven IR

Expressive Document Models
Knowledge Extraction for Building Expressive Document Representation
Matching and Ranking based on Expressive Document Representation
Infrastructure for Semantic-driven IR

Semantic Data Search

Crawling, Storage and Indexing of Semantic Data
Semantic Data Search and Ranking
Data Web Search: Search in Multi-Data-Source, Multi-Repository Scenarios
Dealing with Vague, Incomplete and Dirty Semantic Data
Infrastructure for Searching Semantic Data on the Web

Interaction Paradigms for Semantic Search

Natural Language Interfaces
Keyword-based Query Interfaces
Hybrid Query Interfaces (A Combination of NL, Keywords, Forms, Facets, and Formal Queries)
Visualization of Semantic Data and Expressive Document Representation on the Web

Evaluation of Semantic Search

Evaluation Methodologies for Semantic Search
Standard Datasets and Benchmarks for Semantic Search
Infrastructure for Semantic Search Evaluation

Evaluation for Entity Search Track

Our ultimate goal is to develop a benchmark, based on which semantic search systems can be compared and analysed in a systematic fashion. Clearly, semantics can be used for different tasks (document vs. data retrieval) and can be exploited throughout the search process (for more usable query construction, for better matching and ranking, for richer results presenation etc). Hence, such a benchmark shall enable the study of different aspects of semantic search systems.

For this workshop, we will intially focus on the aspects of matching and ranking in the semantic data search scenario. In particular, we aim to analyze the effectiveness, efficiency and robustness of those features of semantic search systems which are ready to be applied to the Web today: A large share of Web search queries issued today are about entities, i.e. are of the type entity search queries. There is a large and increasing amount of semantic data about entities on the Web. The research questions we aim to tackle are:

How well do semantic data search engines perform on the task of Entity Search on the Web?
What are the underlying concepts and techniques that make up the differences?

For answering these questions, we provide the following guidelines and support for evaluating entity search systems:

Queries: We provide a set of queries that are focused on the task of entity search. These queries represent a sample extracted from the Yahoo Web search query log. Every query is a plain list of keywords which refer to one particular entity. In other words, the queries ask for one particular entity (as opposed to a set of entity).

Some sample queries can be downloaded from this link: http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/samplequeries.
Access to the evaluation set of queries and thus participation in the evaluation requires the signing of a license agreement: http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/agreement.pdf.

The FINAL QUERIES for evaluation are now available at http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/finalqueries.

Data: We provide a corpus of datasets, which contain entity descriptions in the form of RDF. They represent a sample of Web data crawled from publicly available sources. For this evaluation, we use the Billion Triple Challenge 2009 dataset. Further information and detailed statistics can be found here: http://vmlion25.deri.ie/ The original Billion Triple Challenge 2009 dataset contains blank nodes. We will not deal with blank nodes in this evaluation and thus require participants to encode blank nodes according to the following rule: BNID map to http://example.org/URLEncode(BNID), where BNID is the blank node id. Since the blank node ids in that dataset are unique, this convention is sufficient to map blank nodes to obtain distinct URIs. Instead of encoding the blank nodes using this convention, participants can also download the following version of the Billion Triple Challenge 2009 dataset where blank nodes are have been already converted to URIs:
http://km.aifb.uni-karlsruhe.de/ws/dataset_semsearch2010/000-CONTENTS

Relevance Judgement: The search systems produce lists of at most 10 entities ordered by relevance. These results have to be drawn from data in the corpus. Results will be evaluated via the three point scale (0) Not Relevant, (1) Relevant and (3) Perfect Match. A perfect match is a description of a resource that matches the entity to be retrieved by the query. A relevant result is a resource description that is related to the entity, i.e. the entity is contained in the description of that resource. Otherwise, a resource description is not relevant.

In the current evaluation we only assess individual results and as they are found in the original data set. We do not assess the potential of semantic search systems for disambiguating and merging resources. In other words, only resources appearing in the original data set may be returned as results.

Evaluation Process: For participating, each system will have to run the provided queries on the corpus. The results have to be submitted in one file following the TREC format:
http://www.ir.iit.edu/~dagr/cs529/files/project_files/trec_eval_desc.htm
If needed, participants are allowed to submit results of up to three runs! The different design decisions and rationales for these three configurations might be explained in the system description. Please verify that your result file can be read with the TREC evaluation tool available at: http://trec.nist.gov/trec_eval/index.html

The assessment of the results will be performed manually using Amazon Mechanical Turk. Based on the relevance judgments, recall, precision, f-measure and the mean average precision will be computed, and used as the basis for comparing search systems' performance. Given permission of the participants, results of the assessment and the evaluation feedbacks will be made publicly available at the workshop's website.

The FINAL RESULTS for evaluation are now available at
PDF http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/finalresultsPDF.
Excel http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/finalresultsEXCEL.
Assessments http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/assess.
More information can be found in the system descriptions of the participants:

Submission 27: Sindice
Submission 28: Delaware
Submission 29: L3S
Submission 30: Yahoo!
Submission 31: UMass
Submission 32: KIT

The SEALS initiative is sponsoring the award for best paper concerning evaluation of search tools. The goal of the SEALS project is to provide an independent, open, scalable, extensible and sustainable infrastructure (the SEALS Platform) that allows the remote evaluation of semantic technologies thereby providing an objective comparison of the different existing semantic technologies. This will allow researchers and users to effectively compare the available technologies, helping them to select appropriate technologies and advancing the state of the art through continuous evaluation.

This evaluation campaign is discussed in the paper Halpin et al.: Evaluating Ad-Hoc Object Retrieval at IWEST2010.

Organizers

Marko Grobelnik, Jožef Stefan Institute, Ljubljana, Slovenia
Peter Mika, Yahoo! Research, Barcelona, Spain
Thanh Tran Duc, Institute AIFB, University of Karlsruhe (TH), Germany
Haofen Wang, Apex Lab, Shanghai Jiao Tong University, China

Program Committee

Bettina Berendt, Katholieke Universiteit Leuven, Belgium
Paul Buitelaar, DFKI Saarbruecken, Germany
Wray Buntine, NICTA Canberra, Australia
Pablo Castells, Universidad Autónonoma de Madrid, Spain
Gong Cheng, Southeast University, Nanjing, China
Mathieu d'Aquin, KMi, Open University, England
Miriam Fernandez, KMI, Open University, England
Blaz Fortuna, Jožef Stefan Institute, Slovenia
Norbert Fuhr, Universitaet Duisburg-Essen, Germany
Lise Getoor, University Maryland, USA
Rayid Ghani, Accenture Labs, USA
Peter Haase, Fluid Operations, Waldorf, Germany
Harry Halpin, University of Edinburgh, Scotland
Andreas Harth, Institute AIFB, Karlsruhe Institute of Technology, Germany
Michiel Hildebrand, Centre for Mathematics and Computer Science Amsterdam, Netherlands
Wei Jin, North Dakota State Univeristy, USA
Guenter Ladwig, Institute AIFB, Karlsruhe Institute of Technology, Germany
Yuzhong Qu, Nanjing University, Nanjing, China
Sergej Sizov, University of Koblenz-Landau, Germany
Kavitha Srinivas, IBM Research, Hawthorne, USA
Nenad Stojanovic, FZI Karlsruhe, Germany
Rudi Studer, Institute AIFB, University of Karlsruhe, Germany
Cao Hoang Tru, HCMC University of Technology, HCMC, Vietnam
Giovanni Tummarello, DERI, Galway, Ireland
Yong Yu, Apex Lab, Shanghai Jiao Tong University, China
Valentin Zacharias, FZI, Germany
Ilya Zaihrayeu, University of Trento, Italy
Hugo Zaragoza, Yahoo! Research Barcelona, Spain
Lei Zhang, IBM Research, China

Submission and Proceedings

For submissions, the following rules apply:

Full technical papers (March 6th): up to 10 pages in ACM format
Short position or demo papers (March 6th): up to 5 pages in ACM format

For the Entity Search Track at SemSearch, participants

can submit a short system description papers (April 17th): up to 5 pages in ACM format
have to submit evaluation results (April 17th): results in TREC format (UP TO 3 RUNS!!!)

The submission for the short system description is optional. Participants can register at the workshop and ask for a presentation slot without having submitted such a system description paper.
Submissions must be formatted using the WWW2010 templates available here. Submissions will be peer reviewed by three independent reviewers. Accepted papers will be presented at the workshop and included in the workshop proceedings. We will pursue a journal special issue with the topics of the workshop if we receive an appropriate number of high-quality submissions. Details on the proceedings and camera-ready formatting will be announced upon notification of the authors. Please use the following link to the submission system to submit your paper at Easychair Submission System for SemSearch10
For standard paper and system descriptions, the system accepts PDF. The evaluation results should be uploaded as ZIP or TGZ!

Contact

For news and discussions related to SemSearch and Evaluation at SemSearch, please register at http://tech.groups.yahoo.com/group/semsearcheval/. The organization committee can be reached using contact data available at their web pages (or semsearch10@easychair.org).

Semantic Search Workshop