News
2010-05-20: All talks held at SemSearch2010 can be watched at http://videolectures.net/www2010_raleigh/2010-04-24: Results for the Evaluation Track are available and uploaded to the webpage!
2010-04-24: Camera-ready version of the papers are available and uploaded to the webpage!
2010-04-24: We will have a best paper award!
2010-04-08: Queries for the Evaluation Track are available! Please submit your results now!
2010-04-08: Submission deadline for entity search results extended to April the 17th!
2010-04-19: Updates Workshop Program
2010-04-22: Proceedings of the workshop will be published as part of the ACM International Conference Proceedings Series!
Important Dates
Deadline for standard paper submissions:
March 6th, 2010 (12.00 AM, GMT)
Notification of acceptance standard papers:
April 8th, 2010
Camera-ready versions of standard papers:
April 22nd, 2010
Optional deadline for Entity Search system description submissions:
April
17th, 2010 (12.00 AM, GMT)
Deadline for Entity Search Evaluation results:
April 17th, 2010
(12.00 AM, GMT)
Notification of acceptance for Entity Search system papers:
April
22nd, 2010
WWW'10 Conference:
April 26th-30th, 2010
Workshop Day: April 26th, 2010
Workshop Support
The SEALS initiative is sponsoring and supporting the evaluation of
search tools. The goal of the SEALS project is to provide an
independent, open, scalable, extensible and sustainable infrastructure
(the SEALS Platform) that allows the remote evaluation of semantic
technologies thereby providing an objective comparison of the different
existing semantic technologies. This will allow researchers and users to
effectively compare the available technologies, helping them to select
appropriate technologies and advancing the state of the art through
continuous evaluation.
Proceedings published by
Objectives
In recent years we have witnessed tremendous interest and substantial economic exploitation of search technologies, both at web and enterprise scale. However, the representation of user queries and resource content in existing search appliances is still almost exclusively achieved by simple syntax-based descriptions of the resource content and the information need such as in the predominant keyword-centric paradigm (i.e. keyword queries matched against bag-of-words document representation).
On the other hand, recent advances in the field of semantic technologies have resulted in tools and standards that allow for the articulation of domain knowledge in a formal manner at a high level of expressivity. At the same time, semantic repositories and reasoning engines have only now advanced to a state where querying and processing of this knowledge can scale to realistic IR scenarios.
In parallel to these developments, in the past years we have also seen the emergence of important results in adapting ideas from IR to the problem of search in RDF/OWL data, folksonomies, microformat collections or semantically tagged natural text. Common to these scenarios is that the search is focused not on a document collection, but on metadata (which may be possibly linked to or embedded in textual information). Search and ranking in metadata stores is another key topic addressed by the workshop.
As such, semantic technologies are now in a state to provide significant contributions to IR problems.
Challenges
In this context, challenges for Semantic Search research will include, among others:- How can semantic technologies be applied to the IR problems?
- How to address scalability and effectiveness of data Web search (by applying IR technologies)?
- How to allow web users to exploit the expressiveness of the semantic data on the Web? I.e. how to lower the technical barriers for users to ask complex questions and to interact with web data to obtain concrete answers for complex needs?
- And most importantly, how can this new generation of search systems that successfully exploit semantics for IR or for data Web search can be evaluated and compared (with standard IR systems or semantic repositories)?
Program
Each presentation is 20 minutes + 5 minutes for questions.09:00 - 10:30 Session 1
09:05 Invited Talk: Why users need semantic search by Barney Pell (Bing)Abstract
10:05 Paraphrasing Invariance Coefficient: Measuring Para-Query Invariance of Search Engines
Tomasz Imielinski and Jinyun Yan
10:30-11:00 Coffee Break
11:00 - 12:15 Session 2
Using BM25F for Semantic SearchJosé R. Pérez-Agüera, Javier Arroyo, Jane Greenberg, Joaquin Perez-Iglesias and Victor Fresno
Distributed Indexing for Semantic Search
Peter Mika
Dear Search Engine: What’s your opinion about...? - Sentiment Analysis for Semantic Enrichment of Web Search Results
Gianluca Demartini and Stefan Siersdorfer
12:15-2:00 Lunch
2:00-3:40 Session 3
Automatic Modeling of User's Real World Activities from the Web for Semantic IRYusuke Fukazawa and Jun Ota
The Wisdom in Tweetonomies: Acquiring Latent Conceptual Structures from Social Awareness Streams
Claudia Wagner and Markus Strohmaier
A Large-Scale System for Annotating and Querying Quotations in News Feeds
Jisheng Liang, Navdeep Dhillon and Krzysztof Koperski
Semantically Enabled Exploratory Video Search
Joerg Waitelonis, Harald Sack, Zalan Kramer and Johannes Hercher
3:40-4:00 Coffee Break
4:00-5:30 Session 4
Entity Search: Building Bridges between Two WorldsKrisztian Balog, Edgar Meij and Maarten de Rijke
Methodology and Campaign Design for the Evaluation of Semantic Search Tools
Stuart Wrigley, Dorothee Reinhard, Khadija Elbedweihy, Abraham Bernstein and Fabio Ciravegna
Discussion on the Entity Search Track (40')
Topics of Interest
Semantic Search is defined through two main directions. First is Semantic-driven IR, the application of semantic technologies to the IR problem. The second is Semantic Data Search, which mainly deals with the retrieval of semantic data. Main topics of interest for the envisioned workshop contributions include (but are not limited to) the following:
Semantic-driven IR- Expressive Document Models
- Knowledge Extraction for Building Expressive Document Representation
- Matching and Ranking based on Expressive Document Representation
- Infrastructure for Semantic-driven IR
- Crawling, Storage and Indexing of Semantic Data
- Semantic Data Search and Ranking
- Data Web Search: Search in Multi-Data-Source, Multi-Repository Scenarios
- Dealing with Vague, Incomplete and Dirty Semantic Data
- Infrastructure for Searching Semantic Data on the Web
- Natural Language Interfaces
- Keyword-based Query Interfaces
- Hybrid Query Interfaces (A Combination of NL, Keywords, Forms, Facets, and Formal Queries)
- Visualization of Semantic Data and Expressive Document Representation on the Web
- Evaluation Methodologies for Semantic Search
- Standard Datasets and Benchmarks for Semantic Search
- Infrastructure for Semantic Search Evaluation
Evaluation for Entity Search Track
Our ultimate goal is to develop a benchmark, based on which semantic search systems can be compared and analysed in a systematic fashion. Clearly, semantics can be used for different tasks (document vs. data retrieval) and can be exploited throughout the search process (for more usable query construction, for better matching and ranking, for richer results presenation etc). Hence, such a benchmark shall enable the study of different aspects of semantic search systems.
For this workshop, we will intially focus on the aspects of matching and ranking in the semantic data search scenario. In particular, we aim to analyze the effectiveness, efficiency and robustness of those features of semantic search systems which are ready to be applied to the Web today: A large share of Web search queries issued today are about entities, i.e. are of the type entity search queries. There is a large and increasing amount of semantic data about entities on the Web. The research questions we aim to tackle are:
- How well do semantic data search engines perform on the task of Entity Search on the Web?
- What are the underlying concepts and techniques that make up the differences?
For answering these questions, we provide the following guidelines and support for evaluating entity search systems:
Queries: We provide a set of queries that are focused on the task of entity search. These queries represent a sample extracted from the Yahoo Web search query log. Every query is a plain list of keywords which refer to one particular entity. In other words, the queries ask for one particular entity (as opposed to a set of entity).
Some sample queries can be downloaded from this link: http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/samplequeries.Access to the evaluation set of queries and thus participation in the evaluation requires the signing of a license agreement: http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/agreement.pdf.
The FINAL QUERIES for evaluation are now available at http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/finalqueries.
Data:
We provide a corpus of datasets, which contain entity descriptions in
the form of RDF. They represent a sample of Web data crawled from
publicly available sources. For this evaluation, we use the Billion
Triple Challenge 2009 dataset.
Further information and detailed statistics can be found here:
http://vmlion25.deri.ie/
The original Billion Triple Challenge 2009 dataset contains blank nodes.
We will not deal with blank nodes in this evaluation and thus require
participants to encode blank nodes according to the following rule: BNID
map to http://example.org/URLEncode(BNID), where BNID is the blank node
id. Since the blank node ids in that dataset are unique, this
convention is sufficient to map blank nodes to obtain distinct URIs.
Instead of encoding the blank nodes using this convention, participants
can also download the following version of the Billion Triple Challenge
2009 dataset where blank nodes are have been already converted to URIs:
http://km.aifb.uni-karlsruhe.de/ws/dataset_semsearch2010/000-CONTENTS
Relevance Judgement: The search systems produce lists of at most 10 entities ordered by relevance. These results have to be drawn from data in the corpus. Results will be evaluated via the three point scale (0) Not Relevant, (1) Relevant and (3) Perfect Match. A perfect match is a description of a resource that matches the entity to be retrieved by the query. A relevant result is a resource description that is related to the entity, i.e. the entity is contained in the description of that resource. Otherwise, a resource description is not relevant.
In the current evaluation we only assess individual results and as they are found in the original data set. We do not assess the potential of semantic search systems for disambiguating and merging resources. In other words, only resources appearing in the original data set may be returned as results.Evaluation Process:
For participating, each system will have to run the provided queries on
the corpus.
The results have to be submitted in one file following the TREC format:
http://www.ir.iit.edu/~dagr/cs529/files/project_files/trec_eval_desc.htm
If needed, participants are allowed to submit results of up to three
runs! The different design decisions and rationales for these three
configurations might be explained in the system description.
Please verify that your result file can be read with the TREC evaluation
tool available at:
http://trec.nist.gov/trec_eval/index.html
The assessment of the results will be performed manually using Amazon
Mechanical Turk.
Based on the relevance judgments, recall, precision, f-measure and the
mean average precision will be computed, and used as the basis for
comparing search systems' performance.
Given permission of the participants, results of the assessment and the
evaluation feedbacks will be made publicly available at the workshop's
website.
The FINAL RESULTS for evaluation are now available at
PDF http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/finalresultsPDF.
Excel http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/finalresultsEXCEL.
Assessments http://km.aifb.uni-karlsruhe.de/ws/semsearch10/Files/assess.
More information can be found in the system descriptions of the participants:
- Submission 27: Sindice
- Submission 28: Delaware
- Submission 29: L3S
- Submission 30: Yahoo!
- Submission 31: UMass
- Submission 32: KIT
The SEALS initiative is sponsoring the award for best paper concerning evaluation of search tools. The goal of the SEALS project is to provide an independent, open, scalable, extensible and sustainable infrastructure (the SEALS Platform) that allows the remote evaluation of semantic technologies thereby providing an objective comparison of the different existing semantic technologies. This will allow researchers and users to effectively compare the available technologies, helping them to select appropriate technologies and advancing the state of the art through continuous evaluation.
This evaluation campaign is discussed in the paper Halpin et al.: Evaluating Ad-Hoc Object Retrieval at IWEST2010.
Organizers
- Marko Grobelnik, Jožef Stefan Institute, Ljubljana, Slovenia
- Peter Mika, Yahoo! Research, Barcelona, Spain
- Thanh Tran Duc, Institute AIFB, University of Karlsruhe (TH), Germany
- Haofen Wang, Apex Lab, Shanghai Jiao Tong University, China
Program Committee
- Bettina Berendt, Katholieke Universiteit Leuven, Belgium
- Paul Buitelaar, DFKI Saarbruecken, Germany
- Wray Buntine, NICTA Canberra, Australia
- Pablo Castells, Universidad Autónonoma de Madrid, Spain
- Gong Cheng, Southeast University, Nanjing, China
- Mathieu d'Aquin, KMi, Open University, England
- Miriam Fernandez, KMI, Open University, England
- Blaz Fortuna, Jožef Stefan Institute, Slovenia
- Norbert Fuhr, Universitaet Duisburg-Essen, Germany
- Lise Getoor, University Maryland, USA
- Rayid Ghani, Accenture Labs, USA
- Peter Haase, Fluid Operations, Waldorf, Germany
- Harry Halpin, University of Edinburgh, Scotland
- Andreas Harth, Institute AIFB, Karlsruhe Institute of Technology, Germany
- Michiel Hildebrand, Centre for Mathematics and Computer Science Amsterdam, Netherlands
- Wei Jin, North Dakota State Univeristy, USA
- Guenter Ladwig, Institute AIFB, Karlsruhe Institute of Technology, Germany
- Yuzhong Qu, Nanjing University, Nanjing, China
- Sergej Sizov, University of Koblenz-Landau, Germany
- Kavitha Srinivas, IBM Research, Hawthorne, USA
- Nenad Stojanovic, FZI Karlsruhe, Germany
- Rudi Studer, Institute AIFB, University of Karlsruhe, Germany
- Cao Hoang Tru, HCMC University of Technology, HCMC, Vietnam
- Giovanni Tummarello, DERI, Galway, Ireland
- Yong Yu, Apex Lab, Shanghai Jiao Tong University, China
- Valentin Zacharias, FZI, Germany
- Ilya Zaihrayeu, University of Trento, Italy
- Hugo Zaragoza, Yahoo! Research Barcelona, Spain
- Lei Zhang, IBM Research, China
Submission and Proceedings
For submissions, the following rules apply:
- Full technical papers (March 6th): up to 10 pages in ACM format
- Short position or demo papers (March 6th): up to 5 pages in ACM format
- can submit a short system description papers (April 17th): up to 5 pages in ACM format
- have to submit evaluation results (April 17th): results in TREC format (UP TO 3 RUNS!!!)
Submissions must be formatted using the WWW2010 templates available here. Submissions will be peer reviewed by three independent reviewers. Accepted papers will be presented at the workshop and included in the workshop proceedings. We will pursue a journal special issue with the topics of the workshop if we receive an appropriate number of high-quality submissions. Details on the proceedings and camera-ready formatting will be announced upon notification of the authors. Please use the following link to the submission system to submit your paper at Easychair Submission System for SemSearch10
For standard paper and system descriptions, the system accepts PDF. The evaluation results should be uploaded as ZIP or TGZ!
Contact
For news and discussions related to SemSearch and Evaluation at SemSearch, please register at http://tech.groups.yahoo.com/group/semsearcheval/. The organization committee can be reached using contact data available at their web pages (or semsearch10@easychair.org).