4th International

Semantic Search Workshop

March 29, 2011

Located at the 20th Int. World Wide Web Conference WWW2011

Objectives

In recent years we have witnessed substantial exploitation of search technologies, both at web and enterprise scale. However, the representation of user queries and information in existing search appliances is still almost exclusively achieved by simple syntax-based descriptions (i.e. keyword queries matched against bag-of-words document representation). While these systems have shown to work well for many common search needs, they work on the basis of rough approximations and usually fail to address more complex tasks such as aggregation and information analytics. On the other hand, recent advances in the field of semantic technologies have resulted in tools and standards that allow for the articulation of domain knowledge at a high level of expressivity. Semantic repositories and reasoning engines have now advanced to a state where querying and processing of this knowledge can scale to large-scale scenarios. As such, semantic technologies are posed to provide significant contributions to IR problems. More expressive descriptions of resources are achieved through the representation of the resource content in terms of concepts and structured data (OWL, RDF). The recent media interest around Wolfram Alpha, PowerSet (acquired by Microsoft Bing) and Yahoo! SearchMonkey show the expectations regarding the impact of semantic search

The other way around, we have also seen the successful adoption of ideas from IR to the problem of search in semantic (Web) data, which is due to the increasing size of the Semantic Web. Popular examples include the Linking Open Data project, the large body of data in forms of Microformats and RDFa data associated with text. Common to these scenarios is that the search is focused not on a document collection, but on semantic data (which may be possibly linked to or embedded in textual information). Search and ranking large amount of semantic data on the Web is another key topic addressed by this workshop.

Challenges

In this context, challenges for Semantic Search research will include, among others:

How can semantic technologies be applied to the IR problems?
How to address scalability and effectiveness of data Web search (by applying IR technologies)?
How to allow web user to exploit the expressiveness of the semantic data on the Web? I.e. how to lower the technical barriers for users to ask complex questions and to interact with web data to obtain concrete answers for complex needs?
And most importantly, how can this new generation of search systems that successfully exploit semantics for IR or for data Web search can be evaluated and compared (with standard IR systems or semantic repositories)?

Topics of Interests

Semantic Search is defined through two main directions. First is Semantic-driven IR, the application of semantic technologies to the IR problem. The second is Semantic Data Search, which mainly deals with the retrieval of semantic data. Main topics of interest for the envisioned workshop contributions include (but are not limited to) the following:

Semantic-driven IR

Expressive Document Models
Knowledge Extraction for Building Expressive Document Representation
Matching and Ranking based on Expressive Document Representation
Infrastructure for Semantic-driven IR

Semantic Data Search

Crawling, Storage and Indexing of Semantic Data
Semantic Data Search and Ranking
Data Web Search: Search in Multi-Data-Source, Multi-Repository Scenarios
Dealing with Vague, Incomplete and Dirty Semantic Data
Infrastructure for Searching Semantic Data on the Web

Interaction Paradigms for Semantic Search

Natural Language Interfaces
Keyword-based Query Interfaces
Hybrid Query Interfaces (A Combination of NL, Keywords, Forms, Facets, and Formal Queries)
Visualization of Semantic Data and Expressive Document Representation on the Web

Evaluation of semantic search

Evaluation Methodologies for Semantic Search
Standard Datasets and Benchmarks for Semantic Search
Infrastructure for Semantic Search Evaluation

Program

Each presentation is 25 minutes + 5 minutes for questions.

09:00 - 10:30 Session 1

09:00 Welcome
09:15 Keynote 1: Anatomy of the long tail: on satisfying niche interests by Bo Pang (Yahoo! Research)
10:00 Algorithm for answer graph construction for keyword queries on RDF data [PDF]
Parthasarathy K, Sreenivasa Kumar P and Dominic Damien

10:30 - 11:00 Coffee Break

11:00 - 12:00 Session 2

11:00 Using Personalized PageRank for Keyword Based Sensor Retrieval [PDF]
Lorand Dali, Alexandra Moraru and Dunja Mladenic
11:30 RDF Visualization using a Three-Dimensional Adjacency Matrix [PDF]
Mario Arias Gallego, Javier D. Fernández, Miguel A. Martinez-Prieto and Pablo De La Fuente

12:00 - 13:30 Lunch

13:30 - 15:00 Session 3

13:30 Learning to Rank for Semantic Search [PDF]
Lorand Dali and Blaz Fortuna
14:00 Semantic Information Filtering - Beyond Collaborative Filtering [PDF]
Ivo Lašek and Peter Vojtáš
14:30 Two-layered architecture for peer-to-peer concept search [PDF]
Janakiram Dharanipragada, Fausto Giunchiglia, Harisankar Haridas and Uladzimir Kharkevich

15:00 - 15:30 Coffee Break

15:30 - 17:15 Session 4

15:30 Keynote 2: Integrating and Browsing Linked Data at Web Scale with SWSE by Andreas Harth (AIFB)
16:15 Discussion on Semantic Search evaluation
17:15 Close

Semantic Search Challenge

Building on the success of previous year's Semantic Search evaluation, we call for participation in the Semantic Search Challenge 2011, a competition that requires participants to answer queries of varying complexity based on a set of structured data collected from the Web. This year's competition will consists of two tracks. The first "Entity Search Track" consists of queries that refer to one particular entity. The second "List Search Track" consists of complex queries with multiple possible answers. The winners of each track will receive a prize of $500 dollars (sponsored by Yahoo!).

The results of the evaluation will be presented at the 4th International Semantic Search Workshop (SemSearch 2011), co-located with the World Wide Web Conference (WWW 2011) in Hyderabad, India. However, you don't need to attend the workshop to participate in the Challenge.

Datasets

Each track of the competition requires participants to rank objects in the same collection, but using different types of queries.

Collection

The collection is a sample of Linked Data crawled from publicly available sources. The dataset consists of a set of RDF triples with provenance information for each triple. This dataset has been previously made available as part of the Billion Triple Challenge 2009, see some statistics of the dataset and details of the format. We have modified this dataset, because the original Billion Triple Challenge 2009 dataset contains blank nodes. We will not deal with blank nodes in this evaluation and thus require participants to encode blank nodes according to the following rule: BNID map to http://example.org/URLEncode(BNID), where BNID is the blank node id. Since the blank node ids in that dataset are unique, this convention is sufficient to map blank nodes to obtain distinct URIs. Instead of encoding the blank nodes using this convention, participants can also download the following version of the Billion Triple Challenge 2009 dataset where blank nodes are have been already converted to URIs: Download dataset.

Query set #1
(Entity Search track)

For the competition in this track, we provide a set of queries that are focused on the task of entity search. These queries are a subset of the queries in the Yahoo! Search Query Tiny Sample dataset available under the Yahoo! Webscope. We selected from this dataset a number of entity queries, i.e. queries that refer to one particular entity. See sample queries.

Query set #2
(List Search track)

The goal of this track is select objects that match particular criteria. These queries have been hand-written by the organizing committee. See sample queries.

The format of these query sets is one query per line in plain text format using UTF-8 encoding.
The final queries will be posted when the competition begins on March 1st, 2011. The deadline for submitting results is the March 21st, 2011 . For further news and discussions related to SemSearch 2011 and the Semantic Search Challenge, please register at http://tech.groups.yahoo.com/group/semsearcheval/. For further details on this challenge, please refer to http://semsearch.yahoo.com/ (This site will go online on the 21st of February).

Organizers

Marko Grobelnik, Jožef Stefan Institute, Ljubljana, Slovenia
Peter Mika, Yahoo! Research, Barcelona, Spain
Thanh Tran Duc, Institute AIFB, University of Karlsruhe (TH), Germany
Haofen Wang, Apex Lab, Shanghai Jiao Tong University, China

Program Committee

Bettina Berendt, Katholieke Universiteit Leuven, Belgium
Pablo Castells, Universidad Autonoma de Madrid, Spain
Philipp Cimiano, Semantic Computing Group, Bielefeld University, Germany
Gong Cheng, Nanjing University, China
Mathieu d'Aquin, KMI, Open University, England
Miriam Fernandez, KMI, Open University, England
Blaz Fortuna, Jožef Stefan Institute, Slovenia
Norbert Fuhr, Universität Duisburg-Essen, Germany
Lise Getoor, University Maryland, USA
Peter Haase, Fluid Operations, Waldorf, Germany
Harry Halpin, University of Edinburgh, Scotland
Andreas Harth, Institute AIFB, Karlsruhe Institute of Technology, Germany
Michiel Hildebrand, Centre for Mathematics and Computer Science Amsterdam, Netherlands
Aidan Hogan, DERI, Galway, Ireland
Guenter Ladwig, Institute AIFB, Karlsruhe Institute of Technology, Germany
Axel Polleres, Deri, Galway, Ireland
Yuzhong Qu, Nanjing University, China
Danh Lephuoc, Deri, Galway, Ireland
Daniel Schwabe, Departamento de Informática, Brazil
Sergej Sizov, University of Koblenz-Landau, Germany
Rudi Studer, Institute AIFB, University of Karlsruhe, Germany
Kavitha Srinivas, IBM Research, Hawthorne, USA
Cao Hoang Tru, HCMC University of Technology, HCMC, Vietnam
Giovanni Tummarello, Deri, Galway, Ireland
Yong Yu, Apex Lab, Shanghai Jiao Tong University, China
Ilya Zaihrayeu, University of Trento, Italy
Hugo Zaragoza, Yahoo! Research Barcelona, Spain

Submission and Proceedings

For submissions, the following rules apply:

Full technical papers: up to 10 pages in ACM format
Short position or demo papers: up to 5 pages in ACM format

Submissions must be formatted using the WWW2011 templates.

Submissions will be peer reviewed by three independent reviewers. Accepted papers will be presented at the workshop and included in the workshop proceedings. We will pursue a journal special issue with the topics of the workshop if we receive an appropriate number of high-quality submissions. Details on the proceedings and camera-ready formatting will be announced upon notification of the authors. Please use the following link to the submission system to submit your paper: EasyChair Submission System for SemSearch11 at http://www.easychair.org/conferences/?conf=semsearch11

Contact

The organization committee can be reached using contact data available at their web pages. Workshop website at http://km.aifb.uni-karlsruhe.de/ws/semsearch11.

News

2011-2-15: Sample queries are posted!
2011-2-15: Semantic Search Challenge 2011 (sponsored by Yahoo!) is open

Important Dates

Deadline for paper submission:
February 26th, 2011 (12.00 AM, GMT)

Deadline for submitting results to the challenge:
March 21st, 2011 (12.00 AM, GMT)

Notification of acceptance:
March 14th, 2011

Camera-ready versions:
March 21st, 2011

WWW'11 Conference:
March 28th - April 1st, 2011

Workshop Day:
March 29, 2011

Quick Links

Workshop Support

SEALS

The SEALS initiative is sponsoring and supporting the evaluation of search tools. The goal of the SEALS project is to provide an independent, open, scalable, extensible and sustainable infrastructure (the SEALS Platform) that allows the remote evaluation of semantic technologies thereby providing an objective comparison of the different existing semantic technologies. This will allow researchers and users to effectively compare the available technologies, helping them to select appropriate technologies and advancing the state of the art through continuous evaluation.