Abstract

Semantic search has attracted a large community of researchers, practitioners and industry people. As a result of its popularity, the term semantic search is used in different context with different meaning. Taking the variety of work published recently into account, semantic search is not one single type of application but rather, refers to a broad range of systems, which involve the use of semantics. In this tutorial, we aim to provide a comprehensive overview on the different types of semantic search systems, and discuss the differences in the techniques underlying them. Both the application of Semantic Web technologies to the IR problem and vice versa, the application of IR techniques to Semantic Web problem are covered by this tutorial. In particular, focus is given to four topics of semantic search which have attracted much interest recently. The first is one is Semantic- enabled Document Retrieval, i.e. the application of Semantic Web technologies to the IR problem. The second is Semantic Data Retrieval, which concerns with (the application of IR techniques to) the retrieval of semantic data. While the use of semantics is the essential theme in these two major components of the tutorial, Hybrid Search is a complementary part that illustrates the convergence of search paradigms. Last but not least, the user factor in search will be discussed. The aim of the tutorial is to shed some lights on semantic search and provoke ideas for future development.

Current Relevance

In recent years we have witnessed tremendous interest and substantial economic exploitation of semantic search technologies, both in academia and bussiness.

Recent advances in the field of semantic search from the semantic technologies research area have resulted in tools and standards that allow search and ranking in large metadata stores (data retrieval).

In parallel to these developments, in the past years we have also seen the emergence of important results in semantically enhancing traditional keyword-based IR models to improve the processes of textual and multimedia information retrieval (document retrieval)

An intermediate approach exist that is focus on the integration of textual and structured semantic data (hybrid search)

These different semantic search research lines create a need for broad audiences to acquire a solid understanding of current semantic search technologies.

Tutorial Description

Semantic search can be seen as a retrieval paradigm that is centered on the use of semantics. When a system incorporates the semantics entailed by the query and (or) the resources into the matching process, it essentially performs semantic search. By this definition, there is no single type of semantic search but rather a wide range of systems, employing and exploiting semantic models of varying expressivity.

Several commercial initiatives have also showed up in the market aiming to exploit semantics in order to enhance traditional keyword-based search paradigms. Among those initiatives we can highlight the appearance of search engines like: Hakia, PowerSet, FreeBase, AskMeKnow, Digger, Bibtext, wolfram alpha, etc

In this tutorial, participants will learn about the different models embraced under the umbrella of semantic search, test several queries over public available semantic search systems and discuss the advantages and drawbacks of this new paradigm, as well as its future directions. We will explain the theoretical background and give hands-on, step-by-step instructions on different semantic search systems

3.1 Aims and Learning Goals

The goal of the tutorial is to provide an overview of the different existing perspectives towards semantic search, with special focus in three different semantic search models: a) those ones focused on document retrieval, b) those ones focused on data retrieval and c) those ones focused on hybrid search. During the tutorial, several public available semantic search systems will be also presented in a hands-on session, so that the audience can discuss the capabilities and the advantages of these systems respect to traditional keyword-based search methodologies.

3.1 Target Audience

The tutorial is suited for anybody with a basic understanding of Information Retrieval and Semantic Web technologies. It is well suited for practitioners and researchers from adjacent fields who are seeking a self-contained, concise, and hands-on introduction to semantic search technologies. For experienced Semantic Web researchers, the tutorial will provide a framework for the development of a basic semantic search engine.

3.1 Presentation Method

We will use a combination of
* Presentations
* Hands-on exercises
* A final group project

3.1 Technical Requirements

All participants should bring their own computer. Respective software will be made available on this Web page prior to ASWC 2009.
Important: You will need Internet access to use the tools and to complete the exercises.

Outline and Schedule

09:00-11:00 Part 1

  • Overview and Motivation: What is semantic search (30 min)
  • Semantic enabled document retrieval. Models and systems (60 min)

11:00-10:30 Coffee Break

11:30-13:00 Part 2

  • Semantic data retrieval. Models and systems (60min)
  • Hybrid search. Models and systems (30min)

13:00-14:00 Lunch Break

14:00-15:30 Part 3

  • Interaction paradigms for semantic search (90min)
    • Query interfaces (45min)
    • Visualization of results (45min)

15:30-16:00 Coffee Break

16:00-18:00 Part 4

  • Exercises (60min)
  • Discussion of the results (30min)
  • Global Workshop discussion (30min)

Materials

5.1 Software

  • Hakia
  • Freebase
  • 5.3 Slides

    6.1 Thanh Tran Duc

    Webpage: http://sites.google.com/site/kimducthanh
    "Thanh Tran is currently leading the AIFB semantic search special interest group. He has worked as consultant and software engineer for IBM and Capgemini. His research interests are centered around next generation search applications on the Web, where topics range from formal models for web resources (KR), storage and query processing concepts for RDF and text (DB & IR) and user interfaces for web search. His work is published in numerous top-level conference proceedings (SIGMOD, ICDE, WWW, CIKM, ISWC) and earned the second prize at the Billion Triple Challenge 2008. He runs the series of workshops on semantic search and is currently working as researcher and project leader for AIFB for the EU project X-Media, and the German projects iGreen and CollabKloud."

    6.2 Haofen Wang

    Webpage: http://apex.sjtu.edu.cn/apex_wiki/whfcarter
    "Haofen is research associate and PhD student at the Apex Data & Knowledge Management Lab, Shanghai Jiao Tong University. His research interests include semantic data creation & integration, Semantic Web Data Indexing & Search and Query Interface & User Interaction for the Semantic Web. He has published several high‐quality papers and has served as program committee member and reviewer for various top conferences and journals on these topics. Haofen also successfully took charge of several joint research projects with IBM China Research Laboratory and Intel Research China. As one of the organizers, he has successfully held the second workshop on Semantic Search at WWW 2009 and the 2nd China Semantic Web Symposium (CSWS 2008)."

    6.3 Miriam Fernandez

    Webpage: http://kmi.open.ac.uk/people/member/miriam-fernandez
    "Dr Miriam Fernandez receives her BS, MSc (first student of promotion) and PhD (cum laude european mention) at Universidad Autonoma de Madrid, Spain. Her research interest are focused on Information Retrieval and Semanic Web, and more specifically on how semantic technologies can help to enhance current keyword-based retrieval models, including personalization and contextualization techniques. During her PhD period she performed several internships in academia (Knowledge Media Institute, UK, 6 months) and industry (Google Zurich, Switzerland, 6 months) where she continued her research in semantic search and search quality respectively. She has actively participate in several European projects (aceMedia, Mesh, XMedia) and produce publications in top-level conference proceedings(ECIR, ESWC, WWW, ICSC) and journals(TKDE, TCSVT). Miriam joined the Knowledge Media Institute (KMi) in April 2009, where she is working on semantic search models and evaluation methodologies."