Introduction
Short Definition and Example
Further Examples
Publications
Related work

This page serves as an entry point for our research efforts on bridge patterns. It provides a general introduction, basic explanations (definition, examples) and links to further material. For further information, or to share your comments, please contact Basil Ell (AIFB / KIT).

Bridge Patterns

Introduction

Bridge patterns connect human-understandable language and machine-understandable data. There exists a gap since due to the complexity, vagueness and ambiguousness of natural language, content expressed using language cannot easily be interpreted by machines. For the purpose of utilizing machines to solve complex problems, knowledge representation formalisms are created that enable to express entities, their properties and relations, so that machines can reason about these entities, their properties and their relations. Machines can easily access information encoded in these formalisms, but, due to its formality, for humans it is generally difficult to assess.

Bridging the gap would mean: i) making content hitherto only accessible to humans accessible to machines, and ii) making content hitherto only accessible to machines accessible to humans.

A large fraction of the knowledge available to humanity exists in the form of text. If this knowledge could be made accessible to machines by representing it using a knowledge representation formalism, machines could assess this plethora of knowledge. Machines could then be utilized for the tightly focused retrieval of knowledge, to reason on this data, e.g., for the purpose of deriving new knowledge, or to detect inconsistencies. Once a machine performed a task on this data, results can be transformed into text thus rendering the results easily accessible to humans.

We strive in reducing this gap by i) introducing the notion of a bridge pattern. A bridge pattern enables to transform a natural language sentence that belongs to a certain class of sentences in a certain language into a formal representation using the knowledge representation formalism RDF. A bridge pattern also enables to transform knowledge encoded in RDF into a sentence of a certain language. Moreover, ii) we provide an approach that automatically creates bridge patterns given a set of texts and an RDF knowledge base.

In more concrete terms, applying a bridge pattern to bridge the gap from a natural language sentence to an RDF representation can be referred to as Semantic Parsing, Information Extraction, Information Translation, or Natural Language Understanding. Bridging in the opposite direction can be referred to as Natural Language Generation or Text Generation. When having bridge patterns for multiple natural languages, e.g. English and German, applying a English bridge pattern to represent an English sentence as RDF and then applying a German bridge pattern to express this data as a German sentence can be referred to as Machine Translation. Translating content from sentences to RDF and back into sentences of the same language can be referred to as Paraphrasing.
Note that while bridge patterns can play a role in each of these processes, applying a bridge pattern is just one out of multiple tasks in this process.

Completely bridging the gap would mean that anything that can be expressed in a certain language can be expressed in RDF and everything that can be expressed in RDF can be expressed in that language. This is questionable due to several reasons:

  1. When applying bridge patterns for Information Extraction, the results can only be from the following classes 1) entities, 2) literals, 3) relations between two entities, and 4) a relation between an entity and a literal. New types of relations cannot be learned - a knowledge base cannot be populated on the schema level. Each bridge pattern is tied to a certain schema which renders the set of available schemata as the limiting factor of what can be added to the knowledge base.
  2. Each bridge pattern is applicable only to a certain class of sentences. For example a certain bridge pattern would only be applicable to sentences such as:
    • "A Descent into the Maelström" is a short story by Edgar Allan Poe.
    • "The New Lieutenant's Rap" is a short story by Stephen King.
    • "Odour of Chrysanthemums" is a short story by D. H. Lawrence.
    • "A Christmas Memory" is a short story by Truman Capote.
    Since theoretically there are infinitely many classes and practically the number of classes is very large, a very large number of bridge patterns is necessary. However, depending on a concrete use case, having several thousand bridge patterns may already be sufficient. Once the number of bridge patterns available to a system is very large it may become relevant to move from bridge patterns towards grammars.

Short Definition and Example

  • A bridge pattern is a tuple (sp, gp) where sp is a sentence pattern and gp is a graph pattern.
  • A sentence pattern (SP) is a string that consists of terminals, variables, and modifiers. Within an SP a (variable, modifier) tuple (v,m) is denoted as {{M(v|m)M}. \{M( and )M} serve as delimiters of a (variable, modifier) tuple.
  • A graph pattern is a set of triples patterns (s,p,o) where s is an identifier or variable, p is an identifier or variable, and o is a variable or a literal.
  • A modifier is a function applicable to the value of a variable v.
    Examples:
    • lcfirst - Sets the first char to lower case if that char is upper case.
    • -1r - Removes the rightmost char.
    • -1l - Removes the leftmost char.
    • rm() - If a string ends with a string in round braces, e.g. "Dublin (Ohio)", that part is cut off.
    • enInt_sep - Adds English thousands separators, e.g., 10,000.
    • id - Does not change the string

    The following picture shows a simple bridge pattern:
    The graph pattern can be transformed into a SPARQL query:
    Querying a knowledge base with that query results in the following graph:
    The result of verbalizing this graph given the bridge pattern is:
        Flash is a science-fiction novel by L. E. Modesitt published in 2004.
    Given this sentence, the bridge pattern could be applied for Information Extraction which leads to the same RDF graph.

    Further Examples

    The following bridge patterns have been induced from a parallel text-data corpus with the approach introduced in our INLG2014 publication. (Click on images to enlarge)

















    Publications

    1. Basil Ell and Andreas Harth. 2014. A language-independent method for the extraction of RDF verbalization templates. 8th International Conference on Natural Language Generation (INLG 2014). (pre-print, slides)

    Related Work

    In the near future we will add a more extensive list of related works and provide details on how these approaches relate to our work.

    • Chris Welty, James Fan, David Gondek, and Andrew Schlaikjer. 2010. Large scale relation detection. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 24--33. Association for Computational Linguistics.
    • Daniel Duma and Ewan Klein, 2013. Generating Natural Language from Linked Data: Unsupervised template extraction, pages 83--94. Association for Computational Linguistics, Potsdam, Germany.
    • Daniel Gerber and A-C Ngonga Ngomo. 2011. Bootstrapping the linked data web. In 1st Workshop on Web Scale Knowledge Extraction @ International Semantic Web Conference, volume 2011.
    • Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek. 2012. PATTY: A Taxonomy of Relational Patterns with Semantic Types. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

By Basil Ell and Andreas Harth, AIFB, 2014