Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Rettinger 

Supplementary Material for the SWJ Submission
Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Abstract: In recent years, several noteworthy large, cross-domain and openly available knowledge graphs (KGs) have been created. These include DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Although extensively in use, these KGs have not been subject to an in-depth comparison so far. In this survey, we provide data quality criteria according to which KGs can be analyzed and analyze and compare the above mentioned KGs. Furthermore, we propose a framework for finding the most suitable KG for a given setting.

The latest submission can be found at
http://www.semantic-web-journal.net/content/linked-data-quality-dbpedia-freebase-opencyc-wikidata-and-yago-0
The final print version of this article will be published soon.


1. KG Recommendation Framework

The KG Recommendation Framework as Excel Sheet can be used for selecting the most suitable KG for a given setting. Given the metric values regarding the KGs DBpedia, Freebase, OpenCyc, Wikidata, and YAGO (see our article), the user can weight the metrics according to his/her needs and get an indication which KGs might fit best.


2. Key Statistics

 

3. Quality Assessment

We provide our data both as tab-separated file (.tsv) and Excel file (.xlsx).

  • Data for mavail: Contains downtime and uptime measurements per KG

    • Availability of DBpedia:
    Availability of DBpdia

    • Availability of Freebase:
    Availability of Freebase

    • Availability of OpenCyc:
    Availability of OpenCyc
    • Availability of Wikidata:
    Availability of Wikidata

    • Availability of YAGO:
    Availability of YAGO


  • Data for msemTriple: Contains checked triples (semantic accuracy) about names of people per KG, and for dates and population only the errors (i.e., wrong birth dates in Wikidata and YAGO)
  • Data for msynLit: Contains checked literals (syntactic accuracy) for date of Birth, ISBN, population per KG
  • Data for mcCol: Contains considered combinations of classes and relations and their filling degrees for measuring column completness per KG
  • Data for mconClass: Contains all owl:disjointWith class restriction and their inconsistencies
  • Data for mconRelatFunctional: Contains owl:FunctionalProperty relation restriction and how often a relation is used more than once per resource
  • Data for mconRelatRange: Contains considered combination of relations and datatypes and how often the defined datatype is not the same on instance level
  • Data for mcPop: Contains gold standard (entites) that was checked and the result per KG
  • Data for mcSchema: Contains gold standard (classes, relations) that was checked and result per KG
  • Data for mDeref: Contains result w.r.t. dereferencing internal URIs by (s,p,o)-position and corresponding HTTP status codes
  • Data for mURIs: Containts result w.r.t. validity of external URIs and corresponding HTTP status codes



2. Gold Standard

We use the following gold standard regarding the metrics mcSchema, mcCol, and mcPop.
Note that "wn:" stands for the namespace http://wordnet-rdf.princeton.edu/wn31/
The gold standard is also contained in the data for mcPop.

Domain Main classes (subclasses in brackets) Relations (for mcCol)
Person Person (wn:100007846-n)
(Musician, Athlete, Writer, Politician)
Person: Date of birth, Place of birth, Gender, Parents (or Mother and Father)
Media Show (wn:100521313-n)
(Film, TV series)
Literary Composition (wn:106375736-n)
(Book, Newspaper, Magazine)
Musical Composition (wn:107051211-n)
(Album, Song)
Film: Director, Runtime, Release date
TV series: producer
Book: Author, ISBN, Number of Pages
Album: Artist, Tracklist
Organisation Company (wn:108074934-n)
(Bank, Airline)
Educational Institution (wn:108293263-n)
(University, School)
Social Groups (wn:107967506-n)
(Sports Club, Political Party)
Company, University: Location, Founding Year
Company: Revenue
Geography Topogaphy (wn:106132185-n)
(Lake, River, Mountain)
Geographical Area (wn:108693705-n)
(Country, Region, City)
Country: Capital, Area, GDP
City: Population, Geolocation
Biology Animal (wn:100015568-n)
(Mammal, Bird, Fish, Insect)
Plant (wn:100017402-n)
(Tree, Grass, Shrub)
Animal, Plant: Taxonomic Rank


Domain Main class Sub class Entity Short/long
Person Person Business man Carlos Slim s



Bill Gates s



Karl Albrecht l



Michael Otto l


Musician Elvis Presley s



Michael Jackson s



Peter Maffay l



Xavier Naidoo l


Athlete Michael Phelps s



Larisa Latynina s



Maria Riesch/Höfl-Riesch l



Magdalena Neuner l


Writer William Shakespeare s



Agatha Christie s



Tommy Jaud l



Dora Heldt l


Politican Barack Obama s



George W. Bush s



Stefan Mappus l



Günther Oettinger l
Media Show Film Avatar s



Titanic s



Friendship! l



Konfernz der Tiere/Animals United l


TV Series The Walking Dead s



Full House s



Tatort l



Alarm für Cobra 11 l

Literary Composition Book The Lord of the Rings (Tolkien) s



The Hobbit (Tolkien) s



Hummeldumm (Jaud) l



nothing-avail Wort zu Papa (Heldt) l


Magazine Time s



Newsweek s



TV14 l



TV Spielfilm l

Musical Composition Album M. Jackson - Thriller s



AC/DC - Back in Black s



Unheilig - Grosse Freiheit l



Gute Reise - Ich+Ich l
Organisation Company Bank JP Morgan Chase s



Bank of America s



Landesbank Baden-Württemberg l



BayernLB l


Airlines Delta Air Lines s



United Airlines s



Lufthansa l



Air Berlin l

Educational Institution University California Institute of Technology s



Harvard University s



LMU München / Ludwig Maximilian University of Munich l



University of Göttingen l

Social Groups Sports club Real Madrid s



FC Barcelona s



Hertha BSC l



FC Augsburg l


Political Party Republican Party (U.S) s



Democratic Party (U.S) s



SPD / Social Democratic Party of Germany l



CDU / Christian Democratic Union of Germany l
Geography Topography Lake Caspian Sea s



Lake Superior (US) s



Bodensee / Lake Constance l



Müritz l


River Nile s



Amazon River s



Rhine l



Elbe l


Mountain Mount Everest s



K2 s



Zugspitze l



Schneefernerkopf l


Country Russland s



Canada s



Vatican City l



Monaco l


City MexikoStadt s



Peking s



Stuttgart l



Karlsruhe l
Biology Animal Mammal Blue whale (Balaenoptera musculus) s



African bush elephant (Loxodonta africana) s



Nullarbor dwarf bettong (Bettongia pusilla (EX) l



Oriente cave rat (Boromys offella (EX) l


Bird King Pinguin (Aptenodytes patagonicus) s



Blue-and-yellow macaw (Ara ararauna) s



Bermuda saw-whet owl (Aegolius gradyi (EX) l



Mauritius blue pigeon (Alectroenas nitidissimus (EX) l


Fish Great white shark (Carcharodon carcharias) s



Red-bellied piranha (Pygocentrus nattereri) s



Ornate sleeper-ray (Electrolux addisoni (CR) l



Irrawaddy river shark (Glyphis siamensis (CR) l

Plant Tree Giant Sequoia (Sequoiadendron giganteum) s



Apple Tree (Malus domestica) s



Araucaria nemorosa (CR) l



Dacrydium guillauminii (CR) l


Grass Corn/Maize (Zea mays) s



Rye (Secale cereale) s



Cyperus flavoculmis (CR) l



Cyperus microumbellatus (CR) l


All provided data regarding the key statistics and quality assessment is also available as one single archive file here.
 






April 2016