Querying Wikidata with SPARQL

An introduction at the Semantic Web Engineering lecture.
October 12, 2018. Zurich, Switzerland.
This notebook is licensed by CC-BY-SA
Cristina Sarasua (University of Zurich)

Introduction

Wikidata's data can be accessed in many different ways. Depending on what exactly you would like to access (e.g., item descriptions, or page edit history) you may:

  • query the knowledge base live using the RDF-based query language SPARQL
  • download a data dump (i.e., containing a snapshot of the data at a certain point in time)
  • query the knowledge base and its edit history live via the Wikidata's instance of the Media Wiki API
  • retrieve concrete item descriptions as Linked Data executing an HTTP GET request

In this hands-on we will focus on querying data items of the knowledge base using the SPARQL query language --- the language that is used to query Linked Data and Semantic Web data in general.

Wikidata Documentation

If you would like to look up further information about Wikidata, I would recommend you read the article at the Communications of the ACM journal, by Denny Vrandečić and Markus Krötzsch: Wikidata: a Free Collaborative, Knowledge Base, and the following information:

Wikidata's data (model and data access)

Querying Wikidata with SPARQL

Querying from Python

I won't showcase these due to time restrictions, but they are also useful tools to work with SPARQL and RDF data in Python:

When you try out queries, you often need to look up the names/IDs of items, properties, or maybe you need to look up the way qualifiers and references are stated. In that case, I highly recommend you to look at examples of item descriptions

E.g. https://www.wikidata.org/wiki/Q72 E.g. https://www.wikidata.org/wiki/Q11943

Tip! If you would like to keep up-to-date and discover new very sophisticated SPARQL queries that you can execute, you should not miss the updatey by Wikidata-Facts in Twitter! =) https://twitter.com/WikidataFacts

                #                    ###                                       ### 
                #       ###### ##### ###  ####      ####   ####  #####  ###### ### 
                #       #        #    #  #         #    # #    # #    # #      ### 
                #       #####    #   #    ####     #      #    # #    # #####   #  
                #       #        #            #    #      #    # #    # #          
                #       #        #       #    #    #    # #    # #    # #      ### 
                ####### ######   #        ####      ####   ####  #####  ###### ### 


Part I - SPARQL and Wikidata Query Service

The query service is a Web service that accesses Wikidata's data and can be used by humans through this GUI and by programs to execute data queries written in SPARQL.

https://query.wikidata.org

Exploring SPARQL features

Let's look at the kind of things we can do in SPARQL when we want to query data.

> Get items (ID and label) of a particular type

Example: query to retrieve the ID and label of train stations in Zurich

select ?item ?itemLabel
where{
  ?item wdt:P31 wd:Q55488 .
  ?item wdt:P131 wd:Q72 .
  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

What if someone described the station using a more specific type of item than train (e.g. dead-end station)? We can use property paths and ask for the items that have been defined as train stations or subclasses of train stations:

select ?item ?itemLabel
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

We can also order the results by e.g. the label.

`
select ?item ?itemLabel
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}
ORDER BY ?itemLabel

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Sometimes, when you ask for some information (e.g., opening date) it can happen that not all stations have that information. So this query will be restrictive and will give you only 16 results (those that have that information):

select ?item ?itemLabel ?openingDate
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  ?item wdt:P1619 ?openingDate .

  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

What you can do is to "relax" the condition and ask for the stations, and get the opening date (in case they have it). You will still get the other information in the cases where the station does not have the opening date.

select ?item ?itemLabel ?openingDate
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  OPTIONAL{?item wdt:P1619 ?openingDate .}

  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Maybe you would like to retrieve the train stations and bus stations. In that case you can use UNION.

`
select ?item ?itemLabel ?openingDate
where{
  {?item wdt:P31/wdt:P279* wd:Q55488 .} UNION {?item wdt:P31/wdt:P279* wd:Q494829.} 

   ?item wdt:P131 wd:Q72 .
  OPTIONAL{?item wdt:P1619 ?openingDate .}

  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

You can also count, limit and many more things!

> Get Wikipedia and Wikimedia Commons URLs

Example: retrieve the id, label of things located in neighbourhoods of Zurich, with their Wikipedia article and Wikimedia commons URL

SELECT ?quartier ?item ?itemLabel ?article ?image ?prop
{?item wdt:P131 ?quartier .
 ?quartier wdt:P31 wd:Q19644586 .
  OPTIONAL {?article schema:about ?item .
    ?article schema:isPartOf <https://en.wikipedia.org/>.
 ?item wdt:P18 ?image}

 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Federated queries

Example: query example from from https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federated_queries cited: ''find a value for TOID (P3120) from the UK Ordnance Survey for items that have a GSS code (2011) (P836) but no TOID (P3120):''

SELECT ?item ?itemLabel ?gss ?os_toid WHERE {
      ?item p:P836 ?gss_stmt .
    ?gss_stmt ps:P836 ?gss .
    FILTER NOT EXISTS { ?gss_stmt pq:P582 [] } .
    FILTER NOT EXISTS { ?item wdt:P3120 [] } .

    SERVICE <http://data.ordnancesurvey.co.uk/datasets/os-linked-data/apis/sparql> {
      ?os_toid <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ?gss
    }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Exploring features of the Wikidata Query Service

> Queries using / retrieving qualifiers and references --- two valuable features in Wikidata

Example: query to the population of the neighbourhoods of Zurich at a particular point in time, and the source of the information.

PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT ?itemLabel ?itemDescription ?item ?population ?time ?ref WHERE {
  ?item wdt:P625 ?coord.
  ?item wdt:P31 wd:Q19644586.
  ?item wdt:P131 wd:Q72.
  OPTIONAL {
    ?item p:P1082 ?statement.
    ?statement prov:wasDerivedFrom ?source.
    ?source pr:P854 ?ref.
    ?statement ps:P1082 ?population.
    ?statement pq:P585 ?time.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Display query results in a map

Example: query to show the mountains of Switzerland in a map, by height (meters above the sea level).

#defaultView:Map
SELECT ?item ?itemLabel ?coord ?height ?layer ?image 
WHERE { ?item wdt:P31 wd:Q8502. ?item wdt:P625 ?coord. 
       ?item wdt:P17 wd:Q39.  ?item wdt:P2044 ?height . 
       BIND( IF(?height < 1000, "<1000 metres", 
                IF(?height < 2000, "1000 - 2000 metres", 
                   IF(?height < 3000, "2000 - 3000 metres", 
                      IF(?height < 4000, "3000 - 4000 metres", "> 4000 metres")))) AS ?layer). 
       OPTIONAL {?item wdt:P18 ?image.} 

       SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
`

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Retrieve things around / in the surrounding area

Example: query to retrieve max. 100 libraries around the UZH Zentrum building, with a radius of 2km.

#defaultView:Map
SELECT ?thing ?location ?distance ?thingLabel WHERE {

    SERVICE wikibase:around { 
      ?thing wdt:P625 ?location . 
      bd:serviceParam wikibase:center "Point(8.548333 47.374722)"^^geo:wktLiteral .
      bd:serviceParam wikibase:radius "2" . 
      bd:serviceParam wikibase:distance ?distance .
    } 
    ?thing wdt:P31/wdt:P279* wd:Q7075

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?distance LIMIT 100
`

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Example: query to retrieve max. 100 fountains that we know have been added from a specific catalogue, the one maintained by the City of Zurich drinking water supply (https://www.wikidata.org/wiki/Q53629101)

#defaultView:Map
SELECT ?thing ?location ?distance ?thingLabel WHERE {

    SERVICE wikibase:around { 
      ?thing wdt:P625 ?location . 
      bd:serviceParam wikibase:center "Point(8.548333 47.374722)"^^geo:wktLiteral .
      bd:serviceParam wikibase:radius "2" . 
      bd:serviceParam wikibase:distance ?distance .
    } 
    ?thing p:P528 ?statement . ?statement pq:P972 wd:Q53629101 .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?distance LIMIT 100

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Display query results in a bubble-chart

Example: query to plot in a bubble chart the number of floods per country in Europe since 1980.

#defaultView:BubbleChart
SELECT ?country ?countryLabel (count(distinct ?item) as ?count) 
WHERE
{
  ?item wdt:P31/wdt:P279* wd:Q8068 .#wd:Q8065 .
  ?item wdt:P17 ?country .
  ?country wdt:P30 wd:Q46 .
  ?item wdt:P580 ?starttime .
  FILTER(?starttime >= "1980-00-00T00:00:00Z"^^xsd:dateTime)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
GROUP BY ?country ?countryLabel

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Part II - Importing and using SPARQL query results in your Jupyter-Python Notebook

# In case you don't have Gastrodon installed
#!pip install gastrodon
import matplotlib.pyplot as plt
import sys
from gastrodon import RemoteEndpoint,QName,ttl,URIRef,inline
import pandas as pd
pd.options.display.width=120
pd.options.display.max_colwidth=100
prefixes=inline("""
    @prefix wd: <http://www.wikidata.org/entity/> .
    @prefix wdt: <http://www.wikidata.org/prop/direct/> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
""").graph

endpoint=RemoteEndpoint(
    "http://query.wikidata.org/sparql" 
    ,prefixes=prefixes
)
count=endpoint.select("""
    SELECT (COUNT(*) AS ?count) { ?s ?p ?o .}
""").at[0,"count"]
count
5923171575
predicates=endpoint.select("""
    SELECT ?p (COUNT(*) AS ?count) { ?s ?p ?o . ?s wdt:P131 wd:Q72 } GROUP BY ?p ORDER BY DESC(?count)
""")
predicates
count
p
rdfs:label 16379
http://schema.org/description 15764
http://www.w3.org/2004/02/skos/core#altLabel 5955
wdt:P131 4229
http://www.wikidata.org/prop/P131 4229
wdt:P31 3993
http://www.wikidata.org/prop/P31 3993
http://wikiba.se/ontology#identifiers 3801
http://schema.org/version 3801
http://schema.org/dateModified 3801
http://wikiba.se/ontology#sitelinks 3801
http://wikiba.se/ontology#statements 3800
wdt:P17 3790
http://www.wikidata.org/prop/P17 3790
wdt:P625 3407
http://www.wikidata.org/prop/P625 3407
wdt:P1705 2510
http://www.wikidata.org/prop/P1705 2510
wdt:P1945 2508
http://www.wikidata.org/prop/P1945 2508
http://www.wikidata.org/prop/P1082 1629
wdt:P528 586
http://www.wikidata.org/prop/P528 586
wdt:P186 541
http://www.wikidata.org/prop/P186 541
http://www.wikidata.org/prop/P18 528
wdt:P18 515
http://www.wikidata.org/prop/P571 489
wdt:P571 487
wdt:P170 484
... ...
http://www.wikidata.org/prop/P1621 1
wdt:P1552 1
http://www.wikidata.org/prop/P1552 1
wdt:P1454 1
http://www.wikidata.org/prop/P1454 1
wdt:P452 1
http://www.wikidata.org/prop/P452 1
wdt:P3642 1
http://www.wikidata.org/prop/P3642 1
wdt:P950 1
wdt:P1005 1
wdt:P4146 1
wdt:P4702 1
wdt:P5490 1
http://www.wikidata.org/prop/direct-normalized/P950 1
http://www.wikidata.org/prop/P950 1
http://www.wikidata.org/prop/P1005 1
http://www.wikidata.org/prop/P4146 1
http://www.wikidata.org/prop/P4702 1
http://www.wikidata.org/prop/P5490 1
wdt:P2924 1
http://www.wikidata.org/prop/P2924 1
wdt:P51 1
http://www.wikidata.org/prop/P51 1
wdt:P1711 1
http://www.wikidata.org/prop/P1711 1
wdt:P1684 1
http://www.wikidata.org/prop/P1684 1
wdt:P730 1
http://www.wikidata.org/prop/P730 1

448 rows × 1 columns

mostused_predicates = predicates.head(100)
mostused_predicates.plot(kind='bar',fontsize=18,figsize=(100,100))
<matplotlib.axes._subplots.AxesSubplot at 0x7f2984bf9c50>