Tasks using pywikibot and Wikidata

Let's beign writing simple scripts that can pull data from existing structured or semi-structured sources and put it into Wikidata using the pywikibot API.

Some possible tasks are:

  • Software versions from github - Get a list of releases of a software from GitHub and add this list to the corresponding Wikidata page.
  • Software versions from FTP - Many softwares host FTPs where they store their versions. This can be queried to add to Wikidata.
  • Population data - Get census data from a public source and add it to wikidata.

1. Software versions from github

There github releases are available for public viewing in the Github project's Releases page. Even if the project does not use github, they generally clone the project on a github repo which syncs all the commits, branches, tags, and releases. Hence, there is a lot of data on github whih can be used in Wikidata !

Related info to this task:

Examples with lots of versions: systemd (Q286124), Debian (Q7593), Linus kernel (Q14579)

Possible things to work on:

import pywikibot

WIKIDATA_ITEM = 'Q48464'
GITHUB_PAGE = 'https://github.com/joshumax/hurd'
def get_releases(link):
    link = link.replace('github.com', 'api.github.com/repos')
    link += ('' if link[-1] == '/' else '/') + 'releases'
    return link

def get_tags(link):
    link = link.replace('github.com', 'api.github.com/repos')
    link += ('' if link[-1] == '/' else '/') + 'tags'
    return link

def get_json(url):
    import json
    from urllib.request import urlopen
    response = urlopen(url)
    data = json.loads(response.read().decode('utf-8'))
    return data
print(get_releases(GITHUB_PAGE))
tags = get_json(get_tags(GITHUB_PAGE))
releases = get_json(get_releases(GITHUB_PAGE))
def get_tag_versions(tags):
    import datetime
    for tag in tags:
        print("Working on tag: ", tag['name'])
        commit_info = get_json(tag['commit']['url'])
        date = datetime.datetime.strptime(commit_info['commit']['author']['date'], '%Y-%m-%dT%H:%M:%SZ')
        yield {"name": tag['name'], "date": date}

def get_release_versions(releases):
    import datetime
    for release in releases:
        print("Working on release: ", release['name'])
        date = datetime.datetime.strptime(release['published_at'], '%Y-%m-%dT%H:%M:%SZ')
        yield {"name": release['name'], "date": date}
    
from pprint import pprint
pprint(list(get_tag_versions(tags)))
pprint(list(get_release_versions(releases)))

2. Software versions from FTP

Some organizations like Gnome host all their versions of a application on FTP. This allows us to get a single place from where various versions of different software can be scraped.

Related info to the task:

 

3. Software versions from Changelog

Use the CHANGELOG given by the software to find the software versions and update it.

Related info to the task:

Possible things to work on:

- Scribus - source -> target

3. Population data from Census

The population data is publicly available in most countries. This task aims to use this data and add it to the respective city/district/state/country pages in Wikidata and add the source and time when the data was taken from.

Related info to this task: