Fairuse image usage on BLP at id.wikipedia

import pywikibot
import time
site = pywikibot.Site('id', 'wikipedia')

Load list of fairuse images

Notes:

  • List generated from PetScan: link
  • "fairuse.txt" contains around 30k lines of text, each line is the name of file
fileList = []
with open("fairuse.txt") as f:
    for line in f:
        if (len(line) > 1):
            fileList.append(line)

Find out list of articles that uses these files

pageList = []
# Warning! Long running script!
for fileName in fileList:
    file = pywikibot.FilePage(site, fileName)
    pageList += [(page, fileName) for page in site.imageusage(file, namespaces=0)]
print(len(pageList))
44960
## (optional) save this list, so we could revisit this list later on
with open("fairuse-usage2.txt", "w") as f:
    f.write("\n".join([page.title() + ',' + fileName for page, fileName in pageList]))

Find out list of categories these articles are in, find those inside "BLP" category

blpList = set()
# Another warning! Very slow script! Due to huge amount of API requests needed
for page, fileName in pageList:
    categories = site.pagecategories(page)
    for category in categories:
        if category.title(withNamespace=False).startswith("Orang hidup"):
            blpList.add((fileName, page, category))
print(len(blpList))
2847
# Print out one example
for fileName, page, category in blpList:
    print(fileName.strip(), page.title(), category.title())
    break
Berkas:Tjetjep-Muchtar-Solehbptcianjur.jpg Tjetjep Muchtar Soleh Kategori:Orang hidup berusia 64
with open("fairuse-blp3.txt", "w") as f:
    f.write("\n".join([fileName.strip() + ',' + page.title() + ',' + category.title() for fileName, page, category in blpList]))