WikiProject India vs Africa Quality Dynamics

After comparing WikiProject India articles to all English Wikipedia articles, I wanted to compare WikiProject India articles to another country. In this notebook, I looked at quality trends for WikiProject Africa and compared the trends to results for India. I also started to look at the quality trends of Nepal to provide a comparison to another south Asia country.

1: Load data

I queried the publically available quality dataset to obtain quality data for all articles tagged with the "WikiProject Africa" template and then loaded the necessary libraries and data.

%matplotlib inline
import requests, csv
import matplotlib.pyplot as plt
import time
import datetime
indiawikiresponse = requests.get('https://quarry.wmflabs.org/run/195707/output/0/csv?download=true', stream = True)
indiawikirows = csv.DictReader(indiawikiresponse.iter_lines(decode_unicode='utf8'))

africawikireponse = requests.get('https://quarry.wmflabs.org/run/196601/output/0/csv?download=true', stream = True)
africawikirows = csv.DictReader(africawikireponse.iter_lines(decode_unicode='utf8'))

nepalwikireponse = requests.get('https://quarry.wmflabs.org/run/197918/output/0/csv?download=true', stream = True)
nepalwikirows = csv.DictReader(nepalwikireponse.iter_lines(decode_unicode='utf8'))

2. Generate aggregated quality measures

I then calculated two aggreated quality measures: 1) mean weighted sum and 2) proportion of articles in each prediction class. The mean weighted sum was calculated by taking the weighted sum measurement incremented by 1 and dividing it by the total number of articles in the aggregate, max(n). I used the total number of articles in the aggregate at the last month (max(n)) as the denomintor for calculating each of the the quality measures.

indiawiki_ts = []
indiawiki_ws = []
indiawiki_stub = []
indiawiki_start = []
indiawiki_c = []
indiawiki_b = []
indiawiki_ga = []
indiawiki_fa = []
    
indiawiki_total = 134043

for row in indiawikirows:
    indiawiki_ts.append(time.mktime(datetime.datetime.strptime(row['timestamp'], "%Y%m%d%H%M%S").timetuple()))
    indiawiki_ws.append((float(row['weighed_sum'])+ 1)/(indiawiki_total))
    indiawiki_stub.append((float(row['stub_n']))/(indiawiki_total))
    indiawiki_start.append((float(row['start_n']))/(indiawiki_total))
    indiawiki_c.append((float(row['c_n']))/(indiawiki_total))
    indiawiki_b.append((float(row['b_n']))/(indiawiki_total))
    indiawiki_ga.append((float(row['ga_n']))/(indiawiki_total))
    indiawiki_fa.append((float(row['fa_n']))/(indiawiki_total))
africawiki_ts = []
africawiki_ws = []
africawiki_stub = []
africawiki_start = []
africawiki_c = []
africawiki_b = []
africawiki_ga = []
africawiki_fa = []

africawiki_total = 69622

for row in africawikirows:
    africawiki_ts.append(time.mktime(datetime.datetime.strptime(row['timestamp'], "%Y%m%d%H%M%S").timetuple()))
    africawiki_ws.append((float(row['weighted_sum']) + 1)/(africawiki_total))
    africawiki_stub.append((float(row['stub_n']))/(africawiki_total))
    africawiki_start.append((float(row['start_n']))/(africawiki_total))
    africawiki_c.append((float(row['c_n']))/(africawiki_total))
    africawiki_b.append((float(row['b_n']))/(africawiki_total))
    africawiki_ga.append((float(row['ga_n']))/(africawiki_total))
    africawiki_fa.append((float(row['fa_n']))/(africawiki_total))
nepalwiki_ts = []
nepalwiki_ws = []
nepalwiki_stub = []
nepalwiki_start = []
nepalwiki_c = []
nepalwiki_b = []
nepalwiki_ga = []
nepalwiki_fa = []

nepalwiki_total = 7507

for row in nepalwikirows:
    nepalwiki_ts.append(time.mktime(datetime.datetime.strptime(row['timestamp'], "%Y%m%d%H%M%S").timetuple()))
    nepalwiki_ws.append((float(row['weighted_sum']) + 1)/(nepalwiki_total))
    nepalwiki_stub.append((float(row['stub_n']))/(nepalwiki_total))
    nepalwiki_start.append((float(row['start_n']))/(nepalwiki_total))
    nepalwiki_c.append((float(row['c_n']))/(nepalwiki_total))
    nepalwiki_b.append((float(row['b_n']))/(nepalwiki_total))
    nepalwiki_ga.append((float(row['ga_n']))/(nepalwiki_total))
    nepalwiki_fa.append((float(row['fa_n']))/(nepalwiki_total))
plt.plot(indiawiki_ts,indiawiki_ws, '-', label ="WikProject India")
plt.plot(africawiki_ts,africawiki_ws, '--', label = "WikiProject Africa")
plt.plot(nepalwiki_ts,nepalwiki_ws, '-.', label = "WikiProject Nepal")
plt.xlabel('Time(s)')
plt.ylabel('Mean Weighted Sum')
plt.legend(loc = 'upper left')
<matplotlib.legend.Legend at 0x7ff78d32c828>

Overall, all three Wikiprojects show an increase in quaity growth. Compared to WikiProject India, the quality of WikiProject Africa articles are growing at a slower rate. While WikiProject Africa articles started with a higher quality rate, India articles quickly surpass their quality around 2006. WikiProject Africa articles also seem to take a slight dip around the 2012 mark. Wikiproject Nepal articles show a sudden increase around 2008 but stay signficantly below the quality of India and Africa articles.

fig, ax = plt.subplots(nrows=2,ncols=3)

plt.subplot(2,3,1)
plt.plot(indiawiki_ts,indiawiki_stub, '-', label = "Stub")
plt.plot(africawiki_ts,africawiki_stub, '--', label = "Stub")
plt.plot(nepalwiki_ts,nepalwiki_stub, '-.', label = "Stub")
plt.title("Stub")
plt.subplot(2,3,2)
plt.plot(indiawiki_ts,indiawiki_start, '-', label = "Start")
plt.plot(africawiki_ts,africawiki_start, '--', label = "Start")
plt.plot(nepalwiki_ts,nepalwiki_start, '-.', label = "Start")
plt.title("Start")
plt.subplot(2,3,3)
plt.plot(indiawiki_ts,indiawiki_c, '-', label = "C")
plt.plot(africawiki_ts,africawiki_c, '--', label = "C")
plt.plot(nepalwiki_ts,nepalwiki_c, '-.', label = "C")
plt.title("C")
plt.subplot(2,3,4)
plt.plot(indiawiki_ts,indiawiki_b, '-', label = "B")
plt.plot(africawiki_ts,africawiki_b, '--', label = "B")
plt.plot(nepalwiki_ts,nepalwiki_b, '-.', label = "B")
plt.title("B")
plt.subplot(2,3,5)
plt.plot(indiawiki_ts,indiawiki_ga, '-', label = "GA")
plt.plot(africawiki_ts,africawiki_ga, '--', label = "GA")
plt.plot(nepalwiki_ts,nepalwiki_ga, '-.', label = "GA")
plt.title("GA")
plt.subplot(2,3,6)
plt.plot(indiawiki_ts,indiawiki_fa, '-', label = "FA")
plt.plot(africawiki_ts,africawiki_fa, '--', label = "FA")
plt.plot(nepalwiki_ts,nepalwiki_fa, '-.', label = "FA")
plt.title("FA")

plt.tight_layout()
plt.plot(indiawiki_ts,indiawiki_stub, '-', label = "IndiaWiki Stub")
plt.plot(africawiki_ts,africawiki_stub, '--', label = "AfricaWiki Stub")
plt.plot(nepalwiki_ts,nepalwiki_stub, '-.', label = "NepalWiki Stub")
plt.title("Stub")
plt.legend(loc = 'upper left')
plt.xlabel('Time(s)')
plt.ylabel('Proportion of all "possible articles')
<matplotlib.text.Text at 0x7ff78cc88240>

The propostion of articles falling into the "Stub" class shows a unique trend for WikiProject Nepal articles. It appears WikiProject Nepal stub articles had a sudden and signficant jump in quality around 2008, surpassing the mean weighted sum of both WikiProject Africa and India articles.

5. Plot Quality Gap

from operator import sub
wsdiff = list(map(sub, africawiki_ws, indiawiki_ws))
plt.plot(africawiki_ts, wsdiff, '-')
plt.axhline(0, color = "black")
plt.xlabel('Time(s)')
plt.ylabel('WikiProject India and Africa Quality Gap')
<matplotlib.text.Text at 0x7ff78ccf0828>

The above plot show the quality of WikiProject Africa articles surpassed the quality of WikiProject India Articles until around 2006, when their quality suddenly decreased in comparison.