This notebook describes the first part of the understanding thanks research project, the overall goal of which was to explore both how the thanks feature is used and its impact on editor activity. In this notebook, we attempt to characterize the thanks feature.
The chart below shows how the usage of the thanks feature has changed in the past few years.
Typically, Wikimedia defines an active editor in a month as one who has made 5+ edits. Because all people have the potential to receive thanks (even those who have made very few edits), we define an active editor as anybody who as made 1+ edits.
from IPython.display import Image
Image("figures/thanks-usage-rates.png") #code to generate data in thanker-network.ipynb
Summary: As the table shows, even in languages where the number of active editors has decreased, the rate of thanks usage has increased. In other words, a greater percentage of people use the thanks feature now compared to two years ago.
This graph employs the same definitions as above but uses data from the beginning of time (when the thanks feature was rolled out). The purpose is to show the percentage of editors in different communities that have been affected by the thanks feature. (Note: The thanks givers and thanks receivers are not mutually exclusive).
Image("figures/thank-users-population.png") #should be thanks-users-population.png #code to generate data in thanker-network.ipynb
Summary: The data shows that the majority of active editors have never been touched by the thanks feature. However, the percentage of editors who have is sufficient for the feature to have had a measurable impact.
The table below shows the percentage of editors responsible for 80%/20% of thanks given.
Image("figures/icdf-thanker-population.png") #code to generate data in percent-editors-by-percent-thanks.ipynb
Summary: The data shows that there exists a small group of editors who thank disproportionately. We would expect there to be four times as many editors responsible for four times as many thanks, but the numbers are closer to two times as many editors for four times as many thanks. The order of the projects by multiplication factor does not seem to correspond to their original ranks, but it is possible that some trend would become apparent with more data.
This study is built on a previous paper from 2015. The goal is to look at all thanks in some timeframe (May 2018 in this case), take the average of the senders for some trait, and compare it to the average of the receivers for that same trait. We do this for two traits: total edit count and tenure (number of days since registration). The data for the top 20% of editors (those with the highest edit counts) is examined separately from the data for the bottom 20% of editors.
Image("figures/sr-novice-edits.png") #code to generate data in senders-vs-receivers-stats.ipynb
Image("figures/sr-experienced-edits.png") #code to generate data in senders-vs-receivers-stats.ipynb
Summary: The two graphs above make clear that thanks receivers on average have higher edit counts than thanks senders, meaning that thanks are generally sent "upwards". This could be reflective of more experienced editors typically having higher edit quality and thus receiving more thanks, but it could also just be because people with higher edit counts are statistically more likely to receive a thank.
Image("figures/sr-novice-tenure.png") #code to generate data in senders-vs-receivers-stats.ipynb
Image("figures/sr-experienced-tenure.png") #code to generate data in senders-vs-receivers-stats.ipynb
Summary: The two graphs above, which have tenure, not edit count, on the y-axis, uphold the previous trend of thanks being sent upwards, though to a lesser degree. Again, this is logically expected because editors who have been part of a project for longer tend to have higher edit counts, increasing the likelihood that they will receive a thank. Note: The division between novice and experienced for these last two graphs was based on edit count (even though the independent variable was tenure) in order to keep the groups consistent amongst all the sender-receiver graphs.
This is a study to determine the average number of thanks received in a year, a month, and a day.
Image('figures/thanks-avgs.png') #code to generate data in thanks-timeframe.ipynb
Summary: The bottom 20% of editors received far fewer thanks than the top 20%. Also, there is a statistically significant difference between the average number of thanks a person receives per month and the average number of thanks a person receives per month if we count only months where they received at least one thank. This holds true for the average number of thanks a person receives per day as well.
The table below was constructed using a simple metric for determining how clustered thanks are.
Note: Dif = the difference between two samples in the number of months (or days) over which thanks are spread
Note: If the setup of the graph needs more clarification, please see the code in thanks-timeframe.ipynb
Image("figures/thanks-timeframe.png") #code to generate data in thanks-timeframe.ipynb
Summary: The data is less clustered than it would be if we assigned each thank to a random month, but more clustered than it would be if we spread thanks out as much as possible.
The first graph shows the ratio of thankers/editors and the second graph shows the ratio of thanks/editors (for a 10 project sample, though this data has been calculated for all projects)
For a full ranking, see the projects_by_thankers_ratio.csv file
Image("figures/thankers-to-editors.png") #code to generate data in thanks-and-thankers-all-wikis.py
Image("../figures/thanks-to-editors.png") #code to generate data in thanks-and-thankers-all-wikis.py
Summary: Although both datasets have a lot of variation, the thankers/editors data is decisively more consistent. This implies that projects with very different amounts of thanks sent per editor will have more similar percentages of editors involved in sending thanks.
The goal of this study was to see which types of editors (experienced vs novice) the majority of thanks are coming from, both in absolute numbers and as a fraction of edit count.
Image("../figures/thanks-given-to-edits.png") #code to generate data in thanks-by-editor-type.py
Image("../figures/thanks-given-to-edits-ratios.png") #code to generate data in thanks-by-editor-type.py
Summary: The top 5% of editors (the ones with the highest edit counts) give the most thanks in absolute terms but the least thanks relative to their edit count.
The goal of this study was to see which types of editors (experienced vs novice) the majority of thanks are going to, both in absolute numbers and as a fraction of edit count.
Image("../figures/thanks-received-to-edits.png") #code to generate data in thanks-by-editor-type.py
Image("../figures/thanks-received-to-edits-ratios.png") #code to generate data in thanks-by-editor-type.py