Introduction

This notebook describes the first part of the understanding thanks research project, the overall goal of which was to explore both how the thanks feature is used and its impact on editor activity. In this notebook, we attempt to characterize the thanks feature.

Data and Analysis

Thanks Feature Usage Over Time

The chart below shows how the usage of the thanks feature has changed in the past few years.

  • Language = the Wikipedia Project the data was collected from
  • Thanks Givers 2018 = the number of editors who sent at least one thank between Jan-Jul 2018 (6-month timespan)
  • Thanks Givers 2016 = the number of editors who sent at least one thank between Jan-Jul 2016 (6 month-timespan)
  • % Thanks Givers 2018 = Thanks Givers 2018 / the number of editors between Jan-Jul 2018 * 100
  • % Thanks Givers 2016 = Thanks Givers 2016 / the number of editors between Jan-Jul 2016 * 100

Typically, Wikimedia defines an active editor in a month as one who has made 5+ edits. Because all people have the potential to receive thanks (even those who have made very few edits), we define an active editor as anybody who as made 1+ edits.

from IPython.display import Image
Image("figures/thanks-usage-rates.png")
#code to generate data in thanker-network.ipynb

Summary: As the table shows, even in languages where the number of active editors has decreased, the rate of thanks usage has increased. In other words, a greater percentage of people use the thanks feature now compared to two years ago.

Population of Editors Who Use the Thanks Feature

This graph employs the same definitions as above but uses data from the beginning of time (when the thanks feature was rolled out). The purpose is to show the percentage of editors in different communities that have been affected by the thanks feature. (Note: The thanks givers and thanks receivers are not mutually exclusive).

Image("figures/thank-users-population.png") #should be thanks-users-population.png
#code to generate data in thanker-network.ipynb

Summary: The data shows that the majority of active editors have never been touched by the thanks feature. However, the percentage of editors who have is sufficient for the feature to have had a measurable impact.

Percentage of Editors Responsible for some Percentage of Thanks

The table below shows the percentage of editors responsible for 80%/20% of thanks given.

  • Language = the Wikipedia Project the data was collected from
  • Es 20% Thanks = the percentage of editors responsible for 20% of the thanks given (timeframe is June 2017-June 2018)
  • Es 80% Thanks = the percentage of editors responsible for 80% of the thanks given (timeframe is June 2017-June 2018)
  • Mult Factor (multiplication factor) = Es 80% Thanks / Es 20% Thanks
  • Original Rank = the rank of the project when ordered by number of editors with 5+ edits per month (snapshot June 2017 at https://stats.wikimedia.org/EN/Sitemap.htm)
Image("figures/icdf-thanker-population.png")
#code to generate data in percent-editors-by-percent-thanks.ipynb

Summary: The data shows that there exists a small group of editors who thank disproportionately. We would expect there to be four times as many editors responsible for four times as many thanks, but the numbers are closer to two times as many editors for four times as many thanks. The order of the projects by multiplication factor does not seem to correspond to their original ranks, but it is possible that some trend would become apparent with more data.

Characterizations of Thanks Senders vs Receivers

This study is built on a previous paper from 2015. The goal is to look at all thanks in some timeframe (May 2018 in this case), take the average of the senders for some trait, and compare it to the average of the receivers for that same trait. We do this for two traits: total edit count and tenure (number of days since registration). The data for the top 20% of editors (those with the highest edit counts) is examined separately from the data for the bottom 20% of editors.

Image("figures/sr-novice-edits.png")
#code to generate data in senders-vs-receivers-stats.ipynb
Image("figures/sr-experienced-edits.png")
#code to generate data in senders-vs-receivers-stats.ipynb

Summary: The two graphs above make clear that thanks receivers on average have higher edit counts than thanks senders, meaning that thanks are generally sent "upwards". This could be reflective of more experienced editors typically having higher edit quality and thus receiving more thanks, but it could also just be because people with higher edit counts are statistically more likely to receive a thank.

Image("figures/sr-novice-tenure.png")
#code to generate data in senders-vs-receivers-stats.ipynb
Image("figures/sr-experienced-tenure.png")
#code to generate data in senders-vs-receivers-stats.ipynb

Summary: The two graphs above, which have tenure, not edit count, on the y-axis, uphold the previous trend of thanks being sent upwards, though to a lesser degree. Again, this is logically expected because editors who have been part of a project for longer tend to have higher edit counts, increasing the likelihood that they will receive a thank. Note: The division between novice and experienced for these last two graphs was based on edit count (even though the independent variable was tenure) in order to keep the groups consistent amongst all the sender-receiver graphs.

Average Number of Thanks Received

This is a study to determine the average number of thanks received in a year, a month, and a day.

  • Language = the Wikipedia Project the data was collected from
  • Sample = the sample of editors (by edit count) being examined
  • Thanks in Year = the average number of thanks received in a year (data from June 2017-June 2018)
  • Thanks in Month = the average number of thanks received per month (counting only months where at least one thank was received)
  • Thanks in Day = the average number of thanks received per day (counting only days where at least one thank was received)
Image('figures/thanks-avgs.png')
#code to generate data in thanks-timeframe.ipynb

Summary: The bottom 20% of editors received far fewer thanks than the top 20%. Also, there is a statistically significant difference between the average number of thanks a person receives per month and the average number of thanks a person receives per month if we count only months where they received at least one thank. This holds true for the average number of thanks a person receives per day as well.

Distribution of Thanks

The table below was constructed using a simple metric for determining how clustered thanks are.

  • Language = the Wikipedia Project the data was collected from
  • Sample = the sample of editors (by edit count) being examined
  • Timeframe = the unit of time being used to log when a thank occurred

Note: Dif = the difference between two samples in the number of months (or days) over which thanks are spread

  • Dif Actual = actual data - data with thanks at randomly assigned times
  • Dif Constant = data with thanks maximally spread - data with thanks at randomly assigned times
  • Dif Random = data with thanks at randomly assigned times - data with thanks at randomly assigned times

Note: If the setup of the graph needs more clarification, please see the code in thanks-timeframe.ipynb

Image("figures/thanks-timeframe.png")
#code to generate data in thanks-timeframe.ipynb

Summary: The data is less clustered than it would be if we assigned each thank to a random month, but more clustered than it would be if we spread thanks out as much as possible.

Thankers to Editors and Thanks to Edits

The first graph shows the ratio of thankers/editors and the second graph shows the ratio of thanks/editors (for a 10 project sample, though this data has been calculated for all projects)

For a full ranking, see the projects_by_thankers_ratio.csv file

Image("figures/thankers-to-editors.png")
#code to generate data in thanks-and-thankers-all-wikis.py
Image("../figures/thanks-to-editors.png")
#code to generate data in thanks-and-thankers-all-wikis.py
  • The coefficient of variation for the thanks-to-editors dataset is 188.15
  • The coefficient of variation for the thankers-to-editors dataset is 72.49
  • The mean for the thanks-to-editors dataset is 0.29 and the standard deviation is 0.54
  • The mean for the thankers-to-editors dataset is 0.03 and the standard deviation is 0.02

Summary: Although both datasets have a lot of variation, the thankers/editors data is decisively more consistent. This implies that projects with very different amounts of thanks sent per editor will have more similar percentages of editors involved in sending thanks.

Thanks Given by Edits

The goal of this study was to see which types of editors (experienced vs novice) the majority of thanks are coming from, both in absolute numbers and as a fraction of edit count.

Image("../figures/thanks-given-to-edits.png")
#code to generate data in thanks-by-editor-type.py
Image("../figures/thanks-given-to-edits-ratios.png")
#code to generate data in thanks-by-editor-type.py

Summary: The top 5% of editors (the ones with the highest edit counts) give the most thanks in absolute terms but the least thanks relative to their edit count.

Thanks Received by Edits

The goal of this study was to see which types of editors (experienced vs novice) the majority of thanks are going to, both in absolute numbers and as a fraction of edit count.

Image("../figures/thanks-received-to-edits.png")
#code to generate data in thanks-by-editor-type.py
Image("../figures/thanks-received-to-edits-ratios.png")
#code to generate data in thanks-by-editor-type.py