Analysing the Results of the Abandonment Controlled Experiment

# Auxiliary Formatting Cell -------------------------------------------
from IPython.display import HTML
style = "<style>div.info { background-color: #DCD8D7;border-color: #dFb5b4; border-left: 5px solid #DCD8D7  ; padding: 0.5em;}</style>"
HTML(style)
# ---------------------------------------------------------------------

Notebook Metadata

Author Cristina Sarasua
Last Update 01.08.2018

Purpose Analyse the results of the controlled experiment run for the abandonment project, where the same bacth of HITS has been deployed several times in CrowdFlower(?), controlling for two variables:

  1. the reward (B1 reward \$0.10 per HIT B2 reward \$ 0.30 per HIT)
  2. the length (B1 3 docs \$ 0.15 per HIT B2 6 docs \$ 0.30 per HIT) 5 cents per document

Work Description

Reminder: The abandonment is defined in the paper as "workers previewing or starting a HIT and later on deciding to drop ot before copmletion, thus, giving up the reward.

Data

Data Files

  1. task_res JSON files from F8
  2. logs JSON files generated by Kevin et al. with logger

Data Explanation

  • Experiment 1: REWARD
    • A 0.10 per HIT 6 documents
    • B 0.30 per HIT 6 documents
    • C 0.30 per HIT (like B) with quality checks
  • Experiment 2: LENGTH

    • A 3 documents 6 documents
    • B 6 documents 6 documets
  • Ground truth for Topic 418

```SET 1 d1 -- LA010790-0121 -- REL d2 -- LA010289-0001 -- NOT REL d3 -- LA010289-0021 -- NOT REL d4 -- LA011190-0156 -- REL d5 -- LA010289-0060 -- NOT REL d6 -- LA012590-0067 -- REL

SET 2 d1 -- LA052189-0009 -- REL d2 -- LA052189-0189 -- NOT REL d3 -- LA052189-0196 -- NOT REL d4 -- LA052589-0174 -- REL d5 -- LA052590-0132 -- NOT REL d6 -- LA052590-0204 -- REL

EXP 1A and 2A --> SET 1 EXP 1B and 2B --> SET 2 ```

  • Each worker works only one ONE unit / HIT. Therefore, the worker-unit analysis does not apply here. We can do for e.g., time a worker-document analysis.

Findings

Data Quality / General Things
  • All workers were logged, no error
Abandonment
  • Lower reward --> more people abandoned. Shorter length --> more people abandoned. But, in both cases the difference is small. (See also "Abandonment Stats")

(A and B comparisons)

  • Comparison of means of sessionCount between the two populations, both in experiment 1 (1A and 1B) and experiment 2 (2A and 2B), indicate that the means are equal (or that there is not a significant difference between them). That is, increasing the reward or the length of documents (in these experiments) did not change the distribution. (See also "1. Work Done")
  • Comparison of means of number of messages between the two populations, both in experiment 1 (1A and 1B) and experiment 2 (2A and 2B), indicate that the means are equal (or that there is not a significant difference between them). That is, increasing the reward or the length of documents (in these experiments) did not change the distribution. (See also "1. Work Done")
  • Comparison of means of time invested in session between the pairs of populations, in 1A and 1B, the two populations are significantly different, while 2Aand 2B are not. (note: I updated this 1A and 1B comparison, as I corrected a variable name). (See also "2. Time Invested")

(C comparisons with quality checks)

  • Comparison of means of sessionCount between the pairs of populations (2B and 2C) indicates that the two samples are significantly different. Between 1B and 2C we don’t see a significant difference. (See also "1. Work Done")
  • Comparison of means of number of messages between the pairs of populations, (2B and 2C) indicates that the two samples are significantly different and (1B and 2C) are significantly different too. (See also "1. Work Done")
  • Comparison of means of time invested in session between the pairs of populations, (2B and 2C) indicates that the two samples are significantly different and (1B and 2C) are significantly different too. (See also "2. Time Invested")

Notes

  • I changed from the last version:
    • exp1 is exp2 and viceversa because the original task and log files had the inverted name. I

Discussion

  • Interpretation: TBC
  • Implications: TBC
  • The limitations of this analysis are ... TBC

Code

#!pip install statsmodels
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np
%matplotlib inline
import seaborn as sns
from statsmodels.sandbox.stats.multicomp import multipletests 
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
# Data Directories
dataDir = ""
taskDir = dataDir + "F8tasks/"
logDir = dataDir + "logs/"


#dataDir = "/Users/sarasua/Documents/RESEARCH/collab_Abandonment/controlledexperiment/results/"
#taskDir = dataDir + "task_res/"
#logDir = dataDir + "logs/"

# Concrete Task Files 
#original files are inverted, what is called LEN is REWARD and what is called REWARD is LEN! looking at the number of judgments we can see that
#REWARD
fExp1A = taskDir + "CONTROLLED_EXPFIXED_LEN_PAY10_DOCSET1.json"
fExp1B = taskDir + "CONTROLLED_EXPFIXED_LEN_PAY30_DOCSET2.json"
#LENGTH
fExp2A = taskDir + "CONTROLLED_EXPFIXED_REWARD_PAY15_DOCSET1.json" 
fExp2B = taskDir + "CONTROLLED_EXPFIXED_REWARD_PAY30_DOCSET2.json"

# Concrete Logged Events Files
#original files are inverted, what is called LEN is REWARD and what is called REWARD is LEN! looking at the number of judgments we can see that

#REWARD
fLog1A = logDir + "CONTROLLED_EXPFIXED_LEN_PAY10_DOCSET1.json"
fLog1B = logDir + "CONTROLLED_EXPFIXED_LEN_PAY30_DOCSET2.json"
fLog1C = logDir + "CONTROLLED_EXPFIXED_LEN_QUALITY_PAY30_DOCSET1.json" 
#LENGTH
fLog2A = logDir + "CONTROLLED_EXPFIXED_REWARD_PAY15_DOCSET1.json"
fLog2B = logDir + "CONTROLLED_EXPFIXED_REWARD_PAY30_DOCSET2.json"

Data Preprocessing

TODO: Data Preprocessing Tasks

  • Load data
  • Separate abandoned and submit people like Lei did
  • If we want to compare with the other batch of tasks (aka experiment in the wild) then we need to rescale the relevance judgements because in the first experiment it was a 4-level scale and in this one is 2)
  • Merging the files into a DF that I am interested in
  • Do the split of groups in a similar way - but how to analyse later?
  • In cross-check: did they ensure that in experiments A and B there are always disjoint workers? that would be in-between subject experiment design
exp1A = pd.read_json(path_or_buf = fExp1A, lines = True, encoding = 'utf-8', orient = "records")
exp1B = pd.read_json(path_or_buf = fExp1B, lines = True, encoding = 'utf-8', orient = "records")
exp2A = pd.read_json(path_or_buf = fExp2A, lines = True, encoding = 'utf-8', orient = "records")
exp2B = pd.read_json(path_or_buf = fExp2B, lines = True, encoding = 'utf-8', orient = "records")

log1A = pd.read_json(path_or_buf = fLog1A, lines = True, encoding = 'utf-8', orient = "records")
log1B = pd.read_json(path_or_buf = fLog1B, lines = True, encoding = 'utf-8', orient = "records")
log1C = pd.read_json(path_or_buf = fLog1C, lines = True, encoding = 'utf-8', orient = "records")
log2A = pd.read_json(path_or_buf = fLog2A, lines = True, encoding = 'utf-8', orient = "records")
log2B = pd.read_json(path_or_buf = fLog2B, lines = True, encoding = 'utf-8', orient = "records")
 
# Data Format & Content Exploration - with example exp1A 
exp1A.head()
agreement created_at data gold_pool id job_id judgments_count missed_count results state updated_at
0 1 2018-07-25 11:15:26 {'unit_id': '1'} NaN 1829425306 1286626 1 0 {'judgments': [{'id': 3893910391, 'created_at'... finalized 2018-07-25 12:08:15
1 1 2018-07-25 11:15:26 {'unit_id': '2'} NaN 1829425307 1286626 1 0 {'judgments': [{'id': 3893923120, 'created_at'... finalized 2018-07-25 12:16:04
2 1 2018-07-25 11:15:26 {'unit_id': '3'} NaN 1829425308 1286626 1 0 {'judgments': [{'id': 3893912122, 'created_at'... finalized 2018-07-25 12:09:02
3 1 2018-07-25 11:15:26 {'unit_id': '4'} NaN 1829425309 1286626 1 0 {'judgments': [{'id': 3894037230, 'created_at'... finalized 2018-07-25 13:29:20
4 1 2018-07-25 11:15:26 {'unit_id': '5'} NaN 1829425310 1286626 1 0 {'judgments': [{'id': 3893910638, 'created_at'... finalized 2018-07-25 12:08:20
# Create the colum workerid extracted from the dict
def extractUnitId(row):
    resDic = row['data']    
    unitId = resDic['unit_id']
    return unitId
exp1A['unit_id'] = exp1A.apply(extractUnitId,axis=1)
exp1B['unit_id'] = exp1B.apply(extractUnitId,axis=1)
exp2A['unit_id'] = exp2A.apply(extractUnitId,axis=1)
exp2B['unit_id'] = exp2B.apply(extractUnitId,axis=1)
# Create the colum workerid extracted from the dict
def extractWorkerId(row):
    resDic = row['results']
    workerId = resDic['judgments'][0]['worker_id']
    if(len(resDic['judgments']) > 1):
        print('One worker with more than one judgment! '+ str(workerId))
    
    return workerId
    
exp1A['worker_id'] = exp1A.apply(extractWorkerId,axis=1)
exp1B['worker_id'] = exp1B.apply(extractWorkerId,axis=1)
exp2A['worker_id'] = exp2A.apply(extractWorkerId,axis=1)
exp2B['worker_id'] = exp2B.apply(extractWorkerId,axis=1)
# Data Format & Content Exploration - with example log1A 
log1A.head()
judgments message pay server_time session_id step steps task_id time times worker_id
0 [None, -1, -1, -1, -1, -1, -1] Start 10 2018-07-25 11:27:40.008932 0.f03mqh9l 1 [1] 1286626.0 1532518056947 [None, 0, 0, 0, 0, 0, 0] 44031296
1 [None, -1, -1, -1, -1, -1, -1] Start 10 2018-07-25 11:28:02.781684 0.euhmtag5 1 [1] 1286626.0 1532518078966 [None, 0, 0, 0, 0, 0, 0] 43415523
2 [None, -1, -1, -1, -1, -1, -1] Start 10 2018-07-25 11:28:09.870873 0.ir8afh08 1 [1] 1286626.0 1532518090184 [None, 0, 0, 0, 0, 0, 0] 6335115
3 [None, -1, -1, -1, -1, -1, -1] Start 10 2018-07-25 11:28:13.668804 0.mlxv7rso 1 [1] 1286626.0 1532518080321 [None, 0, 0, 0, 0, 0, 0] 44643986
4 [None, -1, -1, -1, -1, -1, -1] Start 10 2018-07-25 11:28:18.000698 0.mfy9iuvp 1 [1] 1286626.0 1532518095276 [None, 0, 0, 0, 0, 0, 0] 42787196

Explanation by Kevin: var final_log = { “session_id”: session_id, // unique session id (to capture page refresh) “message”: String(message), // message that triggered the log -- see below “worker_id”: worker_id, // worker id “task_id”:task_id, // task_id “time”: Date.now(), //time of sending log “step”: step, // step into the task (i.e., 1,2,3,4... no_docs, paystep) “judgments”: judgments, //array of judgments -- start at 1 (0 is null) “times”: times, //array of times for the judgments -- start at 1 (0 is null) “steps”: steps, // array of steps in to the task; e.g., if the worker pressed back at step 2 the array is 1,2,1,2,3,... };

the message-set is:

  • nextButton
  • backButton
  • Final_OK --> task concluded succesfully
  • paying --> paying the worker
  • Start --> start task
  • MW Worker Rejected:’+worker_id --> worker blacklisted that tried to start the task
  • MWorker ok --> opposite of the last (not sure if present)
log1A.message.unique()
array(['Start', 'nextButton', 'backButton', 'Final_OK', 'paying'],
      dtype=object)
sessions = log1A[['worker_id', 'session_id']]
sessions.groupby(['worker_id']).size().unique()
array([ 1,  9, 11,  2, 10,  8, 19, 17,  6, 13, 14])
import json
def getJudgments(row):
    text_result = row['results']['judgments'][0]['data']['text_result']
    textrjson = json.loads(text_result)
    judgments = textrjson['judgments']
     # return pd.Series(judgments) OK but just the array expects the shape of the original data frame calling the apply
    return len(judgments)
# Helpers
def countJudgments(row):
    #return len(row['judgments'])
    return row['judgments_count'] # it's wrapped in the judgments - data
# Cross checks & Basic stats - units per people etc. Global and separating people? 
def checkTask(taskDf):
    
    # checking published config
    print('total number of HITs:' + str(len(taskDf)))
    # KO print('number of judgments per HIT' + str(taskDf.results.map(lambda x: len(x)).max()))   
    nulls = pd.isnull(taskDf)
    
    # missing values
       
    print('Empty value in data column: ' + str(len(nulls.loc[nulls['data'] == True])) + ' out of '+ str(len(nulls['data'])))
    print('Empty value in results column: ' + str(len(nulls.loc[nulls['results'] == True])) + ' out of '+ str(len(nulls['results'])))
    print('Empty value in created_at column: ' + str(len(nulls.loc[nulls['created_at'] == True])) + ' out of '+ str(len(nulls['created_at'])))
    print('Empty value in updated_at column: ' + str(len(nulls.loc[nulls['updated_at'] == True])) + ' out of '+ str(len(nulls['updated_at'])))
    print('Empty value in id column: ' + str(len(nulls.loc[nulls['id'] == True])) + 'out of '+ str(len(nulls['id'])))
    print('Empty value in job_id column: ' + str(len(nulls.loc[nulls['job_id'] == True])) + ' out of '+ str(len(nulls['job_id'])))
    print('Empty value in worker_id column: ' + str(len(nulls.loc[nulls['worker_id'] == True])) + ' out of '+ str(len(nulls['worker_id'])))
    print('Empty value in unit_id column: ' + str(len(nulls.loc[nulls['unit_id'] == True])) + ' out of '+ str(len(nulls['unit_id'])))
    
    
    
    # counts
    print('Total number of workers: ' + str(taskDf['worker_id'].nunique()))
    print('Total number of units - they are judgments: ' + str(taskDf['unit_id'].nunique())) 
    print('AVG Number of units per worker: '+ str(taskDf.groupby(['worker_id'])['unit_id'].nunique().mean()) + ' Max Number of units per worker: '+ str(taskDf.groupby(['worker_id'])['unit_id'].nunique().max()) )
    print('Number of judgments per worker: ' )
    judgmentsCount = pd.Series()
    # when returning an array it takes the length of the DF here! Pandas - print(len(taskDf.columns))
    judgmentsCount = taskDf.apply(getJudgments,axis=1)
    print(judgmentsCount.describe())
        
   
exp1A['results'][0]['judgments'][0]['data']['text_result'] # this one gives an array of 4?
'{"session_id":"0.23pxaqv8","message":"paying","worker_id":37101159,"task_id":1286626,"time":1532520481330,"step":8,"judgments":[null,"0","0","0","0","0","0"],"times":[null,149.016,15.209,12.168,25.372,63.684,20.503],"steps":[1,2,3,4,5,6,7,8],"pay":10}'
checkTask(exp1A) # is the title of the files misleading? From the number of judgments sent by workers it looks like exp1A is the one of the lenth
checkTask(exp1B)
checkTask(exp2A)
checkTask(exp2B)
total number of HITs:100
Empty value in data column: 0 out of 100
Empty value in results column: 0 out of 100
Empty value in created_at column: 0 out of 100
Empty value in updated_at column: 0 out of 100
Empty value in id column: 0out of 100
Empty value in job_id column: 0 out of 100
Empty value in worker_id column: 0 out of 100
Empty value in unit_id column: 0 out of 100
Total number of workers: 100
Total number of units - they are judgments: 100
AVG Number of units per worker: 1.0 Max Number of units per worker: 1
Number of judgments per worker: 
count    100.0
mean       7.0
std        0.0
min        7.0
25%        7.0
50%        7.0
75%        7.0
max        7.0
dtype: float64
total number of HITs:100
Empty value in data column: 0 out of 100
Empty value in results column: 0 out of 100
Empty value in created_at column: 0 out of 100
Empty value in updated_at column: 0 out of 100
Empty value in id column: 0out of 100
Empty value in job_id column: 0 out of 100
Empty value in worker_id column: 0 out of 100
Empty value in unit_id column: 0 out of 100
Total number of workers: 100
Total number of units - they are judgments: 100
AVG Number of units per worker: 1.0 Max Number of units per worker: 1
Number of judgments per worker: 
count    100.0
mean       7.0
std        0.0
min        7.0
25%        7.0
50%        7.0
75%        7.0
max        7.0
dtype: float64
total number of HITs:100
Empty value in data column: 0 out of 100
Empty value in results column: 0 out of 100
Empty value in created_at column: 0 out of 100
Empty value in updated_at column: 0 out of 100
Empty value in id column: 0out of 100
Empty value in job_id column: 0 out of 100
Empty value in worker_id column: 0 out of 100
Empty value in unit_id column: 0 out of 100
Total number of workers: 100
Total number of units - they are judgments: 100
AVG Number of units per worker: 1.0 Max Number of units per worker: 1
Number of judgments per worker: 
count    100.0
mean       4.0
std        0.0
min        4.0
25%        4.0
50%        4.0
75%        4.0
max        4.0
dtype: float64
total number of HITs:100
Empty value in data column: 0 out of 100
Empty value in results column: 0 out of 100
Empty value in created_at column: 0 out of 100
Empty value in updated_at column: 0 out of 100
Empty value in id column: 0out of 100
Empty value in job_id column: 0 out of 100
Empty value in worker_id column: 0 out of 100
Empty value in unit_id column: 0 out of 100
Total number of workers: 100
Total number of units - they are judgments: 100
AVG Number of units per worker: 1.0 Max Number of units per worker: 1
Number of judgments per worker: 
count    100.0
mean       7.0
std        0.0
min        7.0
25%        7.0
50%        7.0
75%        7.0
max        7.0
dtype: float64
def checkLog(logDf):
    # missing values
    nulls = pd.isnull(logDf)
    print('Empty value in data column: ' + str(len(nulls.loc[nulls['message'] == True])) + 'out of '+ str(len(nulls['message'])))
    print('Empty value in session_id column: ' + str(len(nulls.loc[nulls['session_id'] == True])) + 'out of '+ str(len(nulls['session_id'])))
    print('Empty value in task_id column: ' + str(len(nulls.loc[nulls['task_id'] == True])) + 'out of '+ str(len(nulls['task_id'])))
    print('Empty value in time column: ' + str(len(nulls.loc[nulls['time'] == True])) + 'out of '+ str(len(nulls['time'])))
    print('Empty value in times column: ' + str(len(nulls.loc[nulls['times'] == True])) + 'out of '+ str(len(nulls['times'])))
    print('Empty value in worker_id column: ' + str(len(nulls.loc[nulls['worker_id'] == True])) + 'out of '+ str(len(nulls['worker_id'])))
    print('Empty value in pay column: ' + str(len(nulls.loc[nulls['pay'] == True])) + 'out of '+ str(len(nulls['pay'])))
    print('Empty value in judgmens column: ' + str(len(nulls.loc[nulls['judgments'] == True])) + 'out of '+ str(len(nulls['judgments'])))

    # counts
    print('Total number of workers: ' + str(logDf['worker_id'].nunique()))
    print('Total number of tasks: ' + str(logDf['task_id'].nunique())) # task = unit
    print('AVG Number of sessions per worker: '+ str(logDf.groupby(['worker_id'])['session_id'].nunique().mean()) + ' Max Number of sessions per worker: '+ str(logDf.groupby(['worker_id'])['session_id'].nunique().max()) )
    print('AVG Number of tasks per worker: '+ str(logDf.groupby(['worker_id'])['task_id'].nunique().mean()) + ' Max Number of tasks per worker: '+ str(logDf.groupby(['worker_id'])['task_id'].nunique().max()) )
checkLog(log1A)
checkLog(log1B)
checkLog(log1C)
checkLog(log2A)
checkLog(log2B)
Empty value in data column: 0out of 1089
Empty value in session_id column: 0out of 1089
Empty value in task_id column: 1out of 1089
Empty value in time column: 0out of 1089
Empty value in times column: 0out of 1089
Empty value in worker_id column: 0out of 1089
Empty value in pay column: 0out of 1089
Empty value in judgmens column: 0out of 1089
Total number of workers: 207
Total number of tasks: 1
AVG Number of sessions per worker: 1.1159420289855073 Max Number of sessions per worker: 6
AVG Number of tasks per worker: 0.9951690821256038 Max Number of tasks per worker: 1
Empty value in data column: 0out of 1146
Empty value in session_id column: 0out of 1146
Empty value in task_id column: 1out of 1146
Empty value in time column: 0out of 1146
Empty value in times column: 0out of 1146
Empty value in worker_id column: 0out of 1146
Empty value in pay column: 0out of 1146
Empty value in judgmens column: 0out of 1146
Total number of workers: 178
Total number of tasks: 1
AVG Number of sessions per worker: 1.1629213483146068 Max Number of sessions per worker: 4
AVG Number of tasks per worker: 0.9943820224719101 Max Number of tasks per worker: 1
Empty value in data column: 0out of 3592
Empty value in session_id column: 0out of 3592
Empty value in task_id column: 1out of 3592
Empty value in time column: 0out of 3592
Empty value in times column: 0out of 3592
Empty value in worker_id column: 1out of 3592
Empty value in pay column: 0out of 3592
Empty value in judgmens column: 0out of 3592
Total number of workers: 318
Total number of tasks: 1
AVG Number of sessions per worker: 1.8962264150943395 Max Number of sessions per worker: 210
AVG Number of tasks per worker: 1.0 Max Number of tasks per worker: 1
Empty value in data column: 0out of 790
Empty value in session_id column: 0out of 790
Empty value in task_id column: 0out of 790
Empty value in time column: 0out of 790
Empty value in times column: 0out of 790
Empty value in worker_id column: 0out of 790
Empty value in pay column: 0out of 790
Empty value in judgmens column: 0out of 790
Total number of workers: 209
Total number of tasks: 1
AVG Number of sessions per worker: 1.1961722488038278 Max Number of sessions per worker: 6
AVG Number of tasks per worker: 1.0 Max Number of tasks per worker: 1
Empty value in data column: 0out of 1129
Empty value in session_id column: 0out of 1129
Empty value in task_id column: 0out of 1129
Empty value in time column: 0out of 1129
Empty value in times column: 0out of 1129
Empty value in worker_id column: 0out of 1129
Empty value in pay column: 0out of 1129
Empty value in judgmens column: 0out of 1129
Total number of workers: 202
Total number of tasks: 1
AVG Number of sessions per worker: 1.2227722772277227 Max Number of sessions per worker: 8
AVG Number of tasks per worker: 1.0 Max Number of tasks per worker: 1
def checkTaskJobJointly(taskDf, logDf):
    
    abandonedDf = logDf[~logDf['worker_id'].isin(taskDf['worker_id'])]
    completedDf = logDf[logDf['worker_id'].isin(taskDf['worker_id'])]
    
    # all the answers in the task completion report are also in the log data set
    print('Number of people who abandoned: ' + str(len(logDf['worker_id'][~logDf['worker_id'].isin(taskDf['worker_id'])].unique())) ) #+ ' and they are IDs: '+  str(logDf['worker_id'][~logDf['worker_id'].isin(taskDf['worker_id'])])
    print('Number of people who submitted: ' + str(len(logDf['worker_id'][logDf['worker_id'].isin(taskDf['worker_id'])].unique())) ) #+ ' and they are IDs: '+  str(logDf['worker_id'][logDf['worker_id'].isin(taskDf['worker_id'])])
    print('Number of people who were not logged: ' + str(len(taskDf['worker_id'][~taskDf['worker_id'].isin(logDf['worker_id'])].unique())) )
    print('*Total number of workers in Task*: '+ str(taskDf['worker_id'].nunique()))
    print('*Total number of workers in Log*: '+ str(logDf['worker_id'].nunique()))
    
    return abandonedDf, completedDf
   
    

Abandonment Stats

print('--- Experiment 1  ------------------------')
print('--- (A) ------------------------')
aban_1A, complet_1A = checkTaskJobJointly(exp1A, log1A)
print('--- (B) ------------------------')
aban_1B, complet_1B = checkTaskJobJointly(exp1B, log1B)
print('--- (C) ------------------------')
aban_1C, complet_1C = checkTaskJobJointly(exp1B, log1C) 
print('--- Experiment 2  ------------------------')
print('--- (A) ------------------------')
aban_2A, complet_2A = checkTaskJobJointly(exp2A, log2A)
print('--- (B) ------------------------')
aban_2B, complet_2B = checkTaskJobJointly(exp2B, log2B)
print('--- (C) ------------------------')
aban_2C, complet_2C = checkTaskJobJointly(exp2B, log1B)
--- Experiment 1  ------------------------
--- (A) ------------------------
Number of people who abandoned: 107
Number of people who submitted: 100
Number of people who were not logged: 0
*Total number of workers in Task*: 100
*Total number of workers in Log*: 207
--- (B) ------------------------
Number of people who abandoned: 78
Number of people who submitted: 100
Number of people who were not logged: 0
*Total number of workers in Task*: 100
*Total number of workers in Log*: 178
--- (C) ------------------------
Number of people who abandoned: 292
Number of people who submitted: 27
Number of people who were not logged: 73
*Total number of workers in Task*: 100
*Total number of workers in Log*: 318
--- Experiment 2  ------------------------
--- (A) ------------------------
Number of people who abandoned: 109
Number of people who submitted: 100
Number of people who were not logged: 0
*Total number of workers in Task*: 100
*Total number of workers in Log*: 209
--- (B) ------------------------
Number of people who abandoned: 102
Number of people who submitted: 100
Number of people who were not logged: 0
*Total number of workers in Task*: 100
*Total number of workers in Log*: 202
--- (C) ------------------------
Number of people who abandoned: 177
Number of people who submitted: 1
Number of people who were not logged: 99
*Total number of workers in Task*: 100
*Total number of workers in Log*: 178

Building the 4 groups of people:

Focus is on the log files, filtering in one way or the other.

# Get the two subgroups for abandoned workes, who either abandones right away or abandoned after restarting -- more than one session
def abandSpec(df):
    # (!!) Pandas passes through the first twice
    dfG = df.groupby(['worker_id'])
    abanA = dfG.filter(lambda x: len(x['session_id'].unique()) == 1)
    abanB = dfG.filter(lambda x: len(x['session_id'].unique()) > 1)
    return abanA,abanB
# Get the two subgroups for completed workes, who either submitted answers right away or submitted after restarting -- more than one session
# Coded in a different method for extensibility reasons
def completSpec(df):
    # (!!) Pandas passes through the first twice
    dfG = df.groupby(['worker_id'])
    complA = dfG.filter(lambda x: len(x['session_id'].unique()) == 1)
    complB = dfG.filter(lambda x: len(x['session_id'].unique()) > 1)
    return complA,complB
    
    
# Get all the concrete subsets for all versions of the two controlled experiments.

# Experiment 1 (A,B settings)
abanA_1A,abanB_1A = abandSpec(aban_1A)
completA_1A,completB_1A = completSpec(complet_1A)

abanA_1B,abanB_1B = abandSpec(aban_1B)
completA_1B,completB_1B = completSpec(complet_1B)

# Experiment 2 (A,B settings)
abanA_2A,abanB_2A = abandSpec(aban_2A)
completA_2A,completB_2A = completSpec(complet_2A)

abanA_2B,abanB_2B = abandSpec(aban_2B)
completA_2B,completB_2B = completSpec(complet_2B)
# Cross-check - CORRECT
print('abandoned subgroups 1A')
print(abanA_1A.worker_id.nunique() + abanB_1A.worker_id.nunique())
print('abandoned subgroups 1B')
print(abanA_1B.worker_id.nunique() + abanB_1B.worker_id.nunique())
print('abandoned subgroups 2A')
print(abanA_2A.worker_id.nunique() + abanB_2A.worker_id.nunique())
print('abandoned subgroups 2B')
print(abanA_2B.worker_id.nunique() + abanB_2B.worker_id.nunique())

print('completed subgroups 1A')
print(completA_1A.worker_id.nunique() + completB_1A.worker_id.nunique())
print('completed subgroups 1B')
print(completA_1B.worker_id.nunique() + completB_1B.worker_id.nunique())
print('completed subgroups 2A')
print(completA_2A.worker_id.nunique() + completB_2A.worker_id.nunique())
print('completed subgroups 2B')
print(completA_2B.worker_id.nunique() + completB_2B.worker_id.nunique())
abandoned subgroups 1A
107
abandoned subgroups 1B
78
abandoned subgroups 2A
109
abandoned subgroups 2B
102
completed subgroups 1A
100
completed subgroups 1B
100
completed subgroups 2A
100
completed subgroups 2B
100
# ----- Testing Pandas
#log1A[log1A['worker_id']==41202032]
#d = log1A.sort_values(by=['worker_id'])
#d.head(100)
#log1Ag = log1A.groupby(['worker_id'])
#abb = log1Ag.filter(lambda x: len(x['session_id'].unique()) > 1)
#abb
#abb.groupby(['worker_id']).get_group(41202032)
#abb.groupby(['worker_id']).get_group(6476374) #- does not find it - it's correct
# --
# a = [1,2,3]
# b = [2,3,4]
# data = pd.DataFrame()
# data['a'] = pd.Series(a)
# data['b'] = pd.Series(b)
# data.head()
# data['a'][~data['a'].isin(data['b'])]
# data['a'][data['a'].isin(data['b'])]
# data['a'].isin(data['b'])
# data[~data['a'].isin(data['b'])]
# ----------- end of testing Pandas

Experiment-based Hypotheses

Normality test and statistical tests to analyse the difference between the means (in measurement X) of two populations (experiment in setting A and experiment in setting B,

# There is no worker that appears in both settings (A and B)
print(len(exp1A[exp1A['worker_id'].isin(exp1B)]))
print(len(exp2A[exp1A['worker_id'].isin(exp2B)]))
0
0
print(len(log1A[log1A['worker_id'].isin(log1B)]))
print(len(log2A[log2A['worker_id'].isin(log2B)]))
print(len(log1B[log1B['worker_id'].isin(log1C)]))
0
0
0
from scipy.stats import shapiro
from scipy.stats import normaltest
from scipy.stats import anderson

# Input: series has the sample whose distribution we want to test
# Output: gaussian boolean True if it is normal distribution and False otherwise.
def testNormality(series):
    
    alpha = 0.05
    gaussian = False
    
    # only if the three tests give normal will be normal. If we find one that is not passed, then it is NOT normal. 
    
    # Shapiro-Wilk Test - for smaller data sets around thousands of records
    print('length of series in Shapiro is: '+ str(len(series)))
    stats1, p1 = shapiro(series)
    print('Statistics Shapiro-Wilk Test =%.3f, p=%.3f' % (stats1, p1))
    if p1 > alpha:
        gaussian = True
    print('Shapiro.Wilk says it is Normal '+ str(gaussian))
    
    gaussian = False # because of intermediate printing, reinitialize
    # D'Agostino and Pearson's Test
    stats2, p2 = normaltest(series) #dataw.humid
    print('Statistics D\'Agostino and Pearson\'s Test=%.3f, p=%.3f' % (stats2, p2))
    if p2 > alpha:
        gaussian = False
        print('D\'Agostino and Pearson\'s says it is Normal '+ str(gaussian))
    
    # Anderson-Darling Test
    '''result = anderson(series) 
    print('Statistic: %.3f' % result.statistic)
    for i in range(len(result.critical_values)):
        sl, cv = result.significance_level[i], result.critical_values[i]
        if result.statistic > result.critical_values[i]:
            gaussian = False'''
        
    
    return gaussian
    
'''
from scipy.stats import ttest_ind
from scipy.stats import mannwhitneyu

# Input:
# series1 is the series with the set of measurements for every single worker in case A of controlled experiment
# series2 is the series with the set of measurements for every single worker in case B of controlled experiment
# gaussian is the boolean value indicating if the samples have passed the test of normality or not (True is apply parametric test)
# Output:
# stats of statistical test 
# p-value 
# acceptHo (True if we fail to reject it and False if we reject it) 
# See also all tables for picking the tests (e.g., https://help.xlstat.com/customer/en/portal/articles/2062457-which-statistical-test-should-you-use-)
def compareTwoSamples(series1,series2, gaussian):
    # Tests to compare two samples (H0: they have equal distribution; H1: they have different distribution)
    
    alpha = 0.05
    acceptH0 = False
    
    if (gaussian == True):
        # Run Student's T-test
        stats, p = ttest_ind(series1, series2)
        print('Statistics=%.3f, p=%.3f' % (stats, p))
        
    else:
        
        # Run Mann-Whitney; Kruskal-Wallis test is for more samples.
        stats, p = mannwhitneyu(series1, series2)
        print('Statistics=%.3f, p=%.3f' % (stats, p))
        
        # result - hypothesis testing
   
    if p > alpha:
        acceptH0 = True
    
    print('The two samples have the same distribution: ' + str(acceptH0))
    return stats,p,acceptH0        
        
    
'''
# this is the implementation I had with variable accept0 True (fail to reject) and False (reject). Since multipletests from statsmodels and other libraries use True as for reject and False for fail to reject (acceptho) I prefer to change this to avoid confusion.
    
"\nfrom scipy.stats import ttest_ind\nfrom scipy.stats import mannwhitneyu\n\n# Input:\n# series1 is the series with the set of measurements for every single worker in case A of controlled experiment\n# series2 is the series with the set of measurements for every single worker in case B of controlled experiment\n# gaussian is the boolean value indicating if the samples have passed the test of normality or not (True is apply parametric test)\n# Output:\n# stats of statistical test \n# p-value \n# acceptHo (True if we fail to reject it and False if we reject it) \n# See also all tables for picking the tests (e.g., https://help.xlstat.com/customer/en/portal/articles/2062457-which-statistical-test-should-you-use-)\ndef compareTwoSamples(series1,series2, gaussian):\n    # Tests to compare two samples (H0: they have equal distribution; H1: they have different distribution)\n    \n    alpha = 0.05\n    acceptH0 = False\n    \n    if (gaussian == True):\n        # Run Student's T-test\n        stats, p = ttest_ind(series1, series2)\n        print('Statistics=%.3f, p=%.3f' % (stats, p))\n        \n    else:\n        \n        # Run Mann-Whitney; Kruskal-Wallis test is for more samples.\n        stats, p = mannwhitneyu(series1, series2)\n        print('Statistics=%.3f, p=%.3f' % (stats, p))\n        \n        # result - hypothesis testing\n   \n    if p > alpha:\n        acceptH0 = True\n    \n    print('The two samples have the same distribution: ' + str(acceptH0))\n    return stats,p,acceptH0        \n        \n    \n"
from scipy.stats import ttest_ind
from scipy.stats import mannwhitneyu

# Input:
# series1 is the series with the set of measurements for every single worker in case A of controlled experiment
# series2 is the series with the set of measurements for every single worker in case B of controlled experiment
# gaussian is the boolean value indicating if the samples have passed the test of normality or not (True is apply parametric test)
# Output:
# stats of statistical test 
# p-value 
# rejectJ0 (True if we reject it and False if we fail to reject it (i.e., accept)) 
# See also all tables for picking the tests (e.g., https://help.xlstat.com/customer/en/portal/articles/2062457-which-statistical-test-should-you-use-)
def compareTwoSamples(series1,series2, gaussian):
    # Tests to compare two samples (H0: they have equal distribution; H1: they have different distribution)
    
    alpha = 0.05
    rejectH0 = True
    
    if (gaussian == True):
        # Run Student's T-test
        stats, p = ttest_ind(series1, series2)
        print('Statistics=%.3f, p=%.3f' % (stats, p))
        
    else:
        
        # Run Mann-Whitney; Kruskal-Wallis test is for more samples.
        stats, p = mannwhitneyu(series1, series2)
        print('Statistics=%.3f, p=%.3f' % (stats, p))
        
        # result - hypothesis testing
   
    if p > alpha:
        rejectH0 = False
    
    print('The two samples have a statistically different distribution: ' + str(rejectH0))
    # reject True means we go for H1 which is that the populations do not have the same means
    return stats,p,rejectH0        
        
    

1. Work Done

Idea: People who abandon, *try longer* when they see more value / potential in the HIT. The more reward / the more documents the more value the HIT has for a worker. Workers may abandon because of their fear to get a "bad reputation", but when the reward is higher, the extrinsic motivation is stronger and one could think that they try longer (either clicking the answers or restarting the process after having closed it).

  • We have two pairs of populations to compare:
    • Experiment 1 (A and B) and
    • Experiment 2 (A and B)
  • We measure "trying longer" using two different measurements:
    • The number of sessions: start, leave, start again
    • The number of messages: they go further in the process (e.g., the click on many answers instead of staying at start)

Functions to compute the measurements

def getSessionCount(df):
    dfG = df.groupby(['worker_id'])
    sessionCounts = dfG.apply(lambda x: len(x['session_id'].unique()))
    sessionCountsRI = sessionCounts.reset_index()
    del(sessionCountsRI['worker_id'])
    sessionCountsRI.columns=['sessionCount']
    return sessionCountsRI
def getMessageCount(df):
    dfG = df.groupby(['worker_id'])
    messageCounts = dfG.apply(lambda x: len(x['message']))
    messageCountsRI = messageCounts.reset_index()
    del(messageCountsRI['worker_id'])
    messageCountsRI.columns=['messageCount']
    return messageCountsRI

SessionCount

sessionC_aban_1A= getSessionCount(aban_1A)
sessionC_aban_1B= getSessionCount(aban_1B)
sessionC_aban_1C= getSessionCount(aban_1C)

sessionC_aban_2A= getSessionCount(aban_2A)
sessionC_aban_2B= getSessionCount(aban_2B)

sessionC_aban_1B2B = pd.concat([sessionC_aban_1B,sessionC_aban_2B], ignore_index=True) #latest experiment analysis
print(sessionC_aban_1A.describe())
print(sessionC_aban_1B.describe())
print(sessionC_aban_1C.describe())

print(sessionC_aban_2A.describe())
print(sessionC_aban_2B.describe())
       sessionCount
count    107.000000
mean       1.102804
std        0.530834
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        6.000000
       sessionCount
count     78.000000
mean       1.102564
std        0.444000
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        4.000000
       sessionCount
count    291.000000
mean       1.951890
std       12.251576
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max      210.000000
       sessionCount
count    109.000000
mean       1.192661
std        0.440477
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        3.000000
       sessionCount
count    102.000000
mean       1.196078
std        0.783988
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        8.000000

Are the populations (both in pair) normal distrinbutions?

norm_sessionC_aban_1A = testNormality(sessionC_aban_1A)
print("final: " + str(norm_sessionC_aban_1A))
norm_sessionC_aban_1B = testNormality(sessionC_aban_1B)
print("final: " + str(norm_sessionC_aban_1B))
norm_sessionC_aban_1C = testNormality(sessionC_aban_1C)
print("final: " + str(norm_sessionC_aban_1C))
norm_sessionC_aban_2A = testNormality(sessionC_aban_2A)
print("final: " + str(norm_sessionC_aban_2A))
norm_sessionC_aban_2B = testNormality(sessionC_aban_2B)
print("final: " + str(norm_sessionC_aban_2B))
length of series in Shapiro is: 107
Statistics Shapiro-Wilk Test =0.190, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=197.283, p=0.000
final: False
length of series in Shapiro is: 78
Statistics Shapiro-Wilk Test =0.249, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=115.498, p=0.000
final: False
length of series in Shapiro is: 291
Statistics Shapiro-Wilk Test =0.045, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=668.252, p=0.000
final: False
length of series in Shapiro is: 109
Statistics Shapiro-Wilk Test =0.476, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=60.346, p=0.000
final: False
length of series in Shapiro is: 102
Statistics Shapiro-Wilk Test =0.261, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=176.924, p=0.000
final: False

Exp1_H0: means of sessionCount are equal in both populations

Exp1_H1: means of sessionCount are not equal in both populations

normal = norm_sessionC_aban_1A and norm_sessionC_aban_1B
print('Abandoned 1A and 1B')
normal = True #forced
#usual #stats, p1sc, reject1sc = compareTwoSamples(sessionC_aban_1A, sessionC_aban_1B, normal ) 
stats, p1sc, reject1sc = compareTwoSamples(sessionC_aban_1A, sessionC_aban_1B2B, normal ) # forcing t-test
Abandoned 1A and 1B
Statistics=-0.704, p=0.482
The two samples have a statistically different distribution: False
ax = sns.distplot(sessionC_aban_1A)
ax.set_xlabel("Number of sessions per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting A")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting A')
ax = sns.distplot(sessionC_aban_1B)
ax.set_xlabel("Number of sessions per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting B")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting B')

Exp2_H0: means of sessionCount are equal in both populations

Exp2_H1: means of sessionCount are not equal in both populations

normal = norm_sessionC_aban_2A and norm_sessionC_aban_2B
print('Abandoned 2A and 2B')
normal = True #forced
# usual  stats,p2sc,reject2sc = compareTwoSamples(sessionC_aban_2A, sessionC_aban_2B,normal)
stats,p2sc,reject2sc = compareTwoSamples(sessionC_aban_2A, sessionC_aban_1B2B, normal ) # forcing t-test
Abandoned 2A and 2B
Statistics=0.522, p=0.602
The two samples have a statistically different distribution: False
ax = sns.distplot(sessionC_aban_2A)
ax.set_xlabel("Number of sessions per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting A")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting A')

ExpQ_H0: means of sessionCount are equal in both populations

ExpQ_H1: means of sessionCount are not equal in both populations

ax = sns.distplot(sessionC_aban_1B)
ax.set_xlabel("Number of sessions per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting B")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting B')
normal = norm_sessionC_aban_1C and norm_sessionC_aban_1B
print('Abandoned 1C and 1B')
normal = True #forced
#usual #stats,pqsc,rejectqsc= compareTwoSamples(sessionC_aban_1C, sessionC_aban_1B, normal)
stats,pqsc,rejectqsc= compareTwoSamples(sessionC_aban_1C, sessionC_aban_1B2B, normal) #forcing t-test
Abandoned 1C and 1B
Statistics=0.871, p=0.384
The two samples have a statistically different distribution: False
ax = sns.distplot(sessionC_aban_1C)
ax.set_xlabel("Number of sessions per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting C")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting C')
normal = norm_sessionC_aban_2B and norm_sessionC_aban_1C
print('Abandoned 2B and 1C')
normal = True #forced
#usual #stats,pq2sc,rejectq2sc = compareTwoSamples(sessionC_aban_2B, sessionC_aban_1C, normal)
stats,pq2sc,rejectq2sc = compareTwoSamples(sessionC_aban_1B2B, sessionC_aban_1C, normal) #forcing t-test
Abandoned 2B and 1C
Statistics=-0.871, p=0.384
The two samples have a statistically different distribution: False
 

Number of Messages

messageC_aban_1A= getMessageCount(aban_1A)
messageC_aban_1B= getMessageCount(aban_1B)
messageC_aban_1C= getMessageCount(aban_1C)

messageC_aban_2A= getMessageCount(aban_2A)
messageC_aban_2B= getMessageCount(aban_2B)

messageC_aban_1B2B = pd.concat([messageC_aban_1B,messageC_aban_2B], ignore_index=True)
print(messageC_aban_1A.describe())
print(messageC_aban_1B.describe())
print(messageC_aban_1C.describe())

print(messageC_aban_2A.describe())
print(messageC_aban_2B.describe())
       messageCount
count    107.000000
mean       1.158879
std        0.568837
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        6.000000
       messageCount
count     78.000000
mean       1.487179
std        1.585188
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max       11.000000
       messageCount
count    291.000000
mean      10.701031
std       17.677669
min        1.000000
25%        1.000000
50%        4.000000
75%       12.000000
max      229.000000
       messageCount
count    109.000000
mean       1.522936
std        0.834403
min        1.000000
25%        1.000000
50%        1.000000
75%        2.000000
max        6.000000
       messageCount
count    102.000000
mean       1.705882
std        1.668510
min        1.000000
25%        1.000000
50%        1.000000
75%        2.000000
max       15.000000

Normality

norm_messageC_aban_1A = testNormality(messageC_aban_1A)
print("final: " + str(norm_messageC_aban_1A))
norm_messageC_aban_1B = testNormality(messageC_aban_1B)
print("final: " + str(norm_messageC_aban_1B))
norm_messageC_aban_1C = testNormality(messageC_aban_1C)
print("final: " + str(norm_messageC_aban_1C))

norm_messageC_aban_2A = testNormality(messageC_aban_2A)
print("final: " + str(norm_messageC_aban_2A))
norm_messageC_aban_2B = testNormality(messageC_aban_2B)
print("final: " + str(norm_messageC_aban_2B))
length of series in Shapiro is: 107
Statistics Shapiro-Wilk Test =0.290, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=172.215, p=0.000
final: False
length of series in Shapiro is: 78
Statistics Shapiro-Wilk Test =0.349, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=105.635, p=0.000
final: False
length of series in Shapiro is: 291
Statistics Shapiro-Wilk Test =0.499, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=428.798, p=0.000
final: False
length of series in Shapiro is: 109
Statistics Shapiro-Wilk Test =0.613, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=83.161, p=0.000
final: False
length of series in Shapiro is: 102
Statistics Shapiro-Wilk Test =0.438, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=152.159, p=0.000
final: False

Exp1_H0: means of messageCount are equal in both populations

Exp1_H1: means of messageCount are not equal in both populations

normal = norm_messageC_aban_1A and norm_messageC_aban_1B
print('Abandoned 1A and 1B')
normal = True #forced
#usual #stats,p1mc,reject1mc = compareTwoSamples(messageC_aban_1A, messageC_aban_1B, normal )
stats,p1mc,reject1mc = compareTwoSamples(messageC_aban_1A, messageC_aban_1B2B, normal ) #t-test
Abandoned 1A and 1B
Statistics=-2.766, p=0.006
The two samples have a statistically different distribution: True
ax = sns.distplot(messageC_aban_1A)
ax.set_xlabel("Number of messages per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting A")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting A')
ax = sns.distplot(messageC_aban_1B)
ax.set_xlabel("Number of messages per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting B")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting B')

Exp2_H0: means of messageCount are equal in both populations

Exp2_H1: means of messageCount are not equal in both populations

normal = norm_messageC_aban_2A and norm_messageC_aban_2B
print('Abandoned 2A and 2B')
normal = True #forced
#usual #stats,p2mc,reject2mc = compareTwoSamples(messageC_aban_2A, messageC_aban_2B, normal)
stats,p2mc,reject2mc = compareTwoSamples(messageC_aban_2A, messageC_aban_1B2B, normal) #forcing t-test
Abandoned 2A and 2B
Statistics=-0.524, p=0.601
The two samples have a statistically different distribution: False
ax = sns.distplot(messageC_aban_2A)
ax.set_xlabel("Number of messages per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting A")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting A')
ax = sns.distplot(messageC_aban_2B)
ax.set_xlabel("Number of messages per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting B")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting B')
print('values in population A')
messageC_aban_2A['messageCount'].value_counts()
values in population A
1    65
2    38
4     2
3     2
6     1
5     1
Name: messageCount, dtype: int64
print('values in population B')
messageC_aban_2B['messageCount'].value_counts()
values in population B
1     66
2     24
4      7
3      2
15     1
6      1
5      1
Name: messageCount, dtype: int64
messageC_aban_2A.hist(log=True)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7feeae4facf8>]],
      dtype=object)

ExpQ_H0: means of messageCount are equal in both populations

ExpQ_H1: means of messageCount are not equal in both populations

normal = norm_messageC_aban_1C and norm_messageC_aban_1B
print('Abandoned 1C and 1B')
normal = True #forced
#usual #stats,pqmc,rejectqmc = compareTwoSamples(messageC_aban_1C, messageC_aban_1B, normal)
stats,pqmc,rejectqmc = compareTwoSamples(messageC_aban_1C, messageC_aban_1B2B, normal) #forcing t-test
Abandoned 1C and 1B
Statistics=6.878, p=0.000
The two samples have a statistically different distribution: True
ax = sns.distplot(messageC_aban_1C)
ax.set_xlabel("Number of messages per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting C")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting C')
normal = norm_messageC_aban_2B and norm_messageC_aban_1C
print('Abandoned 2B and 1C')
normal = True #forced
#usual #stats,pq2mc,rejectq2mc = compareTwoSamples(messageC_aban_2B, messageC_aban_1C, normal)
stats,pq2mc,rejectq2mc = compareTwoSamples(messageC_aban_1B2B, messageC_aban_1C, normal) #forcing t-test
Abandoned 2B and 1C
Statistics=-6.878, p=0.000
The two samples have a statistically different distribution: True

2. Time invested

Functions to compute the measurements

group = log1A.groupby(['worker_id','session_id'])['server_time']
time = group.apply(lambda x: (x.max() - x.min()).total_seconds())
# testing
# group.get_group((6476374, '0.8lvbip6m'))
# group.get_group((6476374, '0.8lvbip6m')).max()-group.get_group((6476374, '0.8lvbip6m')).min()
def getTimePerSession(df):
   
    '''
    dfG = df.groupby(['worker_id','session_id'])['server_time']
    times = dfG.apply(lambda x: (x.max() - x.min()).total_seconds())
    timesRI = times.reset_index()
    del(timesRI['worker_id'])
    del(timesRI['session_id'])
    timesRI.columns=['time']
    
    print('original df shape:')
    print(df.shape)
    
    print('timeRI  shape:')
    print(timesRI.shape)
    return timesRI
    '''

    dfG = df.groupby(['worker_id','session_id'])['server_time']
    times = dfG.apply(lambda x: (x.max() - x.min()).total_seconds())
   
    timesRI = times.reset_index()
    timesRI.columns=['worker_id','session_id','time_spent']
    
    
    timesPerSession = timesRI.groupby(['worker_id'])['time_spent'].mean()
    timesPerSRI = timesPerSession.reset_index()
   
    del(timesPerSRI ['worker_id'])
   
    timesPerSRI.columns=['avgtimesession']
   
    return timesPerSRI
    
    
# more?

Time per session

sessionTime_aban_1A= getTimePerSession(aban_1A)
sessionTime_aban_1B= getTimePerSession(aban_1B)
sessionTime_aban_1C= getTimePerSession(aban_1C)

sessionTime_aban_2A= getTimePerSession(aban_2A)
sessionTime_aban_2B= getTimePerSession(aban_2B)

sessionTime_aban_1B2B = pd.concat([sessionTime_aban_1B,sessionTime_aban_2B], ignore_index=True)
print(sessionTime_aban_1A.describe())
print(sessionTime_aban_1B.describe())
print(sessionTime_aban_1C.describe())

print(sessionTime_aban_2A.describe())
print(sessionTime_aban_2B.describe())
       avgtimesession
count      107.000000
mean        15.688014
std        114.352099
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max       1147.664558
       avgtimesession
count       78.000000
mean        48.022728
std        248.813323
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max       2122.246052
       avgtimesession
count      291.000000
mean       223.142203
std        272.955536
min          0.000000
25%          0.000000
50%        116.441946
75%        370.051424
max       1215.445574
       avgtimesession
count      109.000000
mean         6.483197
std         42.161366
min          0.000000
25%          0.000000
50%          0.000000
75%          0.011079
max        414.607930
       avgtimesession
count      102.000000
mean        21.247507
std        155.519444
min          0.000000
25%          0.000000
50%          0.000000
75%          0.057470
max       1466.078933

Normality

norm_sessionTime_aban_1A = testNormality(sessionTime_aban_1A)
print("final: " + str(norm_sessionTime_aban_1A))
norm_sessionTime_aban_1B = testNormality(sessionTime_aban_1B)
print("final: " + str(norm_sessionTime_aban_1B))
norm_sessionTime_aban_1C = testNormality(sessionTime_aban_1C)
print("final: " + str(norm_sessionTime_aban_1C))

norm_sessionTime_aban_2A = testNormality(sessionTime_aban_2A)
print("final: " + str(norm_sessionTime_aban_2A))
norm_sessionTime_aban_2B = testNormality(sessionTime_aban_2B)
print("final: " + str(norm_sessionTime_aban_2B))
length of series in Shapiro is: 107
Statistics Shapiro-Wilk Test =0.121, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=219.866, p=0.000
final: False
length of series in Shapiro is: 78
Statistics Shapiro-Wilk Test =0.190, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=157.498, p=0.000
final: False
length of series in Shapiro is: 291
Statistics Shapiro-Wilk Test =0.808, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=71.794, p=0.000
final: False
length of series in Shapiro is: 109
Statistics Shapiro-Wilk Test =0.142, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=213.928, p=0.000
final: False
length of series in Shapiro is: 102
Statistics Shapiro-Wilk Test =0.122, p=0.000
Shapiro.Wilk says it is Normal False
Statistics D'Agostino and Pearson's Test=200.247, p=0.000
final: False

Exp1_H0: means of time per session are equal in both populations

Exp1_H1: means of time per session are not equal in both populations

normal = norm_sessionTime_aban_1A and norm_sessionTime_aban_1B
print('Abandoned 1A and 1B')
normal = True #forced
#usual #stats,p1ts,reject1ts = compareTwoSamples(sessionTime_aban_1A, sessionTime_aban_1B, normal )
stats,p1ts,reject1ts = compareTwoSamples(sessionTime_aban_1A, sessionTime_aban_1B2B, normal)
Abandoned 1A and 1B
Statistics=-0.808, p=0.420
The two samples have a statistically different distribution: False
ax = sns.distplot(sessionTime_aban_1A)
ax.set_xlabel("AVG Time per session per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting A")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting A')
ax = sns.distplot(sessionTime_aban_1B)
ax.set_xlabel("AVG Time per session per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 1 - setting B")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 1 - setting B')

Exp2_H0: means of time per session are equal in both populations

Exp2_H1: means of time per session are not equal in both populations

normal = norm_sessionTime_aban_2A and norm_sessionTime_aban_2B
print('Abandoned 2A and 2B')
normal = True #forced
#usual #stats,p2ts,reject2ts = compareTwoSamples(sessionTime_aban_2A, sessionTime_aban_2B, normal)
stats,p2ts,reject2ts = compareTwoSamples(sessionTime_aban_2A, sessionTime_aban_1B2B, normal)
Abandoned 2A and 2B
Statistics=-1.350, p=0.178
The two samples have a statistically different distribution: False
ax = sns.distplot(sessionTime_aban_2A)
ax.set_xlabel("AVG Time per session per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting A")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting A')
ax = sns.distplot(sessionTime_aban_2B)
ax.set_xlabel("AVG Time per session per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting B")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting B')

ExpQ_H0: means of time per session are equal in both populations

ExpQ_H1: means of time per session are not equal in both populations

normal = norm_sessionTime_aban_1C and norm_sessionTime_aban_1B
print('Abandoned 1C and 1B')
normal = True #forced
#usual #stats,pqts,rejectqts = compareTwoSamples(sessionTime_aban_1C, sessionTime_aban_1B, normal)
stats,pqts,rejectqts = compareTwoSamples(sessionTime_aban_1C, sessionTime_aban_1B2B, normal)
Abandoned 1C and 1B
Statistics=8.091, p=0.000
The two samples have a statistically different distribution: True
ax = sns.distplot(sessionTime_aban_1C)
ax.set_xlabel("AVG Time per session per worker")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of session count abandoned group Experiment 2 - setting C")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Text(0.5,1,'Distribution of session count abandoned group Experiment 2 - setting C')
normal = norm_sessionTime_aban_2B and norm_sessionTime_aban_1C
print('Abandoned 2B and 1C')
normal = True #forced
#usual #stats,pq2ts,rejectq2ts = compareTwoSamples(sessionTime_aban_2B, sessionTime_aban_1C, normal)
stats,pq2ts,rejectq2ts = compareTwoSamples(sessionTime_aban_1B2B, sessionTime_aban_1C, normal)
Abandoned 2B and 1C
Statistics=-8.091, p=0.000
The two samples have a statistically different distribution: True
sessionTime_aban_1C.hist(log=True)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7feeae2b0780>]],
      dtype=object)
sessionTime_aban_2B.hist(log=True)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7feeabce5f98>]],
      dtype=object)

Plots distributions together

print(messageC_aban_1A.head())

sns.distplot(messageC_aban_1A) sns.distplot(messageC_aban_1B)

plt.show()

Corrections

#ps = pd.Series([p1sc,p2sc,pqsc,pq2sc,p1mc,p2mc,pqmc,pq2mc,p1ts,p2ts,pqts,pq2ts])
# after merging the qxxx and q2xxx are the same
ps = pd.Series([p1sc,p2sc,pqsc,p1mc,p2mc,pqmc,p1ts,p2ts,pqts])
ps.head(11)
0       [0.48226690063038824]
1         [0.602350300857029]
2       [0.38424557870904463]
3      [0.006036840259370842]
4        [0.6007672398883928]
5    [1.9490898739061408e-11]
6        [0.4197374019089245]
7       [0.17810653524304262]
8     [5.100165683157102e-15]
dtype: object
corrected_p = multipletests(ps, 
                                            alpha = 0.05, 
                                            method = 'sidak') 
print('AIM: REJECT TRUE -- WOULD SHOW STATISTICALLY SIGNIFICANT DIFFERENCE BETWEEN THE TWO POPULATIONS')
#print(str(reject1sc)+' '+str(reject2sc)+' '+str(rejectqsc)+' '+str(rejectq2sc)+' '+str(reject1mc)+' '+str(reject2mc)+' '+str(rejectqmc)+' '+str(rejectq2mc)+' '+str(reject1ts)+' '+str(reject2ts)+' '+str(rejectqts)+' '+str(rejectq2ts))
print(str(reject1sc)+' '+str(reject2sc)+' '+str(rejectqsc)+' '+str(reject1mc)+' '+str(reject2mc)+' '+str(rejectqmc)+' '+str(reject1ts)+' '+str(reject2ts)+' '+str(rejectqts))

print(corrected_p)
print('p < 0.05 is reject H0, which is accept H1')
AIM: REJECT TRUE -- WOULD SHOW STATISTICALLY SIGNIFICANT DIFFERENCE BETWEEN THE TWO POPULATIONS
False False False True False True False False True
(array([False, False, False, False, False,  True, False, False,  True]), array([array([0.99732728]), array([0.9997514]), array([0.98727471]),
       array([0.05303791]), array([0.99974235]), array([1.7541768e-10]),
       array([0.99254202]), array([0.82886464]), array([4.59632332e-14])],
      dtype=object), 0.005683044988048058, 0.005555555555555556)
p < 0.05 is reject H0, which is accept H1

ANOVA

Building the data for ANOVA

sessionC | exp | variant
1 1 A
3 1 A
1 1 A
...
1 1 B
2 1 B
1 1 B
...
1 1 C
2 1 C
1 1 C
...
1 2 A
1 2 A
...
1 2 B
1 2 B
..

def add_effect_size(aov,conf_value=0.05):
    mse = aov['sum_sq'][-1]/aov['df'][-1]
    aov['omega_sq'] = 'NaN'
    aov['omega_sq'] = (aov[:-1]['sum_sq']-(aov[:-1]['df']*mse))/(sum(aov['sum_sq'])+mse)
    aov['passed?'] = aov['PR(>F)']<conf_value
    return aov
def buildDataForAnova(series1A,series1B,series1C,series2A,series2B,withC):
    
    s = pd.DataFrame(series1A)

    s1A = pd.DataFrame(series1A)
    s1A['qual'] = '0'
    s1A['pay_per_doc'] = 0.017
    s1A['length'] = 6
    s1A['doc_set'] = '1'

    s1B = pd.DataFrame(series1B)
    s1B['qual'] = '0'
    s1B['pay_per_doc'] = 0.05
    s1B['length'] = 6
    s1B['doc_set'] = '2'

    s1C = pd.DataFrame(series1C)
    s1C['qual'] = '1'
    s1C['pay_per_doc'] = 0.05
    s1C['length'] = 6
    s1C['doc_set'] = '2'

    s2A = pd.DataFrame(series2A)
    s2A['qual'] = '0'
    s2A['pay_per_doc'] = 0.05
    s2A['length'] = 3
    s2A['doc_set'] = '1'

    s2B = pd.DataFrame(series2B)
    s2B['qual'] = '0'
    s2B['pay_per_doc'] = 0.05
    s2B['length'] = 6
    s2B['doc_set'] = '2'
    
   
    
    data_anova= pd.DataFrame() 
    data_anova = pd.concat([s1A.reset_index(),s1B.reset_index(),s1C.reset_index(),s2A.reset_index(),s2B.reset_index()])
#    data_anova = pd.concat([s1B.reset_index(),s1C.reset_index(),s2A.reset_index(),s2B.reset_index()])
    print(data_anova.shape)
    print(data_anova.columns)
    print(data_anova.isnull().values.any())
   
    
    
    if withC==True :
        return data_anova
    else:
        return data_anova[~(data_anova['qual']=='1')]
    
'''def buildDataForAnova(series1A,series1B,series1C,series2A,series2B,withC):
    
    s1A = pd.DataFrame(series1A)
    s1A['exp'] = 1
    s1A['variant'] = 'A'
    #s1A['qual'] = 0

    s1B = pd.DataFrame(series1B)
    s1B['exp'] = 1'
    s1B['variant'] = 'B'
    #s1B['qual'] = 0

    s1C = pd.DataFrame(series1C)
    s1C['exp'] = 1
    s1C['variant'] = 'C'
    #s1C['qual'] = 1

    s2A = pd.DataFrame(series2A)
    s2A['exp'] = 2
    s2A['variant'] = 'A'
    #s2A['qual'] = 0

    s2B = pd.DataFrame(series2B)
    s2B['exp'] = 2
    s2B['variant'] = 'B'
    #s1B['qual'] = 0
    
   
    
    data_anova= pd.DataFrame() 
    data_anova = pd.concat([s1A.reset_index(),s1B.reset_index(),s1C.reset_index(),s2A.reset_index(),s2B.reset_index()])
    print(data_anova.shape)
    print(data_anova.columns)
    print(data_anova.isnull().values.any())
   
    
    
    if withC==True :
        return data_anova
    else:
        #return 
        return data_anova[~data_anova['variant'].isin(['C'])]
  '''  
"def buildDataForAnova(series1A,series1B,series1C,series2A,series2B,withC):\n    \n    s1A = pd.DataFrame(series1A)\n    s1A['exp'] = 1\n    s1A['variant'] = 'A'\n    #s1A['qual'] = 0\n\n    s1B = pd.DataFrame(series1B)\n    s1B['exp'] = 1'\n    s1B['variant'] = 'B'\n    #s1B['qual'] = 0\n\n    s1C = pd.DataFrame(series1C)\n    s1C['exp'] = 1\n    s1C['variant'] = 'C'\n    #s1C['qual'] = 1\n\n    s2A = pd.DataFrame(series2A)\n    s2A['exp'] = 2\n    s2A['variant'] = 'A'\n    #s2A['qual'] = 0\n\n    s2B = pd.DataFrame(series2B)\n    s2B['exp'] = 2\n    s2B['variant'] = 'B'\n    #s1B['qual'] = 0\n    \n   \n    \n    data_anova= pd.DataFrame() \n    data_anova = pd.concat([s1A.reset_index(),s1B.reset_index(),s1C.reset_index(),s2A.reset_index(),s2B.reset_index()])\n    print(data_anova.shape)\n    print(data_anova.columns)\n    print(data_anova.isnull().values.any())\n   \n    \n    \n    if withC==True :\n        return data_anova\n    else:\n        #return \n        return data_anova[~data_anova['variant'].isin(['C'])]\n  "
# SessionC
data_anova_wC_sessionC = buildDataForAnova(sessionC_aban_1A,sessionC_aban_1B,sessionC_aban_1C,sessionC_aban_2A,sessionC_aban_2B, True)
data_anova_withoutC_sessionC = buildDataForAnova(sessionC_aban_1A,sessionC_aban_1B,sessionC_aban_1C,sessionC_aban_2A,sessionC_aban_2B, False)
(687, 6)
Index(['index', 'sessionCount', 'qual', 'pay_per_doc', 'length', 'doc_set'], dtype='object')
False
(687, 6)
Index(['index', 'sessionCount', 'qual', 'pay_per_doc', 'length', 'doc_set'], dtype='object')
False
# MessageC
data_anova_wC_messageC = buildDataForAnova(messageC_aban_1A,messageC_aban_1B,messageC_aban_1C,messageC_aban_2A,messageC_aban_2B, True)
data_anova_withoutC_messageC = buildDataForAnova(messageC_aban_1A,messageC_aban_1B,messageC_aban_1C,messageC_aban_2A,messageC_aban_2B, False)
(687, 6)
Index(['index', 'messageCount', 'qual', 'pay_per_doc', 'length', 'doc_set'], dtype='object')
False
(687, 6)
Index(['index', 'messageCount', 'qual', 'pay_per_doc', 'length', 'doc_set'], dtype='object')
False
# time
data_anova_wC_time = buildDataForAnova(sessionTime_aban_1A,sessionTime_aban_1B,sessionTime_aban_1C,sessionTime_aban_2A,sessionTime_aban_2B, True)
data_anova_withoutC_time = buildDataForAnova(sessionTime_aban_1A,sessionTime_aban_1B,sessionTime_aban_1C,sessionTime_aban_2A,sessionTime_aban_2B, False)
(687, 6)
Index(['index', 'avgtimesession', 'qual', 'pay_per_doc', 'length', 'doc_set'], dtype='object')
False
(687, 6)
Index(['index', 'avgtimesession', 'qual', 'pay_per_doc', 'length', 'doc_set'], dtype='object')
False
d =  pd.concat([data_anova_wC_sessionC['sessionCount'],data_anova_wC_messageC['messageCount'],data_anova_wC_time['avgtimesession'],data_anova_wC_sessionC[['qual','pay_per_doc','length','doc_set']]], axis=1, sort=False)
d_withoutC = d[~(d['qual']=='1')]
print(len(d),len(d_withoutC))
687 396
d.head(20)
sessionCount messageCount avgtimesession qual pay_per_doc length doc_set
0 1 1 0.000000 0 0.017 6 1
1 1 1 0.000000 0 0.017 6 1
2 1 1 0.000000 0 0.017 6 1
3 1 1 0.000000 0 0.017 6 1
4 1 1 0.000000 0 0.017 6 1
5 2 2 0.000000 0 0.017 6 1
6 1 1 0.000000 0 0.017 6 1
7 1 1 0.000000 0 0.017 6 1
8 1 1 0.000000 0 0.017 6 1
9 1 1 0.000000 0 0.017 6 1
10 1 1 0.000000 0 0.017 6 1
11 1 1 0.000000 0 0.017 6 1
12 1 1 0.000000 0 0.017 6 1
13 1 1 0.000000 0 0.017 6 1
14 1 1 0.000000 0 0.017 6 1
15 1 2 90.132755 0 0.017 6 1
16 1 1 0.000000 0 0.017 6 1
17 1 1 0.000000 0 0.017 6 1
18 1 1 0.000000 0 0.017 6 1
19 1 2 29.126187 0 0.017 6 1
#d_withoutC = d[~(d['qual']=='1')]
#d_withoutC.qual.unique()
#d_withoutC.pay_per_doc.unique()
subd = d_withoutC[['sessionCount','messageCount','avgtimesession','pay_per_doc','length']]
 
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
model_sessionC = ols(formula='sessionCount ~ C(pay_per_doc) * C(length)', data=subd).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_sessionC = anova_lm(model_sessionC, typ=2) 
aov_table_sessionC = add_effect_size(aov_table_sessionC)
print(aov_table_sessionC)

model_messageC = ols(formula='messageCount ~ C(pay_per_doc) * C(length)', data=subd).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_messageC = anova_lm(model_messageC, typ=2) 
aov_table_sessionC = add_effect_size(aov_table_messageC)
print(aov_table_messageC)

model_time = ols(formula='avgtimesession ~ C(pay_per_doc) * C(length)', data=subd).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_time = anova_lm(model_time, typ=2) 
aov_table_time = add_effect_size(aov_table_time)
print(aov_table_time)
                                sum_sq     df             F    PR(>F)  \
C(pay_per_doc)           -4.357911e-13    1.0 -1.326728e-12  1.000000   
C(length)                 5.421047e+00    1.0  1.650391e+01  0.000059   
C(pay_per_doc):C(length)  1.132725e-02    1.0  3.448482e-02  0.852775   
Residual                  1.287604e+02  392.0           NaN       NaN   

                          omega_sq  passed?  
C(pay_per_doc)           -0.002442    False  
C(length)                 0.037857     True  
C(pay_per_doc):C(length) -0.002358    False  
Residual                       NaN    False  
                                sum_sq     df             F        PR(>F)  \
C(pay_per_doc)           -1.484763e-11    1.0 -9.921434e-12  1.000000e+00   
C(length)                 9.470537e+02    1.0  6.328369e+02  8.030777e-84   
C(pay_per_doc):C(length)  2.700155e-02    1.0  1.804288e-02  8.932155e-01   
Residual                  5.866362e+02  392.0           NaN           NaN   

                          omega_sq  passed?  
C(pay_per_doc)           -0.000975    False  
C(length)                 0.615913     True  
C(pay_per_doc):C(length) -0.000957    False  
Residual                       NaN    False  
                                sum_sq     df             F         PR(>F)  \
C(pay_per_doc)           -1.627843e-08    1.0 -7.234986e-13   1.000000e+00   
C(length)                 4.204623e+07    1.0  1.868755e+03  3.137003e-151   
C(pay_per_doc):C(length)  2.637931e+02    1.0  1.172435e-02   9.138297e-01   
Residual                  8.819841e+06  392.0           NaN            NaN   

                          omega_sq  passed?  
C(pay_per_doc)           -0.000442    False  
C(length)                 0.825795     True  
C(pay_per_doc):C(length) -0.000437    False  
Residual                       NaN    False  
'''
model_sessionC = ols(formula='sessionCount ~ C(exp) * C(variant)', data=data_anova_withoutC_sessionC).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_sessionC = anova_lm(model_sessionC, typ=2) 
aov_table_sessionC = add_effect_size(aov_table_sessionC)
print(aov_table_sessionC)

model_messageC = ols(formula='messageCount ~ C(exp) * C(variant)', data=data_anova_withoutC_messageC).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_messageC = anova_lm(model_messageC, typ=2) 
aov_table_sessionC = add_effect_size(aov_table_messageC)
print(aov_table_messageC)

model_time = ols(formula='time ~ C(exp) * C(variant)', data=data_anova_withoutC_time).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_time = anova_lm(model_time, typ=2) 
aov_table_time = add_effect_size(aov_table_time)
print(aov_table_time)
'''
"\nmodel_sessionC = ols(formula='sessionCount ~ C(exp) * C(variant)', data=data_anova_withoutC_sessionC).fit()\n#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()\n \naov_table_sessionC = anova_lm(model_sessionC, typ=2) \naov_table_sessionC = add_effect_size(aov_table_sessionC)\nprint(aov_table_sessionC)\n\nmodel_messageC = ols(formula='messageCount ~ C(exp) * C(variant)', data=data_anova_withoutC_messageC).fit()\n#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()\n \naov_table_messageC = anova_lm(model_messageC, typ=2) \naov_table_sessionC = add_effect_size(aov_table_messageC)\nprint(aov_table_messageC)\n\nmodel_time = ols(formula='time ~ C(exp) * C(variant)', data=data_anova_withoutC_time).fit()\n#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()\n \naov_table_time = anova_lm(model_time, typ=2) \naov_table_time = add_effect_size(aov_table_time)\nprint(aov_table_time)\n"
 
data_anova_wC_sessionC.head()
index sessionCount qual pay_per_doc length doc_set
0 0 1 0 0.017 6 1
1 1 1 0 0.017 6 1
2 2 1 0 0.017 6 1
3 3 1 0 0.017 6 1
4 4 1 0 0.017 6 1
 
# Plots
import seaborn as sns


def plot_anova (data,y):
    sns.set(style="whitegrid")

    # Draw a pointplot to show pulse as a function of three categorical factors
    g = sns.catplot(x="pay_per_doc", y=y, col="length",capsize=.2, palette="YlGnBu_d", height=6, aspect=.75,kind="point", data=data)
    g.despine(left=True)
    
#!pip install --upgrade seaborn
plot_anova(subd,y="sessionCount")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
plot_anova(subd,y="messageCount")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
plot_anova(subd,y="avgtimesession")
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
# Improved
# joined df
 
model_session = ols(formula='messageCount ~ C(pay_per_doc) * C(length) * C(doc_set) * C(qual)', data=data_anova_wC_messageC).fit()
#model = ols(formula='sessionCount ~ C(exp) * C(variant) * C(qual)', data=data_anova_withoutC_sessionC).fit()
 
aov_table_session = anova_lm(model_session, typ=2) 
aov_table_session = add_effect_size(aov_table_session)
print(aov_table_session)
                                                   sum_sq     df  \
C(pay_per_doc)                              -9.457383e+02    1.0   
C(length)                                    4.273441e+03    1.0   
C(doc_set)                                   2.608301e+01    1.0   
C(qual)                                      5.111264e+03    1.0   
C(pay_per_doc):C(length)                    -0.000000e+00    1.0   
C(pay_per_doc):C(doc_set)                    3.269672e-10    1.0   
C(length):C(doc_set)                         6.484428e-12    1.0   
C(pay_per_doc):C(qual)                      -5.869172e-13    1.0   
C(length):C(qual)                           -0.000000e+00    1.0   
C(doc_set):C(qual)                          -1.911536e-12    1.0   
C(pay_per_doc):C(length):C(doc_set)          2.857362e+03    1.0   
C(pay_per_doc):C(length):C(qual)            -8.588413e-12    1.0   
C(pay_per_doc):C(doc_set):C(qual)            1.341940e-13    1.0   
C(length):C(doc_set):C(qual)                 5.367758e-13    1.0   
C(pay_per_doc):C(length):C(doc_set):C(qual)  9.188925e+03    1.0   
Residual                                     9.121126e+04  683.0   

                                                        F        PR(>F)  \
C(pay_per_doc)                              -7.081793e+00  1.000000e+00   
C(length)                                    3.200000e+01  2.268253e-08   
C(doc_set)                                   1.953125e-01  6.586713e-01   
C(qual)                                      3.827371e+01  1.059246e-09   
C(pay_per_doc):C(length)                    -0.000000e+00  1.000000e+00   
C(pay_per_doc):C(doc_set)                    2.448366e-12  9.999987e-01   
C(length):C(doc_set)                         4.855611e-14  1.000000e+00   
C(pay_per_doc):C(qual)                      -4.394901e-15  1.000000e+00   
C(length):C(qual)                           -0.000000e+00  1.000000e+00   
C(doc_set):C(qual)                          -1.431379e-14  1.000000e+00   
C(pay_per_doc):C(length):C(doc_set)          2.139624e+01  4.468542e-06   
C(pay_per_doc):C(length):C(qual)            -6.431099e-14  1.000000e+00   
C(pay_per_doc):C(doc_set):C(qual)            1.004859e-15  1.000000e+00   
C(length):C(doc_set):C(qual)                 4.019437e-15  1.000000e+00   
C(pay_per_doc):C(length):C(doc_set):C(qual)  6.880768e+01  5.780115e-16   
Residual                                              NaN           NaN   

                                             omega_sq  passed?  
C(pay_per_doc)                              -0.009649    False  
C(length)                                    0.037011     True  
C(doc_set)                                  -0.000961    False  
C(qual)                                      0.044501     True  
C(pay_per_doc):C(length)                    -0.001194    False  
C(pay_per_doc):C(doc_set)                   -0.001194    False  
C(length):C(doc_set)                        -0.001194    False  
C(pay_per_doc):C(qual)                      -0.001194    False  
C(length):C(qual)                           -0.001194    False  
C(doc_set):C(qual)                          -0.001194    False  
C(pay_per_doc):C(length):C(doc_set)          0.024351     True  
C(pay_per_doc):C(length):C(qual)            -0.001194    False  
C(pay_per_doc):C(doc_set):C(qual)           -0.001194    False  
C(length):C(doc_set):C(qual)                -0.001194    False  
C(pay_per_doc):C(length):C(doc_set):C(qual)  0.080956     True  
Residual                                          NaN    False  
 
 
 
 
 
 
# Alex tests --------------
d_withoutC[['sessionCount','messageCount','length']] = d_withoutC[['sessionCount','messageCount','length']].apply(lambda x: pd.DataFrame.astype(x,dtype=np.float64))
d[['sessionCount','messageCount','length']] = d[['sessionCount','messageCount','length']].apply(lambda x: pd.DataFrame.astype(x,dtype=np.float64))
/srv/paws/lib/python3.6/site-packages/pandas/core/frame.py:3140: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]
d.dtypes, d_withoutC.dtypes
(sessionCount      float64
 messageCount      float64
 avgtimesession    float64
 qual               object
 pay_per_doc       float64
 length            float64
 doc_set            object
 dtype: object, sessionCount      float64
 messageCount      float64
 avgtimesession    float64
 qual               object
 pay_per_doc       float64
 length            float64
 doc_set            object
 dtype: object)
mod = ols(formula='sessionCount ~ pay_per_doc * length', data=d_withoutC)
res = mod.fit()
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:           sessionCount   R-squared:                       0.003
Model:                            OLS   Adj. R-squared:                 -0.002
Method:                 Least Squares   F-statistic:                    0.6751
Date:                Tue, 14 Aug 2018   Prob (F-statistic):              0.510
Time:                        15:53:13   Log-Likelihood:                -339.00
No. Observations:                 396   AIC:                             684.0
Df Residuals:                     393   BIC:                             696.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept              1.2246      0.118     10.387      0.000       0.993       1.456
pay_per_doc            0.1030      0.056      1.835      0.067      -0.007       0.213
length                -0.0248      0.023     -1.079      0.281      -0.070       0.020
pay_per_doc:length     0.2493      0.343      0.726      0.468      -0.425       0.924
==============================================================================
Omnibus:                      550.462   Durbin-Watson:                   1.946
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            76827.361
Skew:                           6.982   Prob(JB):                         0.00
Kurtosis:                      69.792   Cond. No.                     3.84e+16
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 7.93e-30. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
from statsmodels.multivariate import manova
#ALEX MAN(C)OVA also this fails!
'''
mod = manova.MANOVA.from_formula(formula='messageCount+avgtimesession+sessionCount ~ pay_per_doc * length', data=d_withoutC)
#x = d_withoutC[['pay_per_doc','length']]
#x = sm.add_constant(x, prepend=False)
#mod = manova.MANOVA(endog=d_withoutC[['avgtimesession','messageCount']],exog=x)
res = mod.fit()
'''
"\nmod = manova.MANOVA.from_formula(formula='messageCount+avgtimesession+sessionCount ~ pay_per_doc * length', data=d_withoutC)\n#x = d_withoutC[['pay_per_doc','length']]\n#x = sm.add_constant(x, prepend=False)\n#mod = manova.MANOVA(endog=d_withoutC[['avgtimesession','messageCount']],exog=x)\nres = mod.fit()\n"
#TODO: change here var name aov table session
#without C, categorical
for y in ['messageCount','sessionCount','avgtimesession']:
    model_session = ols(formula= y+ ' ~ C(pay_per_doc) * C(length) ', data=d_withoutC).fit()
    aov_table_session = anova_lm(model_session, typ=2) 
    aov_table_session = add_effect_size(aov_table_session)
    print('------------')
    print('RESULTS for ',y)
    display(aov_table_session)
    p = aov_table_session['PR(>F)']/6
    print(p<0.05)
    print(model_session)
    
------------
RESULTS for  messageCount
sum_sq df F PR(>F) omega_sq passed?
C(pay_per_doc) -1.484763e-11 1.0 -9.921434e-12 1.000000e+00 -0.000975 False
C(length) 9.470537e+02 1.0 6.328369e+02 8.030777e-84 0.615913 True
C(pay_per_doc):C(length) 2.700155e-02 1.0 1.804288e-02 8.932155e-01 -0.000957 False
Residual 5.866362e+02 392.0 NaN NaN NaN False
C(pay_per_doc)              False
C(length)                    True
C(pay_per_doc):C(length)    False
Residual                    False
Name: PR(>F), dtype: bool
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7feeab78d7b8>
------------
RESULTS for  sessionCount
sum_sq df F PR(>F) omega_sq passed?
C(pay_per_doc) -4.357911e-13 1.0 -1.326728e-12 1.000000 -0.002442 False
C(length) 5.421047e+00 1.0 1.650391e+01 0.000059 0.037857 True
C(pay_per_doc):C(length) 1.132725e-02 1.0 3.448482e-02 0.852775 -0.002358 False
Residual 1.287604e+02 392.0 NaN NaN NaN False
C(pay_per_doc)              False
C(length)                    True
C(pay_per_doc):C(length)    False
Residual                    False
Name: PR(>F), dtype: bool
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7feeab77c4a8>
------------
RESULTS for  avgtimesession
sum_sq df F PR(>F) omega_sq passed?
C(pay_per_doc) -1.627843e-08 1.0 -7.234986e-13 1.000000e+00 -0.000442 False
C(length) 4.204623e+07 1.0 1.868755e+03 3.137003e-151 0.825795 True
C(pay_per_doc):C(length) 2.637931e+02 1.0 1.172435e-02 9.138297e-01 -0.000437 False
Residual 8.819841e+06 392.0 NaN NaN NaN False
C(pay_per_doc)              False
C(length)                    True
C(pay_per_doc):C(length)    False
Residual                    False
Name: PR(>F), dtype: bool
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7feeab78d0f0>
d_withoutC.head()
sessionCount messageCount avgtimesession qual pay_per_doc length doc_set
0 1.0 1.0 0.0 0 0.017 6.0 1
1 1.0 1.0 0.0 0 0.017 6.0 1
2 1.0 1.0 0.0 0 0.017 6.0 1
3 1.0 1.0 0.0 0 0.017 6.0 1
4 1.0 1.0 0.0 0 0.017 6.0 1
print(d_withoutC.pay_per_doc.unique())
print(d_withoutC.length.unique())
[0.017 0.05 ]
[6. 3.]

(np.array([5.100166e-15,1.949090e-11,1.773891e-124,3.137003e-151,2.977822e-08])/6)<0.05

p = aov_table_session['PR(>F)'] print(p.head(3))

corrected_p_anova = multipletests(p, alpha = 0.05, method = 'holm') corrected_p_anova

# ONE WAY ANOVA
d_forC = d[(d['pay_per_doc']==0.05 ) &(d['length']==6.0)]
#display(d_forC.head())
#display(d_forC.describe())
for y in ['messageCount','sessionCount','avgtimesession']:
    model_session = ols(formula= y+ ' ~ C(qual)', data=d_forC).fit()
    aov_table_session = anova_lm(model_session, typ=1) 
    aov_table_session = add_effect_size(aov_table_session)
    print('------------')
    print('RESULTS for ',y)
    display(aov_table_session)
    p = aov_table_session['PR(>F)']/6
    print(p<0.05)
------------
RESULTS for  messageCount
df sum_sq mean_sq F PR(>F) omega_sq passed?
C(qual) 1.0 9188.924676 9188.924676 47.305401 1.949090e-11 0.089513 True
Residual 469.0 91101.767468 194.246839 NaN NaN NaN False
C(qual)      True
Residual    False
Name: PR(>F), dtype: bool
------------
RESULTS for  sessionCount
df sum_sq mean_sq F PR(>F) omega_sq passed?
C(qual) 1.0 70.523787 70.523787 0.758495 0.384246 -0.000513 False
Residual 469.0 43606.970905 92.978616 NaN NaN NaN False
C(qual)     False
Residual    False
Name: PR(>F), dtype: bool
------------
RESULTS for  avgtimesession
df sum_sq mean_sq F PR(>F) omega_sq passed?
C(qual) 1.0 4.027041e+06 4.027041e+06 65.4706 5.100166e-15 0.1204 True
Residual 469.0 2.884780e+07 6.150916e+04 NaN NaN NaN False
C(qual)      True
Residual    False
Name: PR(>F), dtype: bool
pq = aov_table_session['PR(>F)']/6
pq.head(10)
C(qual)     8.500276e-16
Residual             NaN
Name: PR(>F), dtype: float64

pq = aov_table_session['PR(>F)'] print(pq.head(3))

corrected_p_anovaq = multipletests(pq, alpha = 0.05, method = 'holm') corrected_p_anovaq

#additional ANOVA for C (1B,1C,2B)
d_forC = d[(d['pay_per_doc']==0.05 ) &(d['length']==6.0)]
#display(d_forC.head())
#display(d_forC.describe())
for y in ['messageCount','sessionCount','avgtimesession']:
    model_session = ols(formula= y+ ' ~ C(qual)', data=d_forC).fit()
    aov_table_session = anova_lm(model_session, typ=1) 
    aov_table_session = add_effect_size(aov_table_session)
    print('------------')
    print('RESULTS for ',y)
    display(aov_table_session)
------------
RESULTS for  messageCount
df sum_sq mean_sq F PR(>F) omega_sq passed?
C(qual) 1.0 9188.924676 9188.924676 47.305401 1.949090e-11 0.089513 True
Residual 469.0 91101.767468 194.246839 NaN NaN NaN False
------------
RESULTS for  sessionCount
df sum_sq mean_sq F PR(>F) omega_sq passed?
C(qual) 1.0 70.523787 70.523787 0.758495 0.384246 -0.000513 False
Residual 469.0 43606.970905 92.978616 NaN NaN NaN False
------------
RESULTS for  avgtimesession
df sum_sq mean_sq F PR(>F) omega_sq passed?
C(qual) 1.0 4.027041e+06 4.027041e+06 65.4706 5.100166e-15 0.1204 True
Residual 469.0 2.884780e+07 6.150916e+04 NaN NaN NaN False
 

Discussion of these results

For all experiments without quality control, we performed a 2-way ANOVA (considering pay_per_doc and length factors effects) on 'messageCount','sessionCount','avgtimesession'. The lentgh affects significantly ( p<0.05p<0.05 ) all three measures with large effect size ( ω2>0.06ω2>0.06 ) ( medium ( ω2>0.01ω2>0.01 ) for sessionCount). No significant interaction effects have been measured. Regarding the quality control, we performed a one-way ANOVA over experiments 1B,1C,2B (the ones that share same length and pay per documents). The quality controlled effected ( p<0.05p<0.05 ) the three measures with large effect size( ω2>0.06ω2>0.06 ) ( medium ( ω2>0.01ω2>0.01 ) for sessionCount).

#Latest
def plot_anova (data,y):
    sns.set(style="whitegrid")

    # Draw a pointplot to show pulse as a function of three categorical factors
    g = sns.catplot(x="# documents", y=y, hue="Quality control", col="Payment per document", #capsize=.2,
                 palette="YlGnBu_d", height=6, aspect=.75,
                kind="point", data=data.rename(columns={'messageCount':'Message count [log]','sessionCount':'Session count [log]','avgtimesession':'Time per session [log]',"qual": "Quality control", "pay_per_doc": "Payment per document", "length":'# documents'}))
    g.despine(left=True)
    
dtemp = d
dtemp['messageCount'] = np.log(d['messageCount']+1)
dtemp['sessionCount'] = np.log(d['sessionCount']+1)
dtemp['avgtimesession'] = np.log(d['avgtimesession']+1)
for y in ['Message count [log]','Session count [log]','Time per session [log]']:
    plot_anova(d,y)
/srv/paws/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
 
 
 

Discuss: Prediction task? classification task: worker_to_abandon (C1), worker_to_complete (C2)?

#when it's done we want to make both the comparisons 1C-1B and 1C-2B
pd.__version__
'0.23.4'
np.__version__
'1.15.0'