Replica Databases Helper

This is a importable notebook that provides simple helpers for performing queries on the labsdb replica databases from PAWS. It is stateful and designed to be easy to use in an interactive setup.

Import behavior

As soon as you import this notebook, it sets up a simple mysql connection pool to the replica. No actual connections are made until you try to use any of the functions.

It provides the following functions:

sql(query, *args, dbname=None)

This runs the given query (with passed in parameters) on the currently selected db. It returns an iterator


In [11]:
import os
from sqlalchemy import create_engine
In [42]:
host = os.environ['MYSQL_HOST']
user = os.environ['MYSQL_USERNAME']
password = os.environ['MYSQL_PASSWORD']
connection_string = 'mysql+pymysql://{user}:{password}@{host}/?charset=utf8mb4'.format(
In [63]:
engine = create_engine(connection_string, pool_recycle=300)
In [64]:
cur_db = 'enwiki_p'
In [65]:
def use_db(db):
    global cur_db
    if not db.endswith('_p'):
        db = db + '_p'
    cur_db = db
In [66]:
def sql(query, db=None):
    def encode_if_necessary(b):
        if type(b) is bytes:
            return b.decode('utf8')
        return b
    if db is None:
        db = cur_db
    con = engine.connect()
    con.execute('use %s' % db)
    for r in con.execute(query):
        d = dict(r)
        # This is unfortunately necessary since I can not seem to get sqlalchemy
        # nor pymysql to co-erce all strings to unicode values. This might be
        # because our dbs don't have proper encoding set.
        unicode_d = {k: encode_if_necessary(v) for k, v in d.items()}
        yield unicode_d