Jupyter Notebooks

Jupyter notebooks are a coding environment for Python (and several other programming languages). Like code editors such as SublimeText (and your Terminal or Powershell), you can run code in a notebook and see the output printed below.

Jupyter notebooks have several other benefits:

  1. You can run individual blocks of code one at a time.
  2. If you run code that prints any output, that output will be saved until the next time you run that block of code.
  3. You can publish your notebook publicly so that others can see your code and output in one place, accessed through a stable URL.
  4. You can easily import someone else's notebook, and customize their code.

Running code

You can run any Python in Jupyter notebooks that you can in your text editor or terminal.

When you press SHIFT+ENTER, the code is executed. If you include print statements, or if your code raises an error, that will be displayed below the code block.

pythonistas = ["John","Graham","Eric","Michael","Terry J.", "Terry G."]

for p in pythonistas:
    print(p)
John
Graham
Eric
Michael
Terry J.
Terry G.

Formatting text

You can insert blocks of text in your notebook, and format them using a plaintext formatting language called Markdown.

Heading one

"# Heading one"

Heading two

"## Heading two"

Heading three

"### Heading three"

Heading four

"#### Heading four"

Numbered lists:

  1. item one
  2. item two
  3. item three

bold text

**bold text**

italic text

*italic text*

More here: https://daringfireball.net/projects/markdown/syntax

Forking (copying) a Notebook

  1. get the url of another public PAWS notebook (example: https://paws.wmflabs.org/paws/user/Jtmorgan/notebooks/DS4UX%20Jupyter%20intro.ipynb)
  2. pass in a raw param to download a raw .ipynb file https://paws.wmflabs.org/paws/user/Jtmorgan/notebooks/DS4UX%20Jupyter%20intro.ipynb?format=raw
  3. log into your PAWS account and use "upload" to upload this copy into your own directory

Publishing a Notebook

(This part is a little bit manual)

All notebooks are technically public by default. In order to share the public (non-executable) version of any notebook on paws.wikimedia.org, you need to manually change the URL.

  1. Go to a Notebook (example: https://paws.wmflabs.org/paws/user/Jtmorgan/notebooks/DS4UX%20Jupyter%20intro.ipynb)
  2. Change "paws" to "paws-public" in both places where it appears in the URL ENCODED version of the URL https://paws-public.wmflabs.org/paws-public/User:Jtmorgan/DS4UX%20Jupyter%20intro.ipynb
  3. Share the new version of the URL with anyone who you want to be able to view the notebook. Every time you "save" your original notebook, the public version will reflect those changes.

Query an API

import requests

ENDPOINT = 'https://en.wikipedia.org/w/api.php'

parameters = { 'action' : 'query',
               'prop' : 'revisions',
               'titles' : 'Panama_Papers',
               'format' : 'json',
               'rvdir' : 'newer',
               'rvlimit' : 500,
               'rvstart': '2016-04-03T17:59:05Z',
               'rvend' : '2016-04-04T17:59:05Z',
               'continue' : '' }

num_revisions = 0

done = False
while not done:
    wp_call = requests.get(ENDPOINT, params=parameters)
    response = wp_call.json()

    pages = response['query']['pages']

    for page_id in pages:
        page = pages[page_id]
        revisions = page['revisions']
        for revision in revisions:
            num_revisions += 1

    print('Done one query, num revisions is now ' + str(num_revisions))

    if 'continue' in response:
        parameters['continue'] = response['continue']['continue']
        parameters['rvcontinue'] = response['continue']['rvcontinue']
    else:
        done = True

print(parameters['titles'] + ' had ' + str(num_revisions) + ' revisions in the first 24 hours')
Done one query, num revisions is now 500
Done one query, num revisions is now 607
Panama_Papers had 607 revisions in the first 24 hours

Import files

Once you've uploaded a file to your PAWS fileserver, you can import it into your Python code the usual way, since it's in the same directory.

NAMES_LIST = "yob2011_short.txt"

boys = {}
girls = {}

for line in open(NAMES_LIST, 'r').readlines():
    print(line)
    name, gender, count = line.strip().split(",")
    count = int(count)

    if gender == "F":
        girls[name.lower()] = count
    elif gender == "M":
        boys[name.lower()] = count
Sophia,F,21799

Isabella,F,19850

Emma,F,18761

Olivia,F,17286

Ava,F,15471

Emily,F,14228

Abigail,F,13221

Madison,F,12351

Mia,F,11503

Chloe,F,10966

Elizabeth,F,10050

Ella,F,9567

Addison,F,9286

Natalie,F,8620

Lily,F,8164

Grace,F,7613

Samantha,F,7375

Avery,F,7331

Sofia,F,7314

Aubrey,F,7167

Brooklyn,F,7151

Lillian,F,6900

Victoria,F,6874

Evelyn,F,6695

Hannah,F,6547

Alexis,F,6508

Charlotte,F,6414

Zoey,F,6388

Leah,F,6372

Amelia,F,6356

Zoe,F,6287

Hailey,F,6258

Gabriella,F,6079

Layla,F,6071

Nevaeh,F,6068

Kaylee,F,6027

Alyssa,F,5996

Anna,F,5641

Sarah,F,5532

Allison,F,5447

Savannah,F,5433

Ashley,F,5392

Audrey,F,5206

Taylor,F,5184

Brianna,F,5171

Aaliyah,F,5102

Riley,F,5026

Camila,F,4965

Khloe,F,4942

Zakarri,M,5

Zakhar,M,5

Zakhari,M,5

Zakry,M,5

Zalynn,M,5

Zaman,M,5

Zamaree,M,5

Zamarius,M,5

Zamiel,M,5

Zamiere,M,5

Zandar,M,5

Zandre,M,5

Zandyn,M,5

Zanthony,M,5

Zari,M,5

Zarrion,M,5

Zaryn,M,5

Zathan,M,5

Zaviyon,M,5

Zaya,M,5

Zayen,M,5

Zayir,M,5

Zayvien,M,5

Zecheriah,M,5

Zeid,M,5

Zeik,M,5

Zell,M,5

Zeph,M,5

Zephram,M,5

Zephyrus,M,5

Zepplin,M,5

Zerik,M,5

Zeryk,M,5

Zeyd,M,5

Zeyden,M,5

Zhair,M,5

Zhi,M,5

Zidaan,M,5

Zihan,M,5

Zihao,M,5

Ziheir,M,5

Zimri,M,5

Zyerre,M,5

Zykell,M,5

Zylar,M,5

Zylas,M,5

Zyran,M,5

Zyshawn,M,5

Zytavion,M,5

print(girls['sophia'])
21799

IMPORTANT: data licensing and privacy

The site that hosts these notebooks (called "WMF Labs") is run by the Wikimedia Foundation and governed by the following Terms of Use: https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use

Of these, the most relevant to us is are the rules around the data that can be hosted on WMFLabs server. Please do NOT place any of the following types of data in your notebook or your home directory:

  • Private data: Data that contains private information about people
  • Proprietary data: Data that is governed under a copyright license that prohibits open sharing or re-use

This means you should NOT allowed upload (e.g. from a CSV) or download (e.g. a JSON dump from an API query) the following types of data to your PAWS notebooks or home directory:

  • data from many for-profit websites, like Yelp, Twitter, Goodreads, NBA.com, etc.
  • private data with PII ("personally identifiable information")
  • any data which you think may be private, or for which you don't know the copyright status

Failure to comply with these rules may lead to your data or notebooks being deleted and/or your Wikipedia account being blocked.

Remember: everything you put into your notebook is publicly accessible!

Jupyter notebooks on GitHub

If you are working with proprietary and/or private data, or you simply don't want to use PAWS, you can also run Jupyter notebooks on GitHub. More information here: https://github.com/blog/1995-github-jupyter-notebooks-3

Some example notebooks