!pip install dask
Collecting dask
  Downloading dask-0.12.0-py2.py3-none-any.whl (399kB)
    100% |████████████████████████████████| 399kB 875kB/s 
Installing collected packages: dask
Successfully installed dask-0.12.0
!pip install toolz
Collecting toolz
  Downloading toolz-0.8.1.tar.gz (44kB)
    100% |████████████████████████████████| 51kB 1.7MB/s 
Installing collected packages: toolz
  Running setup.py install for toolz ... - \ done
Successfully installed toolz-0.8.1
!pip install cloudpickle
Collecting cloudpickle
  Downloading cloudpickle-0.2.1-py2.py3-none-any.whl
Installing collected packages: cloudpickle
Successfully installed cloudpickle-0.2.1
import dask.dataframe as dd
filename = 'spanish/eswiki_20161101_headings.tsv'
df = dd.read_csv(filename, sep='\t')
df.head()
page_id page_title page_ns heading_level heading_text
0 7 Andorra 0 2 Toponimia
1 7 Andorra 0 2 Símbolos
2 7 Andorra 0 3 Escudo
3 7 Andorra 0 3 Bandera
4 7 Andorra 0 3 Himno
df.page_id.count()
dd.Scalar<series-..., dtype=int64>
# this crashed kernel
df.compute()