%pylab inline
Populating the interactive namespace from numpy and matplotlib
!pip install nltk
Collecting nltk
  Downloading nltk-3.2.5.tar.gz (1.2MB)
    100% |████████████████████████████████| 1.2MB 322kB/s eta 0:00:01
Requirement already satisfied: six in /srv/paws/lib/python3.6/site-packages (from nltk)
Building wheels for collected packages: nltk
  Running setup.py bdist_wheel for nltk ... error
  Complete output from command /srv/paws/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-ikt7_dt_/nltk/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpjbs01ttspip-wheel- --python-tag cp36:
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help
  
  error: invalid command 'bdist_wheel'
  
  ----------------------------------------
  Failed building wheel for nltk
  Running setup.py clean for nltk
Failed to build nltk
Installing collected packages: nltk
  Running setup.py install for nltk ... done
Successfully installed nltk-3.2.5
import nltk
nltk.download(['stopwords','punkt','reuters','book'])
[nltk_data] Downloading package stopwords to /home/paws/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/paws/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package reuters to /home/paws/nltk_data...
[nltk_data]   Package reuters is already up-to-date!
[nltk_data] Downloading collection 'book'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/abc.zip.
[nltk_data]    | Downloading package brown to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/brown.zip.
[nltk_data]    | Downloading package chat80 to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/chat80.zip.
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/cmudict.zip.
[nltk_data]    | Downloading package conll2000 to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/conll2000.zip.
[nltk_data]    | Downloading package conll2002 to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/conll2002.zip.
[nltk_data]    | Downloading package dependency_treebank to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/dependency_treebank.zip.
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Package genesis is already up-to-date!
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package ieer to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/ieer.zip.
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/inaugural.zip.
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/movie_reviews.zip.
[nltk_data]    | Downloading package nps_chat to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/nps_chat.zip.
[nltk_data]    | Downloading package names to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/names.zip.
[nltk_data]    | Downloading package ppattach to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/ppattach.zip.
[nltk_data]    | Downloading package reuters to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Package reuters is already up-to-date!
[nltk_data]    | Downloading package senseval to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/senseval.zip.
[nltk_data]    | Downloading package state_union to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/state_union.zip.
[nltk_data]    | Downloading package stopwords to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Package stopwords is already up-to-date!
[nltk_data]    | Downloading package swadesh to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/swadesh.zip.
[nltk_data]    | Downloading package timit to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/timit.zip.
[nltk_data]    | Downloading package treebank to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/treebank.zip.
[nltk_data]    | Downloading package toolbox to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/toolbox.zip.
[nltk_data]    | Downloading package udhr to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/udhr.zip.
[nltk_data]    | Downloading package udhr2 to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/udhr2.zip.
[nltk_data]    | Downloading package unicode_samples to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/unicode_samples.zip.
[nltk_data]    | Downloading package webtext to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/webtext.zip.
[nltk_data]    | Downloading package wordnet to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet.zip.
[nltk_data]    | Downloading package wordnet_ic to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet_ic.zip.
[nltk_data]    | Downloading package words to /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/words.zip.
[nltk_data]    | Downloading package maxent_treebank_pos_tagger to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping taggers/maxent_treebank_pos_tagger.zip.
[nltk_data]    | Downloading package maxent_ne_chunker to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data]    | Downloading package universal_tagset to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping taggers/universal_tagset.zip.
[nltk_data]    | Downloading package punkt to /home/paws/nltk_data...
[nltk_data]    |   Package punkt is already up-to-date!
[nltk_data]    | Downloading package book_grammars to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping grammars/book_grammars.zip.
[nltk_data]    | Downloading package city_database to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping corpora/city_database.zip.
[nltk_data]    | Downloading package tagsets to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping help/tagsets.zip.
[nltk_data]    | Downloading package panlex_swadesh to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /home/paws/nltk_data...
[nltk_data]    |   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data]    | 
[nltk_data]  Done downloading collection book
True
from nltk.book import text4
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
text4.concordance("America")
Displaying 25 of 192 matches:
posed in me by the people of united America . Previous to the execution of any 
y times , that no middle course for America remained between unlimited submissi
this dangerous crisis the people of America were not abandoned by their usual g
ay be exposed ) which the people of America have exhibited to the admiration an
ty toward the aboriginal nations of America , and a disposition to meliorate th
onor and integrity of the people of America and the internal sentiment of their
een Spain and the colonies in South America , which had commenced many years be
 Britannic Majesty ' s dominions in America , with other differences on importa
 generations yet to come , and that America will present to every friend of man
route across the isthmus of Central America . It is impossible to conceive that
thmus that connects North and South America as will protect our national intere
, in the Philippines , and in South America are known to everyone who has given
of steamers between North and South America has been brought to the attention o
ates and the western coast of South America , and , indeed , with some of the i
nt ports on the east coast of South America reached by rail from the west coast
action will avail , is the unity of America -- an America united in feeling , i
ail , is the unity of America -- an America united in feeling , in purpose and 
friendship and harbor no hate . But America , our America , the America builded
 harbor no hate . But America , our America , the America builded on the founda
e . But America , our America , the America builded on the foundation laid by t
ligent , dependable popular will of America . In a deliberate questioning of a 
mandate in manifest understanding . America is ready to encourage , eager to in
lization , and we hold a maintained America , the proven Republic , the unshake
ome , it also revealed the heart of America as sound and fearless , and beating
lective strength and consecrate all America , materially and spiritually , body
#Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.
text4.similar("citizen")
people states executive world union constitution country time
government nation future president right power effort man one hand
duties administration
#Examine just the contexts that are shared by two or more words:
text4.common_contexts(["war", "freedom"])
of_citizens of_and the_to in_and of_in of_the of_of of_that the_of
of_may the_the of_but the_and for_we of_s of_we that_is of_to of_have
len(text4)
145735
text4.count("democracy")
52
#distint words
print(len(set(text4))) #types
# Each word used on average x times. Richness of the text. 
len(text4) / len(set(text4))
9754
14.941049825712529
len(set(text4))
9754
len(text4)
145735