This is the first part of the series that will introduce you to the NLTK module. In this tutorial, you will learn how to set up your NLTK and start with some of the functions in the module.
What is Natural Language Processing (NLP)?
NLP is the ability of a computer program to understand human speech as it is spoken. NLP is a component of artificial intelligence (AI). Even if you don’t notice, NLP is widely used in several applications around us. To illustrate some of them, we can mention personal assistants like Siri, Cortana, and Alexa, search engines like Google, spam detection systems, etc.
NLP moves hand in hand with machine learning. Machine learning is used in order to understand what a sentence is saying.
Installing and Setting up NLTK
In this tutorial, we will be using PyCharm. To install the dependencies in PyCharm, create a file named “requirements.txt” in the root directory of your project with the following content:
1 2 3 4 |
nltk matplotlib numpy |
A message will appear in the top right corner asking you to install the missing libraries. Click on “Install requirements” and all libraries will be installed.
After this, create a file named “setting-up.py” with the following content:
1 2 3 4 |
import nltk nltk.download() |
Run this file and a new window will open. Select “all” and click to download. If you are using the command line mode, select “download” pressing “d” and Enter and then write “all”.
The download may take some time depending on your internet connection. After the download is complete, you can move to the next section.
Resources of NLTK Module
To introduce you to some of the resources of the NLTK module, we will start making a simple script that we will name “accessing-texts.py”. So in this file write:
1 2 |
from nltk.book import * |
If you run this, your code will output a list like in the image below.
These texts are the introductory texts associated with the
nltk.book module. To load them in the memory, you can use the
texts() function.
To access the texts individually, you can use
text1 to the first text,
text2 to the second and so on. So each text has several functions associated with them which we will talk about in the next section.
Functions of Class Text
If you type print(type(text1)) in your code, you can see that these variables have a special class Text from nltk.text package. This class has some functions that are used extensively in analysis of sentences.
The first function we will discuss is the concordance function. This function receives a single word as its parameter and returns you all the occurrences of that word in your text.
To give you an example, create a file named “basic-functions.py” and copy the following code:
1 2 3 4 |
from nltk.book import * print(text1.concordance("man")) |
Your output will be similar to the picture.
So as you can see in the example, the concordance function will return some words before the word “man” and some words after. This is useful when you need to know in which context the word is appearing.
The next function is the similar function. It receives a single word as its parameter and returns you all the words that appear in a similar context. If you replace the concordance with the similar function, you will get a list of words like in the image below:
Don’t worry if your output is different, this may vary depending on your environment.
The next function is dispersion_plot . To use this function write the following line in your code:
1 2 |
text1.dispersion_plot(["man", "woman"]) |
As you can see, the function receives a list as its parameter and returns you a graph with the words in the y-axis and the x-axis shows you in which part of the text the word occurs. In our example, the word “man” was used in almost all the text whereas “woman” was used around 5000, 15000 and 25000 words.
Note: You need to have Tkinter installed in order to use the dispersion_plot function.
The last function in this tutorial is the count function. This function expects a word as its parameter and returns you the occurrences of that word. So if you want for example see how many times the word “whale” appear in your text, you can type:
1 2 |
print(text1.count("whale")) |
These are the basic functions present in the Text class. If you have any question, leave in the comments below.
NLTK Course
Join our NLTK comprehensive course and learn how to create sophisticated applications using NLTK, including Gender Predictor, and Document Classifier, Spelling Checker, Plagiarism Detector, and Translation Memory system.
https://www.udemy.com/natural-language-processing-python-nltk/?couponCode=NLTK-BLOGS