Edit Distance and Jaccard Distance Calculation with NLTK

Edit Distance

Edit Distance (a.k.a. Levenshtein Distance) is a measure of similarity between two strings referred to as the source string and the target string.

The distance between the source string and the target string is the minimum number of edit operations (deletions, insertions, or substitutions) required to transform the source into the target. The lower the distance, the more similar the two strings. 

Continue reading “Edit Distance and Jaccard Distance Calculation with NLTK”

Data Extraction from APIs with Python – Currency Exchange

There are several popular platforms that give developers access to their “web services”, aka “APIs” (Application Programming Interface). So using APIs is the official way for data extraction and doing other stuff allowed by such applications. You can even benefit from some APIs to build other applications. REST APIs usually generate output in JSON or XML format because most of programming languages can handle these formats easily. In fact, JSON (JavaScript Object Notation) is very similar to data types in programming languages; for example, it is very similar to Python dictionaries. If a REST API allows you to get the data you want to retrieve, then you do not need regular web scraping.

Some APIs require authentication (API Key or Client ID and Client Secret, similar to a username and password, so to speak) to control their usage, and some do not. We will explain this later in multiple APIs. For the purpose of clarifying the basics, we will start with a very simple currency rate conversion API that does not require any authentication.

In this tutorial, you will learn how to use Python to extract data from ExchangeRatesAPI.io which is -according to its official website- “a free service for current and historical foreign exchange rates published by the European Central Bank.” Continue reading “Data Extraction from APIs with Python – Currency Exchange”