Modules, packages, and repositories are three terms that often cause confusion for analysts as they leave BI tools behind and begin to write their own code. I’ll reference these terms in many future posts, and you’ll continue seeing them throughout your career. We’ll cover the basics of each and go over a few examples in Python and R.
Here’s how to think about the three terms from a high level of abstraction: modules are simple files with code revolving around a specific task. Packages expand that concept and contain many modules all designed to help solve a larger problem. Think of modules as a single page of information, packages as folders containing many pages, and repositories as a filing cabinet.
The terms modules and packages are used interchangeably. While there is a difference, we won’t concern ourselves with it here. I’ll use ‘package’ as the catchall phrase. You’ll see it used in Python, R, and other programming languages.
Packages
Think of packages as an organizational unit of code. They are fundamental building blocks of programs and an invaluable way to arrange code you, or someone else has written. Packages contain function definitions or constant values with a shared purpose and commonality.
Programs need various packages to function. Sometimes even packages need packages to function. It’s not uncommon to have shared packages between different programs and their internal reliance can lead to Dependency Hell. It is the worst kind of hell. Thankfully we have package managers to help save us.
Package Managers and Repositories
To use a package in your code, you’ll first need that package on your machine. This is done with a package manager. Their job is to download, update, and generally manage packages. Python comes with a package manager called pip. It’s included with your Python installation and it downloads packages from the Python Package Index, which acts as a repository for millions of Python packages.
To install a Python package using pip, first open the command line. On Linux and OS X run:
pip3 install package_name
Many Linux distributions and OS X come with Python 2.7 already installed on the system. If you simply run:
pip install package_name
you will get the Python 2.7 version of that package and you won’t be able to use it in Python 3+. Using pip3 will get the Python 3+ compliant version of the package.
On a Windows machine run:
py -m pip install package_name
For example, if you need to install the popular requests package you would install it like so:
py -m pip install requests
And you’ll see it download and install all the necessary requirements to get that package working properly.
If you don’t already have Python 3+ installed on your Windows machine you can follow my instructions here.
Packages in R
R has an embedded way of installing packages. Instead of using a system level command line, we use the R Console. This makes the process the same on all operating systems.
From the R Console run:
install.packages("package_name")
CRAN is the main repository for R, and during the install process you’ll be asked to select a Secure CRAN Mirror. Choose whichever is closest to you. During installation you’ll see a similar screen while all the requirements are downloaded:
Using Packages in Code
Once a package is installed, it’s easy to incorporate in your code.
In Python we use the import statement to access packages.
import package_name
Once done, you can access any of that packages functionality by writing:
package_name.value
For instance, using the math package in Python 3.7 would like:
The syntax in R is sightly different, but the concept is the same. First we use the library statement to access the package (the R equivalent of Python’s import):
library("package_name")
Then we can access anything contained in the package by using:
package_name::value
For example, if we wanted to use the dplyr package to look at Star Wars data:
Final Thoughts
Packages and package managers are a tremendous help to developers. There are packages for everything from 3d graphics processing to database connectors. If you understand your problem and the functionality you need to solve it, there’s likely a package to help you.
There’s no need to reinvent the wheel every time you sit down to write code, so take advantage of packages!
Pingback: Web scraping with Python • The Analytics Corner