Anaconda is the most popular Python distribution among data scientists, mostly because it makes their life easier in multiple ways (which we will discuss below).
In this article, we will give you a high-level overview of the benefits of using the Anaconda Python distribution for machine learning, and we will guide you step-by-step on how to install it on your computer.
What is Anaconda Python?
Anaconda is a Python distribution maintained by Anaconda, Inc. (formerly Continuum Analytics), a company founded by two Python veterans: Peter Wang and Travis Oliphant.
Besides Python, it also supports R, and it does this cross-platform on Windows, macOS, and Linux.
What problem does Anaconda solve?
Most importantly, Anaconda tackles a frequent package management challenge faced by data scientists.
Package management with pip
Installing Python packages requires the use of a package manager, the most popular being pip. pip installs dependencies without checking whether they conflict with existing packages, so it might break them. Even worse, it can make them produce different results without the user’s knowledge.
To give an example, let’s say we are already using a machine learning package depending on NumPy (the most widely used numerical computation package for Python). Now, when we install some other package that depends on a different version of NumPy, it could break our machine learning library or generate inconsistent results.
Imagine having this in production code.
Package management with conda
For this reason, Anaconda uses its own package management system, conda. When installing or updating a package, it takes into account the whole environment and the user-defined package preferences. Then, it either tries to work out a configuration compatible with all dependencies or, at least, makes the conflicts explicit.
So, as a result, you do not have to be afraid that installing a new package could affect your entire data analytics workflow.
Package management, therefore, is a core feature of the Anaconda distribution. Besides this, however, there are further functionalities with their pros and cons.
We will review them in the next section.
The main differences between Anaconda and system Python
So, besides the different approach to package management, what are the main differences between the Anaconda distribution and the default system version of Python?
First, a side note. Very often, people compare Anaconda and pip as mutually exclusive alternatives, but as we will see, it is not that simple, as Anaconda is not only a package manager, but has other functionalities as well.
Here we will also compare Anaconda to pip, the Python default package manager.
Pros of using system Python
- pip is the most straightforward way to install packages on Python
- pip is often faster than conda (the package manager of Anaconda)
- You get the most recent versions of each package
- No company is involved
Cons of using system Python
- Requires caution and possibly manual handling of dependencies
- Relying on pip can be unstable and break dependencies
- Cannot install non-Python packages
Pros of using Anaconda
- Package management and dependency is more extensive as it can also track non-Python dependencies
- It has its own native virtual environment management system which can integrate different Python versions
- conda environments are isolated environments
- Anaconda is a company, which ensures consistency and support for commercial applications
- It comes with many useful and integrated applications for data science (most importantly Jupyter Notebook)
- It is perhaps the most popular Python distribution nowadays, and the number one in data science development
Cons of using Anaconda
- The package manager tends to be slow (although the Anaconda team has been actively working on this issue)
- Often the packages are not the most recent version because they undergo a more thorough dependency checking
- Not every package accessible through pip is also in conda, however some packages can be found in conda-forge, which is maintained by the user community
- Installing packages using both pip and conda within the same environment can generate conflicts
Overall, we could say that it is best to use Anaconda for data-science related projects, and especially when managing a full ecosystem, including different Python versions and non-Python elements.
It might be better to stick to system Python, however, for building a customized and robust environment with a narrow focus on a particular Python version.
How to install Anaconda Python on macOS
The most straightforward way to install Anaconda is to download the appropriate installer from the official macOS download page. There you can download it by clicking on the link for your preferred version.
Although it is still possible to download the Python 2.7 version, we recommend going for Python 3.7, as Python 2 is no longer supported.
There are both a graphical and a command-line installer available. By default, you should be fine with the graphical version. In this tutorial, we also follow that path.
After downloading the package, run it and follow its instructions.
Most of the steps are straightforward; the two main things you can alter are
- The install location
- Whether to install the PyCharm IDE as well
If you do not have specific preferences regarding these options, you can use the default location and skip the PyCharm installation.
After going through all the steps, you should see the following screen.
Congratulations, you have successfully installed the Anaconda Python distribution!
Now, how should you proceed from here? In the next section, we will give you a quick overview of the things which you can do with Anaconda right away.
Getting started with Anaconda Python on Mac OS
You can use Anaconda both as a GUI (Anaconda Navigator) or through the command line (
conda). If you are just getting started with it, we advise you to use the GUI version.
On macOS, you can open the Navigator by opening the Launchpad, and then clicking the Anaconda Navigator icon.
Applications coming with the Anaconda distribution
The home screen of the Navigator shows you the applications which come with the Anaconda distribution.
We do not have space to discuss them in detail, but here is a quick overview of them:
- Jupyter Notebook: A popular web-based interactive environment frequently used by data scientists.
- JupyterLab: The next-generation user interface extending Jupyter Notebook with multiple functionalities.
- Qt Console: A lightweight terminal-like environment supporting many of the IPython-based GUI features of Jupyter Notebook.
- Spyder: A Python IDE explicitly designed for data science purposes.
- Glueviz: A linked-view data visualization package.
- Orange: A component-based visual programming software package designed for data analytics and machine learning applications.
- RStudio: An R IDE popular among data scientists using R.
- Visual Studio Code: The popular IDE. If you launch it from the Navigator, it runs with the configurations of the currently selected environment.
If you are using Anaconda Python, you probably want to use it with Jupyter Notebook.
(If you are not familiar with Jupyter Notebook, you might want to try it, as it is a handy tool. We’ve also written a specific guide about it on our blog.)
You can start it directly by clicking on the “Launch” button in the Navigator’s Home screen (see above), or with the
jupyter notebook command from the terminal.
This should open a new tab in your browser, similar to the one below. It will show you files in your home folder.
You can create a new notebook by clicking on the “New” button at the top-right.
This will open a new tab with the new notebook, where you can start working on your data science project.
In this article, we reviewed what the Anaconda Python distribution is and its main benefits.
Its core advantage (especially when compared to pip) is its robust package management system. It also provides an excellent environment management process and several tools (including Jupyter Notebook), all tailored for data science functionalities.
We also reviewed the main steps of installing Anaconda and starting the Navigator and Jupyter Notebook.
As you can see, Anaconda is a well thought through and carefully-designed distribution. It provides a sound development environment and frees up time you otherwise would have spent on unnecessary grunt work.
Are you interested in learning more about it? We can teach you the intricacies of Anaconda Python and how you can use it for machine learning and data science.
- How to install Anaconda Python on your Mac - June 15, 2020
- How to effectively measure a classifier’s performance and interpret its metrics - May 11, 2020
- How to choose the correct loss function for your neural network - April 15, 2020