Python Multithreading Tutorial: Concurrency and Parallelism

Discussions criticizing Python often talk about how it is difficult to use Python for multithreaded work, pointing fingers at what is known as the global interpreter lock (affectionately referred to as the “GIL”) that prevents multiple threads of Python code from running simultaneously. Due to this, the threading module doesn’t quite behave the way you would expect it to if you’re not a Python developer and you are coming from other languages such as C++ or Java. It must be made clear that one can still write code in Python that runs concurrently or in parallel and make a stark difference resulting performance, as long as certain things are taken into consideration. If you haven’t read it yet, I suggest you take a look at Eqbal Quran’s article on concurrency and parallelism in Ruby on the Toptal blog.

In this Python concurrency tutorial, we will write a small Python script to download the top popular images from Imgur. We will start with a version that downloads images sequentially, or one at a time. As a prerequisite, you will have to register an application on Imgur. If you do not have an Imgur account already, please create one first.

The scripts in this tutorial has been tested with Python 3.4.2. With some changes, they should also run with Python 2 – urllib is what has changed the most between these two versions of Python.

P.S – I tested with Python 3.6.0 and works fine

Getting Started with Multithreading in Python

Let us start by creating a Python module, named “”. This file will contain all the functions necessary to fetch the list of images and download them. We will split these functionalities into three separate functions:

  • get_links
  • download_link
  • setup_download_dir

The third function, “setup_download_dir”, will be used to create a download destination directory if it doesn’t already exist.

Imgur’s API requires HTTP requests to bear the “Authorization” header with the client ID. You can find this client ID from the dashboard of the application that you have registered on Imgur, and the response will be JSON encoded. We can use Python’s standard JSON library to decode it. Downloading the image is an even simpler task, as all you have to do is fetch the image by its URL and write it to a file.

This is what the script looks like:

Next, we will need to write a module that will use these functions to download the images, one by one. We will name this “”. This will contain the main function of our first, naive version of the Imgur image downloader. The module will retrieve the Imgur client ID in the environment variable “IMGUR_CLIENT_ID”. It will invoke the “setup_download_dir” to create the download destination directory. Finally, it will fetch a list of images using the get_links function, filter out all GIF and album URLs, and then use “download_link” to download and save each of those images to the disk. Here is what “” looks like:

On my laptop, this script took 37.25 seconds to download 82 images.

Please do note that these numbers may vary based on the network you are on. 37.25 seconds isn’t terribly long, but what if we wanted to download more pictures? Perhaps 900 images, instead of 90. Maybe 9000 pictures. It would take too long. The good news is that by introducing concurrency or parallelism, we can speed this up dramatically.

All subsequent code examples will only show import statements that are new and specific to those. For convenience, all of these Python scripts can be found in this GitHub repository.

P.S – To make things easier set the IMGUR_CLIENT_ID variable in your environment:

Using Threads for Concurrency and Parallelism

Threading is one of the most well known approaches to attaining Python concurrency and parallelism. Threading is a feature usually provided by the operating system. Threads are lighter than processes, and share the same memory space.

In our Python thread tutorial, we will write a new module to replace “”. This module will create a pool of 8 threads, making a total of 9 threads including the main thread. I chose 8 worker threads, because my computer has 8 CPU cores and one worker thread per core seemed a good number for how many threads to run at once. In practice, this number is chosen much more carefully based on other factors, such as other applications and services running on the same machine.

This is almost the same as the previous one, with the exception that we now have a new class, DownloadWorker, that is a descendent of the Thread class. The run method has been overridden, which runs an infinite loop. On every iteration, it calls “self.queue.get()” to try and fetch an URL to from a thread-safe queue. It blocks until there is an item in the queue for the worker to process. Once the worker receives an item from the queue, it then calls the same “download_link” method that was used in the previous script to download the image to the images directory. After the download is finished, the worker signals the queue that that task is done. This is very important, because the Queue keeps track of how many tasks were enqueued. The call to “queue.join()” would block the main thread forever if the workers did not signal that they completed a task.

Running this script on the same machine used earlier results in a download time of 7.66 seconds!

Thats 4.8 times faster than the previous example. While this is much faster, it is worth mentioning that only one thread was executing at a time throughout this process due to the GIL. Therefore, this code is concurrent but not parallel. The reason it is still faster, is because this is an IO bound task. The processor is hardly breaking a sweat while downloading these images, and the majority of the time is spent waiting for the network. This is why threading can provide a large speed increase. The processor can switch between the threads whenever one of them is ready to do some work. Using the threading module in Python or any other interpreted language with a GIL can actually result in reduced performance. If your code is performing a CPU bound task, such as decompressing gzip files, using the threading module will result in a slower execution time. For CPU bound tasks and truly parallel execution, we can use the multiprocessing module.

While the de facto reference Python implementation – CPython – has a GIL, this is not true of all Python implementations. For example, IronPython, a Python implementation using the .NET framework does not have a GIL, and neither does Jython, the Java based implementation. You can find a list of working Python implementations here.

Spawning Multiple Processes

The multiprocessing module is easier to drop in than the threading module, as we don’t need to add a class like the threading example. The only changes we need to make are in the main function.

To use multiple processes we create a multiprocessing Pool. With the map method it provides, we will pass the list of URLs to the pool, which in turn will spawn 8 new processes and use each one to download the images in parallel. This is true parallelism, but it comes with a cost. The entire memory of the script is copied into each subprocess that is spawned. In this simple example it isn’t a big deal, but it can easily become serious overhead for non-trivial programs.


It took 7.56 seconds to download 34 images.

Distributing to Multiple Workers

While the threading and multiprocessing modules are great for scripts that are running on your personal computer, what should you do if you want the work to be done on a different machine, or you need to scale up to more than the CPU on one machine can handle? A great use case for this is long-running back-end tasks for web applications. If you have some long running tasks, you don’t want to spin up a bunch of subprocesses or threads on the same machine that need to be running the rest of your application code. This will degrade the performance of your application for all of your users. What would be great is to be able to run these jobs on another machine, or many other machines.

A great Python library for this task is RQ, a very simple yet powerful library. You first enqueue a function and its arguments using the library. This pickles the function call representation, which is then appended to a Redis list. Enqueueing the job is the first step, but will not do anything yet. We also need at least one worker to listen on that job queue.

The first step is to install and run a Redis server on your computer, or have access to a running Redis server. After that, there are only a few small changes made to the existing code. We first create an instance of an RQ Queue and pass it an instance of a Redis server from the redis-py library. Then, instead of just calling our “download_link” method, we call “q.enqueue(download_link, download_dir, link)”. The enqueue method takes a function as its first argument, then any other arguments or keyword arguments are passed along to that function when the job is actually executed.

One last step we need to do is to start up some workers. RQ provides a handy script to run workers on the default queue. Just run “rqworker” in a terminal window and it will start a worker listening on the default queue. Please make sure your current working directory is the same as where the scripts reside in. If you want to listen to a different queue, you can run “rqworker queue_name” and it will listen to that named queue. The great thing about RQ is that as long as you can connect to Redis, you can run as many workers as you like on as many different machines as you like; therefore, it is very easy to scale up as your application grows. Here is the source for the RQ version:

However, RQ is not the only Python job queue solution. RQ is easy to use and covers simple use cases extremely well, but if more advanced options are required, other job queue solutions (such as Celery) can be used.


If your code is IO bound, both multiprocessing and multithreading in Python will work for you. Multiprocessing is a easier to just drop in than threading, but has a higher memory overhead. If your code is CPU bound, multiprocessing is most likely going to be the better choice – especially if the target machine has multiple cores or CPUs. For web applications, and when you need to scale the work across multiple machines, RQ is going to be better for you.

This article was originally published in:

Credits: Image by Joe Armstrong –