MJC brought to my attention the following piece of code which attempts to provide some empirical data comparing the use of threads vs processes in CPython.
This code, which I understand that it is stored for historical purposes, has some bugs, which I have pointed out in the comment section, but this is not the point of this article (something like this would be not constructive). I believe that a far better contribution would be to write a short piece attempting to point out a few “gotchas”.
Bit rot took hold and the post above has disappeared but the rest of the article still applies …
For the rest of this small article, I assume that you are familiar with how threads work in CPython, especially the GIL “issue”. If you are not then, in layman’s terms, the GIL lock is one big lock that ensures that only one thread is executing per Python interpreter, in order to facilitate more efficient garbage collection. It should be clarified that the GIL lock applies to CPython and not Jython, which manages to “map” python threads to “true-parallelizable-per-core-as-they-were-used-to-be” threads. Jython comes with a performance penalty though so YMMV. (NB: Ruby 1.8.x suffers from the same “issue” so no smug Ruby fans need apply 🙂 ) To sum it up, in CPython, only one thread is allowed to use the python interpreter at any given time. “What” I can hear you lamenting “this will kill my threading performance”. However, keep on reading, all is not lost.
First of all, the GIL ensures that only one thread is executing Python bytecode. However, obviously, this is a performance killer in CPU intensive applications, if you are solving an I/O bound problem then the effect is reduced. If you are using CPython to interface with C code (something that CPython does implicitly for certain library calls), once you have reached C territory you are GIL-free. In order to overcome this limitation, the multiprocessing module was introduced, a module that shares the API with threading module. Multiprocessing attempts a workaround of the GIL limitation by spawning a new python interpreter in a separate process. In a multicore system, this allows for parallelism, much better than threads. However, someone might assume at first that “wow, since a design goal of the API was to be nearly identical, now I can do something along the lines of :s/threading.Thread/multiprocessing.Process/g and I am home free and reap all the performance benefits”.
Not so fast, junior.
Access in global variables is the same when using threading, however if you are using multiprocessing you have to specify explicitly what is shared between the various python interpreters, which adds a bit of code complexity . A nice security side-effect of using multiprocessing, as opposed to threads, is that by controlling what is really shared between processes you get a poor-man’s sandbox for free. (I will update this blog post with some practical examples a bit later)
One might be able to get a few small speedups and proper process control by using a process pool (e.g. you definitely do not want your program to start creating processes uncontrollably, you might hit OS limits or worse have a HIDS detect a privilege abuse or something similar). Given the fact that creating a new process is a somewhat costly operation, in the simplest form, you just create a bunch of them upfront, have them “sleeping” and assign tasks as needed. Given that there are a lot of thread pool implementations out there and that the multiprocessing API closely matches, one should be able to code something like that in a small amount of time. A nice trick that I have seen in practice is fire up X processes, where X is the number of cores and then assign a thread pool to each of them.
Closing, in a multicore system, when solving CPU intensive problems, you are likely to get speed improvements by using multiprocessing as opposed to threads. As with all forms of parallel programming, subtle bugs might be introduced. I hope that my post gave a nice starting point for some “gotchas”