Background
If you have been using Python then you must have come across the term GIL or the fact that Python is primarily a single-thread language. In this post, we will see what this Global Interpreter Lock (GIL) in Python is.
Understanding Global Interpreter Lock (GIL) in Python
The GIL or Global interpreter lock is a lock or a mutex that allows only one thread to hold the control of the Python interpreter. This means that at a time only one thread can be running (You cannot use more than one CPU core at a time).
This essentially prevents Python's internal memory from being corrupted. GIL ensures there are no dangling pointers and memory leaks. Think of this as the equivalent of each object in Python requiring a lock before accessing them, if you do this on your own this might cause deadlock which is prevented with GIL but it essentially makes it single-threaded.
However, note that even with GIL there can be race conditions because all operations will not be atomic. So let's say we have a method that takes in an object and appends it to the end of an array. Thread 1 can go inside the method, read the index that it needs to add the new object to, and then suspend(release GIL) before it can be inserted. Thread 2 gets GIL and does the same for other object and suspends. Now when thread 1 comes back it will update the same index thread 2 wrote to which will overwrite the data and cause race condition.
How does GIL prevent memory corruption?
Python does memory management by keeping a count of references of each object. When this count reaches 0 memory help by that object is freed.
Let's see an example of this
import sys a = ["A", "B"] b = a print(f"References to a : {sys.getrefcount(a)}")
The output is: References to a : 3
It is 3 because one reference is a , 2nd reference is b and 3rd reference is locally created when it is passed in sys.getrefcount method. When this reference reaches 0 then the memory associated with this list object will be released.
Now if multiple threads were allowed then we could have two threads simultaneously increasing or decreasing the count. This can lead to
- There are no actual references to the object but the count is 1 due to race condition. This is essentially a memory leak i.e. the object is not referenced but cannot be garbage collected as well due to bad reference count.
- There are still references to objects but the count is 0 due to race conditions and Python frees the object memory. This will lead to a dangling pointer.
One way to handle the above case is to lock each object in Python before its reference is updated. But as we know locking comes with drawbacks like deadlock, so if two threads are waiting for the lock deadlock will happen. This can also impact the performance as the threads will frequently acquire and release locks.
The alternative is the GIL - single lock on the interpreter itself. So any thread that needs to execute any code needs to acquire GIL to be able to run the code via interpreter. This prevents deadlocks but essentially makes Python single-threaded.
NOTE: Many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.
No comments:
Post a Comment