As everything in Python is an object, classes are created by instanciating a super class called a metaclass. Metaprogramming is about overriding some behavior of that metaclass. This is often used to follow the DRY principle (Don’t Repeat Yourself) in order to avoid repeating the same or similar code inside a program. Metaprogramming is thus often used by frameworks such as Django as it helps to make their API much easier to use.
Imagine we want a given class to keep the old values of some of its attributes. There are several ways to achieve this, but one of them is to use a descriptor – a class that allows to control access to an attribute.
class Descriptor:
def __init__(self, name):
self.name = name
def __set__(self, instance, value):
# store the value in the attribute
instance.__dict__[self.name] = value
# if there is an attribute [name]_history, add the new value
if hasattr(instance, self.name + '_history'):
getattr(instance, self.name + '_history').insert(0, value)
# otherwise create that attribute
else:
setattr(instance, self.name + '_history', [value])
def __delete__(self, instance):
del instance.__dict__[self.name]
# also delete the history attribute if it exists
if hasattr(instance, self.name + '_history'):
delattr(instance, self.name + '_history')
The descriptor above overrides the default update behavior, keeping all the values of the desired attribute in a list named “<attribute name>_history”. This makes updating the value slower than usual, but because the descriptor does not override the __get__ action, reads (the most common access) are as fast as usual. A more complex descriptor could also record who made the updates and when.
Note that one can access instance attributes either through its dictionary (“instance.__dict__”) or by calling the getattr(), hasattr(), setattr() and delattr() methods. The built-in methods are more human-readable but probably not as fast as directly accessing the dictionary. One cannot however use setattr() or delattr() on the current attribute (lines 7 and 16), as it would end up calling respectively Descriptor.__set__() and Descriptor.__delete__(), triggering an infinite loop.
Once the descriptor is defined, it can be used as follows:
>>> class MyClass(object):
... attr = Descriptor('attr')
...
>>> obj = MyClass()
>>> obj.attr = "Value"
>>> obj.attr = "New value"
>>> obj.attr
'New Value'
>>> obj.attr_history
['New Value', 'Value']
Because class update descriptors override instance attributes of the same name (as we have previously seen), we can declare “attr” at the class level. That does not however mean that two instance of MyClass will share the same values. Indeed, the descriptor’s __set__ method modifies the instance and not the class attributes.
But what if we want to have a more automatic way to declare such attributes? What if, in a DRY fashion, we want to avoid repeating the attribute name twice?
One way is to use metaprogramming. Let’s define a convention where a class attribute “attr_version” is a list of attribute names that should be declared as versioned attributes. And let’s use metaprogramming as the plumbing to make it work.
Each class is created by instanciating the metaclass “type”. It is indeed possible to create a class entirely dynamically by explicitly instanciating “type”. The following class:
class MyClass(object):
attr = 42
def display(self):
print('Value: ' + str(self.attr))
can be written 100% dynamically by using the following code (*), even though the result is not as human-readable:
exec("def display(self):print('Value: '+str(self.attr))")
globals()['MyClass'] =
type('MyClass',
(object,),
{ 'attr': 42, 'display': globals()['display'] })
The above code first defines a function “display” (note it is calling exec() and passing a string, thus making the creation really dynamic). It then instanciates “type”, passing 1) the class name 2) the base class(es) and 3) a dictionary of class attributes (remember, a method is just a callable attribute). It is also using globals[] instead of hard-coding variable names (“MyClass = type(…)”, ” ‘display’: display”)
It is however possible to create your own metaclass by deriving “type”:
class MyType(type):
def __new__(cls, name, bases, clsdict):
for attr_name in clsdict['attr_version']:
clsdict[field] = Descriptor(attr_name)
clsobj = super().__new__(cls,name,bases,clsdict)
return clsobj
The __new__ method is called when the new class is being created (you can also overwrite __init__ which is called after the class was created). It receives as parameters the new class object, its name, its base class(es) and the new class dictionary which contains all its attributes. From here, it is easy to fetch the attribute ‘attr_version’ (line 3) and add new class attributes to the dictionary (line 4).
Once this is done, you can use this metaclass the following way:
>>> class MyClass(metaclass=MyType): ... attr_version = ['attr1', 'attr2'] ... >>> obj = MyClass() >>> obj.attr1 = "Value" >>> obj.attr1 = "Other value" >>> obj.attr1_history ['Other Value', 'Value'] >>> obj.attr2 = 42 >>> obj.attr2 = 43 >>> obj.attr2_history [43. 42]
Pretty much any class has a constructor such as:
class MyClass(object):
def __init__(self, attr1, attr2, attr3, ...):
self.attr1 = attr1
self.attr2 = attr2
self.attr3 = attr3
(...)
How could we simplify this boring, repetitive task? We can do so by defining another convention where the class attribute “attr_init” contains the list of attributes that should be passed to the constructor and by defining a metaclass that is generating the proper constructor.
class MyType(type):
def __new__(cls, name, bases, clsdict):
attr_names = clsdict['attr_init']
params = str.join('', [', '+attr_name+'=None' for attr_name in attr_names])
code = 'def __init__(self' + params + '): \r\n'
for attr_name in attr_names:
code += ' if ' + attr_name + \
' != None: self.' + \
attr_name + '=' + \
attr_name + '\r\n'
exec(code, clsdict)
clsobj = super().__new__(cls, name, bases, clsdict)
return clsobj
class MyClass(metaclass=MyType):
attr_init = ['attr1', 'attr2']
This metaclass is dynamically generating a string that contains the constructor code (lines 4 to 10). This string is turned into actual Python code by calling exec() on line 11. Defining MyClass(metaclass=MyType) as shown above implicitly generates the following constructor:
def __init__(self, attr1=None, attr2=None):
if attr1 != None: self.attr1=attr1
if attr2 != None: self.attr2=attr2
In terms of performance, the generation of the MyClass takes more time than explicitly writing the same constructor as Python needs do parse the code in the exec() call to generate the constructor. But classes are not generated often so it shouldn’t be a problem. However, once MyType.__new__() completes, the resulting constructor is as fast as if it had been explicitly written as it has been fully compiled in bytecode.
Metaprogramming can be very useful to simplify code. This should however be used carefully. It can be very easy to forget our own conventions if they start to be too numerous and complex, making the code more difficult to maintain.
For those who are interested in further metaprogramming examples, I recommend this video (from which this article is heavily inspired)
(*) If you’re not convinced, run the following commands for both classes and check that the output is the same:
>>> import dis >>> dis.dis(MyClass) # looks at the generated bytecode >>> MyClass.__mro__ # looks at the parent classes >>> dir(MyClass) # looks at the class dictionary]]>
Python, like languages, contains several built-in collections. Let’s review the various types of collection and how can you also create your own collections.
The first type of collections are iterable types, which means they can be used in a “for elt in collection” loop. Pretty much all the collections in Python are iterable. You can create your own iterable types by implementing the following:
Here is a simple example of a custom iterable type:
>>> class MyIter(object): ... def __init__(self, max): ... self.max = max ... self.number = 0 ... def __iter__(self): ... return self ... def __next__(self): ... if self.number >= self.max: ... raise StopIteration ... self.number += 1 ... return self.number ... >>> class MyRange(object): ... def __init__(self, max): ... self.max = max ... def __iter__(self): ... return new MyIter(self.max) ... >>> obj = MyRange(4) >>> for nb in obj: ... print(nb) ... 1 2 3 4
An iterator that supports key/value iterations (e.g. “for key, value in my_collection”) works the same way, except __next__() returns a tuple (key, value) instead of just the value.
The slice notation “collection[start:end:step]” allows to describe parts of a collection (e.g. my_collection[3:10]). It order to implement this notation, a type must define the following methods:
The slice argument is either an integer if the traditional index notation is used (“my_collection[3]”) or a slice object when the slice notation is used. For example, “obj[1:10]” calls “obj.__getitem__(slice(1, 10, None))” under the hood.
Python provide several built-in sequences that implement the following operations:
Mutable sequences, on top of that, implement modifications through the [] or del operators, append(), clear(), copy(), insert(), extend(), insert(), pop(), remove() and reverse()
The most popular sequences in Python are lists, dictionaries, sets, bytes, bytearrays, memoryviews, tuples, ranges, strings or generators. Of these, lists, set and dictionaries are the only mutable sequences.
Lists are arrays of elements. And by arrays I mean a pre-allocated contiguous structure. This has several implication. The upsides is that the memory footprint impact is limited. Also, random access is very fast.
There are however some downsides. The first is that if the program tries to append an element in an array which is full, Python needs to allocate a new, larger one and copy all the elements from the old array on to the new one. In order to minimize this, lists have two sizes: the one that len() returns, and the actual allocated size. If the two are equal at initialization, the allocation size increases faster than the “official” size. The idea is that if the addition of a new element requires to rebuild the array, let’s rebuild it with more than one extra element, as chances are that other elements are going to be added.
Another consequence is that a list is designed to add elements at the end rather than at the beginning of the list. Consider the following code which is adding new elements at the beginning of the list:
>>> def add1(arr, nb): ... r = range(nb) ... t1 = time.time() ... for _ in r: ... a.insert(0, 1) ... t2 = time.time() ... print(t2-t1) ... >>> add1([], 1000) 0.0019998550415039062 >>> add1([], 10000) 0.06000399589538574 >>> add1([], 100000) 4.750272035598755 >>> add1([], 200000) 18.93508291244507
As we can see, the time to insert elements at the beginning of the list explodes as the number of elements increases. Inserting 20,000 objects takes 4 times as long as inserting 10,000 objects. The reason is because, for each insertion, Python needs to copy all the elements, shifting them of one place. This starts to be costly when each and every insertion requires to copy over 10,000 objects. By comparison, adding a million or even 10 million objects using append() is much faster:
>>> def add2(arr, nb): ... r = range(nb) ... t1 = time.time() ... for _ in r: ... arr.append(1) ... t2 = time.time() ... print(t2-t1) ... >>> add2([], 1000000) 0.16200900077819824 >>> add2([], 10000000) 1.3950800895690918
Tuples can be seen as immutable lists. They are also allocated as an array of elements which makes sense considering the main downside of an array is when it mutates.
Interestingly, CPython may sometimes reuse tuples (like the empty tuple) to avoid duplicates, or even “resize” them instead of deaallocating a tuple and allocating a new one.
With Python 3, ranges are always dynamically generated. The only thing that a range stores is the start point, the end point and the step. This is a change with Python 2 where ranges were actually pre-computing the numbers (exception for xrange which behaved like Python 3’s range). Which means that allocating a range of 2 elements or 2 million isn’t any faster (going through the range is another story).
Range are immutable in the sense that they cannot be modified after they are created. When a slice is applied to a range (e.g. range(1, 10)[2:3]), a new range gets created. Note that Python does NOT go through an iterator to find the start and end of the new range, but rather computes the new values. It is using a similar algorithm when calling “24 in range(1, 100, 4)”.
]]>There are things that vary greatly from language to language. After variable scope, let’s look at object comparison. That is, how are objects compared to each other? How are they evaluated as a condition?
The first question is: how are certain objects converted when evaluated as a condition? If you have:
>>> value = ... >>> True if value else False
What is the result? Under the hood Python is calling “bool(value)” to convert to a boolean. The function PyObject_IsTrue() is the function whose goal is to perform that evaluation:
int
PyObject_IsTrue(PyObject *v)
{
Py_ssize_t res;
if (v == Py_True)
return 1;
if (v == Py_False)
return 0;
if (v == Py_None)
return 0;
else if (v->ob_type->tp_as_number != NULL &&
v->ob_type->tp_as_number->nb_bool != NULL)
res = (*v->ob_type->tp_as_number->nb_bool)(v);
else if (v->ob_type->tp_as_mapping != NULL &&
v->ob_type->tp_as_mapping->mp_length != NULL)
res = (*v->ob_type->tp_as_mapping->mp_length)(v);
else if (v->ob_type->tp_as_sequence != NULL &&
v->ob_type->tp_as_sequence->sq_length != NULL)
res = (*v->ob_type->tp_as_sequence->sq_length)(v);
else
return 1;
/* if it is negative, it should be either -1 or -2 */
return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
}
The code tells us the following:
In other words, None, zero, and objects whose length is zero (whether lists, tuples, maps, strings or custom objects) are interpreted as False. Everything else is interpreted as True. Note that in Python 2.x, user classes could define a method __nonzero__, but this has disappeared with Python 3.
The other evaluation mechanism is when you compare two objects. This may seem like a no-brainer, but it can hide a few surprises.
First of all, Python has two comparison operators: “is” and “==”. The “is” operator is a pure object ID comparison. Considering that Python is sometimes reusing some built-in types (but not always), you should be careful with this operator. In the case of strings for example, using “is” can bring unexpected results:
>>> string2 = "ThisIsATest" >>> string1 = "ThisIsATest" >>> string1 is string2 True >>> string1 = "This is a test" >>> string2 = "This is a test" >>> string1 is string2 False
As we have previously seen, strings with no space are interned and reused in order to avoid duplicates. As a result, you are looking at the same object in line 3 but at two different objects in line 7. Likewise, comparing numbers with “is” will return a different result whether the number is pre-allocated or not:
>>> nb = 200 >>> nb is 200 True >>> nb = 1000 >>> nb is 1000 False
The “==” comparison operator (or rich comparison) is a bit more complex and relies on the do_richcompare function which calls the type method tp_richcompare, when defined, as it if the case for multiple built-in types. User classes can define their own comparison operator by defining a __eq__ method.
If the two objects compared have a comparison operator defined, Python will use the one from the object on the left hand. If only one of them has it defined, Python will used that operator.
>>> class MyClass(object): ... def __init__(self, nb): ... self.number = nb ... def __eq__(self, nb): ... return self.number == nb ... >>> obj = MyClass(42) >>> obj == 41 False >>> obj == 42 True >>> 42 == obj True
In general, two objects of a different built-in type do not compare with each other by default. Two notable exceptions are True and False that are considered equal to respectively 1 and zero. Even though a number like 3 is evaluated to True as a condition, “3 == True” will return False where as “1 == True” will return True. Under the hood, True and False are implemented as the numbers one and zero, albeit as different objects than the actual numbers 1 and 0. But as a result, when compared with numbers they behave as 0 and 1 – whether the operation is comparison, order, binary operator, etc (see Objects/boolobject.c).
/* The objects representing bool values False and True */
struct _longobject _Py_FalseStruct = {
PyVarObject_HEAD_INIT(&PyBool_Type, 0)
{ 0 }
};
struct _longobject _Py_TrueStruct = {
PyVarObject_HEAD_INIT(&PyBool_Type, 1)
{ 1 }
};
In most cases, the “==” operator is preferred, in particular when dealing with numbers or strings. There are however cases where the “is” operator can come in handy.
A first case is when you want to test whether you are dealing with a particular list.
>>> list_ref = [1, 2, 3] >>> def add_elt(elt): ... global list_ref ... if elt is not list_ref: ... list_ref.append(elt) ... return list_ref ... >>> add_elt(list_ref) [1, 2, 3] >>> add_elt([1, 2, 3]) [1, 2, 3, [1, 2, 3]]
In the above case, we use “is not” to make sure that “list_ref” does not contain itself. Using “if elt != list_ref” could work if the function was not mutating anything and just returning “list_ref + [elt]” – who cares whether we passing “list_ref” or with an exact copy as function argument? But because the function does mutate “list_ref”, the argument actually passed does matter.
Another use is when a function can return zero or False, each value having a different meaning. In PHP this is the case of strpos() (the equivalent of str.find()), forcing to use “=== false” (the PHP equivalent of the operator “is”) to differentiate whether the substring is at the beginning of the string (so at offset 0) or is not found inside the string.
If str.find() does not work that way in Python, some functions may. This is where the “is” operator comes in handy. The values None, True and False are indeed singleton objects (i.e. there is only one instance of them). Using the “is” operator helps checking unambiguously if a value is equal to False and not just evaluated to False. Let’s define a function strpos() that behaves just like in PHP and try to come up with a test that successfully checks whether a string is contained in another. For example, checking that strpos(“This is a test”, “This”) returns the number 0 and that strpos(“This is a test”, “No”) returns False.
>>> def strpos(string, substring):
... res = string.find(substring)
... return False if res < 0 else res
...
>>> True if strpos('This is a test', 'This') else False
False # wrong result, it should be True
>>> not (strpos('This is a test', 'This') == False)
False # wrong result, it should be True
>>> strpos('This is a test', 'This') >= 0
True # right result...
>>> strpos('This is a test', 'No') >= 0
True # ... except it always return True
As we can see, comparing the result with False or a number (the two results returned by the function) using the “==” or “>=” operators does not work. Let’s now try with the operator “is”:
>>> not(strpos('This is a test', 'This') is False)
True
>>> strpos('This is a test', 'This') is not False
True
>>> strpos('This is a test', 'No') is not False
False
Using “is not False” is not only accurate, but is also very human-readable.
]]>A non-technical post today, as a reflection about the consequences of what we have studied aspects of Python such as dynamic-typing, immutability or the garbage collector – and their impact on performance.
One of the recurring criticisms of Python and other so-called scripting languages is indeed that they are slow. Interpreted languages are slower than compiled languages, dynamically-typed languages are slower than statically-typed languages (1), etc.
One note about compiled languages, though. Technically, a programming language is neither compiled nor interpreted. Its implementation provides a compiler and/or an interpreter. For instance, CPython (the most popular Python implementation) compiles the code on the fly into bytecode. But there are other Python implementations, some of which compile the code into Java bytecode. And nothing prevents someone from writing an interpreter for any “compiled” language.
What matters the most in that regard is statically-typed vs dynamically-typed. As we’ve previously seen, the latter requires much more processing than the former. And that’s more and more OK. This matters less and less thanks to Moore’s Law.
Moore’s Law states that the amount of transistors per square millimeter roughly doubles every 18 months. Which is generally interpreted as computer power doubling every 18 months. Nobody knows how long will this empiric law be valid, but in the meantime, computer power and capacity has been increasing at exponential rates.
Granted, programs have been consuming more and more resource as well. But the key fact is that resource consumption has been increasing at a smaller pace than hardware capacity. As a result, as time goes by programs have less and less to worry about resources (2).
In the mid ’80s, game programmers were pushing the limits of 8-bit computers (e.g. Apple ][, Commodore 64, Atari 800), to the point where they had to squeeze 12 bytes out of the sound code and data to put it in the animation code (yes, twelve bytes !). When the computer had at most 64 Kb of RAM (that’s 65,536 bytes) and its CPU could only natively handle integers between 0 and 255, you had to use all the optimizations you could get. Assembly language was the only way to go to achieve decent performance.
If modern video games consume much more resources than in the ’80s, hardware capacity has grown at a faster pace, so there is no need to scramble to save a few bytes of memory. If you consider video games on the iPad (which has much lower hardware specs than a PC or a video game console), I am not aware of any one requiring the latest iPad to run – even resource-hungry games such as the latest installment of Infinity Blade.
That does not mean that resources do not matter. Some large websites still use C++ in the backend. Google is famous for caring a log about performance. But this is because they deploy their code on hundreds of thousands of servers. Very few companies are in this situation. But the point is that computer resources matter less and less.
Pretty much any language looks painfully slow compared to a lower-level language. Python looks slow compared to Java or .NET. Because Python is dynamically-typed, its compiler cannot do as much prep work as the Java or C# compiler can – and needs to delegate a lot of work to the runtime. And because the CPython bytecode compiler is run on-the-fly, it cannot afford to perform time-consuming optimization. Java and .NET in turn look like resource hogs compared to C/C++ which is compiling its code directly to assembly and not in some bytecode. C++ does not have a fancy but resource-hungry VM or garbage collector, and does not have to deal with the overhead that comes with immutable strings. And C/C++ itself can sometime look slow compared to assembly language when it comes to memory management.
But because hardware resource is less and less an issue, we’ve been able to move to higher-level language. Another contributing factor is that software complexity has been increasing. Considering the cost of program maintenance, code maintainability takes more and more precedence over performance. Especially when you can scale horizontally by adding more servers and process more work in parallel. C++ strings are extremely efficient compared to Python strings, but wild writes are a plague in C++ (3). On the other hand, Python has very powerful capabilities when it comes to manipulate / transform strings.
The reason why the languages of choice to write a Website have been Perl, PHP, Python or Ruby and not C++ is because the performance those languages offer is good enough for more and more websites, and that it is much easier to write -and maintain- a Website in Python than in C++, let alone in assembly.
This is also why immutability has been used more and more as it can be used both avoid concurrency problems and to make complex code easier to read. In C/C++, there is no concept of immutability – you can update just about anything that you want (and even things that you shouldn’t). With Java, strings are immutable. With Python, even numbers are immutable. With functional languages, collections are immutable – to insert a new element you create a copy of the collection with the new element in. If functional languages are nothing new (LISP was released in 1958), machines are now powerful enough to allow the use of immutable collectionse. This is why a large company such as Twitter switched to Scala, a functional programming language running on top of the Java virtual machine.
So yes, there are cases where Python is not fast enough. But the tradeoff (code which is easier to write and to maintain at the expense of performance) is acceptable in more and more cases.
(1) Things are however not always as straightforward as they seem. In the 2000’s, social site Friendster overcame its plaguing performance problems by rearchitecturing its website… and by switching from Java to PHP. Likewise, the V8 JavaScript engine offers impressive performance, and in some cases can run code only 30% slower than equivalent C++ code.
(2) The critical resources have also been shifting. The CPU is indeed less and less the bottleneck, as it is more and more waiting for other resources such as disk or the network. In a lot of scenarios, memory consumption matters more than CPU consumption.
(3) A wild write is when the program, because of a bug, writes in areas of the memory it shouldn’t. due to . Pretty much any developer in C/C++ has dealt with wild writes at one point or another – particularly when creating/updating strings.
]]>In the first part we looked at some of the consequences of Python being a dynamically-typed language. Let’s have a look at another implications such as the concept of class, how things work under the hood as well as performance.
We have previously seen how an object returns an attribute from its dictionary if available, and if not returns the class attribute (still if available). There is however an exception to that attribute lookup rule. If a class attribute is a data descriptor, it will be used even if the object dictionary contains an attribute of the same name. A data descriptor is an object which has both a __get__ and a __set__ method defined.
>>> class Attr(object):
... def __get__(self, obj, type=None):
... return 42
... def __set__(self, obj, value):
... pass
...
>>> class MyClass(object):
... attr = Attr()
...
>>> obj = MyClass()
>>> obj.__dict__['attr'] = 'New value'
>>> obj.__dict__
{'attr': 'New value'}
>>> obj.attr
42
We’re using a hack on line 11 to update the object dictionary as using “obj.attr = ‘New value'” would call Attr.__set__(). But even though we’ve modified the object dictionary, calling “obj.attr” returns the result from the class attribute lookup.
The implication is that, no matter what, Python needs to perform a class attribute lookup even if the object dictionary contains the attribute name.
More generally, the concept of class is also affected by the type system.
In a statically-typed language such as C++ or Java, one of the class roles is to define the attributes for its instances – whether or not they have a default value or not. As a result, each instance has its own copy of the all the attributes defined in the class. If a class defines an integer attribute “attr”, instantiating this class a million times will create a million copies of this attribute – all with the same default value. The methods however are shared among all instances (i.e. it’s the exact same code which is executed regardless of the instance)
In a dynamically-typed language, there is no need to set such attributes at the class level as it can be done after the objects are created (whether in the constructor or later on). In Python, if a class defines an integer attribute “attr”, instantiating this class a million times will not create copies of the attributes. The million instances will share this attribute – unless the constructor sets a value to this attribute, in which case each instance will have its own copy.
Python creates the illusion that a class attribute is defining an individual attribute for each object with a default value. After all, if “MyClass” has an “attr” attribute, “my_object.attr” will return the default value and can be updated at will, independently of other instances. However, “my_object.attr = … ” will create a new value in the object dictionary that will override the class attribute.
This illusion can however be broken in the case of mutable attributes:
>>> class MyClass(object): ... attr = [1, 2] ... >>> obj1 = MyClass() >>> obj2 = MyClass() >>> obj1.attr.append(3) >>> obj2.attr [1, 2, 3]
“obj1.attr” and “obj2.attr” look like object attributes until the array is modified.
Let’s now look at how attribute lookup is implemented in the CPython source code. In particularly, let’s look what is happening at line 8 in the code below when we’re evaluating the attribute “attr” of the object. Note how “attr” is defined both at the class level and at the object level:
>>> class MyClass(object): ... attr = 42 ... >>> obj = MyClass() >>> obj.attr # MyClass.attr 42 >>> obj.attr = "New value" >>> obj.attr # obj.attr 'New value'
When we step in the source code, here is what is happening:
The bytecode instruction behind “obj.attr” that interests us is LOAD_ATTR (Python/ceval.c, lines 2415-2424) as its purpose is to find the desired attribute at runtime. Following a series of function calls(*), _PyObject_GenericGetAttrWithDict gets called. This function (defined in Objects/object.c, lines 1013-1098) is the heart of the attribute lookup:
The first reaction one may have when seeing all the processing performed here is: what about performances? Not only dictionary lookups expensive, but CPython is going through the MRO chain no matter what, even if the result gets eventually discarded for what is in the object dictionary.
This is why I tried a performance test: how long it would take to go loop through 10 million class lookups, and does it matter whether the object has a large hierarchy or not?
In the following tests, I have checked that the same test for 1 million elements was roughly 10 times faster than for 10 millions, to make sure there was no caching of some sort.
>>> class A(object): a = 1 >>> class B(A): b = 2 >>> class C(B): c = 3 >>> class D(C): d = 4 >>> class E(D): e = 5 >>> class F(E): f = 6 ... >>> def speed(myclass, nb): ... objects = [] ... for _ in range(nb): ... objects.append(myclass.__new__(myclass)) ... t1 = time.time() ... for o in objects: ... tmp = o.a ... t2 = time.time() ... print(t2 - t1) ... >>> speed(A, 10000000) 0.4630260467529297 >>> speed(F, 10000000) 0.4720268249511719
As we can see, there is not much difference whether we’re using an object with a large hierarchy (F) or only “object” as a parent (A). Around 460-470 milliseconds for 10 million objects.
As a comparison, let’s see how long it takes to go through 10 million simple assignments, without any attribute lookup:
>>> def speed0(myclass, nb): ... objects = [] ... for _ in range(nb): ... objects.append(myclass.__new__(myclass)) ... t1 = time.time() ... for o in objects: ... tmp = 42 ... t2 = time.time() ... print(t2 - t1) ... >>> speed0(A, 10000000) 0.30701780319213867 >>> speed0(A, 10000000) 0.2980170249938965
So around 300 milliseconds. If we replace line 7 with “pass” to get an empty loop, the test takes around 180 milliseconds. So to summarize:
As a comparison, an “equivalent” program in C++ takes 7 milliseconds and 14 milliseconds in Java.
So yes, dynamic typing has an impact on performance. Does it matter? Not always, as we will see in the next post.
(*) Here is how it goes:
]]>
Python is a dynamically-typed language, just like PHP, JavaScript or Ruby, and contrary to statically-typed languages such as C/C++, Java or Scala.
In a dynamically-typed language, the code can never know in advance what will be the type of any variable. You can define a function to perform a particular operation on, say, a collection, but unless you explicitly filter out an argument that is not a collection, the code is never certain that it is indeed one – and the bytecode compiler sure cannot be certain either.
Using dynamic types has consequences for the language – both from a conceptual standpoint and from an implementation standpoint.
From a conceptual perspective, dynamically-typed languages care more about the attributes and methods supported by an object than its actual type. This is the concept of duck typing: if it quacks like a duck then we consider it’s a duck – whether or not it is actually a duck.
An example of duck typing in statically-typed languages would be Java interfaces. When a Java class implements an interface, it needs to implement certain methods. Once it does, a program can use this class without knowing -nor caring about- its actual type. The difference of course is that this type is statically defined, and the Java compiler will not let you compile your code unless your classes implement all the methods required by the interfaces they use.
Duck typing has however its downsides. Namely, it forces to give details about the implementation. Consider the case where we want to create a function “double”:
def double(elt):
return elt + elt
So far, so good. This function may have been designed with numbers in mind, but it does work with other types such as strings. Heck, it can work with any type that defines an __add__ method. And that is where lies the problem: one needs to have some detail about the underlying implementation of double(). What if we change later on that implementation for something like:
def double(elt):
return 2 * elt
Will it work for all my use of double()? For types like numbers or strings, sure. But for other types, who knows…
From an implementation standpoint, Python needs to be able to dynamically check whether a given object has a specific attribute or function. Even in the case of CPython where the code is being compiled into bytecode on the fly, there is only so much the compiler can do. The bytecode instruction that is involved here is LOAD_ATTR whose job is to determine at runtime whether the object has the desired attribute and push it onto the stack if it exists (as previously shown, a method is just an attribute of type function). Since an object in Python is really a map from strings to values, the logical implementation choice for such a lookup is a dictionary(1).
I do not know if this is a result of using dictionaries for attribute lookup or the cause, but dynamically-typed languages typically allow to add new attributes to existing objects and define new methods to already defined classes – something that can be easily implemented when you are already using a dictionary to map attribute values.
To recap:
This is a stark contrast with statically-typed languages where a compiler knows exactly what attribute or method needs to be called. In a language such as C++, objects are pretty much an array of attributes. All the instance of the same class have the exact same size. The C++ compiler knows exactly where will the attribute be in memory (e.g. at offset 8 from the beginning of the object). Likewise, it knows exactly what method is being called by every line of code(2).
Dynamic typing thus has an impact on performance as CPython must 1) use a dictionary lookup each time an attribute needs to be accessed and 2) perform that lookup at runtime.
Object inheritance is even complexifying attribute lookup. Indeed, the runtime may need to check for an attribute in the class object and any parent class. As a general rule though, Python will return the attribute value from the object dictionary if there is any. If there isn’t, it will return the result from a class attribute lookup following the Method Resolution Order (MRO).
Let’s consider the following code:
>>> class MyClass1(object): ... attr1 = 1 ... attr2 = 2 ... >>> class MyClass2(MyClass1): ... attr2 = 3 ... >>> obj = MyClass2() >>> obj.attr1 # MyClass1.attr1 1 >>> MyClass1.attr2 = 42 >>> MyClass1.attr3 = 43 >>> obj.attr3 # MyClass1.attr3 43 >>> obj.attr2 # MyClass2.attr2 3 >>> obj.attr2 = 44 # obj.attr2
As you can see, you can dynamically add new attributes in MyClass1 (line 12) and it will be automatically be reflected in the object (lines 13, 14). You can this way define a new class method by adding a lambda function as a class attribute. Updating “MyClass1.attr2” doesn’t however affect “obj” (lines 15, 16) as MyClass2 is overwriting it (line 6). However, you can overwrite any class attribute in the object itself (line 17)
You can see the result when looking at the dictionaries of the various objects and classes involved. The object dictionary contains the value that it defined at line 17. MyClass2’s dictionary contains the values for “attr2” it set in its class definition. And MyClass1’s dictionary contains the values for “attr1” (set at line 2), “attr2” and “attr3” (set at lines 11 and 12):
>>> obj.__dict__
{'attr2': 44}
>>> MyClass2.__dict__
mappingproxy({'__module__': '__main__', 'attr2': 3, '__doc__': None})
>>> MyClass1.__dict__
mappingproxy({'__module__': '__main__',
'attr3': 43, 'attr2': 42, 'attr1': 1,
'__dict__': <attribute '__dict__' of 'MyClass1' objects>,
'__weakref__': <attribute '__weakref__' of 'MyClass1' objects>,
'__doc__': None})
Note that the MRO here is pretty simple. It is a bit more complex when using mixins. In any case, a class “__mro__” attribute tells what classes to look at and in what order:
>>> MyClass2.__mro__ (<class '__main__.MyClass2'>, <class '__main__.MyClass1'>, <class 'object'>)
Last but not least, dir(obj) gives all the attributes visible for “obj”. This includes the attributes declared in “obj” as well as those declared in MyClass1 and MyClass2:
>>> dir(obj) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'attr1', 'attr2', 'attr3']
(1) This is not always the case. V8, the JavaScript engine used in Chrome and Node.js, is sometimes using the concept of hidden class and can represent objects as an array of values, avoiding costly dictionary lookups.
(2) A notable exception is virtual functions, where the C++ compiler has no way of knowing what actual method will be called at runtime. But even in this case, the way the C++ runtime gets around it (virtual tables) is pretty efficient and does not involves a dictionary lookup.
]]>In Python, pretty much everything is an object, whether it is a number, a function… or even None.
>>> type(1)
<class 'int'>
>>> type('test')
<class 'str'>
>>> type('test'.__str__)
<class 'method-wrapper'>
>>> def func():
... pass
...
>>> type(func)
<class 'function'>
>>>
>>> class MyClass(object):
... pass
...
>>> obj = MyClass()
>>> type(MyClass)
<class 'type'>
>>> type(obj)
<class '__main__.MyClass'>
>>> type(None)
<class 'NoneType'>
Python is using a pure object model where classes are instances of a meta-class “type” (in Python, the terms “type” and “class” are synonyms). And “type” is the only class which is an instance of itself:
>>> type(42) <class 'int'> >>> type(int) # same as type(type(42)) <class 'type'> >>> type(type) # same as type(type(type(42))) <class 'type'>
This object model can be useful when we want information about a particular resource in Python. Except for the Python keywords (e.g. if, def, globals), using “type(<name>)” or “dir(<name>)” -or just type the resource name and press enter- will work on pretty much anything.
>>> super <class 'super'> >>> abs <built-in function abs> >>> str <class 'str'> >>> dir <built-in function dir>
Objects can have methods and attributes. Let’s look how the two differ by looking at the following code and the disassembled code behind method2():
>>> class MyClass(object):
... attr = 42
... def method1(self):
... pass
... def method2(self):
... self.method1()
... return self.attr
...
>>> dis.dis(MyClass.method2)
6 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 0 (method1)
6 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
9 POP_TOP
7 10 LOAD_FAST 0 (self)
13 LOAD_ATTR 1 (attr)
16 RETURN_VALUE
The only difference between “self.method1()” and “self.attr” is just the CALL_FUNCTION which is just telling the CPython runtime to execute a function based on what is on the stack. But from an object perspective, a method is just an attribute whose type is “method” or “method-wrapper”(1). You can actually allow a function call by defining __call__ on the attribute:
>>> class Attr(object): ... value = 42 ... >>> class MyClass(object): ... attr = Attr() ... >>> obj = MyClass() >>> Attr.__call__ = lambda self : self.value >>> obj.attr.value 42 >>> obj.attr() 42
In the above example we did not even define __call__ inside the Attr class definition but afterwards, by just adding a class attribute which is a lambda function. The method however needs to have a “self” argument, otherwise it needs to be applied to the class (static method) and not to an instance.
To take another example, “type(42)” is not a call to a built-in function like “dir(42)”, but is the same as “type.__call__(type, 42)”. In other words we’re calling the __call__ method for the class “type” and pass as argument the instance “type” (self) and the object 42(2).
The fact that methods are just a special type of attribute helps explain why the function dir() returns a list of all the combined attributes and methods for a given object:
>>> class MyClass(object): ... attr = 42 ... def test(self): ... pass ... >>> dir(obj) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'attr', 'test']
You can see that the last two element in the array are the “attr” attribute and the “test” method as defined in the class. For the rest, all the other attributes take the __<name>__ notation which is used to represent descriptors. Descriptors are object attributes with binding behavior. For example, obj.__dir__() is called under the hood by dir(obj). Likewise, __le__ is a method that can be overridden in MyClass to implement obj1 <= obj2.
One could think that number literals do not behave like regular objects. Even though dir(42) shows a method __add__, trying to call 42.__add__ fails:
>>> 42.__add__(4)
File "<stdin>", line 1
42.__add__(4)
^
SyntaxError: invalid syntax
But the error has to do with parsing and not how the object 42 behaves. Python indeed interprets the first three characters (“42.”) as a float, and as a result does not understand what to do with “__add__”. You can however call the method on 42 by using a few tricks:
>>> 42 .__add__(4) # inserting a space after 42 46 >>> (42).__add__(4) # wrapping 42 inside () 46 >>> 42..__add__(4) # using 42.0 instead of 42 46.0
Let’s now dive in the C implementation to see how are objects represented.
Those objects are manipulated under the hood as a C structure called PyObject. Ironically, the CPython object model is implemented using C, a language which is not object-oriented.
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
You will notice the two following attributes:
Python comes with some built-in classes, such as int, str, list, but also function or class.
Contrary to a language such as Ruby where everything is also an object, Python does not allow to add new attributes or methods to the built-in types such as int or str, let alone redefine existing methods:
>>> number = 10 >>> number.attr = 10 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute 'attr'
The declarations of these objects are in the Include directory, and you can find in Object the various implementations of several types: int (Objects/longobject.c), str (Objects/unicodeobject.c), list (Objects/listobject.c), user-defined classes (Objects/classobject.c), functions (Objects/funcobject.c), etc.
Each of those files defines a PyTypeObject instance which represents the type. Each PyTypeObject instance contains mostly functions that describe the behavior of the type. For example, tp_getattro and tp_setattro, when defined, are the functions that allow to respectively read and assign a value to an attribute. The absence of tp_setattro for the “int” type explains why it is not possible to add or change an attribute to an integer. tp_as_sequence and tp_as_mapping point to lists of methods to handle standard functions for respectively functions and dictionaries.
PyTypeObject PyLong_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"int", /* tp_name */
offsetof(PyLongObject, ob_digit), /* tp_basicsize */
sizeof(digit), /* tp_itemsize */
long_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
long_to_decimal_string, /* tp_repr */
&long_as_number, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
(hashfunc)long_hash, /* tp_hash */
0, /* tp_call */
long_to_decimal_string, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE |
Py_TPFLAGS_LONG_SUBCLASS, /* tp_flags */
long_doc, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
long_richcompare, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
long_methods, /* tp_methods */
0, /* tp_members */
long_getset, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
long_new, /* tp_new */
PyObject_Del, /* tp_free */
};
When the program is defining a user class, the runtime creates a new type for that class. Here is an example of such a type inside the C debugger:
(gdb) print *tp
$67 = {ob_base = {ob_base = {_ob_next = 0x98a6034, _ob_prev = 0x98a3058, ob_refcnt = 6,
ob_type = 0x82a9d60}, ob_size = 0},
tp_name = 0x98a3070 "MyClass",
tp_basicsize = 24,
tp_itemsize = 0,
tp_dealloc = 0x8073ac6 <subtype_dealloc>,
tp_print = 0,
tp_getattr = 0,
tp_setattr = 0,
tp_reserved = 0x0,
tp_repr = 0x8078847 <object_repr>,
tp_as_number = 0x98563c8,
tp_as_sequence = 0x985645c,
tp_as_mapping = 0x9856450,
tp_hash = 0x805f28f <PyObject_HashNotImplemented>,
tp_call = 0,
tp_str = 0x8078a2e <object_str>,
tp_getattro = 0x805ff70 <PyObject_GenericGetAttr>,
tp_setattro = 0x806029e <PyObject_GenericSetAttr>,
tp_as_buffer = 0x9856484,
tp_flags = 808448,
tp_doc = 0x0,
tp_traverse = 0x8073801 <subtype_traverse>,
tp_clear = 0x8073a31 <subtype_clear>,
tp_richcompare = 0x8080b08 <slot_tp_richcompare>,
tp_weaklistoffset = ,
tp_iter = 0,
tp_iternext = 0x805fa55 <_PyObject_NextNotImplemented>,
tp_methods = 0x0,
tp_members = 0x9856494,
tp_getset = 0x82a9b40,
tp_base = 0x82aa020,
tp_dict = ,
tp_descr_get = 0,
tp_descr_set = 0,
tp_dictoffset = 16,
tp_init = 0x8078462 <object_init>,
tp_alloc = 0x807360b <PyType_GenericAlloc>,
tp_new = 0x8078515 <object_new>,
tp_free = 0x810b295 <PyObject_GC_Del>,
tp_is_gc = 0,
tp_bases = (<type at remote 0x82aa020>,),
tp_mro = (<type at remote 0x98562fc>, <type at remote 0x82aa020>),
tp_cache = 0x0,
tp_subclasses = 0x0,
tp_weaklist = <weakref at remote 0x98a0974>,
tp_del = 0,
tp_version_tag = 189
}
Note how the tp_getattro and tp_setattro point to generic methods PyObject_GenericGetAttr and PyObject_GenericSetAttr.
In the next article we will see how attribute lookup works.
(1) If in CPython methods are just a particular type of attribute, this is not the case for every dynamically-typed languages. In Ruby for example you only access attributes through methods. Something like “my_obj.my_attribute = 42” is syntactic sugar for calling the object setter method to modify its private attribute “my_attribute”.
(2) In most object-oriented language, “my_object.my_method(argument)” is the same as “MyObjectClass.my_method(my_object, argument)”. In the case of type(int), the confusing part is that “type” is both the class and the instance.
]]>Python, like most modern languages, has its own garbage collector (GC). But how does the CPython GC work?
First of all, does it really matter? After, a GC is a GC, right? Well, not exactly. The GC on Java has evolved quite a bit from the days of Java 1.0 (you can find a great deal information about the various algorithms used here). In a interview, three Twitter developers explained the move from Ruby to Scala (a language that runs on top of the Java VM). And one of the reasons was that “because Ruby’s garbage collector is not quite as good as Java’s, each process uses up a lot of memory“. So yes, the GC can matter.
In the rest of this post, the term “collection” refers to the action of garbage-collecting objects which have become unreachable. It does NOT refer to structures such as lists or dictionaries.
The main algorithm used for CPython is based on reference counting. Each object in Python contains a reference count – that is, the number of times it is referenced by another object or by a variable. Once this reference count reaches zero, it is deleted on the spot. Because that object may reference other objects (thus decreasing their reference count), this can trigger a chain reaction where a lot of objects are being deleted in a row.
Where as traditional GC algorithms look for the objects which are reachable (collecting objects that were not marked as reachable after all the reachable objects are accounted for), Python looks at the objects which are unreachable.
This approach however generates some overhead in memory and CPU. First of all, every single object, number, string, must have a reference count. There is also a CPU overhead as that reference count must be properly maintained. Even for a simple code like:
number = 42 number += 1
Python needs to first increment 42’s reference count (line 1) then in line 2 decrement it and increment 43’s reference count. But this extra processing is actually minor compared to the processing that Python goes through just to find the “number” variable, determine its type, etc.
The one thing reference count does not handle is reference cycles. Imagine a circular linked list, or an object referencing itself. Even if these objects are unreachable, their reference count would still be 1.
This is why CPython has an algorithm to detect those reference cycles, implemented in the function collect. First of all, it only focuses on container objects (i.e. objects that can contain a reference to one or more objects): arrays, dictionaries, user class instances, etc. As an extra optimization, the GC ignores tuples containing only immutable types (int, strings, … or tuples containing only immutable types)
CPython for this maintains two double-linked lists: one list of objects to be scanned, and a tentatively unreachable list.
Let’s take the case of a circular linked list which has one link referenced by a variable A, and one self-referencing object which is completely unreachable.
1. When the GC starts, it has all the container objects it wants to scan on a first list (more on that later). Because most objects turn out to be reachable, it is more efficient to assume all objects are reachable and move them to an unreachable list if need be, rather than the other way around. On top of having a reference count (number of objects referencing them), each object container also has a gc_ref field initially set to the reference count:
2. The GC then goes through each container object and decrements by 1 the gc_ref of any other object it is referencing. In other words, we only care about the references from outside the “objects to scan” list (like variables) and not references from other container objects inside that list.
3. The GC scans again the container objects. The objects whose gc_ref is zero are marked as GC_TENTATIVELY_REACHABLE and moved to the tentatively unreachable list. In the following graph, the GC processed the “link 3” and “link 4” objects but hasn’t processed “link 1” and “link 2” yet.
4. Let’s assume that the GC scans next the “link 1” object. Because its gc_ref is equal to 1, it is marked as GC_REACHABLE.
5. When the GC encounters an object which is reachable, it traverses its references to find all the objects that are reachable from it, marking them as well as GC_REACHABLE. This is what happens to “link 2” and “link 3” below as they are reachable from “link 1”. Because “link 3” is reachable after all, it is moved back to the original list.
6. When all the objects are scanned, the container objects in the tentatively unreachable list are really unreachable and can thus be garbage collected.
In order to limit the time each garbage collection takes, the CPython GC is using the concept of object generation. The idea is that most objects have a very short lifespan and can thus be collected shortly after their creation. The flip side is that the older an object, the less likely it is to be unreachable.
So CPython defines three generations of objects (0, 1 and 2). Every new object starts in generation 0. If it survives a garbage collection if will be moved to the generation 1, where it will it will be surveyed for collection less often. If it survives another GC round it will be moved to generation 2 where it will be surveyed the least often.
The concept of object generation is a common optimization nowadays. The garbage collectors for Java and V8 (the JavaScript engine used in Chrome and Node.js) use a similar concept: young generation / old generation for Java, new-space / old-space for V8.
The next question is: what triggers garbage collection? Whenever the number of objects in a particular generation gets over a certain threshold (which can be configured through gc.set_threshold()), collection starts. Note that when the GC decides to collect a given generation, it also collects the younger generations (e.g. it would never collect only gen 1 but also gen 0)
As a second optimization, CPython also limits the number of “full” collections, i.e. that are going through all three generations of objects. A full collection is only run when the ratio of objects that have survived the last “non-full” collection (that is, collection of generations 0 or 1 objects) is greater than 25% the number of objects that have survived the last full collection. This 25% ratio is hard-coded and cannot be configured.
As written in a comment in the GC source code: “each full garbage collection is more and more costly as the number of objects grows, but we do fewer and fewer of them”
There is however another issue. In any language, the garbage collector’s worst enemy is finalizers – user-defined methods which are called when the GC wants to collect an object (in Python a finalizer is created by defining the method __del__). What if a finalizer makes one or more objects reachable again? This is why, up to Python 3.3, container objects inside a circular reference with a finalizer are never garbage-collected.
In the code below, we define a class with a finalizer. We then instanciate that class and add a self-reference to the object in order to mess with test the GC.
>>> class MyClass(object): ... def __del__(self): ... pass ... >>> my_obj = MyClass() >>> my_obj.ref = my_obj >>> my_obj <__main__.MyClass object at 0x00000000025FCEF0>
We then remove the reference to the object. We can then run the garbage collector.
>>> del my_obj >>> gc.collect() 2 >>> gc.garbage [<__main__.MyClass object at 0x00000000025FCEF0>]
gc.garbage returns a list of uncollectable objects (objects that are marked as unreachable but cannot be collected) which now contains the object we tried to delete.
The PEP 442 (Python Enhancement Proposal), implemented with Python 3.4, however changes that behavior. Starting with Python 3.4, during a collection, finalizers for all the objects unreachable group of a circular references are called. Those objects which become reachable again are “revived”. Then, provided that those finalizers do not make the group reachable again, all references are cleared and all the objects are collected. Note that this does not work if you set the GC debug mode (thank you to Antoine Pitrou, the submitter of PEP 442, for the tip)
But Python 3.4 is not a license to do anything with finalizers. If a finalizer makes an object reachable again, its side effects (like writing to a file) would not be rolled back.
So the morale of the story is that you should be very careful with finalizers and only use them when you cannot do otherwise. Remember as well that you cannot tell when the object will really be deleted, even when you explicitly run the GC (the object could still be reachable from somewhere else). Likewise, the fact that the finalizer is called is not even a guarantee that the object will actually be collected.
With all this, what can you do if you believe garbage collection is hampering your application performance?
An immutable structure is a structure that does not change once it has been created. If the concept is nothing new, it has been used more and more for its upsides. Let’s review how Python handles immutability.
This has already been addressed in a previous post: numbers in Python are immutable. You do not change the content of a variable, you make it point to a new number object. There are several other languages that manage numbers as an object (Smalltalk, Objective C, Ruby). I don’t however know if they have immutable numbers (although Ruby seems to)
The first immutable type that a programming language typically implements is strings. When a string gets created in Python, it never gets updated ever again – only deleted when it is not used anymore. You want to change a single character from that string? A new string will be created. What happens to the older string? It gets garbage-collected – provided it is not used somewhere else.
Immutable strings can however generate a lot of garbage objects (in a future post we will analyze how the CPython garbage collector works). Imagine you want to read a file and store the result in a variable.
file = open('myfile.txt', 'r')
text = ""
for line in file:
text += line
Every loop generates 2 garbage objects: one for the line being read from the file, and one for the old version of “text”. If you are reading a file composed of 100 lines of 10 bytes each (so a 1000 bytes file), the whole operation generates 100 garbage objects for each line read (for a total of 1000 bytes) and 99 garbage objects for all the precious versions of “text” (for a total of 10 + 20 + 30 + … + 990 = 49500 bytes), for a grand total of 1000 + 49500 = 50500 bytes worth of garbage objects. Or over 50 times the size of the file (plus the overhead for all these objects) !
Because strings are immutable, this however mean that the same string can be shared by several variables. For example:
>>> def func():
... print(id('Hello' + 'World'))
... print(id('HelloWorld'))
...
>>> func()
42900400
42900400
The two strings generated have the same id, which means they point to the exact same object (id() actually returns the memory address of the object). Considering strings are immutable, it does not matter to the user whether two strings that contain the same text share the same object or not (except if you compare ids or use “string1 is string2”). Now, how does Python detects these are the same string? Let’s disassemble the function:
>>> import dis
>>> dis.dis(func)
2 0 LOAD_GLOBAL 0 (print)
3 LOAD_GLOBAL 1 (id)
6 LOAD_CONST 4 ('HelloWorld')
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
15 POP_TOP
3 16 LOAD_GLOBAL 0 (print)
19 LOAD_GLOBAL 1 (id)
22 LOAD_CONST 3 ('HelloWorld')
25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 POP_TOP
32 LOAD_CONST 0 (None)
35 RETURN_VALUE
>>>
>>> func.__code__.co_consts
(None, 'Hello', 'World', 'HelloWorld', 'HelloWorld')
As you can see, the compiler detects the ‘Hello’ + ‘World’ operation and compiles it as ‘HelloWorld’. Interestingly, it is nonetheless using two different constants for the two strings (ids 4 and 3, and as shown at line 20). But Python has another optimization:
>>> string1 = 'HelloWorld' >>> string2 = 'HelloWorld' >>> id(string1) 35924656 >>> id(string2) 35924656
CPython keeps a dictionary of so-called interned strings which it can reuse when using variable names, attribute/method names or constants (here, the string ‘HelloWorld’ is compiled as a constant). This however does not work with strings that contain white spaces:
>>> string1 = 'Hello W' >>> string2 = 'Hello W' >>> id(string1) 35934312 >>> id(string2) 35934256
When it comes to constants, CPython is indeed only interning strings which are “valid name characters”. That is, contain only either alphanumeric characters and/or underscore characters(1). I assume the rationale is to only try to reuse strings used for symbols as they are the most likely to be reused. More complex strings, on the other hand, are more likely to be unique, to there is less to be gained by interning them.
It is however possible to force Python to use the interned dictionary for any type of string:
>>> from sys import intern
>>> a = intern('Hello W')
>>> b = intern('Hello W')
>>> id(a)
42651352
>>> id(b)
42651352
Immutable collections are primarily used by functional programming languages (e.g. Scala, Haskell, F#). You do not insert an element in a collection, you create a new collection which contains the new element. In the case of a tree structure you do not even need to recreate a complete new tree and can reuse whole parts of the previous tree (the subtrees not affected by the change)
Python has some immutable collections with tuples and frozensets. Once you create a tuple, you cannot add elements to it. You cannot replace its elements. You can concatenate tuples, but this will only create a new tuple and won’t modify any of the source tuples. The only modification you can do is if you modify any mutable type inside the tuple, such as a list or a user class instance.
Because dictionary keys in Python are required to be immutable, you can use a tuple or a frozenset as a dictionary key.
Using immutable structures has some implications. Consider the following code:
>>> class MyClass1(object):
... allusers = ['John']
... def add_user(self, user):
... self.allusers.append(user)
... def get_users(self):
... return self.allusers
...
>>> obj = MyClass1()
>>> users = obj.get_users()
>>> users
['John']
>>> obj.add_user('Mary')
>>> users
['John', 'Mary']
>>> users.clear()
>>> obj.get_users()
[]
The get_users() method returns a dynamic list of users. Adding a new user afterwards gets reflected in “users”. However, because the result is mutable, the caller of get_users() can use functions such as append() or clear() that mutate the list. Let’s now look at a similar class which is instead using tuples:
>>> class MyClass2(object):
... allusers = ('John',)
... def add_user(self, user):
... self.allusers += (user,)
... def get_users(self):
... return self.allusers
...
>>> obj = MyClass2()
>>> users = obj.get_users()
>>> users
('John',)
>>> obj.add_user('Mary')
>>> users
('John',)
>>> obj.get_users()
('John', 'Mary')
The caller of get_result() cannot update the result anymore as it is immutable. The “+=” operation does not mutable “users”. It creates a new tuple and makes “‘obj.allusers” point to that new structure. However, get_result() returns a snapshot of “obj.allusers” at the time it was called. That result may not even be up-to-date by the time the next instruction is executed. Adding new users indeed does not update “users”.
Immutability is mostly used when one wants to avoid that different various parts of a program step on each others toes – in particular in the case of high concurrency programs(2). The caller of MyClass2.get_users() knows that, no matter what, the result will not change. It can for example store “len(users)” in a variable and use it later on, knowing it will always be consistent with “users”.
The flip side is that the result may not be up-to-date, but in some cases that is OK. For example when rendering a Web page. After all, a Web page can be seen as an immutable view of the server: it was a snapshot at the time it was being generated, and may not be up-to-date by the time it has finished downloading (unless you’re using Ajax but that’s another story)
Note that you do not necessarily need to use immutable structures to have an immutable effect. If you are getting the list through a SQL query, the result can be considered immutable in the sense any subsequent change to the database after the query is performed will not be reflected in the result. Sure, you can modify the results, but those changes will not be seen by anybody else.
But in the case of a resource in memory (like a list that stays in memory and which can be updated by several threads), it may be useful to use an immutable structure.
Last but not least, for those who are interested in the use of immutability in high-concurrency environments, Pat Helland (who worked at Microsoft, Amazon and Salesforce.com) gave a great talk called Immutability Changes Everything. He argues that, a few decades ago, computer resources were expensive but there was no problem grabbing a lock. Nowadays, hardware resources are pretty cheap (provided you don’t care about the quality). But try to grab a lock and all hell breaks loose.
(1) For those of you who really want to know how it is implemented, check out Objects/codeobject:
(2) There is always some level of concurrency and mutability. At the very least the program needs to update the variable that references the immutable structure – and you want this operation to be atomic. But immutability nevertheless strongly reduces concurrency problems.
]]>One of the important thing to know when learning a programming language is the variable scope – in other words, what variable can be seen where. It can indeed greatly vary from language to language.
Even though we always talk of variables, Python does not really have variables in the traditional sense (see Variables and integers). Rather, it has names in namespaces. In the current namespace a certain number of names are available. You can use dir() to see what is in the current namespace.
>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> number1 = 42
>>> number2 = 43
>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'number1', 'number2']
>>> vars()
{'__doc__': None, '__package__': None, '__loader__':
<class '_frozen_importlib.BuiltinImporter>, 'number1':
42, 'number2': 43, '__spec__': None, '__builtins__':
<module 'builtins' (built-in)>, '__name__': '__main__'}
>>>
After we initialize the two global variables number1 and number2, we find them in the global namespace. Note also that dir(__builtins__) shows all the names which are available wherever you are in the code.
A namespace can be modified through several ways:
Note that the instruction del does NOT delete an object. Rather, it removes its reference from the current namespace. It’s up to the Python runtime to delete the object (immediately or later on) if need be. A future post will look at the CPython garbage collector in more detail.
Just like there is a global namespace, there is also a local namespace (accessible through locals()). Consider the following code:
>>> number = 42 >>> def func1(): ... return number + 1 ... >>> def func2(): ... number += 1 ... return number ... >>> func1() 43 >>> func2() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in func2 UnboundLocalError: local variable 'number' referenced before assignment
How come func1() can access global variable number but not func2() ? Disassembling the code provides some answers:
>>> import dis
>>> dis.dis(func1)
2 0 LOAD_GLOBAL 0 (number)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 RETURN_VALUE
>>> dis.dis(func2)
2 0 LOAD_FAST 0 (number)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (number)
3 10 LOAD_FAST 0 (number)
13 RETURN_VALUE
func1 is considering “number” as a global variable (it’s using LOAD_GLOBAL) whereas func2 considers it as local (it’s using LOAD_FAST), and thus tries to increment a local variable which hasn’t been initialized yet. This is because func1() only tries to read the variable, whereas func2() tries to update it. This is a safety feature to prevent functions from unknowingly updating global variables.
As a general rule, any variable assigned inside the function is handled as a local variable (the “global” statement can override that), and all the other variables are handled as global variables. If we add “global number” then the function works without error. Note that the location of that line inside the function doesn’t matter – you can insert a global statement after the variable is assigned and it will work the same.
>>> def func3(): ... global number ... number += 1 ... return number ... >>> func3() 43 >>> func3() 44
This can however have some unintended consequences when trying to modify the local namespace through locals()[]:
>>> globals()['var1'] = 42 >>> var1 42 >>> def func(): ... locals()['var2'] = 42 ... print(dir()) ... print(var2) ... >>> func() ['var2'] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in func NameError: global name 'var2' is not defined
A call to dir() inside func() shows that var2 has been added to the local namespace. However, trying to print that variable fails. This is because the compiler doesn’t know that var2 has been initialized (how would it know? It was done dynamically) so considers it a global variable.
The import command allows to access libraries by adding some of its names to the namespace:
>>> import string >>> dir() ['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'number1', 'number2', 'string'] >>> dir(string) ['ChainMap', 'Formatter', 'Template', '_TemplateMetaclass', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']
The official Style Guide for Python known as PEP 8 discourages from using wildcard imports. Not only it could overcrowd the namespace, but could lead to name clash:
>>> whitespace = ' ' >>> whitespace ' ' >>> from string import * >>> whitespace ' \t\n\r\x0b\x0c'
A closure is a variable environment specific to a function that contains non-local variables. Closures are supported by a lot of languages these days, including Python. Let’s see an example:
>>> def incrementer(number1): ... return lambda number2: number1 + number2 ... >>> inc4 = incrementer(4) >>> inc4(8) 12 >>> inc6 = incrementer(6) >>> inc6(8) 14
The incrementer() function returns a function which has its own variable context. In the code above, inc4 sees number1 as set to 4 whereas inc6 sees it as set to 6, even when the functions are called later. We can use the Python bytecode disassembler to verify a closure is being defined:
>>> import dis
>>> dis.dis(incrementer)
2 0 LOAD_CLOSURE 0 (number1)
3 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object <lambda> at 0x0000000002A0B390, file "<stdin>", line 2>)
9 LOAD_CONST 2 ('incrementer.<locals>.<lambda>')
12 MAKE_CLOSURE 0
15 RETURN_VALUE
You can also check the closure at runtime. The inc4 variable has a __closure__ attribute:
>>> dir(inc4) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', ... '__str__', '__subclasshook__'] >>> inc4.__closure__ (<cell at 0x00000000021C7708: int object at 0x000000001E39C770>,) >>> inc4.__closure__[0].cell_contents 4
There is however an interesting twist with closures. Consider the following code:
>>> def incrementer(number1): ... def inc(number2): ... return number1 + number2 ... number1 += 1 ... return inc ... >>> inc4 = incrementer(4) >>> inc4(8) 13
The value kept for number1 is not the value at the time inc() was defined (even though it was the time the closure was created) but the value of number1 at the end of the incrementer() function. One can assume that the closure is just referencing variable number1 rather than making a copy. If Python does not let us modify what is in a closure, number1 has a life even after inc() is defined, and until incrementer() completes.
When you assign a local variable for the first time, it is implicitly declared for the whole function. This is different from languages such as C++ where it is possible to declare a variable only inside a “{ }” block:
int number = 42;
if (true)
{
int number = 43;
}
cout << number << endl; // 42
When “number” gets declared again on line 3, the declaration (and its associated value) is only valid inside the surrounding “{ }” block (lines 3 to 5). This is why “number” on line 6 is equal to 42 and not 43. The notion of block scope exists in languages such as C++, in the latest JavaScript specification (using the “let” keyword), in Java or C# (although in the latter two cases, the compiler seems to refuse to declare the same variable several times in the same function)
In Python however, all declarations are implicit, and there is no concept of block scope.
]]>