Just like Java or C#, CPython is compiling the code into bytecode which is then interpreted by a virtual machine. The Python library dis allows to disassemble Python code and to see how are things are compiled under the hood. Consider the following code:
>>> def test(): ... for i in range(10): ... print(i) ...
You can call dis.dis(test) to display the compiled bytecode, and dis.show_code(test) to understand the symbols referenced by that bytecode.
>>> import dis
>>> dis.dis(test)
2 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
3 19 LOAD_GLOBAL 1 (print)
22 LOAD_FAST 0 (i)
25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
28 POP_TOP
29 JUMP_ABSOLUTE 13
>> 32 POP_BLOCK
>> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
>>> dis.show_code(test)
Name: test
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 1
Stack size: 3
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: 10
Names:
0: range
1: print
Variable names:
0: i
Understanding the bytecode
The output of dis.dis() is split in 4 columns:
- The first column represents the line number (line 2 is “for i in range(10):”, line 3 is “print(i)”)
- The second column is the bytecode offset in bytes (each bytecode instruction takes a certain number of bytes).
- The third column is the bytecode instruction (see the dis documentation for the instruction reference)
- The fourth column is the instruction argument, when any (in parenthesis is a more human-readable translation of this argument)
The CPython bytecode works in a very similar way than assembly languages such as x86 or ARM assembly. It is heavily relying on a stack where it will push arguments on top (using instructions such as LOAD_GLOBAL, LOAD_CONST, LOAD_FAST) and pop arguments off it (POP_TOP). Some instruction will also push a result on the stack.
Finally, like regular assembly languages, the CPython bytecode flow is heavily regulated by direct jumps (JUMP_ABSOLUTE) or relative jumps (e.g. 13 bytes after the next instruction)
However, the CPython bytecode contains high-level instructions such as GET_ITER or MAKE_FUNCTION that are not present in a traditional assembler.
For those who want to get more informations about do the bytecode instructions do, you can look at the C source code in Python/ceval.c.
Loops
The dis.dis() output above highlights the bytecode instructions that implement a loop:
| Instruction | Description |
|---|---|
| SETUP_LOOP 30 | Pushes a loop block on the block stack. Argument 30 indicates that the loop ends at offset 30 after the next function (so at offset 3+30 = 33) |
| GET_ITER | Use the iterator which is (supposed to be) on top of the stack. This iterator is the range(10) computed before. See the next section for details about how function calls are implemented) |
| FOR_ITER 16 | Goes through the iterator and pushes the next value on the stack. When we’re at the end of the iterator, jump at offset 16 after the next function (so at offset 16+16=32) |
| STORE_FAST 0 | Stores what is on the stack to variable #0 (dis.show_code indicates this is variable i) |
| JUMP_ABSOLUTE 13 | Jumps at offset 13 (so back to FOR_ITER) |
| POP_BLOCK | The loop has ended, remove the loop block from the block stack |
Function calls
Python function calls are implemented by putting some arguments on the stack using bytecode instructions such as LOAD_GLOBAL (add a global variable), LOAD_CONST (add a constant), LOAD_FAST (add a local variable), etc. Let’s look at how the print(i) gets implemented:
3 19 LOAD_GLOBAL 1 (print)
22 LOAD_FAST 0 (i)
25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
28 POP_TOP
| Instruction | Description | Stack |
|---|---|---|
| LOAD_GLOBAL 1 | Adds the global name #1 on the stack. dis.show_code() indicates that name #1 is “print”. So we’re effectively pushing a reference to the print function on top of the stack |
|
| LOAD_FAST 0 | Adds the local variable name #0 on the stack. dis.show_code() indicates that variable #1 is the variable i |
i print |
| CALL_FUNCTION 1 | Calls a function with one positional argument (add 256 for each keyword argument, e.g. foo(x=10)). This tells Python to call print with the variable i as an argument. The function will remove the two elements from the stack and put instead the result (even if it is None) |
None |
| POP_TOP | Because we are not using the value returned by print(), we just remove it from the stack. |
Actual binary code
The dis library is relying on the __code__ attribute but makes it much easier to use. e.g. instead of the actual binary returned by __code__.co_code, dis.dis displays the bytecode functions in plain English.
>>> test.__code__
<code object test at 0x0000000002258270, file "<stdin>", line 1>
>>> test.__code__.co_code
b'x\x1e\x00t\x00\x00d\x01\x00\x83\x01\x00D]\x10\x00}\x00\x00t\x01\x00|\x00\x00\x83\x01\x00\x01q\r\x00Wd\x00\x00S'
>>> list(test.__code__.co_code)
[120, 30, 0, 116, 0, 0, 100, 1, 0, 131, 1, 0, 68, 93, 16, 0, 125, 0, 0, 116, 1, 0, 124, 0, 0, 131, 1, 0, 1, 113, 13, 0, 87, 100, 0, 0, 83]
>>> test.__code__.co_varnames
('i',)
>>> test.__code__.co_nlocals
1
>>> test.__code__.co_names
('range', 'print')
The line highlighted above corresponds to the actual compiled binary. The first number (120) is the opcode for SETUP_LOOP (look for 120 in Include/opcode.h), the next two numbers (30 and 0) represent the argument, or 30. Likewise, the “13” towards the end of the array corresponds to the argument passed to the “JUMP_ABSOLUTE 13” command. But using dis.dis() is definitely easier!
When dis.dis() is useful
We will see uses for dis.dis() in future posts, but for now one of its uses is when a piece of code does not behave as expected and you don’t know why. Consider the following code:
>>> a / b / c / d Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: division by zero
We have a division by zero, but what variable is the cause for that error? If the code was inside some more complex code, it would require to step into the debugger. dis.dis() can however help shed some light on what happened.
>>> dis.dis()
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
--> 6 BINARY_TRUE_DIVIDE
7 LOAD_NAME 2 (c)
10 BINARY_TRUE_DIVIDE
11 LOAD_NAME 3 (d)
14 BINARY_TRUE_DIVIDE
15 PRINT_EXPR
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
The line with the arrow (highlighted) indicates where the crash occurred: when trying to divide a with b (they have already been pushed on the stack). So we know that b is equal to zero.