This repository contains parsers from Python code to xml/json and vice versa. This includes parsers for python2 (see pythonparser, source code from this repository) and python3 (see pythonparser3, source code from this repository and this project).
We are going to support Python 3.8 in python3 parser:
Here you can read about all new features that Python 3.8 provides.
-
python2:
pip install -r requirements.txt -
python3:
pip3 install -r requirements.txt -
python3 tests:
pip3 install -r requirements-test.txt
-
python2:
python pythonparser_2 path_to_src_file.py -
python3:
python3 pythonparser_3 path_to_src_file.py
To run tests for python3 parser:
python3 -m pytest
Here are several examples of using python3 parser.
First example
a = 5
b = 16.5
print(a + b)
<Module lineno="1" col="0" end_line_no="3" end_col="12">
<Assign lineno="1" col="0" end_line_no="1" end_col="5">
<Name_Store value="a" lineno="1" col="0" end_line_no="1" end_col="1">
</Name_Store>
<Constant-int value="5" lineno="1" col="4" end_line_no="1" end_col="5">
</Constant-int>
</Assign>
<Assign lineno="2" col="0" end_line_no="2" end_col="8">
<Name_Store value="b" lineno="2" col="0" end_line_no="2" end_col="1">
</Name_Store>
<Constant-float value="16.5" lineno="2" col="4" end_line_no="2" end_col="8">
</Constant-float>
</Assign>
<Expr lineno="3" col="0" end_line_no="3" end_col="12">
<Call lineno="3" col="0" end_line_no="3" end_col="12">
<Name_Load value="print" lineno="3" col="0" end_line_no="3" end_col="5">
</Name_Load>
<BinOp_Add lineno="3" col="6" end_line_no="3" end_col="11">
<Name_Load value="a" lineno="3" col="6" end_line_no="3" end_col="7">
</Name_Load>
<Name_Load value="b" lineno="3" col="10" end_line_no="3" end_col="11">
</Name_Load>
</BinOp_Add>
</Call>
</Expr>
</Module>Second example
# Test example
from ast import NodeVisitor
class Example(NodeVisitor):
def generic_visit(self, node):
print(type(node).__name__)
NodeVisitor.generic_visit(self, node)
<Module lineno="1" col="0" end_line_no="9" end_col="45">
<ImportFrom-0 value="ast" lineno="3" col="0" end_line_no="3" end_col="27">
<alias value="NodeVisitor" lineno="3" col="0" end_line_no="3" end_col="4">
</alias>
</ImportFrom-0>
<ClassDef value="Example" lineno="6" col="0" end_line_no="9" end_col="45">
<bases lineno="6" col="0" end_line_no="9" end_col="45">
<Name_Load value="NodeVisitor" lineno="6" col="14" end_line_no="6" end_col="25">
</Name_Load>
</bases>
<keywords lineno="6" col="0" end_line_no="9" end_col="45">
</keywords>
<body lineno="6" col="0" end_line_no="9" end_col="45">
<FunctionDef value="generic_visit" lineno="7" col="4" end_line_no="9" end_col="45">
<arguments lineno="7" col="22" end_line_no="7" end_col="32">
<posonlyargs lineno="7" col="22" end_line_no="7" end_col="32">
</posonlyargs>
<args lineno="7" col="22" end_line_no="7" end_col="32">
<arg value="self" lineno="7" col="22" end_line_no="7" end_col="26">
</arg>
<arg value="node" lineno="7" col="28" end_line_no="7" end_col="32">
</arg>
</args>
<kwonlyargs lineno="7" col="22" end_line_no="7" end_col="32">
</kwonlyargs>
<kw_defaults lineno="7" col="22" end_line_no="7" end_col="32">
</kw_defaults>
<defaults lineno="7" col="22" end_line_no="7" end_col="32">
</defaults>
</arguments>
<body lineno="7" col="4" end_line_no="9" end_col="45">
<Expr lineno="8" col="8" end_line_no="8" end_col="34">
<Call lineno="8" col="8" end_line_no="8" end_col="34">
<Name_Load value="print" lineno="8" col="8" end_line_no="8" end_col="13">
</Name_Load>
<Attribute_Load lineno="8" col="14" end_line_no="8" end_col="33">
<Call lineno="8" col="14" end_line_no="8" end_col="24">
<Name_Load value="type" lineno="8" col="14" end_line_no="8" end_col="18">
</Name_Load>
<Name_Load value="node" lineno="8" col="19" end_line_no="8" end_col="23">
</Name_Load>
</Call>
<attr value="__name__" lineno="8" col="14" end_line_no="8" end_col="33">
</attr>
</Attribute_Load>
</Call>
</Expr>
<Expr lineno="9" col="8" end_line_no="9" end_col="45">
<Call lineno="9" col="8" end_line_no="9" end_col="45">
<Attribute_Load lineno="9" col="8" end_line_no="9" end_col="33">
<Name_Load value="NodeVisitor" lineno="9" col="8" end_line_no="9" end_col="19">
</Name_Load>
<attr value="generic_visit" lineno="9" col="8" end_line_no="9" end_col="33">
</attr>
</Attribute_Load>
<Name_Load value="self" lineno="9" col="34" end_line_no="9" end_col="38">
</Name_Load>
<Name_Load value="node" lineno="9" col="40" end_line_no="9" end_col="44">
</Name_Load>
</Call>
</Expr>
</body>
<decorator_list lineno="7" col="4" end_line_no="9" end_col="45">
</decorator_list>
</FunctionDef>
</body>
<decorator_list lineno="6" col="0" end_line_no="9" end_col="45">
</decorator_list>
</ClassDef>
</Module>This section describes the format of the tree that python3 parser produces.
The produced tree is a valid XML document. Each node in the document corresponds to a node
of Python Abstract syntax tree (AST).
Since the second version of the GumTree library takes into account only the label of the node,
the value attribute, the token position attributes, and nothing else, we have to include
additional information about some nodes into their labels.
So, it is necessary to note several nuances of the format:
-
Operations are directly included into the label of the node. They follow
the underscore.Example
A node with the
BinOp_Addlabel is aBinOp(binary operation) node and the operation of that node is addition. -
Expression context is directly included into the label of the nod. It follows
the underscore.Example
A node with the
Name_Loadlabel is aNamenode and the context of thatNameisLoad, which means that we "load" or "read" the content held by theNamenode -
The type of the value contained in the constant node (
Constant,Num,Str) is directly included into the label of the node. It followsthe hyphen.Example
A node with the
Constant-floatlabel is theConstantnode and the value contained in it has thefloattype. -
Import level is directly included into the
ImportFromnode label. It followsthe hyphen.Example
A node with the
ImportFrom-3label is anImportFromnode and the import level is 3.
Note: Token position attributes are: lineno, col, end_line_no, end_col. They exist in order to determine the position of the token.
This is a guide on how to use GumTree with Pythonparser.
-
Download GumTree
More details
The stable version of GumTree can be found here. The version is
2.1.2, you should download the source code. Do not clone the repository, because it will give you an unstable version. -
Build GumTree
More details
After you downloaded and extracted the archive, open it as a new IDEA project. While you are in the root, open the IDEA terminal (console), and build this project by running
./gradlew build -x testfor UNIX systems andgradlew.bat build -x teston Windows (it can have some troubles with Windows, see this issue).To check if this step is done — new folder
buildshould appear in the/distdirectory. To get the runnable bash-script you should extract the archivegumtree-2.1.2.zipin
build/distributions/. Do it manually and put all the files in the same directory. Create two files that you want to compare as a test. The resulting directory tree should look like this:Now you can check if this bash script works: run
./gumtreecommand in the terminal with no parameters. If you receive this message, then everything is fine: -
Add Pythonparser
More details
Originally pythonparser came from this repository. But it was modified by us and now you can use the version from the current repository.
- Firstly, download
requirements.txtfrom the repository here and put it in the root of your gumtree project. Install all the requirements by runningpip3 install -r requirements.txtin the IDEA terminal.
- Secondly, take pythonparser_3.py
and place it into the
/tmpdirectory on your laptop. Rename the file into"pythonparser", without any extensions like".py". The type of this file should be "Python 3 script (text/x-python3)". If it is different, check the header of the file. The first line should be"#!/usr/bin/env python3". Make this file executable by runningchmod +x /pathToYourFile/pythonparser.
- Next, add the
/tmpdirectory to thegumtree project PATH. If you want to do it temporarily (before reboot), then open the terminal in the GumTree project in/dist/build/distributions/gumtree-2.1.2/bindirectory and insert:export PATH=$PATH:/tmp. This command temporary (before reboot) adds/tmpto the list of directories where your project will check for the parser file. You can check if it is added to the PATH by using echo$PATH.
- Firstly, download
-
Run Gumtree with python parser
More details
Now everything is done and you can run the project using
./gumtree diff file1.py file2.py.

