I post a comment to the PR showing its performance improvement. I paste it below. I think the result not bad.
-------------
I use timeit to measure time costs and any other operators or calls are excluded. For each testcase, the former is dcd2796 and the latter is this PR's base 036bbb1.
64-bit Release building. Run in Windows 10 1709 (64-bit)
python -m timeit " i = 1; i <<= 3; i >>= 3" # small value (cost down by 36%)
5000000 loops, best of 5: 92.7 nsec per loop
2000000 loops, best of 5: 145 nsec per loop
python -m timeit " i = 1; i <<= 10; i >>= 10" # medium value (-25%)
2000000 loops, best of 5: 114 nsec per loop
2000000 loops, best of 5: 151 nsec per loop
python -m timeit " i = 1; i <<= 100; i >>= 100" # big value (-12%)
1000000 loops, best of 5: 209 nsec per loop
1000000 loops, best of 5: 238 nsec per loop |