Now that we are going to be pushing hard on the proxy point for the reverse bridge, I need to improve the efficiency of the path. Reviewing the code, I see two key points that can be improved.
-
The method lookup currently is going through a very slow path jstring-> (UTF8 translator ->C++ string-> PyString -> Dict lookup. I can put a direct PyObject representation in the JPMethodDispatch which is precached with the name of the method. As the methods are always static and we eagerly create them, I can have the string representation already available in advance on proxy instantiation. So that would the turn a proxy downcall into a single long reference to the lookup point.
-
The main argument path requires a tuple of wrapped types. This requires one upcall for findClassForObject per item. But we just left the Java side which already has that information available. This one would be a bit harder to pull off without creating GC pressure on the Java side as I would need a thread local workspace to represent the array traffic for the exchange. That saves one trip per argument passed. We still pay for a lot of Python wrapping though so the cost savings it not as clear cut as the string lookup.
I wish I had a better way to benchmark this point as currently I have limited visibility other than counting operations. I may need a separate PR as like the previous proxy improvements this is a lot of structural work.
Now that we are going to be pushing hard on the proxy point for the reverse bridge, I need to improve the efficiency of the path. Reviewing the code, I see two key points that can be improved.
The method lookup currently is going through a very slow path jstring-> (UTF8 translator ->C++ string-> PyString -> Dict lookup. I can put a direct PyObject representation in the JPMethodDispatch which is precached with the name of the method. As the methods are always static and we eagerly create them, I can have the string representation already available in advance on proxy instantiation. So that would the turn a proxy downcall into a single long reference to the lookup point.
The main argument path requires a tuple of wrapped types. This requires one upcall for findClassForObject per item. But we just left the Java side which already has that information available. This one would be a bit harder to pull off without creating GC pressure on the Java side as I would need a thread local workspace to represent the array traffic for the exchange. That saves one trip per argument passed. We still pay for a lot of Python wrapping though so the cost savings it not as clear cut as the string lookup.
I wish I had a better way to benchmark this point as currently I have limited visibility other than counting operations. I may need a separate PR as like the previous proxy improvements this is a lot of structural work.