Add monkey patching of the automl clients#613
Conversation
patches/kaggle_gcp.py
Outdated
| kwargs['credentials'] = kaggle_kernel_credentials | ||
| # Note: This is only here so that unit tests can check whether | ||
| # credentials were set properly. | ||
| self._kaggle_credentials = kwargs['credentials'] |
There was a problem hiding this comment.
I don't like this, but it's the only way that I've been able to verify that the patch actually runs and does something.
There was a problem hiding this comment.
Another possible way to test is to mock patch the transport (https://github.com/googleapis/google-cloud-python/blob/master/automl/google/cloud/automl_v1beta1/gapic/auto_ml_client.py#L264) and verify that the credentials passed in are the ones you're expecting, sort of like in https://github.com/Kaggle/docker-python/blob/master/tests/test_bigquery.py#L108
There was a problem hiding this comment.
Done. Thanks for the tip! As it turns out I had to do something similar in my PR for the AutoML client libraries, so I've spent a lot of time with patch these last few days.
ifigotin
left a comment
There was a problem hiding this comment.
LGTM
Yes, I can see that credentials are many layers deep. I thought about other possible ways to test the monkey patching for AutoML, but nothing comes to mind yet. Maybe other reviewers will have some ideas.
Also, it is a pity that a project id cannot be passed to internal GcsClient - it would have been nice! Have you tried reaching out to the team that own AutoML library? Since it is v1beta1 0.5 :) maybe the method signatures could still change, etc.
Thanks for the reminder! I just sent a message to the author about these issues. These clients are new in the 0.5.0 version of the library and seem like they still have a few issues. The only signature change required would be for the GcsClient constructor to accept an optional project parameter and pass it on. Hopefully we can get that fixed, but maybe not in time for our initial release. |
Is this the issue you hit? googleapis/google-cloud-python#8108 |
patches/kaggle_gcp.py
Outdated
| return automl_tablesclient_init(self, *args, **kwargs) | ||
|
|
||
| if (not has_been_monkeypatched(automl.TablesClient.__init__)): | ||
| automl.TablesClient.__init__ = monkeypatch_tablesclient |
There was a problem hiding this comment.
Totally a nit, and no change needed, but may make sense to do this monkeypatch_* as as higher level function so we don't need to duplicate that so many times?
There was a problem hiding this comment.
Done. Good idea! It cleaned up a lot of code.
mcollins42
left a comment
There was a problem hiding this comment.
The automl_v1beta1.GcsClient can't be patched because no project is passed in anywhere and setting the credentials without a project causes the underlying storage.Client to try to find a default project in the environment which fails. Passing a storage.Client into the GcsClient constructor works as expected and we've already patched the GcsClient, so that would be the preferred way to set up TablesClient and GcsClient.
Is this the issue you hit? googleapis/google-cloud-python#8108
Not exactly. The AutoML Tables GcsClient didn't accept a project as a parameter and was passing credentials to the GCS storage.Client without a project which causes it to look for a PROJECT_ID environment variable.
I'm not sure how best to test the credential patching. The AutoML clients don't have a credentials attribute. They pass the credentials 5 layers deep into the gRPC library and it's not clear that the credentials end up as an attribute on any of the objects. I think they just end up captured in a closure.
I added a _kaggle_credentials attribute so that the unit tests can at least verify that the credentials were set in the monkeypatch_XXX functions, but it seems questionable to leave that there in the final image.
The automl_v1beta1.GcsClient can't be patched because no project is passed in anywhere and setting the credentials without a project causes the underlying storage.Client to try to find a default project in the environment which fails. Passing a storage.Client into the GcsClient constructor works as expected and we've already patched the GcsClient, so that would be the preferred way to set up TablesClient and GcsClient.
I'm checking for the 0.5.0 or higher library version because that is the latest version and the first version that includes the TablesClient and GcsClient.