Skip to content

Add monkey patching of the automl clients#613

Merged
mcollins42 merged 4 commits intomasterfrom
patch-automl
Sep 26, 2019
Merged

Add monkey patching of the automl clients#613
mcollins42 merged 4 commits intomasterfrom
patch-automl

Conversation

@mcollins42
Copy link
Copy Markdown
Contributor

I'm not sure how best to test the credential patching. The AutoML clients don't have a credentials attribute. They pass the credentials 5 layers deep into the gRPC library and it's not clear that the credentials end up as an attribute on any of the objects. I think they just end up captured in a closure.

I added a _kaggle_credentials attribute so that the unit tests can at least verify that the credentials were set in the monkeypatch_XXX functions, but it seems questionable to leave that there in the final image.

The automl_v1beta1.GcsClient can't be patched because no project is passed in anywhere and setting the credentials without a project causes the underlying storage.Client to try to find a default project in the environment which fails. Passing a storage.Client into the GcsClient constructor works as expected and we've already patched the GcsClient, so that would be the preferred way to set up TablesClient and GcsClient.

I'm checking for the 0.5.0 or higher library version because that is the latest version and the first version that includes the TablesClient and GcsClient.

kwargs['credentials'] = kaggle_kernel_credentials
# Note: This is only here so that unit tests can check whether
# credentials were set properly.
self._kaggle_credentials = kwargs['credentials']
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, but it's the only way that I've been able to verify that the patch actually runs and does something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible way to test is to mock patch the transport (https://github.com/googleapis/google-cloud-python/blob/master/automl/google/cloud/automl_v1beta1/gapic/auto_ml_client.py#L264) and verify that the credentials passed in are the ones you're expecting, sort of like in https://github.com/Kaggle/docker-python/blob/master/tests/test_bigquery.py#L108

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the tip! As it turns out I had to do something similar in my PR for the AutoML client libraries, so I've spent a lot of time with patch these last few days.

Copy link
Copy Markdown
Contributor

@ifigotin ifigotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Yes, I can see that credentials are many layers deep. I thought about other possible ways to test the monkey patching for AutoML, but nothing comes to mind yet. Maybe other reviewers will have some ideas.

Also, it is a pity that a project id cannot be passed to internal GcsClient - it would have been nice! Have you tried reaching out to the team that own AutoML library? Since it is v1beta1 0.5 :) maybe the method signatures could still change, etc.

@mcollins42
Copy link
Copy Markdown
Contributor Author

Also, it is a pity that a project id cannot be passed to internal GcsClient - it would have been nice! Have you tried reaching out to the team that own AutoML library? Since it is v1beta1 0.5 :) maybe the method signatures could still change, etc.

Thanks for the reminder! I just sent a message to the author about these issues. These clients are new in the 0.5.0 version of the library and seem like they still have a few issues. The only signature change required would be for the GcsClient constructor to accept an optional project parameter and pass it on. Hopefully we can get that fixed, but maybe not in time for our initial release.

@vimota
Copy link
Copy Markdown
Contributor

vimota commented Sep 20, 2019

The automl_v1beta1.GcsClient can't be patched because no project is passed in anywhere and setting the credentials without a project causes the underlying storage.Client to try to find a default project in the environment which fails. Passing a storage.Client into the GcsClient constructor works as expected and we've already patched the GcsClient, so that would be the preferred way to set up TablesClient and GcsClient.

Is this the issue you hit? googleapis/google-cloud-python#8108

return automl_tablesclient_init(self, *args, **kwargs)

if (not has_been_monkeypatched(automl.TablesClient.__init__)):
automl.TablesClient.__init__ = monkeypatch_tablesclient
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally a nit, and no change needed, but may make sense to do this monkeypatch_* as as higher level function so we don't need to duplicate that so many times?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Good idea! It cleaned up a lot of code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🙌

Copy link
Copy Markdown
Contributor

@vimota vimota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@vimota vimota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice update!

Copy link
Copy Markdown
Contributor Author

@mcollins42 mcollins42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automl_v1beta1.GcsClient can't be patched because no project is passed in anywhere and setting the credentials without a project causes the underlying storage.Client to try to find a default project in the environment which fails. Passing a storage.Client into the GcsClient constructor works as expected and we've already patched the GcsClient, so that would be the preferred way to set up TablesClient and GcsClient.

Is this the issue you hit? googleapis/google-cloud-python#8108

Not exactly. The AutoML Tables GcsClient didn't accept a project as a parameter and was passing credentials to the GCS storage.Client without a project which causes it to look for a PROJECT_ID environment variable.

Copy link
Copy Markdown
Contributor

@ifigotin ifigotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mcollins42 mcollins42 merged commit 89f1e09 into master Sep 26, 2019
@rosbo rosbo deleted the patch-automl branch December 10, 2019 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants