Skip to content

Fix downloading MLmodel files from alias-based models:/ URIs#8764

Merged
smurching merged 3 commits intomlflow:masterfrom
smurching:fix-mlmodel-file-download
Jun 16, 2023
Merged

Fix downloading MLmodel files from alias-based models:/ URIs#8764
smurching merged 3 commits intomlflow:masterfrom
smurching:fix-mlmodel-file-download

Conversation

@smurching
Copy link
Collaborator

@smurching smurching commented Jun 15, 2023

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Follow-up to #8728, this PR fixes downloading MLmodel files when loading models with URIs like models:/mymodel@alias. The current logic for this attempts to append the MLmodel file path to the URI and then download from an artifact URI like models:/mymodel@alias/MLmodel, which is not a valid URI and cannot be parsed.

How is this patch tested?

Updated unit tests. Also manually verified the PR fixes a bug in a DAIS demo 😅

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests (describe details, including test results, below)

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

)
ml_model_file = _download_artifact_from_uri(
artifact_uri=append_to_uri_path(model_uri, MLMODEL_FILE_NAME)
ml_model_file = get_artifact_repository(artifact_uri=model_uri).download_artifacts(
Copy link
Collaborator Author

@smurching smurching Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assuming that we can append /MLmodel to an artifact URI to get a valid artifact URI (which is not generally true), we instead construct an ArtifactRepository against the original artifact URI (e.g. models:/mymodels@alias) and then call download_artifacts with the actual file path (MLModel). We should make similar fixes elsewhere, I'll do that in a follow-up PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - another alternative to this could be trying to support URIs like models:/mymodel@alias/path/to/file. Technically this is possible since aliases can only contain alphanumeric characters, would reduce the set of changes needed in the immediate-term, and would allow for things like mlflow.artifacts.download_artifacts(artifact_uri="models:/mymodel@alias/path/to/file"). @harupy I saw you made a similar fix in #5921

The main downside I think is that there may be artifact store types for which appending a /path/to/file at the end of the URI isn't valid (e.g. any artifact store where there are query parameters in the URI). But we could deal with that if/when we need to

Copy link
Collaborator Author

@smurching smurching Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not sure my suggestion will work. The problem is that we don't restrict the set of allowed characters in registered model names in OSS. That means that a URI like models:/mymodel@myalias/1/to/file could ambiguously be interpreted as (1) targeting the file at path 1/to/file under the version of mymodel associated with the alias myalias or (2) targeting the model named mymodel@alias, version 1, file path /to/file. So unfortunately I think we can't support this style of URI for aliases and should adopt the fix in this PR to download specific files from model versions referenced by alias

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the safer option. Just checked cloud provider docs for object store and they really don't have many restrictions on key contents anyway.

def _improper_model_uri_msg(uri):
return (
"Not a proper models:/ URI: %s. " % uri
+ "Models URIs must be of the form 'models:/<model_name>/<suffix>' "
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message doesn't render well in notebooks, since <model_name> is rendered/escaped as an HTML tag; fix it by removing the angle brackets

Copy link
Member

@harupy harupy Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that, makes sense!

Copy link
Member

@harupy harupy Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

lol (looks like a bug, not MLflow's)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably look for other cases of this.... I had no idea that it would parse as a tag

Copy link
Member

@harupy harupy Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another trick:

image

Signed-off-by: Sid Murching <[email protected]>
@mlflow-automation
Copy link
Contributor

mlflow-automation commented Jun 15, 2023

Documentation preview for 4bbdd07 will be available here when this CircleCI job completes successfully.

More info

@smurching smurching added the rn/none List under Small Changes in Changelogs. label Jun 16, 2023
Signed-off-by: Sid Murching <[email protected]>
Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@smurching smurching merged commit 6c1b84b into mlflow:master Jun 16, 2023
BenWilson2 pushed a commit to BenWilson2/mlflow that referenced this pull request Jul 5, 2023
…8764)

* Fix downloading MLmodel files from alias URIs

Signed-off-by: Sid Murching <[email protected]>

* Fix test

Signed-off-by: Sid Murching <[email protected]>

* use variable

Signed-off-by: Sid Murching <[email protected]>

---------

Signed-off-by: Sid Murching <[email protected]>
BenWilson2 pushed a commit to BenWilson2/mlflow that referenced this pull request Jul 7, 2023
…8764)

* Fix downloading MLmodel files from alias URIs

Signed-off-by: Sid Murching <[email protected]>

* Fix test

Signed-off-by: Sid Murching <[email protected]>

* use variable

Signed-off-by: Sid Murching <[email protected]>

---------

Signed-off-by: Sid Murching <[email protected]>
BenWilson2 pushed a commit that referenced this pull request Jul 7, 2023
* Fix downloading MLmodel files from alias URIs

Signed-off-by: Sid Murching <[email protected]>

* Fix test

Signed-off-by: Sid Murching <[email protected]>

* use variable

Signed-off-by: Sid Murching <[email protected]>

---------

Signed-off-by: Sid Murching <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rn/none List under Small Changes in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants