Skip to content

Parallelize predict in classifiers #7448

@lesshaste

Description

@lesshaste

Description

Sometimes one trains a classifier on a sample and then has to run it on a massive dataset which can be very slow. It would be great if predict and predict_proba had a n_jobs parameter so this could be done in parallel on multi core machines or alternatively if there was a standard copy and pasteable solution.

My guess is that one challenge to do this efficiently is to avoid copying large sections of the training matrix but other than that it should be embarrassingly parallel.

http://stackoverflow.com/questions/31449291/how-to-parallelise-predict-method-of-a-scikit-learn-svm-svc-classifier suggests a workaround, although untested.

For random forests I am aware of https://github.com/ajtulloch/sklearn-compiledtrees which speeds up prediction as well although it does not work yet for classification.

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions