-
-
Notifications
You must be signed in to change notification settings - Fork 26.7k
Description
Description
Sometimes one trains a classifier on a sample and then has to run it on a massive dataset which can be very slow. It would be great if predict and predict_proba had a n_jobs parameter so this could be done in parallel on multi core machines or alternatively if there was a standard copy and pasteable solution.
My guess is that one challenge to do this efficiently is to avoid copying large sections of the training matrix but other than that it should be embarrassingly parallel.
http://stackoverflow.com/questions/31449291/how-to-parallelise-predict-method-of-a-scikit-learn-svm-svc-classifier suggests a workaround, although untested.
For random forests I am aware of https://github.com/ajtulloch/sklearn-compiledtrees which speeds up prediction as well although it does not work yet for classification.