-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Context
I am conducting a scientific experiment on AutoML reproducibility across five TPOT versions (0.11.6, 0.11.7, 0.12.1, 0.12.2, and 1.0.0). I am running these on a local machine (no cloud infrastructure) using Docker containers (Linux).
The Issue
I have observed a significant runtime regression in v1.0.0 compared to all previous versions, despite using the same dataset and hyperparameter configuration.
Dataset: Regression task, 638 samples, 13 features.
Dependencies: xgboost is installed and available in all environments.
Hardware: Local PC (Docker), identical resources allocated for all runs.
Observed Results
TPOT v0.11.6 / v0.11.7 (Python 3.8): Runtime ~4-5 minutes.
TPOT v0.12.1 / v0.12.2 (Python 3.10): Runtime ~4-5 minutes.
TPOT v1.0.0 (Python 3.10): Runtime ~30 minutes.
Configuration
The code logic is identical across versions (with v1.0.0 wrapped in if name == 'main': to support the Dask backend).
Standard configuration used in all tests
pipeline_optimizer = TPOTRegressor(
generations=20,
population_size=20,
cv=5,
random_state=seed,
verbose=0
)
In v1.0.0, this is run inside 'if name == "main":'
pipeline_optimizer.fit(X_train, y_train)
Questions
Is this drastic slowdown expected for small datasets due to the overhead of the Dask backend (introduced in v1.0.0) compared to the older multiprocessing backend?
Has the default config_dict changed in v1.0.0 to prioritize significantly heavier estimators (e.g., more aggressive use of XGBoost or Stacking) compared to v0.12.x, even when XGBoost was present in the older environments?
Is there a recommended configuration to restore the runtime profile of the older versions for benchmarking purposes?