Configure benchmarks by editing the config.json file.
You can configure some algorithm parameters, datasets, a list of frameworks to use, and the usage of some environment variables.
Refer to the tables below for descriptions of all fields in the configuration file.
| Field Name |
Type |
Description |
| common |
Common Object |
REQUIRED common benchmarks setting: frameworks and input data settings |
| cases |
List[Case Object] |
REQUIRED list of algorithms, their parameters and training data |
| Field Name |
Type |
Description |
| data-format |
Union[str, List[str]] |
REQUIRED Input data format: numpy, pandas, or cudf. |
| data-order |
Union[str, List[str]] |
REQUIRED Input data order: C (row-major, default) or F (column-major). |
| dtype |
Union[str, List[str]] |
REQUIRED Input data type: float64 (default) or float32. |
| check-finitness |
List[] |
Check finiteness during scikit-learn input check (disabled by default). |
| device |
array[string] |
For scikit-learn only. The list of devices to run the benchmarks on. It can be None (default, run on CPU without sycl context) or one of the types of sycl devices: cpu, gpu, host. Refer to SYCL specification for details. |
| Field Name |
Type |
Description |
| lib |
Union[str, List[str]] |
REQUIRED A test framework or a list of frameworks. Must be from [sklearn, daal4py, cuml, xgboost]. |
| algorithm |
string |
REQUIRED Benchmark file name. |
| dataset |
List[Dataset Object] |
REQUIRED Input data specifications. |
| specific algorithm parameters |
Union[int, float, str, List[int], List[float], List[str]] |
Other algorithm-specific parameters |
Important: You can move any parameter from "cases" to "common" if this parameter is common to all cases
| Field Name |
Type |
Description |
| source |
string |
REQUIRED Data source: synthetic, csv, or npy. |
| type |
string |
REQUIRED for synthetic data. The type of task for which the dataset is generated: classification, blobs, or regression. |
| n_classes |
int |
For synthetic data and for classification type only. The number of classes (or labels) of the classification problem |
| n_clusters |
int |
For synthetic data and for blobs type only. The number of centers to generate |
| n_features |
int |
REQUIRED for synthetic data. The number of features to generate. |
| name |
string |
Name of the dataset. |
| training |
Training Object |
REQUIRED An object with the paths to the training datasets. |
| testing |
Testing Object |
An object with the paths to the testing datasets. If not provided, the training datasets are used. |
| Field Name |
Type |
Description |
| n_samples |
int |
REQUIRED The total number of the training samples |
| x |
str |
REQUIRED The path to the training samples |
| y |
str |
REQUIRED The path to the training labels |
| Field Name |
Type |
Description |
| n_samples |
int |
REQUIRED The total number of the testing samples |
| x |
str |
REQUIRED The path to the testing samples |
| y |
str |
REQUIRED The path to the testing labels |