ComputeLocator

A Kubernetes-based platform that collects runtime metrics from heterogeneous containerized workloads for execution performance prediction experiments, supporting dynamic compute placement decisions.

Research context

This project extends the work of Kimovski et al. in "Cloud, Fog, or Edge: Where to Compute?" (IEEE Internet Computing, 2021) by implementing a dynamic, data-driven approach to compute placement. Where the paper provides static placement recommendations based on application and infrastructure characteristics, this system deploys real workloads across resource-constrained environments, collects execution data, and trains predictive models from observed performance.

Architecture

How it works

Deploy benchmark workloads on Kubernetes pods with different resource constraints (small: 100m CPU / 1Gi RAM, medium: 250m CPU / 2Gi RAM, large: 500m CPU / 4Gi RAM)
Execute diverse compute functions and collect 14+ process metrics at 1-second resolution via Prometheus, plus structured request logs with nanosecond timestamps via Loki
Aggregate and temporally align metrics with request windows through a custom aggregation API
Train an XGBoost multi-output regression model to predict execution performance from pod size, function type, and initial metric state

Benchmark workloads

Workload	Endpoint	Description	Key Tech
Dense Network	`/dense`	MNIST classifier with configurable layers and units, memory-efficient streaming batch loading	TensorFlow
Pi Estimation	`/pi`	Monte Carlo method, 100M vectorized samples in batches of 100K	NumPy
Video Encoding	`/video`	H.264 encoding at 720p / 1080p / 1440p profiles from a 4-second segment	FFmpeg
CPU Stress	`/ml`	Parameterized arithmetic loop for variable-load testing	Python

Collected metrics

All collected at 1-second resolution via Prometheus scraping psutil-based exporters:

Category	Metrics
CPU	Usage %, user time, system time
Memory	RSS, VMS, usage %
I/O	Read bytes, write bytes
Threading	Thread count, voluntary context switches, involuntary context switches
Network	Open connections
Runtime	Process uptime

Each API request also produces a structured JSON log (collected via Promtail → Loki) containing start/end timestamps in nanoseconds, endpoint, method, payload, and status.

ML pipeline

Data collection — The aggregation API joins Prometheus metric timeseries with Loki request logs, producing per-request records with aligned metric snapshots
Feature engineering — Flatten nested metric dictionaries, compute deltas for duration and CPU time, drop constant columns, one-hot encode pod size and function type
Training — XGBoost MultiOutputRegressor (100 estimators, max depth 5, learning rate 0.1) with 80/20 train-test split
Evaluation — MAE, RMSE, R², and MAPE per output metric; feature importance ranking; actual-vs-predicted scatter plots

Project structure

ComputeLocator/
├── kubernetes/
│   ├── target_apis/                  # Benchmark workload deployments
│   │   ├── api-small.yaml            # Pod: 100m CPU, 1Gi RAM
│   │   ├── api-medium.yaml           # Pod: 250m CPU, 2Gi RAM
│   │   ├── api-large.yaml            # Pod: 500m CPU, 4Gi RAM
│   │   └── api_code/
│   │       ├── server.py             # Flask API exposing /ml, /pi, /dense, /video
│   │       ├── functions/            # Benchmark implementations
│   │       │   ├── ML.py
│   │       │   ├── pi_estimation.py
│   │       │   ├── dense_network.py
│   │       │   └── video_encoding.py
│   │       ├── resources/            # MNIST dataset, source video
│   │       └── Dockerfile
│   ├── service_registry/             # Observability stack
│   │   ├── prometheus-server.yaml    # Prometheus, Loki, Promtail, aggregation API
│   │   └── custom_aggregation_api/
│   │       ├── server.py             # /history and /endpoints aggregation API
│   │       └── Dockerfile
│   └── load-balancer/                # Client gateway and ML training
│       ├── client-gateway.yaml       # Kubernetes Job manifest
│       └── code/
│           ├── main.py               # Orchestrates benchmark execution
│           ├── model_exploration.py   # XGBoost training and evaluation
│           ├── tested_api_client.py   # API client for benchmark functions
│           ├── cluster_data_client.py # Client for metrics aggregation
│           └── Dockerfile
└── project_resources/
    └── img.png                       # Architecture diagram

Current Tech stack

Python, Flask, pandas, Kubernetes, Docker, Prometheus, Loki, Promtail

Getting started

Prerequisites

minikube
Docker
kubectl

Build images

docker build -t sample_api ./kubernetes/target_apis/api_code/
minikube image load sample_api

docker build -t cluster_api ./kubernetes/service_registry/custom_aggregation_api/
minikube image load cluster_api

docker build -t client-gateway ./kubernetes/load-balancer/code/
minikube image load client-gateway

Deploy

kubectl apply -f ./kubernetes/target_apis/
kubectl apply -f ./kubernetes/service_registry/
kubectl apply -f ./kubernetes/load-balancer/

Access

Forward the aggregation API to your host:

kubectl port-forward deployment/prometheus-server 5000:5000

Retrieve collected data:

curl http://localhost:5000/history

Send a request to a benchmark pod:

kubectl port-forward deployment/python-api-large 8000:8000
curl -X POST http://localhost:8000/ml -H 'Content-Type: application/json' -d '{"input": 15}'
curl -X POST http://localhost:8000/pi -H 'Content-Type: application/json' -d '{"num_samples": 100000000}'
curl -X POST http://localhost:8000/dense -H 'Content-Type: application/json' -d '{"epochs": 5, "num_layers": 2}'
curl -X POST http://localhost:8000/video -H 'Content-Type: application/json' -d '{"profile": 1}'

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
doc		doc
kubernetes		kubernetes
project_resources		project_resources
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComputeLocator

Research context

Architecture

How it works

Benchmark workloads

Collected metrics

ML pipeline

Project structure

Current Tech stack

Getting started

Prerequisites

Build images

Deploy

Access

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

AlphaOrOmega/ComputeLocator

Folders and files

Latest commit

History

Repository files navigation

ComputeLocator

Research context

Architecture

How it works

Benchmark workloads

Collected metrics

ML pipeline

Project structure

Current Tech stack

Getting started

Prerequisites

Build images

Deploy

Access

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages