Sets up Consul, Nomad & Vault Servers & Clients given an Cloud specific config, with full observability via Grafana, Prometheus, Loki & Tempo + Fabio LB as an ingress. Currently supports Hetzner, AWS, Azure & GCP coming soon.
Networking is setup so your bastion host (allowed_ips) have full access to the cluster, while Cloudflare IPs have access to ports 80 and 443.
Below is a reference architecture of what is created, and how it should be used. Velodrome concerns itself with the left side of the diagram, GitOps repo and operator is for you to implement (we may build something for this later):
Your machine/operator node will need the following pre-installed (velodrome will check for presence before execution):
nomadconsulvaultansiblecfssl&cfssljson
You probably want to also use git secret to protect your [base_dir]/secrets directory in the generated files.
Additionally, direnv will make life easier, as velodrome genenv --config.file [config] will generate a direnv compatible .envrc file for you.
To ensure other env variables are preserved with velodrome genenv, just add this line into an existing .envrc file:
### GENERATED CONFIG BELOW THIS LINE, DO NOT EDIT!
- An SSL certificate with cert, key and ca-file. Can easily be generated with for instance
Cloudflare(network setup has been tested primarily with Cloudflare) - An SSH key and project already setup in Hetzner (when using Hetzner).
- The following 4 environment variables set in your environment (S3 settings can be any S3 compatible store, including Cloudflare R2, this is used for Observability stack long-term storage):
S3_ENDPOINTS3_ACCESS_KEYS3_SECRET_KEYHETZNER_TOKEN(generated from your Hetzner account)
- a
config.yamlfile. Please review the file with similar name in the root of this directory for options. Ensure that the IP of your machine/bastion host is in theallowed_ipssection.
Once all of the above steps are setup, just run velodrome sync --config.file [config file]. If no cluster exists, it will be setup for you. If one exists, it will be synced with your config, setting up the entire cluster.
Loki: http://loki-http.service.consul:3100
Prometheus: http://prometheus.service.consul:9000
Tempo: http://tempo.service.consul:3200
- Nomad: add dashboard ID 15764
- Nodes: add dashboard ID 12486
This linking assumes your app is setup as this example: demo-app. Important is that logs are also in json format. Add derived fields:
Name: trace_id
Regex: "trace_id":"([A-Za-z0-9]+)" // this is for json format
Query: ${__value.raw}
Url label: Trace
Internal link: Tempo
Load balancing has been tested with Cloudflare Load Balancer. Simply put the public IPs of your client nodes (found in config/inventory) into a Cloudflare LB. This can be automated with Terraform, or simply done manually through the Cloudflare interface.
To make DNS work, you will need your root-domain setup, as well as CNAMEs for any additional domains or subdomains.
By default, the cluster will try to setup the following:
grafana.[your management_domain](you still need to setup DNS with Cloudflare)- Once public, please change the default password immediately!
consul.[your management_domain](you still need to setup DNS with Cloudflare)- Username/password will be
consulandCONSUL_HTTP_TOKENyou get fromvelodrome genenv
- Username/password will be
There are examples in the examples/ folder of this repo.
to make specific app policies:
access.hcl
path "secret/*" { #some path in secrets
capabilities = ["read"]
}
vault policy write backend access.hcl
in nomad task definition:
vault {
policies = ["backend"] # policy given above
change_mode = "signal"
change_signal = "SIGUSR1"
}
export NOMAD_DATA_ROOT=«Path to your Nomad data_dir»
for ALLOC in `ls -d $NOMAD_DATA_ROOT/alloc/*`; do for JOB in `ls ${ALLOC}| grep -v alloc`; do umount ${ALLOC}/${JOB}/secrets; umount ${ALLOC}/${JOB}/dev; umount ${ALLOC}/${JOB}/proc; umount ${ALLOC}/${JOB}/alloc; done; done
- Harden servers
- Add SSH Key login
- Setup UFW firewall rules
- Template to allow hosts in cluster access to all ports
- Restart firewall
- Disable password login
- Run firewall script
- Install all required software
- Consul setup
- Setup cluster secrets
- Template configs
- Add configs to cluster
- Systemctl script & startup
- Verify cluster setup
- Automate consul ACL bootstrap
- Allow anonymous DNS access and default Consul as DNS for
.consuldomains
- Nomad setup
- Setup cluster secrets
- Template configs
- Add configs to cluster
- Systemctl scripts and startup
- Nomad & consul bootstrap expects based on inventory
- Vault setup
- setup cluster secrets
- template configs
- Systemctl script & startup
- Auto-unlock with script/ansible/terraform
- Integrate with Nomad
- Observability
- Server health
- CPU monitor
- RAM usage monitor
- HD usage monitor
- Nomad metrics
- Consul metrics
- Log aggregation of jobs
- Metrics produced by jobs
- Job tracing
- Host monitoring (disk, cpu, memory)
- Server health
- Networking
- Understand service mesh/ingress etc from consul
- Ingress to outside world with http/https
- Use consul as DNS
- Pull private docker images
- Observability ingress
- Auto-accept server signatures on first time connect
- Overall setup
- Terraform var generation
- Generate Ansible inventory from Terraform output
- Grafana/Dashboards
- Dashboards
- Consul health
- Nomad health
- Vault health
- Host health
- SLO templates
- Web/api service
- Headless backend service
- Alerts
- Consul health
- Nomad health
- Vault health
- Host health (CPU, memory, disk)
- Dashboards
