refactor exoscale driver to use egoscale v3 by sauterp · Pull Request #353 · rancher/machine

sauterp · 2025-10-03T14:00:45Z

The current exoscale driver is based on v1 of the egoscale package. We update the driver to use v3.

Testing

❯ go run cmd/rancher-machine/machine.go create --driver exoscale test-rancher-
machine1
Creating CA: /var/home/sauterp/.docker/machine/certs/ca.pem
Creating client certificate: /var/home/sauterp/.docker/machine/certs/cert.pem
Running pre-create checks...
Creating machine...
(test-rancher-machine1) Querying exoscale for the requested parameters...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Image Linux Ubuntu 24.04 LTS 64-bit(10) = 07c7bed3-343e-4483-a8df-b6c08dccd0cc ()
(test-rancher-machine1) Profile Small = {0xc000c06ce0 2 standard 0 21624abb-764e-4def-81d7-9fc54b5957fb 2147483648 small [ch-dk-2 de-fra-1 hr-zag-1 at-vie-2 de-muc-1 ch-gva-2 at-vie-1 bg-sof-1]}
(test-rancher-machine1) Security group docker-machine does not exist. Creating it...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Security group docker-machine = 201a8d78-ef72-48a3-b413-eb593cd27f36
(test-rancher-machine1) Generate an SSH keypair...
(test-rancher-machine1) Spawn exoscale host...
(test-rancher-machine1) Using the following cloud-init file:
(test-rancher-machine1) #cloud-config
(test-rancher-machine1) manage_etc_hosts: localhost
(test-rancher-machine1)
(test-rancher-machine1) Deploying test-rancher-machine1...
(test-rancher-machine1) IP Address: 91.92.140.69, SSH User: ubuntu
(test-rancher-machine1) Getting to WaitForSSH function...
(test-rancher-machine1) Using SSH client type: external
(test-rancher-machine1) Using SSH hostname: 91.92.140.69, port: 22
(test-rancher-machine1) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine1) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa (-rw-------)
(test-rancher-machine1) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine1) About to run SSH command: [exit 0]
(test-rancher-machine1) SSH cmd output: []
(test-rancher-machine1) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine1) Getting to WaitForSSH function...
(test-rancher-machine1) Using SSH client type: external
(test-rancher-machine1) Using SSH hostname: 91.92.140.69, port: 22
(test-rancher-machine1) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine1) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa (-rw-------)
(test-rancher-machine1) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine1) About to run SSH command: [exit 0]
(test-rancher-machine1) SSH cmd output: []
(test-rancher-machine1) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine1) Getting to WaitForSSH function...
(test-rancher-machine1) Using SSH client type: external
(test-rancher-machine1) Using SSH hostname: 91.92.140.69, port: 22
(test-rancher-machine1) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine1) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa (-rw-------)
(test-rancher-machine1) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine1) About to run SSH command: [exit 0]
(test-rancher-machine1) SSH cmd output: []
Waiting for machine to be running, this may take a few minutes...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker from: https://get.docker.com
Copying certs to the local machine directory...
Copying certs to the remote machine...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Docker is up and running!
to see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: /var/home/sauterp/.cache/go-build/a5/a5fe62551378e52daa0b64ec194b4d9200366e73f43330a1afc38248f244d05a-d/machine env test-rancher-machine1
⏎                                                                           
❯ go run cmd/rancher-machine/machine.go create --driver exoscale test-rancher-machine2
Running pre-create checks...
Creating machine...
(test-rancher-machine2) Querying exoscale for the requested parameters...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Image Linux Ubuntu 24.04 LTS 64-bit(10) = 07c7bed3-343e-4483-a8df-b6c08dccd0cc ()
(test-rancher-machine2) Profile Small = {0xc0008932e0 2 standard 0 21624abb-764e-4def-81d7-9fc54b5957fb 2147483648 small [ch-dk-2 de-fra-1 hr-zag-1 at-vie-2 de-muc-1 ch-gva-2 at-vie-1 bg-sof-1]}
(test-rancher-machine2) Security group docker-machine = 201a8d78-ef72-48a3-b413-eb593cd27f36
(test-rancher-machine2) Generate an SSH keypair...
(test-rancher-machine2) Spawn exoscale host...
(test-rancher-machine2) Using the following cloud-init file:
(test-rancher-machine2) #cloud-config
(test-rancher-machine2) manage_etc_hosts: localhost
(test-rancher-machine2)
(test-rancher-machine2) Deploying test-rancher-machine2...
(test-rancher-machine2) IP Address: 91.92.140.191, SSH User: ubuntu
(test-rancher-machine2) Getting to WaitForSSH function...
(test-rancher-machine2) Using SSH client type: external
(test-rancher-machine2) Using SSH hostname: 91.92.140.191, port: 22
(test-rancher-machine2) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine2) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa (-rw-------)
(test-rancher-machine2) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine2) About to run SSH command: [exit 0]
(test-rancher-machine2) SSH cmd output: []
(test-rancher-machine2) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine2) Getting to WaitForSSH function...
(test-rancher-machine2) Using SSH client type: external
(test-rancher-machine2) Using SSH hostname: 91.92.140.191, port: 22
(test-rancher-machine2) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine2) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa (-rw-------)
(test-rancher-machine2) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine2) About to run SSH command: [exit 0]
(test-rancher-machine2) SSH cmd output: []
(test-rancher-machine2) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine2) Getting to WaitForSSH function...
(test-rancher-machine2) Using SSH client type: external
(test-rancher-machine2) Using SSH hostname: 91.92.140.191, port: 22
(test-rancher-machine2) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine2) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa (-rw-------)
(test-rancher-machine2) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine2) About to run SSH command: [exit 0]
(test-rancher-machine2) SSH cmd output: []
Waiting for machine to be running, this may take a few minutes...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker from: https://get.docker.com
Copying certs to the local machine directory...
Copying certs to the remote machine...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Docker is up and running!
to see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: /var/home/sauterp/.cache/go-build/a5/a5fe62551378e52daa0b64ec194b4d9200366e73f43330a1afc38248f244d05a-d/machine env test-rancher-machine2

❯ go run cmd/rancher-machine/machine.go restart test-rancher-machine2
Restarting "test-rancher-machine2"...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Waiting for SSH to be available...
Detecting the provisioner...
Restarted machines may have new IP addresses. You may need to re-run the `docker-machine env` command.

❯ go run cmd/rancher-machine/machine.go stop test-rancher-machine2
Stopping "test-rancher-machine2"...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Machine "test-rancher-machine2" was stopped.
(test-rancher-machine2) Closing plugin on server side

❯ go run cmd/rancher-machine/machine.go kill test-rancher-machine1
Killing "test-rancher-machine1"...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Machine "test-rancher-machine1" was killed.

❯ go run cmd/rancher-machine/machine.go rm test-rancher-machine1
About to remove test-rancher-machine1
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) The Anti-Affinity group and Security group were not removed
Successfully removed test-rancher-machine1

❯ go run cmd/rancher-machine/machine.go ls
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
NAME                    ACTIVE   DRIVER     STATE     URL   SWARM   DOCKER    ERRORS
test-rancher-machine2   -        exoscale   Stopped                 Unknown 

❯ go run cmd/rancher-machine/machine.go rm test-rancher-machine2
About to remove test-rancher-machine2
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) The Anti-Affinity group and Security group were not removed
Successfully removed test-rancher-machine2
(test-rancher-machine2) Closing plugin on server side

pierre-emmanuelJ · 2025-10-07T07:51:10Z

drivers/exoscale/exoscale.go

+		d.Password = res.Password
 	}

 	// Destroy the SSH key from CloudStack


Suggested change

// Destroy the SSH key from CloudStack

// Destroy the SSH key

pierre-emmanuelJ · 2025-10-07T07:51:41Z

drivers/exoscale/exoscale.go

 		}
 	}

 	// Destroy the virtual machine


Suggested change

// Destroy the virtual machine

// Destroy the Instance

pierre-emmanuelJ

LGTM 👍 nice work!

pierre-emmanuelJ · 2025-10-07T07:54:18Z

drivers/exoscale/exoscale.go

-		Name:        group,
-		Description: "created by docker-machine",
-	})
+func (d *Driver) createDefaultSecurityGroup(ctx context.Context, sgName string) (v3.UUID, error) {


This SG is never needed to be cleaned-up, at some point?

580c27d not urgent for now since the driver lived until now without.

snasovich · 2025-10-24T15:36:07Z

@sauterp , thank you for submitting this PR. I'll have @rancher/rancher-team-2-hostbusters-dev review the change but meanwhile could you please resolve the merge conflicts when you have a chance?

snasovich

Added couple comments based on quick review but deferring to the team for deeper review - though we will mostly review it from "general sanity" and "core Rancher/machine concepts" standpoint rather than specifics of interacting with Exoscale APIs.

Please note we (SUSE) don't officially support Exoscale driver and won't do testing on it. I see there was testing done with machine directly but ideally it would be good to have some testing done in Rancher with these changes - refer to #151 (comment) for tips on this.

drivers/exoscale/exoscale.go

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

Co-authored-by: Jack Luo <[email protected]>

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

pierre-emmanuelJ · 2025-10-30T16:03:44Z

Thanks for your reviews @snasovich and @jiaqiluo
I fixed all your review, I invite you to review it again :)

For the URL discussion to me, we should be good. To be still discuss in the GitHub comment if needed.

I started performing some test using this way: #151 (comment)

My Dockerfile for Rancher with my machine binary:

FROM rancher/rancher:latest
COPY rancher-machine /usr/bin/rancher-machine
RUN chmod +x /usr/bin/rancher-machine

In the logs, I can see my binary used, in the UI I can see my changes, if I update flag usage..etc, but when I try to use the driver I'm having an issue.

I create a cloud credential with apiKey and apiSecretKey and then create a machine with some value,
and I got this error log in debug mode of rancher:

 Trying to access option  which does not exist\n      THIS ***WILL*** CAUSE UNEXPECTED BEHAVIOR\n    * Type assertion did not go smoothly to string for key \n    * error setting machine configuration from flags provided: missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)\n

What I understand here, my keys are not set... sounds cloud credential is not up-to-date and have the wrong key name..etc

weird also on cloud credential I can not add another field for the apiKey, only secret

and at pool creation it detects only apiKey and not apiSecretKey

If someone of you are available, I would appreciate it if someone can join me for a small debug session via google meet
you can reach me out at pej@exoscale . ch

jiaqiluo · 2025-10-30T20:49:42Z

Hi @pierre-emmanuelJ, after rebuilding the node driver with this fix of issue 2 and updating Rancher with the new version of rancher-machine, apply the workaround for issue 1 to add the publicCredentialFields annotation, you should see both fields (apiKey and apiSecretKey) when creating an Exoscale credential, and get the provisioning work.

Hi @sauterp, we need to fix issue 2 in this PR. See below for details.

There are two issues:

1. Missing `apiKey` Field

On the Create Exoscale Credential page, only the apiSecretKey field is available — the apiKey field is missing.

Fix:
We need a change in the Rancher backend to include the apiKey field in the driver data config for ExoscaleDriver.

Here is the GH issue for tracking it in the rancher/rancher repo

[BUG] Missing apiKey Field when Creating Exoscale Credential rancher#52562

Workaround:
For existing Rancher setups (especially for testing), manually edit the Exoscale node driver to add the missing annotation:

"publicCredentialFields": "apiKey"

To do this:

View the Exoscale node driver in the API.

Click Edit.

Add the missing annotation

Scroll to the end of the form, click Show Request, then Send Request to apply the update.
Deactivate and then activate the Exoscale node driver again.

After reactivating, Rancher logs should show the following info-level messages:

2025/10/30 18:54:18 [INFO] uploading exoscaleConfig to nodeconfig schema
2025/10/30 18:54:18 [INFO] uploading exoscaleConfig to nodetemplateconfig schema
2025/10/30 18:54:18 [INFO] uploading exoscalecredentialConfig to credentialconfig schema

Now, return to the Rancher UI → Cloud Credentials → Exoscale.
You should see both fields — apiKey and apiSecretKey — in the UI.
You may need to refresh the page for the changes to appear.

2. Incorrect Environment Variable Name

When generating the rancher-machine create command, the key apiSecretKey is converted to the environment variable EXOSCALE_API_SECRET_KEY.
However, the Exoscale node driver expects EXOSCALE_API_SECRET.
See code

		mcnflag.StringFlag{
			EnvVar: "EXOSCALE_API_SECRET",
			Name:   "exoscale-api-secret-key",
			Usage:  "exoscale API secret key",
		},

This mismatch results in the following error:

missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)

Fix:
Update the Exoscale node driver to use the correct environment variable name — EXOSCALE_API_SECRET_KEY instead of EXOSCALE_API_SECRET.

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

pierre-emmanuelJ · 2025-10-31T15:13:55Z

Thanks a lot @jiaqiluo for all this well detailed infos!

I fixed the driver: a45a2ac

I well succeeded having the Cloud Credential working as you demonstrated, thanks.

Unfortunately, I still got the error: missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key) from the driver.

What I've done to try to debug and make sure my compiled version is well up-to-date and receive the right ENV variable.

1 - I customized the error from the driver with:

errors.New("TEST=missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key): " + strings.Join(os.Environ(), ", "))

The goal is to try to catch which ENV variable are sent to my rancher-machine.

2 - for visual confirmation also, usually I modify the description of a field like:

-Usage:  "exoscale instance profile (Small, Medium, Large, ...)",
+Usage:  "TESTTTTTTTT1111 exoscale instance profile (Small, Medium, Large, ...)",

I can see well appear the TESTTTTTTTT1111 in the rancher driver UI 🎉 using this latest built binary,
but I still got this error missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)
without my new customized error with the environment dump os.Environ().... but I see the TESTTTTTTTT1111 in the UI.

I don't know what I'm doing wrong, I would like to ensure which env variable is passed to the program to debug.

jiaqiluo · 2025-10-31T16:22:20Z

Hi @pierre-emmanuelJ

Since we’re still encountering the error “missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)”, it’s important to determine which environment variables are set when Rancher runs rancher-machine to provision a machine. The changes you made to the driver are helpful because they allow us to inspect all the environment variables being passed in.

We care about these environment variables because Rancher retrieves credential values from a secret and injects them as environment variables into the Pod that runs the rancher-machine command. See code

In this case, we expect to see EXOSCALE_API_KEY and EXOSCALE_API_SECRET_KEY to be set.

Each time you create or add a new node to the cluster, Rancher launches a Pod in the fleet-default namespace of the local cluster to execute the rancher-machine create command. You can go to that namespace to locate the Pod and check its logs. You can also review the Pod’s YAML manifest to see exactly how Rancher sets the various flags and environment variables.

Note: Rancher automatically deletes this Pod a few minutes after the execution completes (whether it succeeds or fails), so there’s only a short window of time to inspect it.

Can you give it a try and share the pod's YAML file as well as findings from the logs?

jiaqiluo · 2025-10-31T16:27:44Z

@pierre-emmanuelJ, the CI failed at the unit test TestUnmarshalJSON: https://github.com/rancher/machine/actions/runs/18968955722/job/54205162687?pr=353#step:4:12

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

pierre-emmanuelJ · 2025-11-03T17:11:56Z

I finally successfully run it with success! 🎉 😄
here is my config:

rancher-machine dockerfile

FROM golang:1.24 AS builder
ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
WORKDIR /go/src/github.com/rancher/machine
COPY . . 
RUN go build -o rancher-machine ./cmd/rancher-machine
FROM rancher/machine:v0.15.0-rancher135
COPY --from=builder /go/src/github.com/rancher/machine/rancher-machine /usr/local/bin/rancher-machine

rancher dockerfile

FROM rancher/rancher:head
COPY rancher-machine /usr/bin/rancher-machine
RUN chmod +x /usr/bin/rancher-machine
ENV CATTLE_MACHINE_PROVISION_IMAGE=<my_personal_public_registry>/rancher-machine:latest

From Exoscale point of view, I can see instance and security group to be created:

From Rancher, I can then destroy the cluster and resources are correctly cleaned up on the Exoscale side.

On Rancher side, I created a cluster of 3 Nodes, I have seen the 3 rancher machine pods creating my Exoscale resources (one has gone after, but the 3 were completed 👍 ):

After inspected the manifest, my machine image was set correctly and secret was right as expected. Thanks also to the fix

Last point, after more than 30 mins the cluster is partially ready, (1/3 nodes is ready)

Cluster-API logs are just still waiting nodes (no errors):

One node is successfully provisioned via RKE via SSH, but the two others are still in pending.

I will investigate tomorrow, and retry.

jiaqiluo · 2025-11-04T17:53:09Z

@pierre-emmanuelJ, thank you for the updates. I’m glad to hear the driver is now working!

Just a quick note: the fix for the missing apiKey field when creating an Exoscale credential has been merged into the main branch of rancher/rancher (commit). If you’re still using the workaround mentioned in my previous comment, you can now rebuild your Rancher image using the latest main tag to remove the need for it.

For debugging RKE2 clusters, here are some documentation links for finding logs and CLI tools. When troubleshooting, I usually SSH into the node and check the logs for the rke2-server.service using:

journalctl -u rke2-server -f
# or
systemctl status rke2-server.service

If rke2-server.service works fine, then I usually check the logs of each kube-x component for errors.

RKE2 references:

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

pierre-emmanuelJ · 2025-11-05T17:16:19Z

All is working like a charm now! 🎉

I've reworked the firewalling, that was causing issues for RKE2, since it was legacy and not adapted

See my last commit: 62c8797 I have setup the firewalling to be compatible with RKE2 and calico based, from the RKE2 and calico recommended rules from RKE2 doc.

Last things, in rancher node driver it's possible to provision node with rke2 or k3s, do I need to add those k3s firewall rules to make sure being compatible? :
https://ranchermanager.docs.rancher.com/getting-started/installation-and-upgrade/installation-requirements/port-requirements#ports-for-rancher-server-nodes-on-k3s

After that, I think we are ready to merge it :)

cc @jiaqiluo

jiaqiluo · 2025-11-05T17:56:33Z

Hi @pierre-emmanuelJ,

The link you shared refers to the port requirements for the local cluster where Rancher itself is installed, so you don’t need to add those K3s-specific firewall rules.

However, the Exoscale node driver can be used for both RKE2 and K3s downstream clusters, which means the default firewall rules should cover the requirements of both distributions. In addition, Rancher is adding support for IPv6 downstream clusters, so the firewall rules will also need to be updated if the Exoscale node driver is intended to support creating IPv6-only or dual-stack clusters.

To make updating the firewall rules easier, you can refer to the built-in ingress and egress rules used by the AWS EC2 node driver. Specifically, check the ingressPermissions and egressPermissions functions in the drivers/amazonec2/amazonec2.go file for details.

Could you please confirm whether you’ve covered these rules and made the necessary updates? Feel free to ping me if you have any questions.

jiaqiluo · 2025-11-05T18:04:48Z

Hi @pierre-emmanuelJ,

Since I have you here, could you please confirm whether the apiKey is considered sensitive for Exoscale?

Currently, we treat the apiKey as a PublicCredentialField, which means the Rancher UI will display the key value when listing credentials. commit

If the apiKey should actually be treated as sensitive, we’ll update it to a PrivateCredentialField so that the UI does not display it.

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

pierre-emmanuelJ · 2025-11-06T09:03:37Z

Thanks @jiaqiluo

I adapted the implementation based on aws ec2 one, supporting dualstack of course: 5c3e0e5

I confirm all is working as intended:

We are not blocking in egress by default in security-group, so I'll keep as before and touch nothing on this side.

For the apikey you can keep it PublicCredentialField. The one sensible is the secret one, that is already in PrivateCredentialField.

Thanks for your helps and reviews. I let you review the last change, to me, it's ready to be merged.

jiaqiluo

Great work! 👍 Everything looks good to me.
Thanks for your contribution!
I’ll go ahead and get the PR merged once my team signs off again.

snasovich

Approving since my concerns from initial review got addressed and I trust @jiaqiluo's review for the rest. 🚀

sauterp force-pushed the sauterp-xzlwzokozmkx branch 2 times, most recently from 532c5ec to 0e4bd91 Compare October 6, 2025 17:11

sauterp marked this pull request as ready for review October 6, 2025 17:15

pierre-emmanuelJ reviewed Oct 7, 2025

View reviewed changes

pierre-emmanuelJ approved these changes Oct 7, 2025

View reviewed changes

pierre-emmanuelJ reviewed Oct 7, 2025

View reviewed changes

snasovich requested a review from a team October 24, 2025 15:34

snasovich reviewed Oct 24, 2025

View reviewed changes

drivers/exoscale/exoscale.go Show resolved Hide resolved

drivers/exoscale/exoscale.go Show resolved Hide resolved

snasovich requested a review from a team October 24, 2025 15:59

jiaqiluo requested changes Oct 24, 2025

View reviewed changes

sauterp and others added 7 commits October 30, 2025 14:46

refactor exoscale driver to use egoscale v3

576d442

Add url in client to be able to handle prod and preprod

5673aa9

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

Update drivers/exoscale/exoscale.go

fd17357

Co-authored-by: Jack Luo <[email protected]>

Update drivers/exoscale/exoscale.go

747ab63

Co-authored-by: Jack Luo <[email protected]>

some refacto

6ef9ea3

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

fix error handling

c0dd225

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

some refacto

858645a

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

pierre-emmanuelJ force-pushed the sauterp-xzlwzokozmkx branch from 021910a to 858645a Compare October 30, 2025 14:47

Add todo

580c27d

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

Fix exoscale api secret key env variable

a45a2ac

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

fix tests

c6ef28b

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

Rework firewalling

62c8797

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

Update security group rules ports like aws ec2

5c3e0e5

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>

jiaqiluo approved these changes Nov 6, 2025

View reviewed changes

jiaqiluo requested a review from snasovich November 6, 2025 19:06

snasovich approved these changes Nov 6, 2025

View reviewed changes

jiaqiluo merged commit 89f3079 into rancher:master Nov 6, 2025
1 check passed

	// Destroy the SSH key from CloudStack
	// Destroy the SSH key

Conversation

sauterp commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

pierre-emmanuelJ Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

pierre-emmanuelJ Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

pierre-emmanuelJ left a comment

Choose a reason for hiding this comment

Uh oh!

pierre-emmanuelJ Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

pierre-emmanuelJ Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

snasovich commented Oct 24, 2025

Uh oh!

snasovich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pierre-emmanuelJ commented Oct 30, 2025

Uh oh!

jiaqiluo commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Missing apiKey Field

2. Incorrect Environment Variable Name

Uh oh!

pierre-emmanuelJ commented Oct 31, 2025

Uh oh!

jiaqiluo commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiaqiluo commented Oct 31, 2025

Uh oh!

pierre-emmanuelJ commented Nov 3, 2025

Uh oh!

jiaqiluo commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pierre-emmanuelJ commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiaqiluo commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiaqiluo commented Nov 5, 2025

Uh oh!

pierre-emmanuelJ commented Nov 6, 2025

Uh oh!

jiaqiluo left a comment

Choose a reason for hiding this comment

Uh oh!

snasovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sauterp commented Oct 3, 2025 •

edited

Loading

snasovich left a comment •

edited

Loading

jiaqiluo commented Oct 30, 2025 •

edited

Loading

1. Missing `apiKey` Field

jiaqiluo commented Oct 31, 2025 •

edited

Loading

jiaqiluo commented Nov 4, 2025 •

edited

Loading

pierre-emmanuelJ commented Nov 5, 2025 •

edited

Loading

jiaqiluo commented Nov 5, 2025 •

edited

Loading