Skip to content

refactor exoscale driver to use egoscale v3#353

Merged
jiaqiluo merged 12 commits intorancher:masterfrom
exoscale:sauterp-xzlwzokozmkx
Nov 6, 2025
Merged

refactor exoscale driver to use egoscale v3#353
jiaqiluo merged 12 commits intorancher:masterfrom
exoscale:sauterp-xzlwzokozmkx

Conversation

@sauterp
Copy link

@sauterp sauterp commented Oct 3, 2025

The current exoscale driver is based on v1 of the egoscale package. We update the driver to use v3.

Testing

❯ go run cmd/rancher-machine/machine.go create --driver exoscale test-rancher-
machine1
Creating CA: /var/home/sauterp/.docker/machine/certs/ca.pem
Creating client certificate: /var/home/sauterp/.docker/machine/certs/cert.pem
Running pre-create checks...
Creating machine...
(test-rancher-machine1) Querying exoscale for the requested parameters...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Image Linux Ubuntu 24.04 LTS 64-bit(10) = 07c7bed3-343e-4483-a8df-b6c08dccd0cc ()
(test-rancher-machine1) Profile Small = {0xc000c06ce0 2 standard 0 21624abb-764e-4def-81d7-9fc54b5957fb 2147483648 small [ch-dk-2 de-fra-1 hr-zag-1 at-vie-2 de-muc-1 ch-gva-2 at-vie-1 bg-sof-1]}
(test-rancher-machine1) Security group docker-machine does not exist. Creating it...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Security group docker-machine = 201a8d78-ef72-48a3-b413-eb593cd27f36
(test-rancher-machine1) Generate an SSH keypair...
(test-rancher-machine1) Spawn exoscale host...
(test-rancher-machine1) Using the following cloud-init file:
(test-rancher-machine1) #cloud-config
(test-rancher-machine1) manage_etc_hosts: localhost
(test-rancher-machine1)
(test-rancher-machine1) Deploying test-rancher-machine1...
(test-rancher-machine1) IP Address: 91.92.140.69, SSH User: ubuntu
(test-rancher-machine1) Getting to WaitForSSH function...
(test-rancher-machine1) Using SSH client type: external
(test-rancher-machine1) Using SSH hostname: 91.92.140.69, port: 22
(test-rancher-machine1) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine1) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa (-rw-------)
(test-rancher-machine1) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine1) About to run SSH command: [exit 0]
(test-rancher-machine1) SSH cmd output: []
(test-rancher-machine1) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine1) Getting to WaitForSSH function...
(test-rancher-machine1) Using SSH client type: external
(test-rancher-machine1) Using SSH hostname: 91.92.140.69, port: 22
(test-rancher-machine1) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine1) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa (-rw-------)
(test-rancher-machine1) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine1) About to run SSH command: [exit 0]
(test-rancher-machine1) SSH cmd output: []
(test-rancher-machine1) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine1) Getting to WaitForSSH function...
(test-rancher-machine1) Using SSH client type: external
(test-rancher-machine1) Using SSH hostname: 91.92.140.69, port: 22
(test-rancher-machine1) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine1) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa (-rw-------)
(test-rancher-machine1) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine1/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine1) About to run SSH command: [exit 0]
(test-rancher-machine1) SSH cmd output: []
Waiting for machine to be running, this may take a few minutes...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker from: https://get.docker.com
Copying certs to the local machine directory...
Copying certs to the remote machine...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Docker is up and running!
to see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: /var/home/sauterp/.cache/go-build/a5/a5fe62551378e52daa0b64ec194b4d9200366e73f43330a1afc38248f244d05a-d/machine env test-rancher-machine1
⏎                                                                           
❯ go run cmd/rancher-machine/machine.go create --driver exoscale test-rancher-machine2
Running pre-create checks...
Creating machine...
(test-rancher-machine2) Querying exoscale for the requested parameters...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Image Linux Ubuntu 24.04 LTS 64-bit(10) = 07c7bed3-343e-4483-a8df-b6c08dccd0cc ()
(test-rancher-machine2) Profile Small = {0xc0008932e0 2 standard 0 21624abb-764e-4def-81d7-9fc54b5957fb 2147483648 small [ch-dk-2 de-fra-1 hr-zag-1 at-vie-2 de-muc-1 ch-gva-2 at-vie-1 bg-sof-1]}
(test-rancher-machine2) Security group docker-machine = 201a8d78-ef72-48a3-b413-eb593cd27f36
(test-rancher-machine2) Generate an SSH keypair...
(test-rancher-machine2) Spawn exoscale host...
(test-rancher-machine2) Using the following cloud-init file:
(test-rancher-machine2) #cloud-config
(test-rancher-machine2) manage_etc_hosts: localhost
(test-rancher-machine2)
(test-rancher-machine2) Deploying test-rancher-machine2...
(test-rancher-machine2) IP Address: 91.92.140.191, SSH User: ubuntu
(test-rancher-machine2) Getting to WaitForSSH function...
(test-rancher-machine2) Using SSH client type: external
(test-rancher-machine2) Using SSH hostname: 91.92.140.191, port: 22
(test-rancher-machine2) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine2) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa (-rw-------)
(test-rancher-machine2) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine2) About to run SSH command: [exit 0]
(test-rancher-machine2) SSH cmd output: []
(test-rancher-machine2) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine2) Getting to WaitForSSH function...
(test-rancher-machine2) Using SSH client type: external
(test-rancher-machine2) Using SSH hostname: 91.92.140.191, port: 22
(test-rancher-machine2) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine2) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa (-rw-------)
(test-rancher-machine2) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine2) About to run SSH command: [exit 0]
(test-rancher-machine2) SSH cmd output: []
(test-rancher-machine2) Error getting SSH command 'exit 0' : failed to run SSH command [exit 0]: exit status 255
(test-rancher-machine2) Getting to WaitForSSH function...
(test-rancher-machine2) Using SSH client type: external
(test-rancher-machine2) Using SSH hostname: 91.92.140.191, port: 22
(test-rancher-machine2) proxy_url: ; ncBinaryPath: /usr/sbin/nc
(test-rancher-machine2) Using SSH private key: /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa (-rw-------)
(test-rancher-machine2) &{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -o IdentitiesOnly=yes -i /var/home/sauterp/.docker/machine/machines/test-rancher-machine2/id_rsa -p 22] /usr/sbin/ssh <nil>}
(test-rancher-machine2) About to run SSH command: [exit 0]
(test-rancher-machine2) SSH cmd output: []
Waiting for machine to be running, this may take a few minutes...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker from: https://get.docker.com
Copying certs to the local machine directory...
Copying certs to the remote machine...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Docker is up and running!
to see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: /var/home/sauterp/.cache/go-build/a5/a5fe62551378e52daa0b64ec194b4d9200366e73f43330a1afc38248f244d05a-d/machine env test-rancher-machine2

❯ go run cmd/rancher-machine/machine.go restart test-rancher-machine2
Restarting "test-rancher-machine2"...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Waiting for SSH to be available...
Detecting the provisioner...
Restarted machines may have new IP addresses. You may need to re-run the `docker-machine env` command.

❯ go run cmd/rancher-machine/machine.go stop test-rancher-machine2
Stopping "test-rancher-machine2"...
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Machine "test-rancher-machine2" was stopped.
(test-rancher-machine2) Closing plugin on server side

❯ go run cmd/rancher-machine/machine.go kill test-rancher-machine1
Killing "test-rancher-machine1"...
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
Machine "test-rancher-machine1" was killed.

❯ go run cmd/rancher-machine/machine.go rm test-rancher-machine1
About to remove test-rancher-machine1
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
(test-rancher-machine1) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine1) The Anti-Affinity group and Security group were not removed
Successfully removed test-rancher-machine1

❯ go run cmd/rancher-machine/machine.go ls
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
NAME                    ACTIVE   DRIVER     STATE     URL   SWARM   DOCKER    ERRORS
test-rancher-machine2   -        exoscale   Stopped                 Unknown 

❯ go run cmd/rancher-machine/machine.go rm test-rancher-machine2
About to remove test-rancher-machine2
WARNING: This action will delete both local reference and remote instance.
Are you sure? (y/n): y
(test-rancher-machine2) Availability zone ch-dk-2 = {https://api-ch-dk-2.exoscale.com/v2 ch-dk-2 https://sos-ch-dk-2.exo.io}
(test-rancher-machine2) The Anti-Affinity group and Security group were not removed
Successfully removed test-rancher-machine2
(test-rancher-machine2) Closing plugin on server side
image

@sauterp sauterp force-pushed the sauterp-xzlwzokozmkx branch 2 times, most recently from 532c5ec to 0e4bd91 Compare October 6, 2025 17:11
@sauterp sauterp marked this pull request as ready for review October 6, 2025 17:15
d.Password = res.Password
}

// Destroy the SSH key from CloudStack

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Destroy the SSH key from CloudStack
// Destroy the SSH key

}
}

// Destroy the virtual machine

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Destroy the virtual machine
// Destroy the Instance

Copy link

@pierre-emmanuelJ pierre-emmanuelJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 nice work!

Name: group,
Description: "created by docker-machine",
})
func (d *Driver) createDefaultSecurityGroup(ctx context.Context, sgName string) (v3.UUID, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SG is never needed to be cleaned-up, at some point?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

580c27d not urgent for now since the driver lived until now without.

@snasovich snasovich requested a review from a team October 24, 2025 15:34
@snasovich
Copy link
Collaborator

@sauterp , thank you for submitting this PR. I'll have @rancher/rancher-team-2-hostbusters-dev review the change but meanwhile could you please resolve the merge conflicts when you have a chance?

Copy link
Collaborator

@snasovich snasovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added couple comments based on quick review but deferring to the team for deeper review - though we will mostly review it from "general sanity" and "core Rancher/machine concepts" standpoint rather than specifics of interacting with Exoscale APIs.

Please note we (SUSE) don't officially support Exoscale driver and won't do testing on it. I see there was testing done with machine directly but ideally it would be good to have some testing done in Rancher with these changes - refer to #151 (comment) for tips on this.

@snasovich snasovich requested a review from a team October 24, 2025 15:59
sauterp and others added 7 commits October 30, 2025 14:46
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
@pierre-emmanuelJ
Copy link

Thanks for your reviews @snasovich and @jiaqiluo
I fixed all your review, I invite you to review it again :)

For the URL discussion to me, we should be good. To be still discuss in the GitHub comment if needed.

I started performing some test using this way: #151 (comment)

My Dockerfile for Rancher with my machine binary:

FROM rancher/rancher:latest
COPY rancher-machine /usr/bin/rancher-machine
RUN chmod +x /usr/bin/rancher-machine

In the logs, I can see my binary used, in the UI I can see my changes, if I update flag usage..etc, but when I try to use the driver I'm having an issue.

I create a cloud credential with apiKey and apiSecretKey and then create a machine with some value,
and I got this error log in debug mode of rancher:

 Trying to access option  which does not exist\n      THIS ***WILL*** CAUSE UNEXPECTED BEHAVIOR\n    * Type assertion did not go smoothly to string for key \n    * error setting machine configuration from flags provided: missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)\n

What I understand here, my keys are not set... sounds cloud credential is not up-to-date and have the wrong key name..etc

weird also on cloud credential I can not add another field for the apiKey, only secret

image

and at pool creation it detects only apiKey and not apiSecretKey

image

If someone of you are available, I would appreciate it if someone can join me for a small debug session via google meet
you can reach me out at pej@exoscale . ch

@jiaqiluo
Copy link
Member

jiaqiluo commented Oct 30, 2025

Hi @pierre-emmanuelJ, after rebuilding the node driver with this fix of issue 2 and updating Rancher with the new version of rancher-machine, apply the workaround for issue 1 to add the publicCredentialFields annotation, you should see both fields (apiKey and apiSecretKey) when creating an Exoscale credential, and get the provisioning work.

Hi @sauterp, we need to fix issue 2 in this PR. See below for details.

There are two issues:

1. Missing apiKey Field

On the Create Exoscale Credential page, only the apiSecretKey field is available — the apiKey field is missing.

Fix:
We need a change in the Rancher backend to include the apiKey field in the driver data config for ExoscaleDriver.

Here is the GH issue for tracking it in the rancher/rancher repo

Workaround:
For existing Rancher setups (especially for testing), manually edit the Exoscale node driver to add the missing annotation:

"publicCredentialFields": "apiKey"

To do this:

  1. View the Exoscale node driver in the API.
Screenshot 2025-10-30 at 1 34 23 PM
  1. Click Edit.
Screenshot 2025-10-30 at 1 36 05 PM
  1. Add the missing annotation
Screenshot 2025-10-30 at 1 36 26 PM
  1. Scroll to the end of the form, click Show Request, then Send Request to apply the update.
  2. Deactivate and then activate the Exoscale node driver again.

After reactivating, Rancher logs should show the following info-level messages:

2025/10/30 18:54:18 [INFO] uploading exoscaleConfig to nodeconfig schema
2025/10/30 18:54:18 [INFO] uploading exoscaleConfig to nodetemplateconfig schema
2025/10/30 18:54:18 [INFO] uploading exoscalecredentialConfig to credentialconfig schema

Now, return to the Rancher UI → Cloud CredentialsExoscale.
You should see both fields — apiKey and apiSecretKey — in the UI.
You may need to refresh the page for the changes to appear.

Screenshot 2025-10-30 at 1 40 03 PM

2. Incorrect Environment Variable Name

When generating the rancher-machine create command, the key apiSecretKey is converted to the environment variable EXOSCALE_API_SECRET_KEY.
However, the Exoscale node driver expects EXOSCALE_API_SECRET.
See code

		mcnflag.StringFlag{
			EnvVar: "EXOSCALE_API_SECRET",
			Name:   "exoscale-api-secret-key",
			Usage:  "exoscale API secret key",
		},

This mismatch results in the following error:

missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)

Fix:
Update the Exoscale node driver to use the correct environment variable name — EXOSCALE_API_SECRET_KEY instead of EXOSCALE_API_SECRET.

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
@pierre-emmanuelJ
Copy link

Thanks a lot @jiaqiluo for all this well detailed infos!

I fixed the driver: a45a2ac

I well succeeded having the Cloud Credential working as you demonstrated, thanks.

Unfortunately, I still got the error: missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key) from the driver.

What I've done to try to debug and make sure my compiled version is well up-to-date and receive the right ENV variable.

1 - I customized the error from the driver with:

errors.New("TEST=missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key): " + strings.Join(os.Environ(), ", "))

The goal is to try to catch which ENV variable are sent to my rancher-machine.

2 - for visual confirmation also, usually I modify the description of a field like:

-Usage:  "exoscale instance profile (Small, Medium, Large, ...)",
+Usage:  "TESTTTTTTTT1111 exoscale instance profile (Small, Medium, Large, ...)",

I can see well appear the TESTTTTTTTT1111 in the rancher driver UI 🎉 using this latest built binary,
but I still got this error missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)
without my new customized error with the environment dump os.Environ().... but I see the TESTTTTTTTT1111 in the UI.

I don't know what I'm doing wrong, I would like to ensure which env variable is passed to the program to debug.

@jiaqiluo
Copy link
Member

jiaqiluo commented Oct 31, 2025

Hi @pierre-emmanuelJ

Since we’re still encountering the error “missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)”, it’s important to determine which environment variables are set when Rancher runs rancher-machine to provision a machine. The changes you made to the driver are helpful because they allow us to inspect all the environment variables being passed in.

We care about these environment variables because Rancher retrieves credential values from a secret and injects them as environment variables into the Pod that runs the rancher-machine command. See code

In this case, we expect to see EXOSCALE_API_KEY and EXOSCALE_API_SECRET_KEY to be set.

Each time you create or add a new node to the cluster, Rancher launches a Pod in the fleet-default namespace of the local cluster to execute the rancher-machine create command. You can go to that namespace to locate the Pod and check its logs. You can also review the Pod’s YAML manifest to see exactly how Rancher sets the various flags and environment variables.

Note: Rancher automatically deletes this Pod a few minutes after the execution completes (whether it succeeds or fails), so there’s only a short window of time to inspect it.

Can you give it a try and share the pod's YAML file as well as findings from the logs?

Screenshot 2025-10-31 at 9 16 00 AM

@jiaqiluo
Copy link
Member

@pierre-emmanuelJ, the CI failed at the unit test TestUnmarshalJSON: https://github.com/rancher/machine/actions/runs/18968955722/job/54205162687?pr=353#step:4:12

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
@pierre-emmanuelJ
Copy link

I finally successfully run it with success! 🎉 😄
here is my config:

rancher-machine dockerfile

FROM golang:1.24 AS builder
ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
WORKDIR /go/src/github.com/rancher/machine
COPY . . 
RUN go build -o rancher-machine ./cmd/rancher-machine
FROM rancher/machine:v0.15.0-rancher135
COPY --from=builder /go/src/github.com/rancher/machine/rancher-machine /usr/local/bin/rancher-machine

rancher dockerfile

FROM rancher/rancher:head
COPY rancher-machine /usr/bin/rancher-machine
RUN chmod +x /usr/bin/rancher-machine
ENV CATTLE_MACHINE_PROVISION_IMAGE=<my_personal_public_registry>/rancher-machine:latest

From Exoscale point of view, I can see instance and security group to be created:
image
image

From Rancher, I can then destroy the cluster and resources are correctly cleaned up on the Exoscale side.

On Rancher side, I created a cluster of 3 Nodes, I have seen the 3 rancher machine pods creating my Exoscale resources (one has gone after, but the 3 were completed 👍 ):

image

After inspected the manifest, my machine image was set correctly and secret was right as expected. Thanks also to the fix

Last point, after more than 30 mins the cluster is partially ready, (1/3 nodes is ready)

image

Cluster-API logs are just still waiting nodes (no errors):
image

One node is successfully provisioned via RKE via SSH, but the two others are still in pending.

I will investigate tomorrow, and retry.

@jiaqiluo
Copy link
Member

jiaqiluo commented Nov 4, 2025

@pierre-emmanuelJ, thank you for the updates. I’m glad to hear the driver is now working!

Just a quick note: the fix for the missing apiKey field when creating an Exoscale credential has been merged into the main branch of rancher/rancher (commit). If you’re still using the workaround mentioned in my previous comment, you can now rebuild your Rancher image using the latest main tag to remove the need for it.

For debugging RKE2 clusters, here are some documentation links for finding logs and CLI tools. When troubleshooting, I usually SSH into the node and check the logs for the rke2-server.service using:

journalctl -u rke2-server -f
# or
systemctl status rke2-server.service

If rke2-server.service works fine, then I usually check the logs of each kube-x component for errors.

RKE2 references:

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
@pierre-emmanuelJ
Copy link

pierre-emmanuelJ commented Nov 5, 2025

All is working like a charm now! 🎉

I've reworked the firewalling, that was causing issues for RKE2, since it was legacy and not adapted

image

See my last commit: 62c8797 I have setup the firewalling to be compatible with RKE2 and calico based, from the RKE2 and calico recommended rules from RKE2 doc.

image

Last things, in rancher node driver it's possible to provision node with rke2 or k3s, do I need to add those k3s firewall rules to make sure being compatible? :
https://ranchermanager.docs.rancher.com/getting-started/installation-and-upgrade/installation-requirements/port-requirements#ports-for-rancher-server-nodes-on-k3s

After that, I think we are ready to merge it :)

cc @jiaqiluo

@jiaqiluo
Copy link
Member

jiaqiluo commented Nov 5, 2025

Hi @pierre-emmanuelJ,

The link you shared refers to the port requirements for the local cluster where Rancher itself is installed, so you don’t need to add those K3s-specific firewall rules.

However, the Exoscale node driver can be used for both RKE2 and K3s downstream clusters, which means the default firewall rules should cover the requirements of both distributions. In addition, Rancher is adding support for IPv6 downstream clusters, so the firewall rules will also need to be updated if the Exoscale node driver is intended to support creating IPv6-only or dual-stack clusters.

To make updating the firewall rules easier, you can refer to the built-in ingress and egress rules used by the AWS EC2 node driver. Specifically, check the ingressPermissions and egressPermissions functions in the drivers/amazonec2/amazonec2.go file for details.

Could you please confirm whether you’ve covered these rules and made the necessary updates? Feel free to ping me if you have any questions.

@jiaqiluo
Copy link
Member

jiaqiluo commented Nov 5, 2025

Hi @pierre-emmanuelJ,

Since I have you here, could you please confirm whether the apiKey is considered sensitive for Exoscale?

Currently, we treat the apiKey as a PublicCredentialField, which means the Rancher UI will display the key value when listing credentials. commit

If the apiKey should actually be treated as sensitive, we’ll update it to a PrivateCredentialField so that the UI does not display it.

Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
@pierre-emmanuelJ
Copy link

Thanks @jiaqiluo

I adapted the implementation based on aws ec2 one, supporting dualstack of course: 5c3e0e5

I confirm all is working as intended:
image

We are not blocking in egress by default in security-group, so I'll keep as before and touch nothing on this side.

For the apikey you can keep it PublicCredentialField. The one sensible is the secret one, that is already in PrivateCredentialField.

Thanks for your helps and reviews. I let you review the last change, to me, it's ready to be merged.

Copy link
Member

@jiaqiluo jiaqiluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 👍 Everything looks good to me.
Thanks for your contribution!
I’ll go ahead and get the PR merged once my team signs off again.

@jiaqiluo jiaqiluo requested a review from snasovich November 6, 2025 19:06
Copy link
Collaborator

@snasovich snasovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since my concerns from initial review got addressed and I trust @jiaqiluo's review for the rest. 🚀

@jiaqiluo jiaqiluo merged commit 89f3079 into rancher:master Nov 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants