refactor exoscale driver to use egoscale v3#353
Conversation
532c5ec to
0e4bd91
Compare
drivers/exoscale/exoscale.go
Outdated
| d.Password = res.Password | ||
| } | ||
|
|
||
| // Destroy the SSH key from CloudStack |
There was a problem hiding this comment.
| // Destroy the SSH key from CloudStack | |
| // Destroy the SSH key |
drivers/exoscale/exoscale.go
Outdated
| } | ||
| } | ||
|
|
||
| // Destroy the virtual machine |
There was a problem hiding this comment.
| // Destroy the virtual machine | |
| // Destroy the Instance |
| Name: group, | ||
| Description: "created by docker-machine", | ||
| }) | ||
| func (d *Driver) createDefaultSecurityGroup(ctx context.Context, sgName string) (v3.UUID, error) { |
There was a problem hiding this comment.
This SG is never needed to be cleaned-up, at some point?
There was a problem hiding this comment.
580c27d not urgent for now since the driver lived until now without.
|
@sauterp , thank you for submitting this PR. I'll have @rancher/rancher-team-2-hostbusters-dev review the change but meanwhile could you please resolve the merge conflicts when you have a chance? |
There was a problem hiding this comment.
Added couple comments based on quick review but deferring to the team for deeper review - though we will mostly review it from "general sanity" and "core Rancher/machine concepts" standpoint rather than specifics of interacting with Exoscale APIs.
Please note we (SUSE) don't officially support Exoscale driver and won't do testing on it. I see there was testing done with machine directly but ideally it would be good to have some testing done in Rancher with these changes - refer to #151 (comment) for tips on this.
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
Co-authored-by: Jack Luo <[email protected]>
Co-authored-by: Jack Luo <[email protected]>
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
021910a to
858645a
Compare
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
|
Thanks for your reviews @snasovich and @jiaqiluo For the URL discussion to me, we should be good. To be still discuss in the GitHub comment if needed. I started performing some test using this way: #151 (comment) My Dockerfile for Rancher with my machine binary: FROM rancher/rancher:latest
COPY rancher-machine /usr/bin/rancher-machine
RUN chmod +x /usr/bin/rancher-machineIn the logs, I can see my binary used, in the UI I can see my changes, if I update flag usage..etc, but when I try to use the driver I'm having an issue. I create a cloud credential with What I understand here, my keys are not set... sounds cloud credential is not up-to-date and have the wrong key name..etc weird also on cloud credential I can not add another field for the apiKey, only secret
and at pool creation it detects only apiKey and not apiSecretKey
If someone of you are available, I would appreciate it if someone can join me for a small debug session via google meet |
|
Hi @pierre-emmanuelJ, after rebuilding the node driver with this fix of issue 2 and updating Rancher with the new version of Hi @sauterp, we need to fix issue 2 in this PR. See below for details. There are two issues: 1. Missing
|
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
|
Thanks a lot @jiaqiluo for all this well detailed infos! I fixed the driver: a45a2ac I well succeeded having the Cloud Credential working as you demonstrated, thanks. Unfortunately, I still got the error: What I've done to try to debug and make sure my compiled version is well up-to-date and receive the right 1 - I customized the error from the driver with: errors.New("TEST=missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key): " + strings.Join(os.Environ(), ", "))The goal is to try to catch which ENV variable are sent to my 2 - for visual confirmation also, usually I modify the description of a field like: -Usage: "exoscale instance profile (Small, Medium, Large, ...)",
+Usage: "TESTTTTTTTT1111 exoscale instance profile (Small, Medium, Large, ...)",I can see well appear the I don't know what I'm doing wrong, I would like to ensure which env variable is passed to the program to debug. |
|
Since we’re still encountering the error “missing an API key (--exoscale-api-key) or API secret key (--exoscale-api-secret-key)”, it’s important to determine which environment variables are set when Rancher runs We care about these environment variables because Rancher retrieves credential values from a secret and injects them as environment variables into the Pod that runs the In this case, we expect to see EXOSCALE_API_KEY and EXOSCALE_API_SECRET_KEY to be set. Each time you create or add a new node to the cluster, Rancher launches a Pod in the fleet-default namespace of the local cluster to execute the Note: Rancher automatically deletes this Pod a few minutes after the execution completes (whether it succeeds or fails), so there’s only a short window of time to inspect it. Can you give it a try and share the pod's YAML file as well as findings from the logs?
|
|
@pierre-emmanuelJ, the CI failed at the unit test |
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
|
I finally successfully run it with success! 🎉 😄 rancher-machine dockerfile FROM golang:1.24 AS builder
ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
WORKDIR /go/src/github.com/rancher/machine
COPY . .
RUN go build -o rancher-machine ./cmd/rancher-machine
FROM rancher/machine:v0.15.0-rancher135
COPY --from=builder /go/src/github.com/rancher/machine/rancher-machine /usr/local/bin/rancher-machinerancher dockerfile FROM rancher/rancher:head
COPY rancher-machine /usr/bin/rancher-machine
RUN chmod +x /usr/bin/rancher-machine
ENV CATTLE_MACHINE_PROVISION_IMAGE=<my_personal_public_registry>/rancher-machine:latestFrom Exoscale point of view, I can see instance and security group to be created: From Rancher, I can then destroy the cluster and resources are correctly cleaned up on the Exoscale side. On Rancher side, I created a cluster of 3 Nodes, I have seen the 3 rancher machine pods creating my Exoscale resources (one has gone after, but the 3 were completed 👍 ):
After inspected the manifest, my machine image was set correctly and secret was right as expected. Thanks also to the fix Last point, after more than 30 mins the cluster is partially ready, (1/3 nodes is ready)
Cluster-API logs are just still waiting nodes (no errors): One node is successfully provisioned via RKE via SSH, but the two others are still in pending. I will investigate tomorrow, and retry. |
|
@pierre-emmanuelJ, thank you for the updates. I’m glad to hear the driver is now working! Just a quick note: the fix for the missing For debugging RKE2 clusters, here are some documentation links for finding logs and CLI tools. When troubleshooting, I usually SSH into the node and check the logs for the journalctl -u rke2-server -f
# or
systemctl status rke2-server.serviceIf rke2-server.service works fine, then I usually check the logs of each kube-x component for errors. RKE2 references: |
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
|
All is working like a charm now! 🎉 I've reworked the firewalling, that was causing issues for RKE2, since it was legacy and not adapted
See my last commit: 62c8797 I have setup the firewalling to be compatible with RKE2 and calico based, from the RKE2 and calico recommended rules from RKE2 doc.
Last things, in rancher node driver it's possible to provision node with rke2 or k3s, do I need to add those k3s firewall rules to make sure being compatible? : After that, I think we are ready to merge it :) cc @jiaqiluo |
|
The link you shared refers to the port requirements for the local cluster where Rancher itself is installed, so you don’t need to add those K3s-specific firewall rules. However, the Exoscale node driver can be used for both RKE2 and K3s downstream clusters, which means the default firewall rules should cover the requirements of both distributions. In addition, Rancher is adding support for IPv6 downstream clusters, so the firewall rules will also need to be updated if the Exoscale node driver is intended to support creating IPv6-only or dual-stack clusters. To make updating the firewall rules easier, you can refer to the built-in ingress and egress rules used by the AWS EC2 node driver. Specifically, check the Could you please confirm whether you’ve covered these rules and made the necessary updates? Feel free to ping me if you have any questions. |
|
Since I have you here, could you please confirm whether the Currently, we treat the If the |
Signed-off-by: Pierre-Emmanuel Jacquier <[email protected]>
|
Thanks @jiaqiluo I adapted the implementation based on aws ec2 one, supporting dualstack of course: 5c3e0e5 I confirm all is working as intended: We are not blocking in egress by default in security-group, so I'll keep as before and touch nothing on this side. For the apikey you can keep it Thanks for your helps and reviews. I let you review the last change, to me, it's ready to be merged. |
jiaqiluo
left a comment
There was a problem hiding this comment.
Great work! 👍 Everything looks good to me.
Thanks for your contribution!
I’ll go ahead and get the PR merged once my team signs off again.















The current exoscale driver is based on v1 of the egoscale package. We update the driver to use v3.
Testing