Certificate Troubleshooting
Certificate configuration troubleshooting
This page includes troubleshooting for common certificate configuration issues.
Confirm certificate exists
To confirm certificate the cluster has a certificate to install into the traefik
reverse proxy, run the following command to view details of the certificate.
kubectl -n <namespace> get secret traefik-tls -o yaml
For example:
$ kubectl -n <namespace> get secret traefik-tls -o yaml
apiVersion: v1
data:
tls.crt: ${BASE64_ENCODED_CERTIFICATE_CHAIN}
tls.key: ${BASE64_ENCODED_PRIVATE_KEY}
kind: Secret
metadata:
creationTimestamp: "2023-06-27T15:27:12Z"
name: traefik-tls
namespace: <namespace>
resourceVersion: "32029846"
uid: 33c06aa2-b81f-44a9-9123-c24e1af94bb5
type: kubernetes.io/tls
Presence of data in the response indicates that a certificate is present.
Check certificate content
Extract the base64 encoded "tls.crt" chain or "tls.key" to examine the contents. Use base64
to decode the contents.
kubectl -n <namespace> get secret traefik-tls -o jsonpath='{.data}' | jq -r '."tls.crt"' | base64 -d
-----BEGIN CERTIFICATE-----
MIIDvDCCA0GgAwIBAgISBcaGB/nkEsSDJPFV7rvZAnAyMAoGCCqGSM49BAMDMDIx
CzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MQswCQYDVQQDEwJF
[ ... omitted ... ]
Xb8Wh0r2bqbrXYti3ujCIOAH9OFKb3BFJvDj6NaKazNwz8fkzUFu7bf5t0AoAjEA
v/tRY9gta5rmxsQTvSlztnln1Xu4zDbAaIBBTwWxTl3CcphQCiLDNPTFzgWIaV9Y
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIEVzCCAj+gAwIBAgIRALBXPpFzlydw27SHyzpFKzgwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
[ ... omitted ... ]
EwOy59Hdm0PT/Er/84dDV0CSjdR/2XuZM3kpysSKLgD1cKiDA+IRguODCxfO9cyY
Ig46v9mFmBvyH04=
-----END CERTIFICATE-----
Store the contents to a file for examination with TLS tooling. Example:
openssl x509 -text -noout < ${FILE_CONTENTS}
Check certificate chain order
Kubernetes requires full chain certificates. The certificate chain should begin with your certificate, continue with intermediate certificates down the chain, and end with the root certificate:
-----BEGIN CERTIFICATE-----
{ Your issued Certificate }
-----END CERTIFICATE-----
----BEGIN CERTIFICATE-----
{ Intermediate Certificate }
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
{ Root Certificate }
-----END CERTIFICATE-----
Let's Encrypt is busy
The Let's Encrypt or another ACME provider may be busy, or the account in use for acquiring certificates may have exceeded a rate limit.
Check the init-acme
pod logs for lines reporting deferral or error reasons.
acme: error: 0 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rateLimited :: Service busy; retry later., url:
Rate limits occur in a variety of circumstances, but most commonly when a single account has issued a high number of recent requests. This tends to happen at cluster initialization time, and is at least part of the reason for limiting retry count to six consecutive failures.
Fix this by either waiting 30 minutes or correcting the provisioning failure and switching to a different admin_email
.
AWS ACL limitations
For clusters deployed into AWS, the Virtual Private Cloud (VPC) security groups for load balancers allow a maximum of 60 rules. If your AWS load balancer hits the limit, Hydrolix can't add the required IP addresses for certificate renewal.
Fix this with a temporary repair:
- Temporarily set the allowlist to
0.0.0.0/0
. - Apply the configuration to the
hydrolixcluster.yaml
file.
kubectl apply -f hydrolixcluster.yaml --namespace <namespace>
- Wait for the
init-acme
job to complete.
kubectl -n <namespace> get job --field-selector status.successful=1 | grep acme
- Verify that the job completed successfully, and that a
traefik-tls
secret exists. - Revert the changes made to
hydrolixcluster.yaml
and verify that theacme-renewal
cronjob exists.
Updated 1 day ago