Certificate Troubleshooting

This page includes troubleshooting for common certificate configuration issues.

Confirm certificate exists

To confirm certificate the cluster has a certificate to install into the traefik reverse proxy, run the following command to view details of the certificate.

kubectl -n <namespace> get secret traefik-tls -o yaml

For example:

$ kubectl -n <namespace> get secret traefik-tls -o yaml  
apiVersion: v1  
data:  
  tls.crt: ${BASE64_ENCODED_CERTIFICATE_CHAIN}
  tls.key: ${BASE64_ENCODED_PRIVATE_KEY}
kind: Secret  
metadata:  
  creationTimestamp: "2023-06-27T15:27:12Z"  
  name: traefik-tls  
  namespace: <namespace>
  resourceVersion: "32029846"  
  uid: 33c06aa2-b81f-44a9-9123-c24e1af94bb5  
type: kubernetes.io/tls

Presence of data in the response indicates that a certificate is present.

Check certificate content

Extract the base64 encoded "tls.crt" chain or "tls.key" to examine the contents. Use base64 to decode the contents.

kubectl -n <namespace> get secret traefik-tls -o jsonpath='{.data}' | jq -r '."tls.crt"' | base64 -d
-----BEGIN CERTIFICATE-----
MIIDvDCCA0GgAwIBAgISBcaGB/nkEsSDJPFV7rvZAnAyMAoGCCqGSM49BAMDMDIx
CzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MQswCQYDVQQDEwJF
  [ ... omitted ... ]
Xb8Wh0r2bqbrXYti3ujCIOAH9OFKb3BFJvDj6NaKazNwz8fkzUFu7bf5t0AoAjEA
v/tRY9gta5rmxsQTvSlztnln1Xu4zDbAaIBBTwWxTl3CcphQCiLDNPTFzgWIaV9Y
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIEVzCCAj+gAwIBAgIRALBXPpFzlydw27SHyzpFKzgwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
  [ ... omitted ... ]
EwOy59Hdm0PT/Er/84dDV0CSjdR/2XuZM3kpysSKLgD1cKiDA+IRguODCxfO9cyY
Ig46v9mFmBvyH04=
-----END CERTIFICATE-----

Store the contents to a file for examination with TLS tooling. Example:

openssl x509 -text -noout < ${FILE_CONTENTS}

Check certificate chain order

Kubernetes requires full chain certificates. The certificate chain should begin with your certificate, continue with intermediate certificates down the chain, and end with the root certificate:

Let's Encrypt is busy

The Let's Encrypt or another ACME provider may be busy, or the account in use for acquiring certificates may have exceeded a rate limit.

Check the init-acme pod logs for lines reporting deferral or error reasons.

 acme: error: 0 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rateLimited :: Service busy; retry later., url:

Rate limits occur in a variety of circumstances, but most commonly when a single account has issued a high number of recent requests. This tends to happen at cluster initialization time, and is at least part of the reason for limiting retry count to six consecutive failures.

Fix this by either waiting 30 minutes or correcting the provisioning failure and switching to a different admin_email.

AWS ACL limitations

For clusters deployed into AWS, the Virtual Private Cloud (VPC) security groups for load balancers allow a maximum of 60 rules. If your AWS load balancer hits the limit, Hydrolix can't add the required IP addresses for certificate renewal.

Fix this with a temporary repair:

Temporarily set the allowlist to 0.0.0.0/0.
Apply the configuration to the hydrolixcluster.yaml file.
kubectl apply -f hydrolixcluster.yaml --namespace <namespace>
Wait for the init-acme job to complete.
kubectl -n <namespace> get job --field-selector status.successful=1 | grep acme
Verify that the job completed successfully, and that a traefik-tls secret exists.
Revert the changes made to hydrolixcluster.yaml and verify that the acme-renewal cronjob exists.