Certificate Troubleshooting
This page includes troubleshooting for common certificate configuration issues.
Confirm certificate exists⚓︎
To confirm certificate the cluster has a certificate to install into the traefik reverse proxy, run the following command to view details of the certificate.
For example:
Presence of data in the response indicates that a certificate is present.
Check certificate content⚓︎
Extract the base64 encoded "tls.crt" chain or "tls.key" to examine the contents. Use base64 to decode the contents.
Store the contents to a file for examination with TLS tooling. Example:
Check certificate chain order⚓︎
The leaf certificate is the first in the file. The intermediate cert used to sign the leaf appears next. If there are more intermediates, continue adding them in order.
For certificates from public CAs, omit the root certificate. Clients use an intermediate certificate to find the corresponding root certificate in their trust store.
For certificates from private CAs or self-managed PKIs, you may wish to include the root certificate. The latter is necessary, for example, in Kubernetes private communication with applications like kubectl and k9s.
Let's Encrypt is busy⚓︎
The Let's Encrypt or another ACME provider may be busy, or the account in use for acquiring certificates may have exceeded a rate limit.
Check the init-acme pod logs for lines reporting deferral or error reasons.
Rate limits occur in a variety of circumstances, but most commonly when a single account has issued a high number of recent requests. This tends to happen at cluster initialization time, and is at least part of the reason for limiting retry count to six consecutive failures.
Fix this by either waiting 30 minutes or correcting the provisioning failure and switching to a different admin_email.
AWS ACL limitations⚓︎
For clusters deployed into AWS, the Virtual Private Cloud (VPC) security groups for load balancers allow a maximum of 60 rules. If your AWS load balancer hits the limit, Hydrolix can't add the required IP addresses for certificate renewal.
Fix this with a temporary repair:
- Temporarily set the allowlist to
0.0.0.0/0. - Apply the configuration to the
hydrolixcluster.yamlfile.kubectl apply -f hydrolixcluster.yaml --namespace <namespace> - Wait for the
init-acmejob to complete.kubectl -n <namespace> get job --field-selector status.successful=1 | grep acme - Verify that the job completed successfully, and that a
traefik-tlssecret exists. - Revert the changes made to
hydrolixcluster.yamland verify that theacme-renewalcronjob exists.