Related to:
Problem:
The pod entered a restart loop with frequent restarts over several hours. kubectl get pods showed the pod in 0/1 Running with an increasing restart count.
Pod events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 10h (x3 over 10h) kubelet Liveness probe failed: Get "http://***HIDDEN***:8080/healthcheck": dial tcp ***HIDDEN***:8080: connect: connection refused Warning Unhealthy 10h (x162 over 10h) kubelet Readiness probe failed: Get "http://***HIDDEN***:8080/healthcheck": dial tcp ***HIDDEN***:8080: connect: connection refused Normal Killing 10h (x9 over 10h) kubelet Container iax-app-capacity-daemon failed liveness probe, will be restarted
Kubernetes events reported repeated Liveness probe failed and Readiness probe failed messages, followed by Normal Killing when the kubelet restarted the container:
- “Liveness probe failed … connection refused”
- “Readiness probe failed … connection refused”
- “Container … failed liveness probe, will be restarted”
Container logs did not show a hard error; instead, they stopped at random startup lines, e.g.:
Starting metric indexer for config …-
HikariPool-7 - Starting…thenHikariPool-7 - Start completed.
The service had not yet made its health endpoint responsive when the liveness probe fired.
Possible cause:
The pod's startup time exceeded the default liveness probe window. While the service was still initializing, the liveness probe attempted to call /healthcheck and timed out/was refused, causing Kubernetes to mark the container unhealthy and restart it prematurely—creating a loop. The defaults in the deployment were too aggressive for this environment:
initialDelaySeconds: 120timeoutSeconds: 5failureThreshold: 3
Possible solution:
Extending the liveness probe window allowed the service to finish initialization and expose a healthy endpoint before the kubelet judged it. Specifically:
livenessProbe:
httpGet:
path: /healthcheck
port: 8080
initialDelaySeconds: 300 # increased from 120
timeoutSeconds: 10 # increased from 5
failureThreshold: 5 # increased from 3
periodSeconds: 10
After increasing initialDelaySeconds to 300 (with the optional timeout/failure tweaks), the pod stabilized: the pod reached a healthy state, the health probe began responding (HTTP health probe server listening … /healthcheck), and restarts ceased.
Steps:
- Identify the pod's parent deployment.
kubectl get deployment -n <namespace>
- increase/adjust the
livenessProbesettings.-
kubectl edit deployment <deployment_name> -n <namespace>
-
Related articles:
If you need further help:
-
Please contact our support team via the chat service box on any of our websites or raise a support request.
-
Make sure you provide us with:
- Background of the issue or request.
- Use cases, requirements, business impact, etc.
- Encountered error messages.
- Log files or diagnostic files.
- Screenshots.
- And other important information relevant to your inquiry.
Comments
0 comments
Please sign in to leave a comment.