Identify it is OOMKill
Reason should be OOMKill and the time is Finished
kubectl get pods testapp-v092-p8czf -o yaml | less -i
…
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 11 Sep 2020 11:00:08 +0800
Finished: Mon, 14 Sep 2020 13:00:46 +0800
…
OOM heap dump ( when oomkill happen )
Container entrypoints add java start params
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=
2. Mount an emptyDir for the pod In the pod lifecycle, /var/log/dump is durable.
3. Compress the dumpfile and download
gzip heapdump2020-09-15-03-198523874477783269974.hprof
kubectl cp testapp-v1127-xnbhq:/var/log/dump/heapdump2020-09-15-03-198523874477783269974.hprof.gz /tmp/heapdump2020-09-15-03-198523874477783269974.hprof.gz
Check List ( pod is already restarted )
check stackdirver applictaion logs
check memory and cpu limits
$ kubectl get pods testapp-v203-trsfl -o yaml
…
resources:
limits:
cpu: 1500m
memory: 1229Mi
requests:
cpu: 300m
memory: 1Gi
…
check kubectl top status
$ kubectl top pod testapp-v203-trsfl –containers
POD NAME CPU(cores) MEMORY(bytes)
testapp-v203-trsfl testapp 13m 1144Mi
testapp-v203-trsfl istio-proxy 5m 47Mi
new relic pod memory:
commands investigate java stack heap (inside pod)
apk add –no-cache jattach –repository http://dl-cdn.alpinelinux.org/alpine/edge/community/
jattach pid inspectheap
jattach pid jcmd VM.info
ps find RSS of process (inside pod)
$ kubectl exec -it testapp-v203-trsfl /bin/bash
ps -o pid,user,vsz,rss,comm,args
PID USER VSZ RSS COMMAND COMMAND
1 root 4332 720 tini /tini — /entrypoint.sh java
7 test 6.3g 1.1g java java -XX:+UseG1GC -Xms768m -Xmx768m -DREGION=gcp_hk -XX:+ExitOnOutOfMemoryError -XX:+UseStringDeduplication -XX:StringDeduplicationAgeThreshold=3 -agentlib:jdwp=transport=dt_socket,ser
18215 root 2620 2316 bash /bin/bash
18267 root 1572 20 ps ps -o pid,user,vsz,rss,comm,args
Issues:
Leave a Reply