分类
life

k8s pod OOMKill Exit Code: 137

Identify it is OOMKill

Reason should be OOMKill and the time is Finished

kubectl get pods testapp-v092-p8czf -o yaml | less -i


Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 11 Sep 2020 11:00:08 +0800
Finished: Mon, 14 Sep 2020 13:00:46 +0800

OOM heap dump ( when oomkill happen )

Container entrypoints add java start params

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=\.hprof

2. Mount an emptyDir for the pod In the pod lifecycle, /var/log/dump is durable.

3. Compress the dumpfile and download

gzip heapdump2020-09-15-03-198523874477783269974.hprof
kubectl cp testapp-v1127-xnbhq:/var/log/dump/heapdump2020-09-15-03-198523874477783269974.hprof.gz /tmp/heapdump2020-09-15-03-198523874477783269974.hprof.gz

Check List ( pod is already restarted )

check stackdirver applictaion logs

check memory and cpu limits

$ kubectl get pods testapp-v203-trsfl -o yaml

resources:
limits:
cpu: 1500m
memory: 1229Mi
requests:
cpu: 300m
memory: 1Gi

check kubectl top status

$ kubectl top pod testapp-v203-trsfl –containers
POD NAME CPU(cores) MEMORY(bytes)
testapp-v203-trsfl testapp 13m 1144Mi
testapp-v203-trsfl istio-proxy 5m 47Mi

new relic pod memory:

commands investigate java stack heap (inside pod)

apk add –no-cache jattach –repository http://dl-cdn.alpinelinux.org/alpine/edge/community/
jattach pid inspectheap
jattach pid jcmd VM.info

ps find RSS of process (inside pod)

$ kubectl exec -it testapp-v203-trsfl /bin/bash
ps -o pid,user,vsz,rss,comm,args
PID USER VSZ RSS COMMAND COMMAND
1 root 4332 720 tini /tini — /entrypoint.sh java
7 test 6.3g 1.1g java java -XX:+UseG1GC -Xms768m -Xmx768m -DREGION=gcp_hk -XX:+ExitOnOutOfMemoryError -XX:+UseStringDeduplication -XX:StringDeduplicationAgeThreshold=3 -agentlib:jdwp=transport=dt_socket,ser
18215 root 2620 2316 bash /bin/bash
18267 root 1572 20 ps ps -o pid,user,vsz,rss,comm,args

Issues:

分类
monitoring

newrelic 和 opsgenie 集成

NewRelic

policy
channel => opsgenine Teams foobar

Opsgenie
integration
teams foobar

分类
istio

debug istio multicluster

curl -X POST http://localhost:15000/logging?level=debug

Check config

bin/istioctl proxy-config listener  istio-ingressgateway-6589659c8c-f76f9 --port 15443 -o json -n istio-system
分类
gcp

gsutil

gsutil versioning get gs://xxx-infra

gs://xxx-infra: Suspended

如果没开启,打开 versioning,Enabling Object Versioning increases storage costs

gsutil versioning set on gs://xxx-infra

gsutil versioning get gs://xxx-infra

gs://xxx-infra: : Enabled

查看所有版本的所有文件

gsutil ls -a gs://xxx-infra

恢复某一个版本的文件

gsutil cp gs://xxx-infra/subnet_list.json#1607987168023139 gs://xxx-infra/subnet_list.json

分类
postgresql

postgresql

apt install postgresql-client

test connection

pg_isready -d  -h  -p  -U 
pg_isready -h postgres10 -p 5432 -U 123

find configuration file

psql -U postgres -c 'SHOW config_file'