Blog

  • k8s pod OOMKill Exit Code: 137

    Identify it is OOMKill

    Reason should be OOMKill and the time is Finished

    kubectl get pods testapp-v092-p8czf -o yaml | less -i


    Last State: Terminated
    Reason: OOMKilled
    Exit Code: 137
    Started: Fri, 11 Sep 2020 11:00:08 +0800
    Finished: Mon, 14 Sep 2020 13:00:46 +0800

    OOM heap dump ( when oomkill happen )

    Container entrypoints add java start params

    -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=\.hprof

    2. Mount an emptyDir for the pod In the pod lifecycle, /var/log/dump is durable.

    3. Compress the dumpfile and download

    gzip heapdump2020-09-15-03-198523874477783269974.hprof
    kubectl cp testapp-v1127-xnbhq:/var/log/dump/heapdump2020-09-15-03-198523874477783269974.hprof.gz /tmp/heapdump2020-09-15-03-198523874477783269974.hprof.gz

    Check List ( pod is already restarted )

    check stackdirver applictaion logs

    check memory and cpu limits

    $ kubectl get pods testapp-v203-trsfl -o yaml

    resources:
    limits:
    cpu: 1500m
    memory: 1229Mi
    requests:
    cpu: 300m
    memory: 1Gi

    check kubectl top status

    $ kubectl top pod testapp-v203-trsfl –containers
    POD NAME CPU(cores) MEMORY(bytes)
    testapp-v203-trsfl testapp 13m 1144Mi
    testapp-v203-trsfl istio-proxy 5m 47Mi

    new relic pod memory:

    commands investigate java stack heap (inside pod)

    apk add –no-cache jattach –repository http://dl-cdn.alpinelinux.org/alpine/edge/community/
    jattach pid inspectheap
    jattach pid jcmd VM.info

    ps find RSS of process (inside pod)

    $ kubectl exec -it testapp-v203-trsfl /bin/bash
    ps -o pid,user,vsz,rss,comm,args
    PID USER VSZ RSS COMMAND COMMAND
    1 root 4332 720 tini /tini — /entrypoint.sh java
    7 test 6.3g 1.1g java java -XX:+UseG1GC -Xms768m -Xmx768m -DREGION=gcp_hk -XX:+ExitOnOutOfMemoryError -XX:+UseStringDeduplication -XX:StringDeduplicationAgeThreshold=3 -agentlib:jdwp=transport=dt_socket,ser
    18215 root 2620 2316 bash /bin/bash
    18267 root 1572 20 ps ps -o pid,user,vsz,rss,comm,args

    Issues:

  • newrelic 和 opsgenie 集成

    NewRelic

    policy
    channel => opsgenine Teams foobar

    Opsgenie
    integration
    teams foobar

  • debug istio multicluster

    curl -X POST http://localhost:15000/logging?level=debug
    

    Check config

    bin/istioctl proxy-config listener  istio-ingressgateway-6589659c8c-f76f9 --port 15443 -o json -n istio-system
    
  • gsutil

    gsutil versioning get gs://xxx-infra

    gs://xxx-infra: Suspended

    如果没开启,打开 versioning,Enabling Object Versioning increases storage costs

    gsutil versioning set on gs://xxx-infra

    gsutil versioning get gs://xxx-infra

    gs://xxx-infra: : Enabled

    查看所有版本的所有文件

    gsutil ls -a gs://xxx-infra

    恢复某一个版本的文件

    gsutil cp gs://xxx-infra/subnet_list.json#1607987168023139 gs://xxx-infra/subnet_list.json

  • postgresql

    postgresql service local

    https://hub.docker.com/_/postgres

    docker run --name postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
    docker exec -it postgres /bin/bash
    

    test connection

    pg_isready -d  -h  -p  -U 
    pg_isready -h 127.0.0.1 -p 5432 -U root
    

    apt install postgresql-client-13

    find configuration file

    psql -U postgres -c 'SHOW config_file'
    

    print time

    psql -U postgres -c "select (to_char(CURRENT_TIMESTAMP ,'yyyy-MM-dd HH24:MI')||':00')::timestamp"
    
  • git 根据条件计算中间 commit 数量

    https://stackoverflow.com/a/11657647/3672812

    在 feature 分支下

    git rev-list --count HEAD ^master
    
    13