Category: Kubernetes

  • helm

    helm 2

    版本

    helm version
    

    安装或升级

    helm upgrade gitlab-runner gitlab/gitlab-runner --namespace tools -f deploy/gitlab-runner-values.yaml --install --wait

    查看

    helm status gitlab-runner
    

    Init

    helm init --client-only

    Dry run, upgrade, deployment pods will rolling update

    helm upgrade --dry-run --debug -f values.yaml gitlab-runner .

    istio

    helm upgrade --dry-run --debug -f values.yaml istio .
    

    清理

    helm delete gitlab-runner --purge

    查看,更新 repo

    helm repo list
    helm repo update
    

    查看 repo 下的所有版本

    helm2 search -l gitlab/gitlab-runner
    helm2 search -l stable/nginx-ingress
    

    安装具体 chart 版本

    helm upgrade --debug gitlab-runner gitlab/gitlab-runner --version "1.8.1" --namespace infra gitlab-runner-values.yaml --install --wait
    
    helm --kube-context prod list
    

    切换 helm 版本

    brew unlink kubernetes-helm
    brew switch helm 3.4.0
    

    查看一个 helm 所有资源

    helm2 get service1
    helm3 get all service1 -n dev
    

    查看 values

    helm3.6.1 get values gitlab-runner  -n infra
    
  • k8s 容器内访问 apiserver

    kubectl exec -it cassandra-0  bash -n noah
    
    TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
    
    curl https://kubernetes.default.svc.cluster.local/api/v1/namespaces/noah/endpoints/cassandra --header "Authorization: Bearer $TOKEN" --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    
    
  • 接入 k8s 遇到的问题

    从传统容器接入 k8s 过程中遇到的问题:

    1. 某 sdk 要升级(低版本的会导致 istio 容器挂掉)
    报错为 Caused by: java.io.IOException: Cannot bind to URL [rmi:///jmxrmi]: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is

    2. 客户端 HTTP 对外请求被 envoy rule deny 400( bad request ), 原因 HTTP header 里出现了空的 key:value,客户端修复后,问题消失。以下是抓包,见 Content-Type 和 Accept 中间的一行
    14:08:37.918970 IP 10.18.19.98.51604 > lb008-dev.http: Flags [P.], seq 1:489, ack 1, win 229, options [nop,nop,TS val 1596856343 ecr 1593089157], length 488: HTTP: POST /ws/rs/domain/domain/init HTTP/1.1
    E….J@.?…
    ..b
    ..7…P. .].*……+\…..
    _…^…POST /ws/rs/domain/domain/init HTTP/1.1
    Content-Type: application/json
    :
    Accept: application/json
    api-uuid: 02ac3ebe-f212-4ca8-998e-4a4ab576018c
    api-control-request-type: ANONYMOUS
    User-Agent: Apache CXF 3.1.4
    Cache-Control: no-cache
    Pragma: no-cache
    Host: uniauthserver-dev
    Connection: keep-alive
    Content-Length: 407
    解决:修复上面 : k,v 都是空的行

    3. 如果想要使用 jaeger 进行分布式 tracing,可以参考 https://istio.io/zh/docs/tasks/telemetry/distributed-tracing/overview/

    4. kiali 出现 unknown 的调用链 :没有通过 service mesh 的调用,会显示 unknown

    5. k8snode kernel 版本问题
    kernel版本过低会导致docker报错,kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1
    会导致系统cpu占用过高,docker容器都会卡住。
    Observed kernel versions with this issue
    RHEL7 3.10.0-862
    4.15.0
    4.20.0
    Kernel versions claimed not triggering this issue
    RHEL7 3.10.0-957.10.1
    4.19.12
    4.17.0
    4.17.11
    Related kernel commits
    torvalds/linux@f186ce6 – since 4.12
    torvalds/linux@4ee806d – since 4.15
    torvalds/linux@ee60ad2 – since 5.1

    另一个表现为 kubectl get pods –all-namespace -o wide 发现 pods 长时间一直 Terminating,删不掉

    解决: yum update ( 升级 kernel 和操作系统至最新版 kernel 3.10.0-957.21.3.el7)

    6. 请求的 url 出现 no healthy upstream( http 503 错误) 检查是否发布成功

    7. 请求 url 出现 404 (业务发布是成功的) ,检查 k8s 内部的 virtual service 和 ingress gateway 是否配置正确

    8. node 程序因为 k8s 注入的环境变量太多(k8s服务发现机制),导致 node process.env 长度太长,报错启动失败 。

    目前遇到问题的有 frontend-main, market-solution-activity-web。还没找到不改程序的解决办法。改程序的解决办法是只取自己需的 process.env https://zhuanlan.zhihu.com/p/74056339

    [2019-07-30 16:54:13] PM2 error: Trace: { Error: spawn E2BIG
    at exports._errnoException (util.js:1024:11)
    at ChildProcess.spawn (internal/child_process.js:325:11)
    at exports.spawn (child_process.js:493:9)
    at exports.fork (child_process.js:99:10)
    at createWorkerProcess (internal/cluster/master.js:127:10)
    at EventEmitter.cluster.fork (internal/cluster/master.js:161:25)
    at Object.nodeApp (/opt/nodeapp/node_modules/pm2/lib/God/ClusterMode.js:52:21)
    at Object.executeApp (/opt/nodeapp/node_modules/pm2/lib/God.js:159:9)
    at inject (/opt/nodeapp/node_modules/pm2/lib/God.js:418:18)
    at Object.injectVariables (/opt/nodeapp/node_modules/pm2/lib/God.js:530:10) code: ‘E2BIG’, errno: ‘E2BIG’, syscall: ‘spawn’ }
    at Object.God.logAndGenerateError (/opt/nodeapp/node_modules/pm2/lib/God/Methods.js:36:15)
    at Object.nodeApp (/opt/nodeapp/node_modules/pm2/lib/God/ClusterMode.js:54:11)
    at Object.executeApp (/opt/nodeapp/node_modules/pm2/lib/God.js:159:9)
    at inject (/opt/nodeapp/node_modules/pm2/lib/God.js:418:18)
    at Object.injectVariables (/opt/nodeapp/node_modules/pm2/lib/God.js:530:10)
    at /opt/nodeapp/node_modules/pm2/lib/God.js:416:9
    at /opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1135:9
    at replenish (/opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1011:17)
    at /opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1016:9
    at _asyncMap (/opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1133:5)
    [2019-07-30 16:54:13] PM2 error: spawn E2BIG

    9. 用 flannel + host-gw 阿里云不支持自己定义的 route ,否则需要手动添加路由,换为 vxlan

    [[email protected] kubespray]# ansible all -i inventory/k8s_prod_aliyun-cn-shanghai-b_006/inventory.ini -m shell -a “ping -c 3 10.36.3.4”
    [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
    k8snode034-prod.aliyun-cn-shanghai-b | CHANGED | rc=0 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    64 bytes from 10.36.3.4: icmp_seq=1 ttl=64 time=0.066 ms
    64 bytes from 10.36.3.4: icmp_seq=2 ttl=64 time=0.068 ms
    64 bytes from 10.36.3.4: icmp_seq=3 ttl=64 time=0.067 ms
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 3 received, 0% packet loss, time 1999ms
    rtt min/avg/max/mdev = 0.066/0.067/0.068/0.000 ms
    k8smaster016-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 0 received, 100% packet loss, time 2000msnon-zero return code
    k8smaster015-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 0 received, 100% packet loss, time 1999msnon-zero return code
    k8smaster014-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 0 received, 100% packet loss, time 2000msnon-zero return code

    10. 有的应用需要自己拨 vpn 连到其他网络,有状态,不能接入

    11. k8s里面,java应用通过 Runtime.getRuntime().availableProcessors()拿到的核数为1,这样使用这个设置线程池的大小会变成1,按照之前docker的情况应该市返回宿主机核数

    12. 有的暂时不接入 k8s ,暴露端口为 tcp (非 http ),发布系统生成的 istio 配置均为 http,后续考虑

    13. pod STATUS CreateContainerConfigError

  • kubespray 使用遇到的问题

    1 问题:

    FAILED! => {“changed”: false, “module_stderr”: “sudo: sorry, you must have a tty to run sudo\n”, “module_stdout”: “”, “msg”: “MODULE FAILURE\nSee stdout/stderr for the exact error”, “rc”: 1}

    解决 visudo,将 Defaults requiretty 注释掉

    2.

    FAILED! => {“changed”: false, “msg”: “Failed to reload sysctl: vm.max_map_count = 262144\nnet.ipv4.ip_forward = 1\nsysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory\nsysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n”}
    解决:

    modprobe br_netfilter

  • kubectl kubectx kubens 快速切换集群和 namespace

    集群和 namespace 信息配置在

     ~/.kube/config
    

    查看当前的 context

    kubectl config current-context
    

    切换 namespace

    alias kcd='kubectl config set-context $(kubectl config current-context) --namespace' 
    

    切换集群

    kubectl config use-context dev-admin@dev
    

    使用工具进行快速切换

    https://github.com/ahmetb/kubectx

    brew install kubectx
    

    查看 config 配置

    kubectl config view
    
  • k8s 停用一个 deployment statefulset

    将 节点 “扩”成 0

    kubectl scale --replicas=0 deployment contract-service -n dev  只能把节点扩成 0
    

    statefulset

    kubectl scale --replicas=0 statefulset kafka -n dev