Category: Kubernetes

helm

helm 2

版本

helm version

安装或升级

helm upgrade gitlab-runner gitlab/gitlab-runner --namespace tools -f deploy/gitlab-runner-values.yaml --install --wait

查看

helm status gitlab-runner

Init

helm init --client-only

Dry run, upgrade, deployment pods will rolling update

helm upgrade --dry-run --debug -f values.yaml gitlab-runner .

istio

helm upgrade --dry-run --debug -f values.yaml istio .

清理

helm delete gitlab-runner --purge

查看,更新 repo

helm repo list
helm repo update

查看 repo 下的所有版本

helm2 search -l gitlab/gitlab-runner
helm2 search -l stable/nginx-ingress

安装具体 chart 版本

helm upgrade --debug gitlab-runner gitlab/gitlab-runner --version "1.8.1" --namespace infra gitlab-runner-values.yaml --install --wait

helm --kube-context prod list

切换 helm 版本

brew unlink kubernetes-helm
brew switch helm 3.4.0

查看一个 helm 所有资源

helm2 get service1
helm3 get all service1 -n dev

查看 values

helm3.6.1 get values gitlab-runner  -n infra

2020 年 6 月 9 日

k8s 容器内访问 apiserver

kubectl exec -it cassandra-0  bash -n noah

TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)

curl https://kubernetes.default.svc.cluster.local/api/v1/namespaces/noah/endpoints/cassandra --header "Authorization: Bearer $TOKEN" --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

2020 年 5 月 5 日

接入 k8s 遇到的问题

从传统容器接入 k8s 过程中遇到的问题：

1. 某 sdk 要升级（低版本的会导致 istio 容器挂掉）
报错为 Caused by: java.io.IOException: Cannot bind to URL [rmi:///jmxrmi]: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is

2. 客户端 HTTP 对外请求被 envoy rule deny 400( bad request ), 原因 HTTP header 里出现了空的 key:value，客户端修复后，问题消失。以下是抓包，见 Content-Type 和 Accept 中间的一行
14:08:37.918970 IP 10.18.19.98.51604 > lb008-dev.http: Flags [P.], seq 1:489, ack 1, win 229, options [nop,nop,TS val 1596856343 ecr 1593089157], length 488: HTTP: POST /ws/rs/domain/domain/init HTTP/1.1
E….J@.?…
..b
..7…P. .].*……+\…..
_…^…POST /ws/rs/domain/domain/init HTTP/1.1
Content-Type: application/json
:
Accept: application/json
api-uuid: 02ac3ebe-f212-4ca8-998e-4a4ab576018c
api-control-request-type: ANONYMOUS
User-Agent: Apache CXF 3.1.4
Cache-Control: no-cache
Pragma: no-cache
Host: uniauthserver-dev
Connection: keep-alive
Content-Length: 407
解决：修复上面 : k,v 都是空的行

3. 如果想要使用 jaeger 进行分布式 tracing，可以参考 https://istio.io/zh/docs/tasks/telemetry/distributed-tracing/overview/

4. kiali 出现 unknown 的调用链：没有通过 service mesh 的调用，会显示 unknown

5. k8snode kernel 版本问题
kernel版本过低会导致docker报错，kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1
会导致系统cpu占用过高，docker容器都会卡住。
Observed kernel versions with this issue
RHEL7 3.10.0-862
4.15.0
4.20.0
Kernel versions claimed not triggering this issue
RHEL7 3.10.0-957.10.1
4.19.12
4.17.0
4.17.11
Related kernel commits
torvalds/linux@f186ce6 – since 4.12
torvalds/linux@4ee806d – since 4.15
torvalds/linux@ee60ad2 – since 5.1

另一个表现为 kubectl get pods –all-namespace -o wide 发现 pods 长时间一直 Terminating，删不掉

解决： yum update ( 升级 kernel 和操作系统至最新版 kernel 3.10.0-957.21.3.el7)

6. 请求的 url 出现 no healthy upstream( http 503 错误) 检查是否发布成功

7. 请求 url 出现 404 （业务发布是成功的) ，检查 k8s 内部的 virtual service 和 ingress gateway 是否配置正确

8. node 程序因为 k8s 注入的环境变量太多（k8s服务发现机制)，导致 node process.env 长度太长，报错启动失败。

目前遇到问题的有 frontend-main, market-solution-activity-web。还没找到不改程序的解决办法。改程序的解决办法是只取自己需的 process.env https://zhuanlan.zhihu.com/p/74056339

[2019-07-30 16:54:13] PM2 error: Trace: { Error: spawn E2BIG
at exports._errnoException (util.js:1024:11)
at ChildProcess.spawn (internal/child_process.js:325:11)
at exports.spawn (child_process.js:493:9)
at exports.fork (child_process.js:99:10)
at createWorkerProcess (internal/cluster/master.js:127:10)
at EventEmitter.cluster.fork (internal/cluster/master.js:161:25)
at Object.nodeApp (/opt/nodeapp/node_modules/pm2/lib/God/ClusterMode.js:52:21)
at Object.executeApp (/opt/nodeapp/node_modules/pm2/lib/God.js:159:9)
at inject (/opt/nodeapp/node_modules/pm2/lib/God.js:418:18)
at Object.injectVariables (/opt/nodeapp/node_modules/pm2/lib/God.js:530:10) code: ‘E2BIG’, errno: ‘E2BIG’, syscall: ‘spawn’ }
at Object.God.logAndGenerateError (/opt/nodeapp/node_modules/pm2/lib/God/Methods.js:36:15)
at Object.nodeApp (/opt/nodeapp/node_modules/pm2/lib/God/ClusterMode.js:54:11)
at Object.executeApp (/opt/nodeapp/node_modules/pm2/lib/God.js:159:9)
at inject (/opt/nodeapp/node_modules/pm2/lib/God.js:418:18)
at Object.injectVariables (/opt/nodeapp/node_modules/pm2/lib/God.js:530:10)
at /opt/nodeapp/node_modules/pm2/lib/God.js:416:9
at /opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1135:9
at replenish (/opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1011:17)
at /opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1016:9
at _asyncMap (/opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1133:5)
[2019-07-30 16:54:13] PM2 error: spawn E2BIG

9. 用 flannel + host-gw 阿里云不支持自己定义的 route ，否则需要手动添加路由，换为 vxlan

[[email protected] kubespray]# ansible all -i inventory/k8s_prod_aliyun-cn-shanghai-b_006/inventory.ini -m shell -a “ping -c 3 10.36.3.4”
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
k8snode034-prod.aliyun-cn-shanghai-b | CHANGED | rc=0 >>
PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
64 bytes from 10.36.3.4: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.36.3.4: icmp_seq=2 ttl=64 time=0.068 ms
64 bytes from 10.36.3.4: icmp_seq=3 ttl=64 time=0.067 ms
— 10.36.3.4 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.066/0.067/0.068/0.000 ms
k8smaster016-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
— 10.36.3.4 ping statistics —
3 packets transmitted, 0 received, 100% packet loss, time 2000msnon-zero return code
k8smaster015-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
— 10.36.3.4 ping statistics —
3 packets transmitted, 0 received, 100% packet loss, time 1999msnon-zero return code
k8smaster014-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
— 10.36.3.4 ping statistics —
3 packets transmitted, 0 received, 100% packet loss, time 2000msnon-zero return code

10. 有的应用需要自己拨 vpn 连到其他网络，有状态，不能接入

11. k8s里面，java应用通过 Runtime.getRuntime().availableProcessors()拿到的核数为1，这样使用这个设置线程池的大小会变成1，按照之前docker的情况应该市返回宿主机核数

12. 有的暂时不接入 k8s ，暴露端口为 tcp (非 http )，发布系统生成的 istio 配置均为 http，后续考虑

13. pod STATUS CreateContainerConfigError

2020 年 3 月 8 日
kubespray 使用遇到的问题

1 问题：

FAILED! => {“changed”: false, “module_stderr”: “sudo: sorry, you must have a tty to run sudo\n”, “module_stdout”: “”, “msg”: “MODULE FAILURE\nSee stdout/stderr for the exact error”, “rc”: 1}

解决 visudo，将 Defaults requiretty 注释掉

2.

FAILED! => {“changed”: false, “msg”: “Failed to reload sysctl: vm.max_map_count = 262144\nnet.ipv4.ip_forward = 1\nsysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory\nsysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n”}
解决:

modprobe br_netfilter

2019 年 8 月 9 日
kubectl kubectx kubens 快速切换集群和 namespace
集群和 namespace 信息配置在
```
 ~/.kube/config
```
查看当前的 context
```
kubectl config current-context
```
切换 namespace
```
alias kcd='kubectl config set-context $(kubectl config current-context) --namespace' 
```
切换集群
```
kubectl config use-context dev-admin@dev
```
使用工具进行快速切换

https://github.com/ahmetb/kubectx
```
brew install kubectx
```
查看 config 配置
```
kubectl config view
```
2019 年 7 月 20 日

k8s 停用一个 deployment statefulset

将节点 “扩”成 0

kubectl scale --replicas=0 deployment contract-service -n dev  只能把节点扩成 0

statefulset

kubectl scale --replicas=0 statefulset kafka -n dev

2019 年 4 月 23 日

Category: Kubernetes

helm

k8s 容器内访问 apiserver

接入 k8s 遇到的问题

kubespray 使用遇到的问题

kubectl kubectx kubens 快速切换集群和 namespace

k8s 停用一个 deployment statefulset