Blog

  • 接入 k8s 遇到的问题

    从传统容器接入 k8s 过程中遇到的问题:

    1. 某 sdk 要升级(低版本的会导致 istio 容器挂掉)
    报错为 Caused by: java.io.IOException: Cannot bind to URL [rmi:///jmxrmi]: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is

    2. 客户端 HTTP 对外请求被 envoy rule deny 400( bad request ), 原因 HTTP header 里出现了空的 key:value,客户端修复后,问题消失。以下是抓包,见 Content-Type 和 Accept 中间的一行
    14:08:37.918970 IP 10.18.19.98.51604 > lb008-dev.http: Flags [P.], seq 1:489, ack 1, win 229, options [nop,nop,TS val 1596856343 ecr 1593089157], length 488: HTTP: POST /ws/rs/domain/domain/init HTTP/1.1
    E….J@.?…
    ..b
    ..7…P. .].*……+\…..
    _…^…POST /ws/rs/domain/domain/init HTTP/1.1
    Content-Type: application/json
    :
    Accept: application/json
    api-uuid: 02ac3ebe-f212-4ca8-998e-4a4ab576018c
    api-control-request-type: ANONYMOUS
    User-Agent: Apache CXF 3.1.4
    Cache-Control: no-cache
    Pragma: no-cache
    Host: uniauthserver-dev
    Connection: keep-alive
    Content-Length: 407
    解决:修复上面 : k,v 都是空的行

    3. 如果想要使用 jaeger 进行分布式 tracing,可以参考 https://istio.io/zh/docs/tasks/telemetry/distributed-tracing/overview/

    4. kiali 出现 unknown 的调用链 :没有通过 service mesh 的调用,会显示 unknown

    5. k8snode kernel 版本问题
    kernel版本过低会导致docker报错,kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1
    会导致系统cpu占用过高,docker容器都会卡住。
    Observed kernel versions with this issue
    RHEL7 3.10.0-862
    4.15.0
    4.20.0
    Kernel versions claimed not triggering this issue
    RHEL7 3.10.0-957.10.1
    4.19.12
    4.17.0
    4.17.11
    Related kernel commits
    torvalds/linux@f186ce6 – since 4.12
    torvalds/linux@4ee806d – since 4.15
    torvalds/linux@ee60ad2 – since 5.1

    另一个表现为 kubectl get pods –all-namespace -o wide 发现 pods 长时间一直 Terminating,删不掉

    解决: yum update ( 升级 kernel 和操作系统至最新版 kernel 3.10.0-957.21.3.el7)

    6. 请求的 url 出现 no healthy upstream( http 503 错误) 检查是否发布成功

    7. 请求 url 出现 404 (业务发布是成功的) ,检查 k8s 内部的 virtual service 和 ingress gateway 是否配置正确

    8. node 程序因为 k8s 注入的环境变量太多(k8s服务发现机制),导致 node process.env 长度太长,报错启动失败 。

    目前遇到问题的有 frontend-main, market-solution-activity-web。还没找到不改程序的解决办法。改程序的解决办法是只取自己需的 process.env https://zhuanlan.zhihu.com/p/74056339

    [2019-07-30 16:54:13] PM2 error: Trace: { Error: spawn E2BIG
    at exports._errnoException (util.js:1024:11)
    at ChildProcess.spawn (internal/child_process.js:325:11)
    at exports.spawn (child_process.js:493:9)
    at exports.fork (child_process.js:99:10)
    at createWorkerProcess (internal/cluster/master.js:127:10)
    at EventEmitter.cluster.fork (internal/cluster/master.js:161:25)
    at Object.nodeApp (/opt/nodeapp/node_modules/pm2/lib/God/ClusterMode.js:52:21)
    at Object.executeApp (/opt/nodeapp/node_modules/pm2/lib/God.js:159:9)
    at inject (/opt/nodeapp/node_modules/pm2/lib/God.js:418:18)
    at Object.injectVariables (/opt/nodeapp/node_modules/pm2/lib/God.js:530:10) code: ‘E2BIG’, errno: ‘E2BIG’, syscall: ‘spawn’ }
    at Object.God.logAndGenerateError (/opt/nodeapp/node_modules/pm2/lib/God/Methods.js:36:15)
    at Object.nodeApp (/opt/nodeapp/node_modules/pm2/lib/God/ClusterMode.js:54:11)
    at Object.executeApp (/opt/nodeapp/node_modules/pm2/lib/God.js:159:9)
    at inject (/opt/nodeapp/node_modules/pm2/lib/God.js:418:18)
    at Object.injectVariables (/opt/nodeapp/node_modules/pm2/lib/God.js:530:10)
    at /opt/nodeapp/node_modules/pm2/lib/God.js:416:9
    at /opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1135:9
    at replenish (/opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1011:17)
    at /opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1016:9
    at _asyncMap (/opt/nodeapp/node_modules/pm2/node_modules/async/dist/async.js:1133:5)
    [2019-07-30 16:54:13] PM2 error: spawn E2BIG

    9. 用 flannel + host-gw 阿里云不支持自己定义的 route ,否则需要手动添加路由,换为 vxlan

    [[email protected] kubespray]# ansible all -i inventory/k8s_prod_aliyun-cn-shanghai-b_006/inventory.ini -m shell -a “ping -c 3 10.36.3.4”
    [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
    k8snode034-prod.aliyun-cn-shanghai-b | CHANGED | rc=0 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    64 bytes from 10.36.3.4: icmp_seq=1 ttl=64 time=0.066 ms
    64 bytes from 10.36.3.4: icmp_seq=2 ttl=64 time=0.068 ms
    64 bytes from 10.36.3.4: icmp_seq=3 ttl=64 time=0.067 ms
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 3 received, 0% packet loss, time 1999ms
    rtt min/avg/max/mdev = 0.066/0.067/0.068/0.000 ms
    k8smaster016-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 0 received, 100% packet loss, time 2000msnon-zero return code
    k8smaster015-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 0 received, 100% packet loss, time 1999msnon-zero return code
    k8smaster014-prod.aliyun-cn-shanghai-b | FAILED | rc=1 >>
    PING 10.36.3.4 (10.36.3.4) 56(84) bytes of data.
    — 10.36.3.4 ping statistics —
    3 packets transmitted, 0 received, 100% packet loss, time 2000msnon-zero return code

    10. 有的应用需要自己拨 vpn 连到其他网络,有状态,不能接入

    11. k8s里面,java应用通过 Runtime.getRuntime().availableProcessors()拿到的核数为1,这样使用这个设置线程池的大小会变成1,按照之前docker的情况应该市返回宿主机核数

    12. 有的暂时不接入 k8s ,暴露端口为 tcp (非 http ),发布系统生成的 istio 配置均为 http,后续考虑

    13. pod STATUS CreateContainerConfigError

  • 曾经特别期盼的东西 在后面会轻易得到

    当你需要它时,它会离你很近,放大到这世界上只有这个东西
    但其实随着时间推移,社会发展,再加上运气,你会轻易得到。所以 be patient,不要看东西看的那么重。

    ssl 证书(之前花很长时间去找免费的,现在云服务上点一下申请即可)
    花几年时间拍沪牌 其实即使没有新能源也可以考虑
    户口
    携号转网
    各种手机流量套餐(N年前5块钱30M流量)

  • 阿里云挂载本地路径使用

    oss 里授权用户

    /usr/local/bin/ossfs data-backup /oss -ourl=http://oss-cn-shanghai.aliyuncs.com -o allow_other

  • zsh .zprofile

    从 macOS Catalina 开始,zsh 成为了默认 shell

    https://support.apple.com/en-us/HT208050

    .zprofile 相当于 .bash_profile
    .zshrc 相当于 .bashrc


    fix error
    zsh compinit: insecure directories, run compaudit for list.
    Ignore insecure directories and continue [y] or abort compinit [n]? ccompinit: initialization aborted
    complete:13: command not found: compdef
    complete:13: command not found: compdef
    complete:13: command not found: compdef
    检查$ compaudit
    There are insecure directories:
    /usr/local/share/zsh/site-functions
    /usr/local/share/zsh

    chown -R "$(whoami)" /usr/local/share/zsh/site-functions /usr/local/share/zsh
    chmod 755 /usr/local/share/zsh/site-functions /usr/local/share/zsh
    


    2023-03-16 fix error

    Last login: Thu Mar 16 14:35:14 on ttys010
    [9]    47765 illegal hardware instruction  sed --version 2>&1 | 
           47766 exit 1                        grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox} -q GNU
    

    fixed by updating file `oh-my-zsh.sh`

    omz update
    

    preferred zsh themes

    ZSH_THEME="random"                                                                         
    # Set list of themes to pick from when loading at random                        
    # Setting this variable when ZSH_THEME=random will cause zsh to load            
    # a theme from this variable instead of looking in ~/.oh-my-zsh/themes/         
    # If set to an empty array, this variable will have no effect.                  
    ZSH_THEME_RANDOM_CANDIDATES=(rkj-repos cloud)
    
  • 2020读书清单

    目标30本书,单线程模式,切记不要并发,不要同时读多本书

    月份 技术书 非技术书
    2020-02-02 云原生服务网格 Istio  
    2020-03-02   寂静的春天
    2020-05-11   你当像鸟飞往你的山
    2020-05-25   无缘社会
    2020-06-10   呐喊
    2020-06-20   坏小孩
    2020-07-01   无证之罪
    2020-09-01   当呼吸化为空气
    2020-10-15   褚时健传
    2020-12-20   软技能

     

  • ansible 不 check host key

    初始化的时候,要批量用密码登一遍执行初始化,不想 check host key

    ANSIBLE_HOST_KEY_CHECKING=False ansible all -i my_inventory -m ping -k