Think before you speak, read before you think.

linux 负载高排查

by

in

机器 load avg 高

首先看一下什么是 load avg

man proc
The first three fields in this file are load average figures giving the number of jobs  in  the  run  queue
              (state  R)  or waiting for disk I/O (state D) averaged over 1, 5, and 15 minutes.  They are the same as the
              load average numbers given by uptime(1) and other programs.  The fourth field consists of two numbers sepa‐
              rated  by  a  slash (/).  The first of these is the number of currently runnable kernel scheduling entities
              (processes, threads).  The value after the slash is the number of kernel scheduling entities that currently
              exist  on the system.  The fifth field is the PID of the process that was most recently created on the sys‐
              tem.

查看 load avg

cat /proc/loadavg 
3.33 4.76 5.55 2/505 8687

查看队列

sar -q 1

查看 cpu time 来说,如果 system time 不高,我们认为系统层面没有问题,更多的关注 user , nice, 如果有 io 问题关注 iowait. 如果 system time 高,需要解决。

strace 是输出 system call 的工具,-c 可以进行一段时间的统计,之后按 ctrl + c 停止

strace -c -p 6615

打印出来会按 cpu time 占比来看

uptime
dimes -T | tail
vmstat 1
mpstatl -P ALL 1
pidstat 1
instate -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top

SREcon 2016 Performance Checklists for SREs from Brendan Gregg

http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html

http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *