[TOC]
roadmap
- pprof 发展过程:pprof --> http/pprof --> uber go-torch --> google/pprof 整合go-torch,并提供web界面分析
- pprof web ui 已经集成 top, svg, list, disassemble, regex search, go-torch。最方便使用
使用指南
步骤:
- 程序开启pprof,收集数据
- 拉取go程序profile数据(如果是http/pprof方式)
- 分析go程序性能
程序开启pprof的几种常见方式
- 最常用的使用方式是http/pprof,参考http/pprof/
,大概步骤如下:-
import _ "net/http/pprof"
,原理:用http内置的DefaultServeMux,注册了http://host:port/debug/pprof/xxx
路由 -
http.ListenAndServe("localhost:6060", nil)
上个步骤提供的路由,开启端口提供访问
-
- 结合go test 开启pprof,参考:runtime/pprof
。命令:go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
- 程序内代码实现指定开启profile并输出,如下
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")
var memprofile = flag.String("memprofile", "", "write memory profile to `file`")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal("could not create CPU profile: ", err)
}
defer f.Close() // error handling omitted for example
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile()
}
// ... rest of the program ...
if *memprofile != "" {
f, err := os.Create(*memprofile)
if err != nil {
log.Fatal("could not create memory profile: ", err)
}
defer f.Close() // error handling omitted for example
runtime.GC() // get up-to-date statistics
if err := pprof.WriteHeapProfile(f); err != nil {
log.Fatal("could not write memory profile: ", err)
}
}
}
拉取go程序profile数据
参考:https://golang.org/pkg/net/http/pprof/
如果是使用go test与程序内输出指定的profile文件,则忽略此步骤。
如果是通过http/pprof方式开启的profile,则需要通过http方式拉取profile数据。
拉取命令执行后会阻塞采集数据,采集结束后会生成profile文件。
拉取命令:
- 获取cpuprofile:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
- 获取memprofile:
go tool pprof http://localhost:6060/debug/pprof/heap
- 查看可获取的profile类型,浏览器打开:
http://localhost:6060/debug/pprof
分析go程序性能
几种方式:
- web分析(推荐方式):简单,直观,有🔥图
- pprof 命令行交互工具分析
web 分析
执行命令:pprof -http=:8080 pprof.xxx.samples.cpu.001.pb.gz
调用图:
火焰图:
命令行交互工具分析
example如下:
pprof [binary] pprof.out
(pprof) top10
Total: 2525 samples
flat flat% sum% cum cum%
298 11.8% 11.8% 345 13.7% runtime.mapaccess1_fast64
268 10.6% 22.4% 2124 84.1% main.FindLoops
251 9.9% 32.4% 451 17.9% scanblock
178 7.0% 39.4% 351 13.9% hash_insert
131 5.2% 44.6% 158 6.3% sweepspan
119 4.7% 49.3% 350 13.9% main.DFS
96 3.8% 53.1% 98 3.9% flushptrbuf
95 3.8% 56.9% 95 3.8% runtime.aeshash64
95 3.8% 60.6% 101 4.0% runtime.settype_flush
88 3.5% 64.1% 988 39.1% runtime.mallocgc
- 采样策略:停止所有goroutine,采集数据。默认只收集100个frames层级的调用栈信息
- 第一列:采集采样点时落在函数中的个数。
- 第二列:比例= 第一列 / total samples. eg 298/2525 = 11.8%
- 第三列:第二列的累积和,用于直观看出占比的百分比。eg: 11.8% + 10.6% = 22.4%
- 第四列:收集的采样点中该函数出现的个数(出现的意思就是,可能刚好落在这个函数的执行部分,或者落在该函数内调用的子函数里)
- The fourth and fifth columns show the number of samples in which the function appeared (either running or waiting for a called function to return)
- Each location contains two values: flat is the value of the location itself, while cum is the value of the location plus all its descendants.
- 第五列:第四列占总samples的纠比例,345/2525 = 13.7%
list & weblist
(pprof) list DFS
Total: 2525 samples
ROUTINE ====================== main.DFS in /home/rsc/g/benchgraffiti/havlak/havlak1.go
119 697 Total samples (flat / cumulative)
3 3 240: func DFS(currentNode *BasicBlock, nodes []*UnionFindNode, number map[*BasicBlock]int, last []int, current int) int {
1 1 241: nodes[current].Init(currentNode, current)
1 37 242: number[currentNode] = current
. . 243:
1 1 244: lastid := current
89 89 245: for _, target := range currentNode.OutEdges {
9 152 246: if number[target] == unvisited {
7 354 247: lastid = DFS(target, nodes, number, last, lastid+1)
. . 248: }
. . 249: }
7 59 250: last[number[currentNode]] = lastid
1 1 251: return lastid
- 第一列:采样点落在该行的个数
- 第二列:采样点落在该行的个数+ 采样点落在被该行代码调用的函数中的个数(值越大,开销越大,一般只关注这列)。以247行为例:
- 7:采样点落在该行9个
- 354:7 + 347(采样点落在DF函数内247个)
- 第三列:代码行数
相关命令
go tool pprof binary profile_file(cpu/memory/...)
topN
-
top5 -cum
: -cum (for cumulative) -
go tool pprof --inuse_objects
显示内存申请的次数-inuse_space Display in-use memory size
-inuse_objects Display in-use object counts
-alloc_space Display allocated memory size
-alloc_objects Display allocated object counts
- web [func]
- list func 命令行查看每行代码的详细分析
- weblist func web查看每行代码的详细分析
- web 命令,查看svg
-
go tool pprof --nodefraction=0.1 havlak4 havlak4.prof
, 过虑低的数据, 使用--nodefraction
, 查看超过10% - 生成pdf:
go tool pprof -pdf bench.test cpu.out > cpu.pdf
- 对比工具:
benchcmp bench0 bench1
. (install: go get golang.org/x/tools/cmd/benchcmp)
demo
与benchmark 的结合使用
- 写测试用例
- benchmark 测试,指定生成相关的profile。 eg:
-
go test -v -run="none" -bench=. -benchtime="30s" -cpuprofile=cpu.pprof -memprofile=mem.pprof
, 会生成cpu.pprof, mem.pprof 与xxx.test
- 分析pprof文件:
go tool pprof xxx.test(binary) mem.pprof(pprof source file)
http/pprof的使用
import _ "net/http/pprof"
- http pprof 默认使用的是
http.DefaultMux
, 设置监听端口
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
- 拉取pprof分析:
cpu: go tool pprof http://localhost:6060/debug/pprof/profile?second=60
heap: go tool pprof http://localhost:6060/debug/pprof/heap?second=60
...
- 使用相关命令分析pprof
pprof web ui
- 拉取profile:
go tool pprof http://localhost:6060/debug/pprof/profile?second=60
- 启动pprof:
pprof -http 0.0.0.0:7070 binaryfile xxx.pb.gz(profile_file)
- 访问web界面,查看相关指标:top, svg, graph, flame graph, list, disassemble
或者
- 拉取并启动服务:
pprof -seconds 60 -http me:6060 http://ts1:32144/debug/pprof/profile
参考
- https://golang.org/doc/diagnostics.html#profiling
- https://blog.golang.org/profiling-go-programs
- http/pprof: https://golang.org/pkg/net/http/pprof/
- pprof: https://github.com/google/pprof
- https://medium.com/@hackintoshrao/daily-code-optimization-using-benchmarks-and-profiling-in-golang-gophercon-india-2016-talk-874c8b4dc3c5
- https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/
- https://rakyll.org/archive/