本文是站在巨人的肩膀上的一个实践记录,方便不熟悉的同学快速上手,思路实现均来自boomer, 向作者致敬!
前置条件
- 需要对prometheus(数据存储和指标采集),prometheus_client(指标采集的agent),grafana(可视化前端),Locust(发压框架)有一定的了解;
- 搭建一个Prometheus, Grafana的服务器, 安装python的Locust到你的测试环境;
- 在Grafana中添加Prometheus数据源;
Locust相关配置
复制boomer仓库根目录下的prometheus_exporter.py
文件,进行如下修改(非必须,这里是为了减少后面的配置复杂度):
-
locust_init
函数里@environment.web_ui.app.route("/export/prometheus")
改成默认的抓取路径/metrics
; - 单机模式的话注释掉原本
class Dummy
,然后正常编写自己的测试用例;
修改后的文件prometheus_locust.py
:
# coding: utf8
import six
from itertools import chain
from flask import request, Response
from locust import stats as locust_stats, runners as locust_runners
from locust import HttpUser, User, task, events
from prometheus_client import Metric, REGISTRY, exposition
# This locustfile adds an external web endpoint to the locust master, and makes it serve as a prometheus exporter.
# Runs it as a normal locustfile, then points prometheus to it.
# locust -f prometheus_exporter.py --master
# Lots of code taken from [mbolek's locust_exporter](https://github.com/mbolek/locust_exporter), thx mbolek!
class LocustCollector(object):
registry = REGISTRY
def __init__(self, environment, runner):
self.environment = environment
self.runner = runner
def collect(self):
# collect metrics only when locust runner is spawning or running.
runner = self.runner
if runner and runner.state in (locust_runners.STATE_SPAWNING, locust_runners.STATE_RUNNING):
stats = []
for s in chain(locust_stats.sort_stats(runner.stats.entries), [runner.stats.total]):
stats.append({
"method": s.method,
"name": s.name,
"num_requests": s.num_requests,
"num_failures": s.num_failures,
"avg_response_time": s.avg_response_time,
"min_response_time": s.min_response_time or 0,
"max_response_time": s.max_response_time,
"current_rps": s.current_rps,
"median_response_time": s.median_response_time,
"ninetieth_response_time": s.get_response_time_percentile(0.9),
# only total stats can use current_response_time, so sad.
#"current_response_time_percentile_95": s.get_current_response_time_percentile(0.95),
"avg_content_length": s.avg_content_length,
"current_fail_per_sec": s.current_fail_per_sec
})
# perhaps StatsError.parse_error in e.to_dict only works in python slave, take notices!
errors = [e.to_dict() for e in six.itervalues(runner.stats.errors)]
metric = Metric('locust_user_count', 'Swarmed users', 'gauge')
metric.add_sample('locust_user_count', value=runner.user_count, labels={})
yield metric
metric = Metric('locust_errors', 'Locust requests errors', 'gauge')
for err in errors:
metric.add_sample('locust_errors', value=err['occurrences'],
labels={'path': err['name'], 'method': err['method'],
'error': err['error']})
yield metric
is_distributed = isinstance(runner, locust_runners.MasterRunner)
if is_distributed:
metric = Metric('locust_slave_count', 'Locust number of slaves', 'gauge')
metric.add_sample('locust_slave_count', value=len(runner.clients.values()), labels={})
yield metric
metric = Metric('locust_fail_ratio', 'Locust failure ratio', 'gauge')
metric.add_sample('locust_fail_ratio', value=runner.stats.total.fail_ratio, labels={})
yield metric
metric = Metric('locust_state', 'State of the locust swarm', 'gauge')
metric.add_sample('locust_state', value=1, labels={'state': runner.state})
yield metric
stats_metrics = ['avg_content_length', 'avg_response_time', 'current_rps', 'current_fail_per_sec',
'max_response_time', 'ninetieth_response_time', 'median_response_time', 'min_response_time',
'num_failures', 'num_requests']
for mtr in stats_metrics:
mtype = 'gauge'
if mtr in ['num_requests', 'num_failures']:
mtype = 'counter'
metric = Metric('locust_stats_' + mtr, 'Locust stats ' + mtr, mtype)
for stat in stats:
# Aggregated stat's method label is None, so name it as Aggregated
# locust has changed name Total to Aggregated since 0.12.1
if 'Aggregated' != stat['name']:
metric.add_sample('locust_stats_' + mtr, value=stat[mtr],
labels={'path': stat['name'], 'method': stat['method']})
else:
metric.add_sample('locust_stats_' + mtr, value=stat[mtr],
labels={'path': stat['name'], 'method': 'Aggregated'})
yield metric
@events.init.add_listener
def locust_init(environment, runner, **kwargs):
print("locust init event received")
if environment.web_ui and runner:
# 这里改下默认路径为/metrics,因为prometheus指标接口的默认路径就是这个
@environment.web_ui.app.route("/metrics")
def prometheus_exporter():
registry = REGISTRY
encoder, content_type = exposition.choose_encoder(request.headers.get('Accept'))
if 'name[]' in request.args:
registry = REGISTRY.restricted_registry(request.args.get('name[]'))
body = encoder(registry)
return Response(body, content_type=content_type)
REGISTRY.register(LocustCollector(environment, runner))
# class Dummy(User):
# @task(20)
# def hello(self):
# pass
# 如果不用boomer的话,单机模式可以从这里开始正常写自己的用例
class HelloWorldUser(HttpUser):
@task
def hello_world(self):
self.client.get("/greet")
执行命令:
python3 -m locust -f prometheus_locust.py
之后正常的去web上执行测试,执行过程中可以查看对应路径已经可以看到对应的指标了,到这一步locust侧的配置就完成了:
$ curl localhost:8089/metrics
...
locust_stats_current_rps{method="GET",path="/greet"} 1371.3
locust_stats_current_rps{method="Aggregated",path="Aggregated"} 1371.3
# HELP locust_stats_current_fail_per_sec Locust stats current_fail_per_sec
# TYPE locust_stats_current_fail_per_sec gauge
locust_stats_current_fail_per_sec{method="GET",path="/greet"} 0.0
locust_stats_current_fail_per_sec{method="Aggregated",path="Aggregated"} 0.0
# HELP locust_stats_max_response_time Locust stats max_response_time
# TYPE locust_stats_max_response_time gauge
locust_stats_max_response_time{method="GET",path="/greet"} 30.07530327886343
locust_stats_max_response_time{method="Aggregated",path="Aggregated"} 30.07530327886343
# HELP locust_stats_ninetieth_response_time Locust stats ninetieth_response_time
# TYPE locust_stats_ninetieth_response_time gauge
locust_stats_ninetieth_response_time{method="GET",path="/greet"} 8.0
locust_stats_ninetieth_response_time{method="Aggregated",path="Aggregated"} 8.0
# HELP locust_stats_median_response_time Locust stats median_response_time
# TYPE locust_stats_median_response_time gauge
locust_stats_median_response_time{method="GET",path="/greet"} 6.0
locust_stats_median_response_time{method="Aggregated",path="Aggregated"} 6.0
# HELP locust_stats_min_response_time Locust stats min_response_time
# TYPE locust_stats_min_response_time gauge
locust_stats_min_response_time{method="GET",path="/greet"} 1.2684203684329987
locust_stats_min_response_time{method="Aggregated",path="Aggregated"} 1.2684203684329987
# HELP locust_stats_num_failures_total Locust stats num_failures
# TYPE locust_stats_num_failures_total counter
locust_stats_num_failures{method="GET",path="/greet"} 0.0
locust_stats_num_failures{method="Aggregated",path="Aggregated"} 0.0
# HELP locust_stats_num_requests_total Locust stats num_requests
# TYPE locust_stats_num_requests_total counter
locust_stats_num_requests{method="GET",path="/greet"} 49992.0
locust_stats_num_requests{method="Aggregated",path="Aggregated"} 49992.0
这里注意如果没有运行对应的locust任务,locust相关指标可能是获取不到的
Prometheus配置
prometheus.yml文件的scrape_configs部分新增locust相关的job配置,比如:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "locust"
static_configs:
- targets: ["host:8089"]
注意把里面的host:8089替换成实际的locust web访问地址
之后重启一下prometheus进程,可以去 prometheus web服务的targets页面看下能否正常抓取到我们新增的locust指标,比如去这个地址下看下locust job是否正常工作:http://localhost:9090/targets
Grafana配置
搭建grafana后,添加上述的prometheus数据源,之后直接导入社区的dashboard模板即可,ID是12081
最终效果对比:
image.png