web性能监控及采集方式

也许你有听过一个问题，你这款 web 应用性能怎么样呀？你会回答什么呢？是否会优于海量 web 应用市场呢？本文就来整理下如何进行 web 性能监控？包括我们需要监控的指标、监控的分类、performance 分析以及如何监控。

但是，如何进行 web 性能监控本身是一个很大的话题，文中只会侧重一部分进行研究，某些内容不是很全面。

前言：为什么需要监控？

web 的性能一定程度上影响了用户留存率，Google DoubleClick 研究表明：如果一个移动端页面加载时长超过 3 秒，用户就会放弃而离开。BBC 发现网页加载时长每增加 1 秒，用户就会流失 10%。

我们希望通过监控来知道 web 应用性能的现状和趋势，找到 web 应用的瓶颈？某次发布后的性能情况怎么样？是否发布后对性能有影响？感知到业务出错的概率？业务的稳定性怎么样？

监控什么？

首先我们需要知道应该监控些什么呢？有哪些具体的指标？

google 开发者提出了一种 RAIL 模型来衡量应用性能，即：Response、Animation、Idle、Load，分别代表着 web 应用生命周期的四个不同方面。并指出最好的性能指标是：100ms 内响应用户输入；动画或者滚动需在 10ms 内产生下一帧；最大化空闲时间；页面加载时长不超过 5 秒。

image

我们可转化为三个方面来看：响应速度、页面稳定性、外部服务调用

响应速度：页面初始访问速度 + 交互响应速度
页面稳定性：页面出错率
外部服务调用：网络请求访问速度

1.页面访问速度：白屏、首屏时间、可交互时间

我们来看看 google 开发者针对用户体验，提出的几个性能指标

image

这几个指标其实都是根据用户体验，提炼出对应的性能指标

image

1）first paint (FP) and first contentful paint (FCP)

首次渲染、首次有内容的渲染

这两个指标浏览器已经标准化了，从 performance 的 The Paint Timing API 可以获取到，一般来说两个时间相同，但也有情况下两者不同。

image

2）First meaningful paint and hero element timing

首次有意义的渲染、页面关键元素

我们假设当一个网页的 DOM 结构发生剧烈的变化的时候，就是这个网页主要内容出现的时候，那么在这样的一个时间点上，就是首次有意义的渲染。这个指标浏览器还没有规范，毕竟很难统一一个标准来定义网站的主体内容。

google lighthouse 定义的 first meaningful paint：https://docs.google.com/document/d/1BR94tJdZLsin5poeet0XoTW60M0SjvOJQttKT-JK8HI/view

3）Time to interactive

可交互时间

4）长任务

浏览器是单线程的，如果长任务过多，那必然会影响着用户响应时长。好的应用需要最大化空闲时间，以保证能最快响应用户的输入。

image

2.页面稳定性：页面出错情况

资源加载错误
JS 执行报错

3.外部服务调用

CGI 耗时
CGI 成功率
CDN 资源耗时

监控的分类？

web 性能监控可分为两类，一类是合成监控（Synthetic Monitoring，SYN），另一类是真实用户监控（Real User Monitoring，RUM）

合成监控

合成监控是采用 web 浏览器模拟器来加载网页，通过模拟终端用户可能的操作来采集对应的性能指标，最后输出一个网站性能报告。例如：Lighthouse、PageSpeed、WebPageTest、Pingdom、PhantomJS 等。

1. Lighthouse

Lighthouse 是 google 一个开源的自动化工具，运行 Lighthouse 的方式有两种：一种是作为 Chrome 扩展程序运行；另一种作为命令行工具运行。Chrome 扩展程序提供了一个对用户更友好的界面，方便读取报告。通过命令行工具可以将 Lighthouse 集成到持续集成系统。

展示了白屏、首屏、可交互时间等性能指标和 SEO、PWA 等。

腾讯文档移动端官网首页测速结果：

image

2. PageSpeed

https://developers.google.com/speed/pagespeed/insights/

不仅展示了一些主要的性能指标数据，还给出了部分性能优化建议。

腾讯文档移动端首页测速结果和性能优化建议：

image

3. WebPageTest

WebPageTest

给出性能测速结果和资源加载的瀑布图。

image

4. Pingdom

https://www.pingdom.com/

注意：Pingdom 不仅提供合成监控，也提供真实用户监控。

image

合成监控方式的优缺点：

优点：

无侵入性。
简单快捷。缺点：
不是真实的用户访问情况，只是模拟的。
没法考虑到登录的情况，对于需要登录的页面就无法监控到。

二、真实用户监控

真实用户监控是一种被动监控技术，是一种应用服务，被监控的 web 应用通过 sdk 等方式接入该服务，将真实的用户访问、交互等性能指标数据收集上报、通过数据清洗加工后形成性能分析报表。例如 FrontJs、oneapm、Datadog 等。

image

1. oneapm

https://www.oneapm.com/bi/feature.html

功能包括：大盘数据、特征统计、慢加载追踪、访问页面、脚本错误、AJAX、组合分析、报表、告警等。

image

2. Datadog

https://www.datadoghq.com/rum/

image

3. FrontJs

https://www.frontjs.com/

功能包括：访问性能、异常监控、报表、趋势等。

image

这种监控方式的优缺点：

优点：

是真实用户访问情况。
可以观察历史性能趋势。
有一些额外的功能：报表推送、监控告警等等。缺点：
有侵入性，会一定程度上响应 web 性能。

performance 分析

在讲如何监控之前，先来看看浏览器提供的 performance api，这也是性能监控数据的主要来源。

performance 提供高精度的时间戳，精度可达纳秒级别，且不会随操作系统时间设置的影响。

目前市场上的支持情况：主流浏览器都支持，大可放心使用。

image

基本属性

performance.navigation: 页面是加载还是刷新、发生了多少次重定向

image

performance.timing: 页面加载的各阶段时长

image

各阶段的含义：

image

performance.memory：基本内存使用情况，Chrome 添加的一个非标准扩展

image

performance.timeorigin: 性能测量开始时的时间的高精度时间戳

image

基本方法

performance.getEntries()

通过这个方法可以获取到所有的 performance 实体对象，通过 getEntriesByName 和 getEntriesByType 方法可对所有的 performance 实体对象进行过滤，返回特定类型的实体。

mark 方法和 measure 方法的结合可打点计时，获取某个函数执行耗时等。

image

performance.getEntriesByName()
performance.getEntriesByType()
performance.mark()
performance.clearMarks()
performance.measure()
performance.clearMeasures()
performance.now() ...

提供的 API

performance 也提供了多种 API，不同的 API 之间可能会有重叠的部分。

1. PerformanceObserver API

用于检测性能的事件，这个 API 利用了观察者模式。

获取资源信息

image

监测 TTI

image

监测长任务

image

2. Navigation Timing API

https://www.w3.org/TR/navigation-timing-2/

performance.getEntriesByType("navigation");

image

不同阶段之间是连续的吗? —— 不连续

每个阶段都一定会发生吗？—— 不一定

重定向次数：performance.navigation.redirectCount
重定向耗时: redirectEnd - redirectStart
DNS 解析耗时: domainLookupEnd - domainLookupStart
TCP 连接耗时: connectEnd - connectStart
SSL 安全连接耗时: connectEnd - secureConnectionStart
网络请求耗时 (TTFB): responseStart - requestStart
数据传输耗时: responseEnd - responseStart
DOM 解析耗时: domInteractive - responseEnd
资源加载耗时: loadEventStart - domContentLoadedEventEnd
首包时间: responseStart - domainLookupStart
白屏时间: responseEnd - fetchStart
首次可交互时间: domInteractive - fetchStart
DOM Ready 时间: domContentLoadEventEnd - fetchStart
页面完全加载时间: loadEventStart - fetchStart
http 头部大小：transferSize - encodedBodySize

3. Resource Timing APIhttps://w3c.github.io/resource-timing/

performance.getEntriesByType("resource");

image

image.png

// 某类资源的加载时间，可测量图片、js、css、
XHRresourceListEntries.forEach(resource => {   
 if (resource.initiatorType == 'img') {   
 console.info(`Time taken to load ${resource.name}: `, resource.responseEnd - resource.startTime);    
}});

这个数据和 chrome 调式工具里 network 的瀑布图数据是一样的。

4. paint Timing API

https://w3c.github.io/paint-timing/

首屏渲染时间、首次有内容渲染时间

image

5. User Timing API

https://www.w3.org/TR/user-timing-2/#introduction

主要是利用 mark 和 measure 方法去打点计算某个阶段的耗时，例如某个函数的耗时等。

<head>
<script>
    // 通常在head标签尾部时，打个标记，这个通常会视为白屏时间
    performance.mark("first paint time");
</script>
</head>
<body>
...
<script>
    // get the first paint time
    const fp = Math.ceil(performance.getEntriesByName('first paint time')[0].startTime);
</script>
</body>

6. High Resolution Time APIhttps://w3c.github.io/hr-time/#dom-performance-timeorigin

主要包括 now() 方法和 timeOrigin 属性。

7. Performance Timeline APIhttps://www.w3.org/TR/performance-timeline-2/#introduction

总结

基于 performance 我们可以测量如下几个方面：

mark、measure、navigation、resource、paint、frame。

let p = window.performance.getEntries();

重定向次数：performance.navigation.redirectCount

JS 资源数量: p.filter(ele => ele.initiatorType === "script").length

CSS 资源数量：p.filter(ele => ele.initiatorType === "css").length

AJAX 请求数量：p.filter(ele => ele.initiatorType === "xmlhttprequest").length

IMG 资源数量：p.filter(ele => ele.initiatorType === "img").length

总资源数量: window.performance.getEntriesByType("resource").length

不重复的耗时时段区分：

重定向耗时: redirectEnd - redirectStart
DNS 解析耗时: domainLookupEnd - domainLookupStart
TCP 连接耗时: connectEnd - connectStart
SSL 安全连接耗时: connectEnd - secureConnectionStart
网络请求耗时 (TTFB): responseStart - requestStart
HTML 下载耗时：responseEnd - responseStart
DOM 解析耗时: domInteractive - responseEnd
资源加载耗时: loadEventStart - domContentLoadedEventEnd

其他组合分析：

白屏时间: domLoading - fetchStart
粗略首屏时间: loadEventEnd - fetchStart 或者 domInteractive - fetchStart
DOM Ready 时间: domContentLoadEventEnd - fetchStart
页面完全加载时间: loadEventStart - fetchStart

JS 总加载耗时:

const p = window.performance.getEntries();
let cssR = p.filter(ele => ele.initiatorType === "script");
Math.max(...cssR.map((ele) => ele.responseEnd)) - Math.min(...cssR.map((ele) => ele.startTime));

CSS 总加载耗时:

const p = window.performance.getEntries();
let cssR = p.filter(ele => ele.initiatorType === "css");
Math.max(...cssR.map((ele) => ele.responseEnd)) - Math.min(...cssR.map((ele) => ele.startTime));

npm install -g lighthouse

image.png

当页面链接至使用 target="_blank" 的另一个页面时，两个页面将在同一个进程上运行。如果新页面正在执行开销极大的 JavaScript，当前页面性能可能会受影响。
另外，target="_blank" 也有一个安全漏洞。新的页面可以通过 window.opener 访问旧的窗口对象，甚至可以使用 window.opener.location = newURL 将旧页面导航至不同的网址。

当设置rel="noopener"时chrome会在独立的进程中打开新页面,同时会阻止window.opener，因此不存在跨窗口访问。

  <a target="_blank" rel="noopener" href="https://xxx.com">

image.png

转原文：腾讯前端团队是如何做web性能监控的？https://mp.weixin.qq.com/s/y2jh7oT36XdmHM9fImLl_w