webrtc视频处理流水线的第一个环节就是获取视频数据,视频源可以有多种来源,以摄像头采集为例,每个平台往往又提供不同的接口,本文打算以android平台camera采集为例,分析一下webrtc视频采集和分发流程。
视频采集主要类
如下所示,webrtc针对视频采集对外主要提供的是VideoCapturer接口,实现类有ScreenCapturerAndroid、FileVideoCapturer和CameraCapturer,分别表示屏幕、文件、摄像头三种不同的视频来源,因为android系统先后提供了camera1.0和camera2.0接口,因此CameraCapturer又用Camera1Capturer和Camera2Capturer两个子类分别表示。
VideoCapturer定义了CapturerObserver接口,如下所示,可通过实现该接口来接收图像数据。
// Interface used for providing callbacks to an observer.
public interface CapturerObserver {
// Notify if the camera have been started successfully or not.
// Called on a Java thread owned by VideoCapturer.
void onCapturerStarted(boolean success);
void onCapturerStopped();
// Delivers a captured frame. Called on a Java thread owned by VideoCapturer.
void onByteBufferFrameCaptured(
byte[] data, int width, int height, int rotation, long timeStamp);
// Delivers a captured frame in a texture with id |oesTextureId|. Called on a Java thread
// owned by VideoCapturer.
void onTextureFrameCaptured(int width, int height, int oesTextureId, float[] transformMatrix,
int rotation, long timestamp);
// Delivers a captured frame. Called on a Java thread owned by VideoCapturer.
void onFrameCaptured(VideoFrame frame);
}
webrtc中的AndroidVideoTrackSourceObserver类实现了该接口,然后由其分发到Sink模块,比如编码器、本地预览等需要图像数据的模块。
CameraCapturer对象的创建是由CameraEnumerator来完成,考虑到有前后摄像头之分,CameraEnumerator为不同的摄像头分配特定的名称,并根据名称来创建CameraCapturer对象,Camera1Enumerator和Camera2Enumerator创建的分别是Camera1Capturer和Camera2Capturer对象。
CameraCapturer主要是实现接口和状态维护,与android camera接口打交道的是通过创建的CameraSession对象来完成,相对应的也有Camera1Session和Camera2Session不同的实现,Camera1Session封装是camera1.0接口,即android.hardware.Camera,Camera2Session封装的是camera2.0接口,即android.hardware.camera2。camera1.0和camera2.0可以通过SurfaceTexture来接收图像数据,webrtc提供了SurfaceTextureHelper来帮助管理SurfaceTexture,camera1.0还可以通过注册PreviewCallback来接收YUV数据。关于camera1.0和camera2.0的使用后面再单独写文章分析。
CameraSession通过CreateSessionCallback和Events接口来分发事件和图像数据,如下所示,CameraCapturer实现了这两个接口来接收并转发事件和图像数据。
public interface CreateSessionCallback {
void onDone(CameraSession session);
void onFailure(FailureType failureType, String error);
}
// Events are fired on the camera thread.
public interface Events {
void onCameraOpening();
void onCameraError(CameraSession session, String error);
void onCameraDisconnected(CameraSession session);
void onCameraClosed(CameraSession session);
void onFrameCaptured(CameraSession session, VideoFrame frame);
// The old way of passing frames. Will be removed eventually.
void onByteBufferFrameCaptured(
CameraSession session, byte[] data, int width, int height, int rotation, long timestamp);
void onTextureFrameCaptured(CameraSession session, int width, int height, int oesTextureId,
float[] transformMatrix, int rotation, long timestamp);
}
视频数据表示
视频是由一帧一帧图像组成的,图像格式有RGB和YUV两类,每类又有不同的格式,webrtc中统一用VideoFrame来表示,不管是什么格式,本质就是一段buffer,只是buffer的格式不一样,webrtc图像数据需要在java层和native层相互传递,因此在java层和native层都有定义。
java层主要类如下:
Buffer接口定义如下:
public interface Buffer {
/**
* Resolution of the buffer in pixels.
*/
@CalledByNative("Buffer") int getWidth();
@CalledByNative("Buffer") int getHeight();
/**
* Returns a memory-backed frame in I420 format. If the pixel data is in another format, a
* conversion will take place. All implementations must provide a fallback to I420 for
* compatibility with e.g. the internal WebRTC software encoders.
*/
@CalledByNative("Buffer") I420Buffer toI420();
/**
* Reference counting is needed since a video buffer can be shared between multiple VideoSinks,
* and the buffer needs to be returned to the VideoSource as soon as all references are gone.
*/
@CalledByNative("Buffer") void retain();
@CalledByNative("Buffer") void release();
/**
* Crops a region defined by |cropx|, |cropY|, |cropWidth| and |cropHeight|. Scales it to size
* |scaleWidth| x |scaleHeight|.
*/
@CalledByNative("Buffer")
Buffer cropAndScale(
int cropX, int cropY, int cropWidth, int cropHeight, int scaleWidth, int scaleHeight);
}
native层主要类如下:
VideoFrameBuffer接口定义如下:
class VideoFrameBuffer : public rtc::RefCountInterface {
public:
// New frame buffer types will be added conservatively when there is an
// opportunity to optimize the path between some pair of video source and
// video sink.
enum class Type {
kNative,
kI420,
kI420A,
kI444,
};
// This function specifies in what pixel format the data is stored in.
virtual Type type() const = 0;
// The resolution of the frame in pixels. For formats where some planes are
// subsampled, this is the highest-resolution plane.
virtual int width() const = 0;
virtual int height() const = 0;
// Returns a memory-backed frame buffer in I420 format. If the pixel data is
// in another format, a conversion will take place. All implementations must
// provide a fallback to I420 for compatibility with e.g. the internal WebRTC
// software encoders.
virtual rtc::scoped_refptr<I420BufferInterface> ToI420() = 0;
// These functions should only be called if type() is of the correct type.
// Calling with a different type will result in a crash.
// TODO(magjed): Return raw pointers for GetI420 once deprecated interface is
// removed.
rtc::scoped_refptr<I420BufferInterface> GetI420();
rtc::scoped_refptr<const I420BufferInterface> GetI420() const;
I420ABufferInterface* GetI420A();
const I420ABufferInterface* GetI420A() const;
I444BufferInterface* GetI444();
const I444BufferInterface* GetI444() const;
protected:
~VideoFrameBuffer() override {}
};
从上面两个图可以看到,java和native的定义比较类似,而且都实现了转化为I420格式的接口,以java为例,定义了几种YUV Buffer和Texture Buffer,对于YUV Buffer对象,成员就是ByteBuffer了,比较好理解,每一帧的ByteBuffer内容肯定不一样,而对于Texture Buffer对象,涉及到opengl这块,没具体研究过,成员主要是id和矩阵,这两个信息每一帧都是一样的,猜测id可能对应一个native层的对象,而这个对象拥有一个buffer内容可变的成员,buffer格式应该也是YUV或者RGB,当然这只是猜测,具体还得去分析源码才知道,网上也看不到介绍相关原理的文章,先这么来理解吧。
视频采集和分发流程
以android camera1.0 PreviewCallback方式获取图像数据为例,视频采集和分发流程如下所示:
分发流程中主要类如下所示,VideoBroadcaster有个std::vector<SinkPair>类型的sinks_成员,存储了需要分发的sink对象,通过AddOrUpdateSink和RemoveSink函数来添加和删除。
从中可以看出,从camera获取到图像数据后,通过AndroidVideoTrackSourceObserver传递给native层的AndroidVideoTrackSource对象,再由VideoBroadcaster分发给不同的sink,通过VideoStreamEncoder分发给编码器,通过VideoSinkWrapper分发给java层的VideoSink对象,比如用于本地预览的SurfaceViewRenderer对象。
java层的VideoSink定义如下,onFrame是从native层的VideoSinkWrapper回调上来的。
public interface VideoSink {
/**
* Implementations should call frame.retain() if they need to hold a reference to the frame after
* this function returns. Each call to retain() should be followed by a call to frame.release()
* when the reference is no longer needed.
*/
@CalledByNative void onFrame(VideoFrame frame);
}
从camera到AndroidVideoTrackSourceObserver的流程比较简单,下面分析一下从AndroidVideoTrackSourceObserver之后的流程:
AndroidVideoTrackSourceObserver的nativeOnByteBufferFrameCaptured函数实现如下:
static void JNI_AndroidVideoTrackSourceObserver_OnByteBufferFrameCaptured(
JNIEnv* jni,
const JavaParamRef<jclass>&,
jlong j_source,
const JavaParamRef<jbyteArray>& j_frame,
jint length,
jint width,
jint height,
jint rotation,
jlong timestamp) {
AndroidVideoTrackSource* source =
AndroidVideoTrackSourceFromJavaProxy(j_source);
jbyte* bytes = jni->GetByteArrayElements(j_frame.obj(), nullptr);
source->OnByteBufferFrameCaptured(bytes, length, width, height,
jintToVideoRotation(rotation), timestamp);
jni->ReleaseByteArrayElements(j_frame.obj(), bytes, JNI_ABORT);
}
其中source是一个AndroidVideoTrackSource对象,它的OnByteBufferFrameCaptured函数最后调用的是父类AdaptedVideoTrackSource的OnFrame函数,定义如下:
void AdaptedVideoTrackSource::OnFrame(const webrtc::VideoFrame& frame) {
rtc::scoped_refptr<webrtc::VideoFrameBuffer> buffer(
frame.video_frame_buffer());
/* Note that this is a "best effort" approach to
wants.rotation_applied; apply_rotation_ can change from false to
true between the check of apply_rotation() and the call to
broadcaster_.OnFrame(), in which case we generate a frame with
pending rotation despite some sink with wants.rotation_applied ==
true was just added. The VideoBroadcaster enforces
synchronization for us in this case, by not passing the frame on
to sinks which don't want it. */
if (apply_rotation() && frame.rotation() != webrtc::kVideoRotation_0 &&
buffer->type() == webrtc::VideoFrameBuffer::Type::kI420) {
/* Apply pending rotation. */
broadcaster_.OnFrame(webrtc::VideoFrame(
webrtc::I420Buffer::Rotate(*buffer->GetI420(), frame.rotation()),
webrtc::kVideoRotation_0, frame.timestamp_us()));
} else {
broadcaster_.OnFrame(frame);
}
}
其中broadcaster_是一个VideoBroadcaster对象,在OnFrame函数中通过for循环分发到注册好的sink对象,如下所示:
void VideoBroadcaster::OnFrame(const webrtc::VideoFrame& frame) {
rtc::CritScope cs(&sinks_and_wants_lock_);
for (auto& sink_pair : sink_pairs()) {
if (sink_pair.wants.rotation_applied &&
frame.rotation() != webrtc::kVideoRotation_0) {
// Calls to OnFrame are not synchronized with changes to the sink wants.
// When rotation_applied is set to true, one or a few frames may get here
// with rotation still pending. Protect sinks that don't expect any
// pending rotation.
RTC_LOG(LS_VERBOSE) << "Discarding frame with unexpected rotation.";
continue;
}
if (sink_pair.wants.black_frames) {
sink_pair.sink->OnFrame(webrtc::VideoFrame(
GetBlackFrameBuffer(frame.width(), frame.height()), frame.rotation(),
frame.timestamp_us()));
} else {
sink_pair.sink->OnFrame(frame);
}
}
}
如果sink是一个VideoStreamEncoder对象,则是分发给编码器,如果sink是一个VideoSinkWrapper对象,则是分发给java层的VideoSink对象,比如用于本地预览的SurfaceViewRenderer对象。
VideoSinkWrapper的OnFrame定义如下:
void VideoSinkWrapper::OnFrame(const VideoFrame& frame) {
JNIEnv* jni = AttachCurrentThreadIfNeeded();
Java_VideoSink_onFrame(jni, j_sink_, NativeToJavaFrame(jni, frame));
}
Java_VideoSink_onFrame完成了从native层到java层VideoSink对象的回调。这属于jni方面的知识了,调用原理就是根据java类方法的签名获取到jmethodID,然后再用jni提供的接口调用jmethodID对应的java类方法。
不过你会发现在源码里用Java_VideoSink_onFrame这个名称搜索不到源码,这时你应该想到这个函数的代码一定是自动生成的,在c/c++开发中,一般有两种方式来完成代码自动生成工作:
- 宏定义,这种方式实际上是在编译代码时由预处理器来完成的,在c/c++开发中属于比较常见的一种方式,不过这种方式不够灵活,对于复杂点的代码就有点力不从心了。
- 工具,这种方式实际上是在编译前用其他工具根据写好的配置信息(比如IDL)来生成代码,这种方式灵活性要好很多,写配置信息比写一堆代码简单多了,像那种没啥技术含量的,重复度高的代码应该都能做到,比如aidl、probuffer、gsoap都属于这种。
其实站在一个更高的或者更抽象的角度考虑的话,宏定义的方式也属于第二种方式。仔细想想是不是呢,预处理器不就是工具么,#define不就是配置信息么。
好了,扯得有点远了,再扯下去就要偏离主题了,接下来回到原题。那么webrtc采用哪种方式呢?直觉告诉我应该是第二种方式,因为宏定义的方式不好搞,那么生成的代码在哪呢。编译后在顶层源码目录下搜索一下,其实就在./out/Debug/gen/sdk/android/generated_video_jni/jni/VideoSink_jni.h这个文件中。
这个文件的内容如下所示:
// Copyright 2014 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
// This file is autogenerated by
// base/android/jni_generator/jni_generator.py
// For
// org/webrtc/VideoSink
#ifndef org_webrtc_VideoSink_JNI
#define org_webrtc_VideoSink_JNI
#include <jni.h>
#include "../../../../../../../sdk/android/src/jni/jni_generator_helper.h"
// Step 1: forward declarations.
JNI_REGISTRATION_EXPORT extern const char kClassPath_org_webrtc_VideoSink[];
const char kClassPath_org_webrtc_VideoSink[] = "org/webrtc/VideoSink";
// Leaking this jclass as we cannot use LazyInstance from some threads.
JNI_REGISTRATION_EXPORT base::subtle::AtomicWord g_org_webrtc_VideoSink_clazz =
0;
#ifndef org_webrtc_VideoSink_clazz_defined
#define org_webrtc_VideoSink_clazz_defined
inline jclass org_webrtc_VideoSink_clazz(JNIEnv* env) {
return base::android::LazyGetClass(env, kClassPath_org_webrtc_VideoSink,
&g_org_webrtc_VideoSink_clazz);
}
#endif
// Step 2: method stubs.
static base::subtle::AtomicWord g_org_webrtc_VideoSink_onFrame = 0;
static void Java_VideoSink_onFrame(JNIEnv* env, const
base::android::JavaRef<jobject>& obj, const base::android::JavaRef<jobject>&
frame) {
CHECK_CLAZZ(env, obj.obj(),
org_webrtc_VideoSink_clazz(env));
jmethodID method_id =
base::android::MethodID::LazyGet<
base::android::MethodID::TYPE_INSTANCE>(
env, org_webrtc_VideoSink_clazz(env),
"onFrame",
"("
"Lorg/webrtc/VideoFrame;"
")"
"V",
&g_org_webrtc_VideoSink_onFrame);
env->CallVoidMethod(obj.obj(),
method_id, frame.obj());
jni_generator::CheckException(env);
}
#endif // org_webrtc_VideoSink_JNI
从文件的内容可以看出,这个文件是用jni_generator.py脚本生成的,生成的代码逻辑就是上面讲到的,至于配置信息在哪里,就不继续看了。
如果你接触过webrtc之前的版本,就会发现,那时从native代码调用java代码是Google工程师敲出来的,只是后面改用了脚本来干这种繁琐的事情,估计是写这种重复性的代码写烦了吧。从这也可以看出Google公司对技术精益求精的一面。
总结
这篇文章从主要类和流程的角度对webrtc的视频采集与分发进行了介绍,分析时是以android平台为例的,对于其他平台除了采集这块不太一样,后面的分发流程是一样的,后面再写文章来分析吧。接下来会从初始化的角度分析一下这些主要类的对象是何时创建和关联起来的。