Apple Metal 2 8.LOD 和函数

原文https://developer.apple.com/documentation/metal/advanced_techniques/lod_with_function_specialization

LOD with Function Specialization

Demonstrates how to use specialized functions to render a model with varying levels of detail.

Overview

A high-quality gaming experience has to manage trade-offs between great graphics and great performance. High quality models look great, but their complexity requires a significant amount of processing power. By increasing or decreasing the level of detail (LOD) of a model, games can selectively manage both graphics and performance.

Instead of selecting a fixed LOD at build time, games can dynamically select between a range of LODs at runtime based on certain model-view conditions. For example, a focal-point foreground model could have a high LOD, whereas a fast-moving background model could have a low LOD.

This sample demonstrates dynamic LOD selection for a fire truck model, based on its distance to the scene's camera. When the model is closer to the camera, the renderer uses a higher LOD; when the model is further from the camera, the renderer uses a lower LOD.

LOD and GPU Branches

At first thought, it might seem simple to select a LOD dynamically using a series of if and else statements, like the following pseudo-code:

if(highLOD)
{
    // Render high-quality model
}
else if(mediumLOD)
{
    // Render medium-quality model
}
else if(lowLOD)
{
    // Render low-quality model
}

This code seems straightforward, but its implementation, execution, and performance implications are very different between the CPU and the GPU.

The massively-parallel architecture of GPUs is not particularly well suited to handle GPU functions that have many branches. The number of concurrent instructions that a GPU can execute is highly dependent on the total number of registers allocated for a particular function. A GPU compiler needs to allocate the maximum number of registers that a function could possibly use. This means that the compiler allocates registers for all branches, even for branch conditions that are never true and whose code is never executed. As a result, branch statements greatly increase a function's register allocation and decrease the number of GPU threads that can execute concurrently.

Nevertheless, branch statements are very useful programming constructs, particularly for functions that share a lot of common code. In these situations, branches make it a lot easier to write maintainable code with only a few differences between branches. One of the most common cases for graphics functions that share code is handling a branch condition that differs only between draw calls, not between individual threads executing within a single draw call.

Traditionally, branches that differ between draw calls are mitigated in one of the following ways:

  • Writing per-branch functions. Each branch is written as a complete and separate function, and the render loop decides which function to use at runtime. This greatly increases code duplication since all possible outcomes of each branch condition require their own standalone function. For example, a single if statement requires one function for the true outcome and another function for the false outcome.

  • Using preprocessor directives. Instead of using a regular if statement, functions can use the #if statement which is a preprocessor directive that selectively compiles a function after evaluating its branch conditions. This avoids code duplication but reduces the performance benefits of precompiled Metal shading language code. Since the branch conditions can only be evaluated at runtime, the functions cannot be precompiled at build time.

Metal's function specialization feature is designed to reduce branch performance costs, avoid code duplication, and leverage build time compilation. Function specialization enables you to create multiple executable versions of a single source function. You can create specialized functions by declaring function constants in your Metal shading language code and setting their values at runtime. This allows the front-end compiler to precompile your source function at build time and the back-end compiler to compile the specialized function at runtime when the pipeline is created.

LOD Selection Criteria

This sample demonstrates function specialization by creating different render pipelines for each LOD. All of the pipelines share the same source function, but function constants determine LOD-specific paths and inputs for each pipeline. Thus, different function constant values create different specialized functions for each LOD render pipeline.

This sample demonstrates dynamic LOD selection for a fire truck, based on its distance to the scene's camera. When the fire truck is close to the camera, it occupies more pixels on the screen; therefore, the sample uses a high-quality render pipeline to render the fire truck. When the fire truck is far from the camera, it occupies fewer pixels on the screen; therefore, the sample uses a low-quality render pipeline to render the fire truck.

The fire truck model in this sample uses many types of textures, such as albedo, normal, metallic, roughness, ambient occlusion, and pre-baked irradiance material textures. It would be incredibly wasteful to sample from each of these textures when the model is far from the camera; the detail provided by the full combination of textures just wouldn't be seen. The sample uses various function constant values to create specialized functions that sample from more or fewer textures, depending on the selected LOD. Additionally, specialized functions that sample from fewer textures also perform less complex computations and result in a faster render pipeline.

The isTexturedProperty:atQualityLevel: method controls whether a material property is set by sampling from a texture or by reading a constant value.

+ (BOOL)isTexturedProperty:(AAPLFunctionConstantIndices)propertyIndex atQualityLevel:(AAPLQualityLevel)quality
{
    AAPLQualityLevel minLevelForProperty = kQualityLevelHigh;
    
    switch(propertyIndex)
    {
        case kFunctionConstantBaseColorMapIndex:
        case kFunctionConstantIrradianceMapIndex:
            minLevelForProperty = kQualityLevelMedium;
            break;
        default:
            break;
    }
    
    return quality <= minLevelForProperty;
}

Writing Specialized Functions

The sample uses six function constants to control the various inputs available to the fragment function.

constant bool has_base_color_map        [[ function_constant(kFunctionConstantBaseColorMapIndex) ]];
constant bool has_normal_map            [[ function_constant(kFunctionConstantNormalMapIndex) ]];
constant bool has_metallic_map          [[ function_constant(kFunctionConstantMetallicMapIndex) ]];
constant bool has_roughness_map         [[ function_constant(kFunctionConstantRoughnessMapIndex) ]];
constant bool has_ambient_occlusion_map [[ function_constant(kFunctionConstantAmbientOcclusionMapIndex) ]];
constant bool has_irradiance_map        [[ function_constant(kFunctionConstantIrradianceMapIndex) ]];

The sample also declares a derived function constant, has_any_map, that is used in the vertex function. This value determines whether or not the render pipeline requires the vertex function to output a texture coordinate to the ColorInOut.texCoord return value.

constant bool has_any_map = (has_base_color_map        ||
                             has_normal_map            ||
                             has_metallic_map          ||
                             has_roughness_map         ||
                             has_ambient_occlusion_map ||
                             has_irradiance_map);

When the value of has_any_map is false, the vertex function does not write a value to the texCoord member.

if (has_any_map)
{
    out.texCoord = in.texCoord;
}

The function constants control the source of a parameter to the lighting computation in the calculateParameters() function. By using the [[function_constant(index)]] attribute qualifier, this function can determine whether or not it should sample from a texture. The function only samples from a texture if the attribute qualifier indicates that a texture parameter is present, otherwise it reads a uniform value from the materialUniforms buffer.

LightingParameters calculateParameters(ColorInOut in,
                                       constant AAPLUniforms         & uniforms,
                                       constant AAPLMaterialUniforms & materialUniforms,
                                       texture2d<float>   baseColorMap        [[ function_constant(has_base_color_map) ]],
                                       texture2d<float>   normalMap           [[ function_constant(has_normal_map) ]],
                                       texture2d<float>   metallicMap         [[ function_constant(has_metallic_map) ]],
                                       texture2d<float>   roughnessMap        [[ function_constant(has_roughness_map) ]],
                                       texture2d<float>   ambientOcclusionMap [[ function_constant(has_ambient_occlusion_map) ]],
                                       texturecube<float> irradianceMap       [[ function_constant(has_irradiance_map) ]])

The corresponding inputs to the fragment function also use the same function constants.

fragment float4
fragmentLighting(ColorInOut in [[stage_in]],
                 constant AAPLUniforms         & uniforms         [[ buffer(kBufferIndexUniforms) ]],
                 constant AAPLMaterialUniforms & materialUniforms [[ buffer(kBufferIndexMaterialUniforms) ]],
                 texture2d<float>   baseColorMap         [[ texture(kTextureIndexBaseColor),        function_constant(has_base_color_map) ]],
                 texture2d<float>   normalMap            [[ texture(kTextureIndexNormal),           function_constant(has_normal_map) ]],
                 texture2d<float>   metallicMap          [[ texture(kTextureIndexMetallic),         function_constant(has_metallic_map) ]],
                 texture2d<float>   roughnessMap         [[ texture(kTextureIndexRoughness),        function_constant(has_roughness_map) ]],
                 texture2d<float>   ambientOcclusionMap  [[ texture(kTextureIndexAmbientOcclusion), function_constant(has_ambient_occlusion_map) ]],
                 texturecube<float> irradianceMap        [[ texture(kTextureIndexIrradianceMap),    function_constant(has_irradiance_map)]])

Creating Different Pipelines

This sample uses three different MTLRenderPipelineState objects, each representing a different LOD. Specializing functions and building pipelines is expensive, so you should always perform these tasks asynchronously before starting your render loop. When the AAPLRenderer object is initialized, each LOD pipeline is created asynchronously by using dispatch groups, completion handlers, and notification blocks.

The sample creates six specialized functions overall: a vertex and a fragment function for each of the three LODs. This task is monitored by the specializationGroup dispatch group and each function is specialized by calling the newFunctionWithName:constantValues:completionHandler: method.

for (uint qualityLevel = 0; qualityLevel < kQualityNumLevels; qualityLevel++)
{
    dispatch_group_enter(specializationGroup);

    MTLFunctionConstantValues* constantValues = [self functionConstantsForQualityLevel:qualityLevel];

    [defaultLibrary newFunctionWithName:@"fragmentLighting" constantValues:constantValues
                      completionHandler:^(id <MTLFunction> newFunction, NSError *error )
     {
         if (!newFunction)
         {
             NSLog(@"Failed to specialize function, error %@", error);
         }

         _fragmentFunctions[qualityLevel] = newFunction;
         dispatch_group_leave(specializationGroup);
     }];

    dispatch_group_enter(specializationGroup);

    [defaultLibrary newFunctionWithName:@"vertexTransform" constantValues:constantValues
                      completionHandler:^(id <MTLFunction> newFunction, NSError *error )
     {
         if (!newFunction)
         {
             NSLog(@"Failed to specialize function, error %@", error);
         }
         
         _vertexFunctions[qualityLevel] = newFunction;
         dispatch_group_leave(specializationGroup);
     }];
}

The notifyBlock block builds the three render pipelines. This task is monitored by the _pipelineCreationGroup dispatch group and each pipeline is built by calling the newRenderPipelineStateWithDescriptor:completionHandler: method.

dispatch_group_enter(_pipelineCreationGroup);

void (^notifyBlock)() = ^void()
{
    MTLRenderPipelineDescriptor *pipelineStateDescriptors[kQualityNumLevels];

    dispatch_group_wait(specializationGroup, DISPATCH_TIME_FOREVER);

    for (uint qualityLevel = 0; qualityLevel < kQualityNumLevels; qualityLevel++)
    {
        dispatch_group_enter(_pipelineCreationGroup);

        pipelineStateDescriptors[qualityLevel] = [pipelineStateDescriptor copy];
        pipelineStateDescriptors[qualityLevel].fragmentFunction = _fragmentFunctions[qualityLevel];
        pipelineStateDescriptors[qualityLevel].vertexFunction = _vertexFunctions[qualityLevel];

        [_device newRenderPipelineStateWithDescriptor:pipelineStateDescriptors[qualityLevel]
                                    completionHandler:^(id <MTLRenderPipelineState> newPipelineState, NSError *error )
         {
             if (!newPipelineState)
                 NSLog(@"Failed to create pipelineState, error %@", error);
             
             _pipelineStates[qualityLevel] = newPipelineState;
             dispatch_group_leave(_pipelineCreationGroup);
         }];
    }

    dispatch_group_leave(_pipelineCreationGroup);
};

dispatch_group_notify(specializationGroup, pipelineQueue, notifyBlock);

Rendering with a Specific LOD

At the beginning of the render loop, for each frame, the sample calls the _calculateQualityAtDistance: method to update the _currentQualityLevel value. This defines the LOD for the frame based on the distance between the model and the camera. The _calculateQualityAtDistance: method also sets a _globalMapWeight value that creates a smooth transition between LOD boundaries.

- (void)calculateQualityAtDistance:(float)distance
{
    static const float MediumQualityDepth     = 150.f;
    static const float LowQualityDepth        = 650.f;
    static const float TransitionDepthAmount  = 50.f;

    assert(distance >= 0.0f);
    if (distance < MediumQualityDepth)
    {
        static const float TransitionDepth = MediumQualityDepth - TransitionDepthAmount;
        if(distance > TransitionDepth)
        {
            _globalMapWeight = distance - TransitionDepth;
            _globalMapWeight /= TransitionDepthAmount;
            _globalMapWeight = 1.0 - _globalMapWeight;
        }
        else
        {
            _globalMapWeight = 1.0;
        }
        _currentQualityLevel = kQualityLevelHigh;
    }
    else if (distance < LowQualityDepth)
    {
        static const float TransitionDepth = LowQualityDepth - TransitionDepthAmount;
        if(distance > TransitionDepth)
        {
            _globalMapWeight = distance - (TransitionDepth);
            _globalMapWeight /= TransitionDepthAmount;
            _globalMapWeight = 1.0 - _globalMapWeight;
        }
        else
        {
            _globalMapWeight = 1.0;
        }
        _currentQualityLevel = kQualityLevelMedium;
    }
    else
    {
        _currentQualityLevel = kQualityLevelLow;
        _globalMapWeight = 0.0;
    }
}

The updated _currentQualityLevel value is used to set the corresponding MTLRenderPipelineState object for the frame.

[renderEncoder setRenderPipelineState:_pipelineStates[_currentQualityLevel]];

The updated _globalMapWeight value is used to interpolate between quality levels and prevent abrupt LOD transitions.

[submesh computeTextureWeightsForQualityLevel:_currentQualityLevel
                          withGlobalMapWeight:_globalMapWeight];

Finally, the render loop draws each submesh in the model with the specific LOD pipeline.

[renderEncoder drawIndexedPrimitives:metalKitSubmesh.primitiveType
                          indexCount:metalKitSubmesh.indexCount
                           indexType:metalKitSubmesh.indexType
                         indexBuffer:metalKitSubmesh.indexBuffer.buffer
                   indexBufferOffset:metalKitSubmesh.indexBuffer.offset];
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,997评论 6 502
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,603评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 163,359评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,309评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,346评论 6 390
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,258评论 1 300
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,122评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,970评论 0 275
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,403评论 1 313
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,596评论 3 334
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,769评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,464评论 5 344
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,075评论 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,705评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,848评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,831评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,678评论 2 354

推荐阅读更多精彩内容

  • (一) 我:爸,我为什么要练功? 爸:为了报仇。 我:报仇? 爸:是的。 我:杀谁? 爸:你妈。 我:妈不是早死了...
    后畏小生阅读 1,397评论 43 31
  • 在女儿出生之前,我的生活一向单调,读书,几乎占据了大部分的休闲时光。尽管选书的视角狭窄,眼光并不精道,但读书确实令...
    April的秘密花园阅读 440评论 0 1
  • weiwei常常和我说到对她父母的各种抱怨,说到父母不爱自己,说到她对父母的埋怨和记恨。我能理解她的感受,但也试着...
    涵予张阅读 272评论 0 5
  • 去岁秋末,秋心袅袅。一时心血来潮,独往东湖公园赏菊。 东湖公园在我们居住地的东部,也是未来的中心区。因公园靠南的地...
    宬方圆阅读 348评论 0 3