Shader intrinsic functions stand as a partial solution for granting developers more control over existing computational resources and how they are leveraged. This capability (much touted by AMD as a performance-enhancing feature on their GCN-based products) essentially exposes features and capabilities that exist on the hardware developers are programming for, but wouldn’t generally be able to access. This can happen either because they’re being abstracted by a high-level API (Application Programming Interface, like DX11), or because the API isn’t functionally able to access them. To understand why high-level APIs such as DX11 don’t usually offer support for a piece of hardware’s full feature list, or full processing capabilities, we must first look at the basic architecture of a given computer system.
As you can see, there are usually multiple layers a given task must go through in order for it to be processed at a hardware level. You might be wondering why do we even need so many layers in the first place, and why wasn’t this enabled before. There are many technical reasons for this, but one of the strongest is simply the breadth of different hardware available for your buying and assembling pleasure. Unlike the console ecosystem, where hardware is fixed and, as a result, predictable in its performance metrics and command execution, the PC ecosystem is fractured in countless hardware combinations. You may have an AMD, CMT-enabled (Clustered Multi-Threading) FX-8350, an SMT-enabled (Simultaneous Multi-Threading) i7 6700K or anything in between, paired with a GCN RX 480 or a Pascal GTX 1070… And all that hardware has particularities in regards as to how it processes the same task, and the type of commands you need to input in order to get a given result. So, DX11, DX12 and Vulkan serve as what we call an abstraction layer.
Abstraction layers essentially simplify the programmer’s work – they “hide” and automate a given command’s underlying processes, particular implementation and hardware-specific code paths, so that the programmer only has to worry about what commands he wants to use – and voila. The high-level API converts a given command (let’s imagine, for simplicity’s sake, “draw frame”) into its equivalent, non-abstracted, hardware code, and runs it with a good-enough optimization on most hardware to deliver those awesome (insert your favorite game here) frames. To elaborate a little: imagine you have a command called “Stack”. On a high-level API like DX11, this command will be interpreted and values for its inner workings wil be automatically given, based on general hardware compatibility: how many levels to stack, when to stack them, and when to stop the operation. But since these aren’t optimized, your hardware will use somewhat of a brute-force approach. With a low-level API, developers can now set the exact values for the “Stack” command’s inner workings, optimized for your hardware, so it never goes out of budget, and none of those sexy stream processors are left idle.
The problem with the former, high-level approach, of course, is that generalizations and simplifications aren’t as efficient as running an optimized, hardware-specific code path, and may sometimes even deny access to hardware features over lack of support from the high-level API. The thing with DX12 and Vulkan’s low-level capabilities is that with them, in specific scenarios, developers can mostly ignore abstraction layers (some compiler checks are still used to make sure the code is within expected parameters). This allows them to code so as to take advantage of hardware-specific features, sometimes accelerating workloads by up to 2x compared to the high-level approach. This is the basic principle of low-level API’s: and something that is enabled, at least partially, by shader intrinsic functions.
Going back to the different layers on a system, imagine, for argument’s sake, that it takes 5 ms for a task to compute and go through each of the layers until it is executed by the hardware – in the example image given above, that would mean 5x 5ms = 25 ms. And now imagine you can effectively avoid going through all those figurative hoops, going straight from the app’s hardware processing requirements to the hardware. You now have reduced your 25 ms computation to a mere 10 ms, which frees up computation time for other tasks. This is what shader intrinsic functions really are: pieces of code that when recognized by the low-level API, are allowed to move directly towards the hardware, bypassing other, time-consuming layers.
The problem with this approach must seem obvious to you by now: while abstraction layers do add overhead to any given computing task, they do so while simplifying, sometimes by orders of magnitude, the coding process. Closer to the metal programming has in its greatest strength what also amounts to its greatest flaw: the ability to directly leverage hardware resources needs specific, time-consuming programming for the functions that were largely automatic before. This not only means more developer resources, but also a system that is more prone to errors: and debugging five lines of code is very different from debugging fifty lines of it. One must also keep in mind that closer to the metal programming, on behalf of it targeting more specifically only a subset of existing hardware, ends up leaving behind users of older, unsupported hardware.
AMD’s specific application of shader intrinsic functions in low-level graphics APIs such as Vulkan and DX12 stem from AMD’s grasp on the console market (with their CPUs and GPUs powering all three current-generation games consoles), as well as their previous work on Mantle, which went on to become embedded in today’s Vulkan library, and arguably gave Microsoft the push it needed to include low-level access to their DX12. This means that programmers are already leveraging optimized, feature-specific code paths in their console game implementations, which in turn, leads to AMD wanting to give them access to those same features on the PC hardware that supports it, reaping the benefits of hardware-specific optimizations for their GCN architecture. That said, this doesn’t mean NVIDIA doesn’t have their own shader intrinsic functions that developers can take advantage of: through their GameWorks initiative, NVIDIA allows programmers to add extensions not natively supported by DX’s HLSL (High Level Shading Language), while also allowing shader intrinsic functions to be leveraged as part of their CUDA ecosystem. An important distinction between the two companies’ approach is that while NVIDIA requires developers to use their specific GamesWorks libraries (which are proprietary, and not accessible on AMD’s cards), AMD’s approach is more open, being accessible in open standards such as GPUOpen and Vulkan’s libraries.
Shader intrinsics are just a part of what a low-level API needs to be, and aren’t particularly game-changing in and of themselves. That said, shader intrinsics will never be at their best on PC hardware, simply because of how the ecosystem is fractured by the amount of possible, updated or not-so-up-to-date systems. The best part of PC gaming is also, in this case and at this point in time, its greatest drawback towards obtaining perfect performance from any given system. But shader intrinsics are indeed a step forward towards giving developers more control over the features they implement and how they are run, and stand side by side with other technologies which will, in time, steer us towards ever more performant systems.