UE4 Performance Indicators and Solutions
This doc outlines the primary performance indicators in the context of Unreal Engine, including how to measure them, and suggestions on addressing performance problems in each area.
Reduce Draw Calls
Draw calls are typically one of the most expensive parts of rendering a complex scene. This is among the first things you want to look into reducing as much as possible.
How to Measure
Use the console command stat RHI and check “DrawPrimitive calls” under the “Counters” category.
Use the console command stat scenerendering and check “Mesh Draw calls” under the “Counters” category. Unlike the “DrawPrimitive calls” counter, this counter will only show draw calls from meshes.
Hierarchical Instanced Static Meshes
Meshes that are rendered as a part of an instanced static mesh component (ISMC) or hierarchical instanced static mesh component (HISMC) will share draw calls. The difference between the ISMC and HISMC is that HISMCs support rendering different LODs for its instances at once, whereas with ISMC all rendered instances share the same LOD. For this reason, HISMC should be preferred, but in either case, there will be a draw call reduction. Note: since UE 4.22, with some limitations, the engine can automatically merge identical static meshes with the same materials and material uniform parameters into shared draw calls, similar to instanced meshing. However, this automation is limited to the DX11 feature set, so others such as DX12 or earlier than DX11 can’t receive this benefit (read more about this under “Draw Call Merging” here UE4 doc: Mesh Drawing Pipeline). For this reason, I will still recommend using instanced.
Foliage Paint Assets Where Possible
If you place static meshes using the foliage painter, each type of mesh painted will be rendered similarly as though they are all rendered through a hierarchical instanced static mesh component (HISMC). This is a simpler option than managing separate Actors interacting with a shared HISMC to add/remove themselves from the HISMC pool as needed. However, with this option, each placed static mesh is not its own Actor, reducing simulation/interaction options for gameplay purposes. Additionally, addition/removal of instanced static meshes through the foliage painter is only available at editor time, not runtime. As a result, this method should be preferred when you’re placing many common static meshes for set-dressing which do not need any C++/BP functionality built into them.
Reduce Material Slots for Meshes
Each material slot on a mesh will increase its draw call count. For clarity, material slots are different from texture uniform reference. Material slots are where you assign material instances. Material slot count is determined on the artist’s side when they design the mesh and determine which parts of the mesh will use different materials. While you can’t change this in-engine, you can check your mesh assets in-engine to determine if they have excessive material slots, and work with your artists to get them optimized to use less material slots. The most general rule of thumb is that the simpler the object, the fewer material slots it should have. For example, a rock should have one material slot, and a wood + metal bench could have two (one for the wood parts to share, and one for the metal parts to share). An example of an excessive material slot setup would be if each piece of wood or metal on the bench had its own material slot, resulting in a total of 8 material slots.
Actor Merging
Unreal is capable of merging multiple selected actors within a level into a single new actor. The main benefit to this feature is reducing draw calls. See their documentation here: UE4 docs: Actor Merging. Actor merging can be carried out in a couple ways. The first way will effectively create a new mesh (with new UVs) and even make a single material which is all the selected actor’s materials combined. This allows you to directly convert multiple draw calls caused by multiple actors into a single draw call.
Actor merging also supports merging multiple actors into using one instanced mesh component. This is helpful for cases where you have many copies of the same static mesh actor.
Reduce Rendered Tri Count
How to Measure
Use the console command stat RHI and check “Triangles drawn” under the “Counters” category.
Creating Mesh LODs
Creating different mesh LODs (levels of detail) will cause the mesh to render with reduced geometry complexity depending on its size as a percentage of the screen. Unreal can automatically generate LODs for your meshes with some simple configuration within the mesh’s asset. This is one of the simplest ways to reduce the number of rendered triangles.
Resources:
Cull Distance Volumes
Culling distance volume actors allow fine-grained control over the culling the visibility of any actors within the volume based upon their size and distance from the camera. The amount of a tri draw reduction you can get from this depends on the use case. They are especially good for wide open environments with a range of object sizes.
Resources:
Reduce Overdraw
Overdraw describes how often texels (pixels in the context of textures) are written to within a framebuffer in the scope of one frame. When rendering a scene with opaque materials, texels will usually be written to once. Texels are often written to many more times than once usually as a result of translucent or additive materials overlapping. For example, many fog particles overlapping result in the overlapped texels being written to many times, which is measured as a high amount of overdraw. Overdraw is most commonly a problem with foliage (which use the foliage card technique, in which you have a foliage texture on an alpha translucent material) and particle systems.
Use Overdraw View Mode to Identify
Reducing Overdraw
The method to reduce overdraw is highly dependent on the specific cause of overdraw (for example, foliage, particle systems, or something else). This example looks at particle systems.
Shown above is a sample particle system producing a large amount of overdraw with its additive particles. A common cause of frame rates dipping down is when particle systems are spawned which produce a lot of quad overdraw. Quad overdraw is difficult to avoid when making some kinds of particle systems, and some amount of it is okay for these use cases if it’s temporary. However, in order to reduce quad overdraw, effort should be made to reduce the amount of particles needed at once for these systems. Sometimes, this can be achieved without much change to the visuals of the system by reducing the quantity of the particles, and then increasing the opacity of the particles to compensate for the reduced particle quantity.
Reduce Lighting Complexity
Lighting complexity increases as dynamic light sources overlap. This is a significant source of performance cost from dynamic lights. Below is a sample scene with 3 point lights with different attenuation radius. We’ll then look at it with Light Complexity view mode.
Reducing Lighting Complexity - Attenuation and Avoiding Overlap
The difference in settings between all of these lights is just attenuation - no difference in intensity. The key takeaway is that the light’s attenuation setting is the factor which affects the radius in which it imposes a performance cost. Light intensity does not have an affect on this. As a result, make sure each dynamic light’s attenuation is proportional to how intense the light is and how much area it should visibly affect. As dynamic light sources move around, a smaller attenuation radius will reduce the amount they will overlap on average. You can see that where the 3 lights overlap, the complexity is the worst (the purple region in the center). In environments with dynamic lights, spacing them out to reduce overlap between their attenuations will reduce light complexity. Consider outright removing dynamic lights in cases with excessive light complexity.
Reduce Skeletal Mesh Animation Costs
Skeletal meshes contribute a significant amount to game-thread costs because they must tick and update on the CPU, on top of having to be rendered like a typical mesh on the render-thread. The methods outlined here are about measuring and reducing their performance impact on the game-thread (CPU).
Measuring Skeletal Mesh Costs
To quickly see how much skeletal meshes are contributing to game-thread costs, use the stat command stat anim. From here, you can see how much your skeletal meshes are taking to Tick each frame.
Budgeted Skeletons
The Animation Budget Allocator (ABA) allows you to define game-thread cost limits collectively for skeletons which interact with the ABA. You can have skeletal mesh components opt-in to the ABA system by replacing them with the type “Skeletal Mesh Component Budgeted”. The ABA will reduce animation quality for budgeted skeletons dynamically based upon current animation performance, and the way it reduces quality is configurable. For example, dynamically reducing component tick rate, and/or removing pose interpolation. It will apply these optimizations based upon the calculated “significance” of skeletal meshes. For example, skeletal meshes that are further away from the camera can have lower significance, and thus be subject to much more optimization than skeletons close to the camera. With the C++ API for the ABA, you can define your own custom significance function to suit your use case’s requirements.
This should be the first optimization effort for your typical use cases of skeletal meshes, especially with how flexibly you can push the optimization with it.
Read more about animation budgeting in the UE4 docs: Animation Budget Allocator
Vertex Animation Textures
Vertex Animation Textures (VATS) are textures within which all necessary data from a skeletal mesh’s animations are baked. This can include skeletal mesh LODs as well. The VATs are then rendered through static meshes, resulting in a significant performance improvement over skeletal meshes. Downsides to this method are that animation events and any tech achieved through animation graphs (such as anim blending) are not available. The robustness of the featureset depends on how the VAT baking and material functions which use it are implemented. There are a few plugins out there which implement them in a feature-rich way. This is one such plugin: Marketplace: Vertex Animation Toolset. VATs are ideal to use instead of skeletal meshes for animated actors which play a small/decorative role in gameplay and therefore don’t need the full feature set that comes with a skeletal mesh, such as environment creatures like birds or frogs. VATs are practically mandatory for cases where you need to render an extremely large amount of animated entities (5000+) of any complexity, such as in crowd simulation, in tandem with Hierarchical Instanced Static Mesh components.
Usage of Tick()
Many gameplay features can be implemented as a part of an Actor’s Tick(), but using Tick is the most expensive option as they run every single frame by default. A common cause of performance issues I’ve seen in several medium to large scope games is that most actors implement Tick(). While the tick itself only takes a fraction of a millisecond, if nearly all actors in the game use Tick(), those fractions of a millisecond end up adding up to a lot - sometimes most of the frame budget. As a result, overuse of Ticking is considered a death by a thousand cuts.
Consider these alternatives instead of using Tick. Tick should be the last resort for implementing functionality.
- Event-driven systems (In C++, this is using Delegates. In blueprints this is using event dispatchers)
- Timers
- Actor Timelines
If you feel like Tick is still the best way to implement a feature, consider reducing the Tick Interval of the Actor. Usually, ticking once per frame is excessive (which is 60 times a second at 60 fps) - sometimes for example 5 times a second is enough. Ticking 5 times per second is 12x less expensive than 60 times per second.
Checking for Prevalence of Tick()
During runtime use the console commands stat startfile and stat stopfile to begin and end a performance profiling session. This will create a UE stats file which you can load in the profiler GUI. One place to open this data is the “Session Frontend” window, accessed from Window -> Developer Tools in the toolbar. See this page for reference on using the profiler: UE4 doc: Profiler Tool Reference. Once the profiler data is loaded in the GUI. Inspect it to see how much of your costs are collectively coming from your Tick() implementations across your class. For most games, Ticking should be a minority of game thread costs.
Reduce Shader Complexity
Shader complexity is defined as the amount of instructions a shader/material needs to execute. The more instructions, the higher the complexity, and the more expensive it is to render. Many materials in a game will typically have low complexity, but sometimes high complexity materials crop up, or many low-complexity shaders overlap due to translucency, adding up to high shader complexity for the covered pixels.
Identifying Shader Complexity Issues
Use the shader complexity view mode (hotkey Alt + 8) to switch the view mode to shader complexity.
Above are examples of two cases where shader complexity is high. An opaque cube with a high complexity material (467 instructions), and a particle system with many overlapping low complexity translucent materials (60 instructions). You can read more about the view mode here: UE4 docs: Shader complexity view mode
Reducing Shader Complexity
For the particle system, the heat map looks similar to the quad overdraw view mode. The shader complexity in the case of the particle system is so high specifically because of the amount of quad overdraw (overlapping particle billboards, in this case). Optimizing the shader complexity of the particle material would be one angle of improving the shader complexity of the particle system, as would be reducing the amount of quad overdraw (see the section on Overdraw in this doc).
Reducing shader complexity for opaque materials like in the case of the above example cube requires modifying the material. The way this is to be done depends on why the material is so expensive. You can experiment with reducing parts of the material, recompiling it, and checking the instruction count in the material’s output stats window to see the amount the instruction count has changed by.
Types of commonly problematic content to check for high shader complexity in the context of a level include grass foliage, and situations in which there are many particle systems layering, such as during combat, or when environmental vfx are in play, like in a rainy environment.