Saturday, October 12, 2013

Quality Test - GPU-accelerated H.264/AVC Game Recording with CUDA (with Screenshot Comparisons and Examples)

To perform game recording solely on the videocard, utilizing the high resources of today's little powerhouses sitting in our cases - resulting in high quality video and less lag while recording - sounds like manna from heaven. You might be saying, "Wait, the possibly of lowered performance hit and the possibility of a high-quality recording?.. if this is true, please stop teasing and just tell me now..". Well, dear reader, let me tell you that these things are indeed true [for the most part..].

Beginning with version 1.9.0, Bandicam (a game recording application) included support for utilizing NVIDIA's CUDA for recording your gameplay, touting high speed (less 'lag'), high compression ratio (smaller file sizes) and high quality. All you needed was a CUDA-capable NVIDIA videocard, the latest videocard drivers from NVIDIA, and Bandicam. In the most recent version of Bandicam (1.9.1) they have also included support for AMD's APP ACCELERATION and INTEL's QUICK SYNC. 

For this QualityTest, I took an NVIDIA GPU and put it through the paces of some game recording with the accelerated H.264/AVC codec (as opposed to the CPU-based x264 for example), using Bandicam [which at the time of this writing, to my knowledge, was the only game recording app that could utilize accelerated/gpu-based AVC recording] to test things out. Here are my findings, shared just for you.


An example of an NVIDIA CUDA encoded game recording, this is a frame extracted from the CUDA-produced output file (Unigine Heaven Benchmark Test @ 1080p). Click to see Full Size


Overall, recording with the GPU gave good performance (low performance hit / lag), fairly small file size output, with decent quality [surprising to anyone that has compressed video with CUDA in the past, I know, more info below]. It was only with certain games that the performance suffered (although this may be more the fault of optimization with the game's engine and not NVIDIA's programming) and it was slightly disappointing that the performance wasn't "that much faster" than other codecs [more info at the end about that]. For the most part however, game recording "live" with the GPU and directly encoding to a file using a compatible videocard is indeed nice and fast and produces comfortably-small footprint file sizes.



What file sizes are we talking about here? 



I haven't looked at the specs for the codec and how it's utilized, but from what I can quickly see on the surface, it is using the h.264/AVC codec with my NVIDIA GPU, with Deblocking enabled and with a keyframe/I-frame inserted every 5 seconds. That doesn't leave 'a lot' of headroom for compression (x264 AVC usually defaults to 250-300 frames between keyframes which, at 30 frames per second, is more like 10 seconds to 'play with' for compression); but it still gave a nice small filesize when recording with VBR (Variable Bit Rate), where it would compress slower-motion areas and scenes when it could and increased the bitrate (to try to keep apparent quality) when needed in faster/high-motion sections.


An example of an NVIDIA CUDA encoded game recording, this is a frame extracted from the CUDA-produced output file (Battlefield 3 @ 1080p). Click to see Full Size


Just below, you will find some data samples of game recordings that were done at incremental quality settings, showing the average bitrate of the recording overall and the sizes of the files produced as output
[(i)all game samples were recorded with accelerated/gpu-based H.264/AVC generated by NVIDIA's CUDA at 1080p unless otherwise indicated (ii)although test recordings varied in length between games, throughout each game title, all 4 tests of the same game were done with the same elapsed time]:


Recorded Game TitleQ. settingAverage BRFilesize
Batman: Arkham CityQuality 100(~75000 kbps)753 MB
Batman: Arkham CityQuality 70(~22000 kbps)224 MB
Batman: Arkham CityQuality 50(~10000 kbps)104 MB
Batman: Arkham CityQuality 20(~5000 kbps)59 MB
Unigine Heaven BenchmarkQuality 100(~120000 kbps)1390 MB
Unigine Heaven BenchmarkQuality 70(~48000 kbps)551 MB
Unigine Heaven BenchmarkQuality 50(~24000 kbps)276 MB
Unigine Heaven BenchmarkQuality 20(~11000 kbps)140 MB
Diablo IIIQuality 100(~22000 kbps)44 MB
Diablo IIIQuality 70(~9000 kbps)20 MB
Diablo IIIQuality 50(~5000 kbps)12 MB
Diablo IIIQuality 20(~4000 kbps)9 MB
Alien Versus Predator (PC-2010)Quality 100(~110000 kbps)380 MB
Alien Versus Predator (PC-2010)Quality 70(~37000 kbps)126 MB
Alien Versus Predator (PC-2010)Quality 50(~18000 kbps)66 MB
Alien Versus Predator (PC-2010)Quality 20(~8000 kbps)28 MB
Minecraft (Standalone)Quality 100(~47000 kbps)44 MB
Minecraft (Standalone)Quality 70(~27000 kbps)25 MB
Minecraft (Standalone)Quality 50(~13000 kbps)12 MB
Minecraft (Standalone)Quality 20(~7600 kbps)7 MB
Lord Of Ultima (Browser Game)Quality 100(~ 24000 kbps)59 MB
Lord Of Ultima (Browser Game)Quality 70          -      -
Lord Of Ultima (Browser Game)Quality 50(~7000 kbps)20 MB
Lord Of Ultima (Browser Game)Quality 20(~3000 kbps)11 MB
Lottso Express (Browser, 384p)Quality 100(~870 kbps)10 MB
Lottso Express (Browser, 384p)Quality 70(~320 kbps)7 MB
Lottso Express (Browser, 384p)Quality 50(~225 kbps)6 MB
Lottso Express (Browser, 384p)Quality 20(~164 kbps)5 MB
table code created by Danny Sanchez (journalistopia.com)

As you can see in the table above, due to the nature of Variable Bit Rate recording, setting a quality figure of "50" does not produce 'exactly one-half' of the bitrate or filesize of a "100" quality setting. The codec is adjusting as needed and where required, allocating more bitrate to complex areas and changes between frames, to help keep "apparent quality" near the desired estimate. Configuring the Quality setting of "100" is essentially telling everything to keep as much detail as possible - but it will only do so within the limitations of the codec being used (in this case, AVC) and how it is configured internally (the calculation time allowed per frame, the buffer allowed, etc), depending on how the developers have programmed it to encode.


An example of an NVIDIA CUDA encoded game recording, this is a frame extracted from the CUDA-produced output file (Minecraft @ 1080p). Click to see Full Size


Overall, it seems that with CUDA, the file sizes are kept comfortably small - especially compared to a FRAPS or DXTORY codec recording. The bitrate doesn't stray too far from a 100,000kbps maximum (about 12 MB per second of video data) despite how much action is going on in the game at the time. At that bitrate, a half hour of straight recording would take up only about 22GB. Not too bad, especially if you are creating long recordings or can't afford that 4TB drive upgrade just yet. A Blu-Ray disc title typically uses a bitrate of 25,000-50,000kbps, so a GPU-based/accelerated recording is still allowing for more than double that bitrate to try to represent what is happening on the screen in viewable quality.



So then, what is the quality like?



Normally, if you ask anyone about quality who has used GPU-acceleration together with words like "video editing" and "compression", they will tell you: "It encoded faster, but the output looked craptacular..". I myself have tried off and on in the past to compress videos for myself and others with acceleration and every time I tried CUDA or AMD's APP.ACCEL for the final AVC output I was disappointed at the blotchy, blurred, 'macroblock-y' mess that comes out, unless I allow for a much larger bitrate than intended (and receive a correspondingly larger file size). Somewhat surprisingly then, it was nice to find that I did not need to confine myself to "100% Quality" and always expect huge filesizes with GPU-based game recording to get decent viewable quality, suitable for sharing.

A side-by-side comparison of four CUDA quality settings (100%, 70%, 50%, and 20%), these are frames extracted from the CUDA-produced output file (Batman: Arkham City @ 1080p). Click to see Full Size






As can be seen in the screenshot comparison above, decent quality can be maintained at the 100% Quality setting, despite the speed of the encoding being done with H.264/AVC. I was pleasantly surprised, to be honest. Of course, Videophiles will notice slight Posterization and light MPEG-compression artifacts (such as Macroblocking) occuring even at 100% Quality if looked closely for; but for the majority of people, a quality setting of 100% (even down to 80%) should be found quite acceptable. I personally found that at 70%, the compression artifacts became more noticeable, especially the 'trails' that occur from the lossy Vector Quantization (the codec keeping track of where things are moving around between frames), which is why I chose 70% as the 'next notch down' on these tests. At a Quality setting of 60% and any setting below that, these 'trails' left behind are very apparent and in my personal opinion I do not recommend going below 60% for GPU-based game recording at this time, as the quality loss and compression artifacts become too obvious and may remain apparent even in a final render after editing.

An example of an NVIDIA CUDA encoded game recording, this is a frame extracted from the CUDA-produced output file (Diablo III @ 1080p). Click to see Full Size
The screenshot above is a frame taken from a CUDA-produced Diablo III game recording. Even during a busy moment, with many things happening quickly on the screen, clarity seems to be maintained at a decent level, even the somewhat-hard-to-compress 'Red Text On A Dark Background' - and this video clip was recorded at the 80% Quality setting. Although some macroblocking is beginning to occur in the darker/flatter area of the upper left (as the codec attempts to keep detail in the high-motion areas by compressing the more static areas of the screen to a higher degree), the darker/flatter toolbar and stone floor overall do not seem to be suffering excessively.  [As always however, my personal opinions on quality are mere suggestions based on my own tests and I encourage you to do a bit of your own testing, to find out what settings you would like to settle on and use for your projects]

Since the desktop/GUI can also be recorded with Bandicam, I decided to try record a web browser game with the GPU as well, for the people out there who like to record these games. I recorded two free online games, Lord of Ultima (a city/area building game) and Lottso Express (a bingo-style game), both of which are 2D (flat, board-like), which I thought would be a good example of more static detail (low-motion) compression.
An example of an NVIDIA CUDA encoded game recording, this is a frame extracted from the CUDA-produced output file (Lord of Ultima, a web browser game @ 1080p). Click to see Full Size

Lord of Ultima takes up the entire desktop in this large example of low-motion/static area handling by the accelerated codec. The game, captured above, is largely non-moving, especially the toolbars and left portion of the screen. Recorded at GPU-encoded 100% Quality, the toolbars are clear, text is very readable, and even the darker, static area of the Town Hall pop-up is clean and macroblock-free (very little compression artifacts throughout the entire screen), very nice.

A side-by-side comparison of four CUDA quality settings (100%, 70%, 50%, and 20%), these are frames extracted from the CUDA-produced output file (Lottso Express, a web browser game @ 384p). Click to see Full Size

Lottso Express runs in a small window, so I tested out recording at the exact resolution of this window (576x384) to see how gpu-acceleration handles lower resolutions. The differences inherent in the four Quality settings, in each of the four extracted frames from the CUDA recordings above, is shown. Although the top section/frame is quite clear (at 100% Quality), it quickly degrades as the quality setting goes down, but the recording still remains watchable. In my opinion, if one is recording just for themselves for fun, or just to share with a friend quickly, even a recording level as low as 60% quality may be acceptable when recording a low-motion 2D game with gpu-accelerated encoding; but I still recommend not going below the 70% Quality setting when recording a 3D (high-motion) game, to maintain enjoyability and clarity when using gpu-accelerated recording (such as CUDA, being used here).
 [My tests presented here are only done with CUDA, as I do not own any other videocards capable of gpu-based recording at this time. Output quality may vary when recording with AMD's App Acceleration or Intel's Quick Sync. As always, I suggest doing a few short tests yourself to see what codec and settings you personally would prefer]


What are the configuration settings that can be changed? 



There was not much available to configure, as far as the Quality settings and configuration, within the game recording application (Bandicam) at the time of this test. Certainly not as much as the x264 codec, with it's VFW interface, that has been developed over time by generous programmers/contributors. For CUDA, you can choose between Variable Bitrate and Constant Bitrate (allowing CBR if you wish to more precisely estimate the file size) if you wish, and you can utilize the CPU to assist with recording and compression as well, but that's about it. This may be a limitation in CUDA itself however, as I contacted Bandicam developers to see if there would be any deeper configuration options for the codec that CUDA is using (such as Deblocking settings and Partitions) and they got back to me stating that there were no other configuration options of that type available to include, at this time.

The interface for CUDA recording found within Bandicam's Video Format settings. 

As of the most recent version of Bandicam (announced at the Bandicam website just days ago), they have added the option to change the Keyframe Interval (the frequency of Information Frames within the Groups Of Pictures, which helps with things like 'trails' or corruption from compression) and the ability to change the FourCC identifier, both of which will make the recording more compatible with video editors such as the Sony Vegas and Adobe Premiere lines of products. I am proud to say that I did a lot of compatibility testing previous to this on my own and submitted my findings to the developers of Bandicam, just in case they were interested, which contributed directly towards this recent addition to their application. Information on my tests and findings, for those interested, can be found in these articles, here:
http://gametipsandmore.blogspot.com/2013/05/game-recording-with-mpeg-4-using.html
http://gametipsandmore.blogspot.com/2013/06/and-more-how-to-stop-trails-and.html
http://gametipsandmore.blogspot.com/2013/06/and-more-game-recording-for.html
http://gametipsandmore.blogspot.com/2013/06/and-more-how-to-record-with.html
http://www.bandicam.com/forum/viewtopic.php?f=14&t=1687&sid=06b9cdb4450af00f3d33fb4306963f6f




As always, remember that if sharing your video on streaming sites (which limit bitrate) and uploading sites (such as YouTube), your video will be recompressed [converted again] at settings much lower than your production video and detail will be lost (blurred/smoothed and show Macroblocking) due to  the nature of recompression. Therefore, if wanting to save time, there is no need to create and upload huge-bitrate, finely-detail video, as can be seen in this comparison of an uploaded YouTube video and the output that people will be seeing afterward:
A side-by-side comparison of frames extracted from the Original Uploaded Video (Left), YouTube's 1080p Compressed Video (Middle), and YouTube's 720p Compressed Video (Right), showing loss of data/detail due to recompression.
(Diablo III @ 1080p, high-motion scene, moderately-detailed game engine)
Click to see Full Size



How about performance?



I don't usually talk much about performance when I do QualityTests or TestRuns, except for maybe a short paragraph on my own experiences or a "Personal Short Version/Opinion" section at the end. The reason is, every system is different and everyone has their system configured in different ways. I could say that GPU-recording ran great for me, but then someone with a laptop running a non-dedicated videocard will tell me how wrong I am and that it doesn't run well at all. I could say it lagged me out all the time, but there would be many who could run the same thing with no problems. A quick look at any Technical Support area of any forum anywhere is evidence of the fact that even with the same hardware, there are a bunch of people who will have no problem at all and a bunch who will be having no joy. So, I mainly leave the issue aside but I sometimes mention how it seemed to run for me and [especially if anyone asks directly] I would be happy to provide more information to those who want to know more about it.

For me, performance while recording with the GPU was a mixed bag. For the most part it seemed to work fine, but there were a handful of games that just didn't like it as much. Some were choppy or laggy when I began recording. The theory is, that recording with the many cores of a GPU, processing and encoding a frame to a file should be more streamlined, but in practice it seems to take more resources from the videocard itself than I expected. Perhaps it is due to needed optimization of the games or drivers, or perhaps it is my 'older' videocards that are beginning to get on in years (I was running two GTX 560 Ti's in SLI mode during these tests). I did find that some games preferred non-SLI mode when recording with CUDA (such as Diablo III, if I remember correctly) and things were smoother when recording that way; but for the most part, accelerated recording ran with about the same responsiveness as [and in some games, slower than] my own optimized settings for the x264 iteration of AVC. Not long ago, I spent a bunch of time testing and tweaking x264 to record with the H.264/AVC codec, finding the best settings for speed while maintaining quality - and in light of the performance I was able to tweak out of x264 - GPU-based-recording didn't impress me much [if talking strictly about performance hit here]... perhaps I'll end up just using my settings with x264, for now.

For more information, tests and tweaks that I did over time with h.264/AVC and the x264 codec, feel free to check out these articles here at the blog (some are the same as the above asection):



An example of an NVIDIA CUDA encoded game recording, this is a frame extracted from the CUDA-produced output file (Allods @ 1080p). Click to see Full Size



Overall, recording with gpu-acceleration was pleasantly surprising. Quality (at 100%) was better than I expected from it (it does taper off quickly as you lower the quality setting, however). The file sizes were comfortably small (about 25GB for an hour of recording at 70% Quality, about 45GB for an hour of recording at 100% Quality). The performance was a mixed bag for me, but for most games it was indeed fast and could get the job done. So, in essence, it managed to deliver on all promises. Not bad at all.


Have fun recording with your GPU and See You In The Games!