Max Cores/Threads?

Hiya!

I couldn't find any info on this, really. Does anyone know how effective and how many Cores/Threads HitFilm Pro 2017 can use?

I know it's not really easy to program multithreading into programs, and I know all my cores aren't likely to all hit 100% (hell, 50% would be pretty nice!). But I have a lot of cores/threads (dual xeon E5-2699 v4's; 44 Cores, so 88 Threads...I mostly do 3D modeling/animation)...just wondering if anyone has any tips/tricks for maximizing usage when Rendering and Encode/Decode a project.

^_^

Paul L. Ming

Comments

  • Triem23Triem23 Moderator

    Um... Your massive amount of cores won't do too much for you in Hitfilm.

    Most of the heavy lifting in Hitfilm is done by the GPU. The CPU mostly does file read and decode, file write and encode, and physics calculations for the particle sim. There are a few other small things done by the CPU, but pretty much all rendering tasks are covered by the GPU.

    Even for things like file decode/encode, I BELIEVE, but don't quote me here, that Hitfilm is using a single core per video decode. This is helpful for people doing heavier video editing, as, in theory, one could have several media clip reading at once without much additional slowdown, but Hitfilm is optimized more for consumer-end processors and GPUs, not workstation GPUs and xeons.

    On the other hand, your machine is probably a blazing beast in dedicated 3D software.

    I probably got something wrong here, and, if so, I'm sure @NormanPCN will set me straight.

  • Media decode/encode is multithreaded but there's a lot of things affecting how many threads or cores would be utilized.

    • During playback utilization is going to be clamped by the frame rate of the project. Ok there's more to it than that but in layman's terms if HitFilm can playback @ 60 fps on your system but your project is @ 30 fps then you're only going to see roughly 50% utilization. This isn't unique to HitFilm. There's no reason to load up the CPU just for decode/playback.
    • The codec the video is using will make a difference too. Some handle multithreading better than others but in general they all seem to be optimized for 4-6 cores with the efficiency tanking beyond that. This is for Adobe Premier CC 2015 but the results and graphs on multi-core decode/encode performance are informative. Premiere Pro CC 2015 Multi Core Performance
    • If you're doing pure CG you'll never see high CPU utilization because in HitFilm the GPU is doing all the heavy lifting like @Triem23 mentioned
    • Encoding in theory could load up a lot of cores but you probably won't see that because an encoder has to wait on HitFilm and the GPU to feed it frames.

    A big plus for a system like yours is in being able to handle a larger number of video streams at once. Where a quad core can struggle to keep up with decoding two 4k streams you can handle several of them with other bottlenecks like drive throughput being the limiting factor instead of the CPU

  • Triem23Triem23 Moderator

    @Aladdin4d correcting myself and correcting your correction, maybe... Didn't either CedricBonnier or DannyUK77 say once Hitfilm used one CORE (not thread) per decode stream, or do I need to purge that from my brain? 

    I probably got that wrong somewhere. 

  • @NormanPCN once mentioned he saw a thread pool was created for decode but couldn't determine exactly how they were being used. Sometimes some threads terminated with no execution time and at other times all threads got execution time. Once the thread pool is created the OS allocates cores to active threads according to @Ady ;in this thread. @Hendo has a good comment in the same thread on HitFilm's multithreading. 

     

  • Lots of CPU cores for Hitfilm really only come into play with lots of parallel media streams in decode. Heavy compositing and/or 4K work.

    Yes, AVC file encoding is multi-threaded but most encoders run out of steam (scaling) at 6-8 cores. Even the mighty x264 encoder. Encoding to AVC/HEVC is just not a very parallel operation without compromises. Not my words but those of the main x264 developer.

    Decode is definitely multi-threaded. Exactly how many threads are allocated to a decode stream I cannot say. Yes, the decode (native MainConcept at least) looks to have a thread pool. My machine has a 4C/8T CPU and I see 8 threads created and destroyed during decode operations.

    When Hitfilm is doing its decoder thrashing thing one often sees most threads in the pool with zero execution time. With HD material maybe a couple/few of threads with time and the others with none. The thrash not only zaps the LongGOP context but also causes the entire thread pool to be terminated and re-created. Every handful of frames.  Thus the reason for many threads with no execution time.

    If you take just a single media stream and avoid the decode thrash situations you will see all 8 threads stay live the whole time and all will have significant execution time at the point you stop playback. HF is not using all threads at the same time. Utilization graphs show this. This just indicates a likely underlying thread pool.

    To take a guess it appears that around 2 threads are allocated from the pool for HD work and around 4 threads for 4K.

    I'm not sure if HF is asynchronous with decode. I have doubts given the poor timeline playback performance of HF in high overhead AVC media relative to Vegas, Powerdirector, FCP, Adobe, iMovie, Windows movie maker.

    HF asks for frame 100, receives it and then goes to do its thing with that frame which is basically GPU work. Does Hitfilm start/queue the frame 101 request after receipt of frame 100. Or does HF wait to start a decode request for 101 until the point it actually needs frame 101 to keep working without a pipeline stall.

    My memory is good for the native decode types. Quicktime and Video for Windows could be different as these are external complete subsystems.

  • Want to tip to make Hitfilm use double the CPU that it usually uses?

    Make a relatively simple composite. It doesn't have to be very long; 10 seconds is fine.

    Now on the Playback controls, set the loop icon to on. Open up Task Manager and look at the CPU usage. Should be idling about 1%.

    Now start to play the composite and note the CPU usage as it plays. When it gets to the end, it'll loop to the beginning, but...doubles the CPU usage and playback becomes jerkier as well.

    OK, I never said it was a useful tip... :)

    Pause and restart and the CPU goes back to the original lower value with smoother playback.

  • Hiya folks!

    Thanks for the info! I suspected the low usage-thread count, actually. No worries. Just means I can get HF working on a 4k vid, minimize it, and get back to work on other stuff. :)

    One of the main reasons I invested so much moolah into this new rig (I'm having Maingear build it for me), is that I can allocate a butt-load of threads to rendering a 3D image/animation, but still have enough left over to keep working. It just got to the point where once I hit "render", I pretty much could go on a long-weekend getaway before it was finished. It'll be nice to be able to hit "render", and just keep working! :D

    Thanks again guys!

    (PS: I'm a newb at video editing...know some vfx stuff...but I need another source of income and indie-film video editing/vfx is actually a 'thing' up here where I live....unlike 3D, where it's all online remote freelancing).

    ^_^

    Paul L. Ming

  • This doesn't really make sense to me i have tried a few different editors and they all utilise a lot of core+threads, and my IT background makes me believe that encoding and video rendering is heavily CPU bound, so i did a simple test, i was in the middle of rendering a video anyway so i opened up task manager first and saw that CPU was at between 55%-65% usage, then i opened CPUID HWMonitor to check GPU usage and found it to be around 20% on average and only 60% max, this was probably due to watching some video while waiting on rendering. I don't want to offend anyone here but what they are saying doesn't really match up to what i found and what i have always thought to be true; Rendering really loves CPU time, please correct me if i am wrong but it does seam to be that way and if i am correct why does Hitfilm express only use around 60% of my CPU time when it could easily use more?

     

    https://i.imgur.com/cTSJbve.png - screenshot of HWMonitor with the render behind it

  • edited September 8

    @LeeBomb20 "...why does Hitfilm express only use around 60% of my CPU time when it could easily use more?"

    Hitfilm does operates as specified in previous posts. Media file decode is CPU, 99% of effects are GPU, and file encode is CPU.

    Why does Hitfilm show low utilization during export or ram preview? Only the Devs can say for sure. What we end users can say is that Hitfilm is slower than other apps and shows lower utilization.

    My speculation is on the GPU readback being a culprit in low utilization. I can, and have, constructed a test case that will peg my GPU to 90-100% on playback and playback is still real time.  This test case is pure CG so the CPU sits around one thread as expected. On my machine, 4C/8T, this is approx 12-15%. Then doing a ram preview, the frame rate is *much* slower than real time and the GPU utilization is a ton lower as well.  The same thing happens with encoding but now we are adding a new variable to the equation. The encoder CPU utilization. The less variables the better. Both ram preview and exporting (file encoding) require a GPU read back.

    The lowering of utilization (play vs ram preview), and the slower throughput, and the fact of a read back would seem to indicate a DMA operation in the mix. Also, ram preview is not timeline framerate limited. Nor is export for that matter. The slowest code in the world can easily crank CPU utilization. Utilization does not directly say anything about real world speed. DMA will not show up on utilization but if the app is waiting for the DMA to complete the utilization will lower.

    Is this readback slowness due to the OpenGL driver or Hitfilm incorrect coding for fast readbacks. It is easy to screw up a PBO component order and and double of triple buffering might be an issue for the Hitfilm pipeline. There are just so many things one just cannot say. For example, I have no problem believing that OpenGL readback can be a poor implementation in the driver and/or in the necessary OpenGL pipeline dataflow for speed. OpenGL is all about displaying 3D images. Readback is not much of an important thing.

    Utilization. Really who cares what utilization is. What we care about is real world speed. That said, Hitfilm is slow. The low utilization indicates there is a pipeline stall in there somewhere. Someone/thing is waiting on something and that something has a lot of idle time in it. Hence my DMA speculation. That low utilization also indicates that if one can break the dam creating the stall then there is big time speed to be gained because GPU and CPU typically have power to spare.

    File encoders, like AVC/H2.64, can crank the CPU factor but in Hitfilm this stall of what I speak is slowing the frame rate being fed to the encoder, thus clamping what it can contribute to total utilization. For example, encoding in Handbrake is not clamped in frame rate the encoder is being fed with. It is reading a file. So it is easier to it to peg utilization.

    I know exporting/encoding to image sequences is not asynchronous to the Hitfilm engine. So an encoder can slow down the Hitfilm rendering engine. Encoding to AVC (Hitfilm MP4) option is hard to figure. The very nature of LongGOP encoding requires that the encoder have multiple frames in flight. So does the encoder when receiving a frame from the source (aka Hitfilm), immediately release the app and the encoder go on doing it's thing independent of the app. Or is there some synchronous operation here even though the encoder will be creating and using it's own threads for it's own work.

    The point of this async talk is about keeping the dataflow pipeline moving. Hitfilm, and any GPU type app, has very specific separation lines of who/what does what in the pipeline. Media file decode is CPU. After a frame is decoded then effects and such are done. Nearly all of these effects and work are GPU in Hitfilm. The CPU really does not have any work to do while the GPU is doing it thing. There is the GPU dispatch logic but let's keep the argument simpler. A simple thread priority deals with thing like that. Back on track, now we get frame 100 from the media stream. Why not start up an async request for frame 101 when we go off and do our GPU based effects and such. Then hopefully when we want frame 101 it will be sitting there waiting for us when we get there. If the effects and such were primarily CPU then this async thing is not quite the pipeline benefit. Less black and white. My post is long enough.

    Similar argument for the encoder end. It is CPU based. When it is given a frame, just soak it up, release the app (the video engine) and do the encode thing asynchronous to the rest of the app.

    In the above async ideas one can have our own CPU threads competing with each other. This is really okay. This is not a thread per client server with a 500 clients beating on us and thus 500 threads competing for time on an 8/16/32 core CPU causing excess context switching. This is why the thread pool for the front end decode is a good idea. Don't create excess threads.

    So why the async talk. I think that Hitfilm does a lot of thing synchronously. I know the UI does this.

    Take one example: A 4K GH4 MOV/MP4 high overhead AVC file. This will not scrub smoothly in any app on initial scrub. Frame caching apps can gradually become smooth depending. In Hitfilm the playhead will not move smoothly. It will jump since the view updating cannot be done real time with the scrub. Even a slow scrub. The whole update and playhead movement is synchronous with the update and the update code has high latency due to the decode overhead. Now contrast with Vegas and Resolve. The playhead will move smoothly. The viewer will still lag badly like in Hitfilm but the UI is smooth and responsive to your mouse always. The update is disconnected (async) to the UI. 

    Someone who has a synchronous mentality is certain instances is likely to have it in others. As I said, we as users cannot know things in precision. So would a lack of async in a mixed CPU/GPU pipeline, is that exists, be an/the issue in Hitfilm. I am only comfortable in saying(proclaiming ) it is not 0% and not 100%. Anywhere is between is fair game.

    Decode with CPU, then GPU doing nothing. Then GPU kicks in and CPU is doing nothing. For export now CPU kicks in a GPU doing nothing. I think you can see where utilization can lower with sync code.

    Doing or setting things up async can be more work and thus time. It adds a degree of freedom and thus can affect stability. Hitfilm is very stable and FxHome should be proud of that.  Dealing with GPU drivers is almost quantum mechanical probabilistic. 

    One thing I am curious about is why native Cineform performs so well in 2017 relative to others. Cineform, by itself, has similar performance to other low overhead I-frame codecs and similar bitrate. Things that make you go hmm.

    The decoder thrashing that Hitfilm does in documented circumstances has got to go. I'm sorry, but that is just stupid.

    ...I need to shut up.

  • @NormanPCN that all makes very interesting reading. If they could also stop it fighting with itself, as in the example in my previous post (double CPU = half performance) that  would be nice too.

  • DannyDevDannyDev Staff
    edited September 8

    So, this has been discussed a few times but to reiterate, CPU utilization is a very poor measure of performance. An efficient computer program will use fewer resources, not more.

    HitFilm, like most modern applications running on PCs, does not exist in isolation. It uses 3rd party frameworks and tools for media encoding/decoding, UI (Qt) and GPU rendering (OpenGL via the Nvidia/AMD/Intel driver).

    We have almost no control over what these 3rd party components are doing so it's unfair to claim that HitFilm is 100% responsible for performance, perceived or otherwise. We've had to work around memory leaks, performance bottlenecks and even crashes in GPU drivers alone on many occasions.

    Comparing HitFilm's perceived performance with mature applications developed by teams many times our size and budget is also unfair.

    Working with QuickTime (.mov) media on Windows for example requires the use of Apple's QuickTime SDK which they have abandoned. This library is 32 bit and cannot be used from HitFilm which is 64bit so we had to go out of our way to 'wrap' the library in 32bit helper process which has proven to be a big bottleneck.

    Implementing our own 64bit capable QuickTime import library from scratch is possible but would be a massive amount of work.

    We use a 3rd part library for 3D model import. It's a very good library but provides no way to multithread the import so that's another bottleneck beyond our control.

    In fact there is very little functionality within the HitFilm code base that doesn't depend on some 3rd party component.

    There is certainly a great deal we can do to improve performance on our side: HitFilm was conceived before 'HD' resolution videos (1080+) were commonplace for example so performance edge cases that were acceptible back in 2010 are no longer adequate.

    But it's often difficult to isolate these bottlecks and devise solutions, expecially when the developer time could be used to implement new features that ultimate justify the upgrade price.

    In my personal experience, investigating, isolating and addressing a specific performance problem in the software, particularly in an area I was not previously responsible for, is a daunting and very, very time consuming task; the risk of introducing a critical functional regression increases exponentionally with the performance gains and simply re writing the code, which itself has the been the recipient of numerous bug fixes and feature enhancements over the year, is rarely a practical option.

    And like those 3rd party components, HitFilm's internal components do not exist in isolation.

    In summary, we are aware of the performance issues and working on them within the constraints of our limited resources and time budget. 

  • edited September 13

    "CPU utilization is a very poor measure of performance. An efficient computer program will use fewer resources, not more."

    I have no problem with such a statement at some level but it is an odd definition.

    One thing that is true is that code must be executing to accomplish a task. If code is executing it will register on utilization. You have to execute code to accomplish a task.

    App A does not make the same speed in throughput as app B. App B shows higher utilization than app A. App A is not being "efficient". It is just slower. If App A made the same speed/throughput or better than B at less utilization one can then make a statement/claim about efficiency.

    How I would define efficiency. Take for example the Mainconcept (MC) AVC encoder and the x264 encoder. Both encoders can fully utilize a 4C/8T CPU if the source can feed them frames fast enough. The x264 encoder is somewhere between 2 and 4x faster than MC. It is just more efficient. They are both pegging the CPU to the wall, 90-100%, but x264 finishes it's work sooner because it does more work per actual CPU clock executed (summing all threads used).

    If MC code was rewritten to be more efficient it would not use less CPU in utilization, it would use less CPU in that it would just finish the task sooner.

    It is never good to leave hardware performance on the table. That said, in this discussion I discount crazy machines with tons of CPU cores and such (e.g. 32 and maybe even 16). GPU centric apps, like Hitfilm, often don't really have that much use for that many CPUs.

    "But it's often difficult to isolate these bottlecks and devise solutions, expecially when the developer time could be used to implement new features that ultimate justify the upgrade price."

    Very much agreed. If FxHome came out with a Hitfilm that only had performance boosts across the board I would buy it in a heartbeat. My time is worth something. I am an impatient POS anyway. I would agree/submit that most users are willing to complain about performance but not pay for it.

    But hey, I get off on that sort of thing (performance). Having developed global optimizing compilers etc and SDKs and stuff for decades, performance is kinda the itch I love to scratch.

    This is one area where the likes of Adobe can beat a smaller entity. They have money out the wazoo. They can afford to have a couple of guys take the time to tweak/change something and not "contribute" for a couple of releases.

    "In fact there is very little functionality within the HitFilm code base that doesn't depend on some 3rd party component."

    This is also true for other apps like Adobe and Vegas. e.g. Mainconcept for one. When you plop an AVC MP4 onto the timeline it is a common report for Hitfilm to stutter (not make speed). Can't really blame Mainconcept since others are using it and they don't stutter. Same hardware. Too many forums reports on this. Likely the only third party in play in this senario.

    "Comparing HitFilm's perceived performance with mature applications developed by teams many times our size and budget is also unfair."

    Perceived performance? 

    IMO.

    Size does not matter. At least with respect to "doing it right". I'll take the Pepsi challenge on that one. Never had a problem beating Microsoft, Borland or Symantec. Admittedly a simpler scenario given an app like Hitfilm has sooo much more going on than my situation.

    As for features added in a given time then size can/does matter. Really because size implies money and money is what matters.

    Mature does not matter WRT doing it right, IMO. Of course we all know more today than we knew yesterday. The sad fact is that management almost never lets developers rework something that already "works" even though it can be vastly improved. With this situation, mature does not stand for much if anything. Mature only matters when internal re-works are continually allowed and funded. Atypical case.

    Budget is certainly everything. Unfair comparo? I suppose, considering all issues as a whole.

    Nobody is complaining about 3D model import speeds, or most other things. In the forum the performance reports are weighted on basic timeline perf and export perf. The two are related but export has the readback thing added on top. I think if the timeline were smooth, people could live with the export.

    We users are telling FxHome what's wrong in our eyes performance wise. Take that as you will. The issue will constantly come up until it is not an issue. This is to be expected, even if it is upsetting.

    I don't start threads on the issue(s) for giggles to beat on Hitfilm. I will jump into an existing thread as I believe I can contribute better information than most on that particular subject. I also have tried to contribute with information to help others work with Hitfilm, because Hitfilm is great and people should use it. Hitfilm does need some help. 

     

Sign in to comment

Leave a Comment