Multi-threading support when encoding

Encoding h.264 video. My edit doesn't have any fx or filters applied. Just plain cuts and edits. Encoding only uses 30% of my cpu. Is it a bug? It seems like hf is only single threaded.

«1

Comments

  • HitFilm is a very GPU-heavy program. So, it might very well be that your gpu is the bottleneck - or something else, as you haven't given you systems specs. Have you monitored you gpu usage while encoding? Also, as you are suspecting the lack of multithreading to be an issue, have you monitored the individual usage of your cpu cores, and can confirm that one core is at max while the others are not?

  • edited February 2015

    Hard drives can also bottleneck renders. I've seen this happen, where a render wasn't using much CPU, and seemed to be clunking along quite a bit more slowly than I was used to, only to realize that the hard disk was pegged at 100%. Switching from a USB2 port to a USB3 (they look the same...) made quite a difference.

    That said, while HitFilm's renderer is quite fast, the absolute fastest H.264 encoder I've seen so far has been Handbrake. 

  • edited February 2015

    Encoding video is a cpu task. You can test it yourself with GPU-Z to see that the  gpu is not used when you encode the video in h.264 in HF. The plugin in HF might be gpu accelerated but video encoding is purely cpu.  And in this case Hf doesn't use all cores when encoding video.  Vegas uses all cpu cores.
    The same video encoded in hf and vegas with no editing, no plugins.
    HF = 44secs  uses only 30% cpu
    Vegas = 24 secs uses 90% cpu, mainconcept encoder

    Those are really bad numbers in HF. I know guys here don't do video edits in HF but it seems that the video editing aspect of HF is a bit neglected.

  • I know the pain if the software would take much CPU or GPU during encoding, but my Pavtube would not do that since I can customize CPU cores it would take for encoding. So it will not affect other software on my computer during the task.

  • emma24xia

    Sorry I didn't understand what you mean. Are you talking about cpu affinity?

  • I think emma talks about programs clogging up the cpu so the pc won't react anymore.

  • So emma wants to have encoding use up less cpu cores like now for hf? :D

    I haven't compared how slow hf is when exporting/encoding video against fusion, adobe, or even the lowend video editing appz but I suspect those appz are multi threaded not single threaded like hf.

  • chibi  - I'm interested by what you've been talking about here & have run some tests myself but I don't see what you do.

    What were you using to collect your data?
    Can you upload the footage you used so I can test it here?
    Did the export from Vegas & HitFilm match exactly? (i.e. no change in bit depth... etc)

    HitFilm will use as many cores as your system has available, but not necessarily all of them, or at 100%. The operating system allocates cores to the currently active threads (from all running processes) according to its own scheduler.
    There are some operations in HitFilm which will be very CPU intensive, but these operations will often be interleaved with things like disk I/O or rendering. E.g. HitFilm will use multiple cores when decoding a video, but the CPU will have to wait for the actual data to be read from disk.

    Ady

  • Nothing fancy. Just a video dragged to the editor cut multiple times with no filters.
    Its been mentioned by other users in other forums that video encoding in HF is really slow. Multi threading is not good.

    Do a straight comparison of any clip and just render it with different video editors and see which one uses more cpu resources.  Vegas uses up to 90%.

  • AdyAdy Staff
    edited February 2015

     @chibi

    Whilst I can see the image, it appears to be captured at the end of the Export, is HitFilm showing CPU usage like that all the way through or only towards the end? You have 94 Processes running are any of those potentially stealing resources away? Are the same amount of Processes running when Vegas is used? There are a lot of variables here.

    I've run quite a few tests, but I'm sorry but I just don't see an issue here, when comparing against Vegas using the same footage, exporting with the same settings & on the same machine (i7 with a GTX 760) the CPU usage between HitFilm & Vegas is almost identical (give or take few % either side).

    I'm happy to look into any issue, but I don't have much information to go on here. Maybe if you could provide some of the footage you've used I may see the issue. But currently with the formats I've used, they've all been fine. 

    There is always ways in which we can improve & if you feel this needs looking at then I'm happy to put it to the team, but I need to see the issue here.

  • When you are encoding video in hitfilm, how high is the cpu  usage that you saw on average in hf? It never goes up to 50%.

    In Vegas it can go as high as 90%. Same for AE if you click multi-process in preference..

    On 3d renderers it goes up 100%.

    Multi-threading in hf is just no good.

  • I've done a quick test as I too suspected HF multithreading was lacking however I've been pleasantly surprised.

    I took an 18 minute 1080/50p file and grabbed a 1 minute section starting 1 minute into the clip. What a  mission, HitFilm 3 seems to be even worse than HF2 with handling m2ts files. To do the above must have taken over 10 minutes! Anyway the export is what's important for this test. Set the level to 5.1 and left it as variable 10Mbps.

    CPU and GPU were measured using Sysinternals Process Explorer and these are for the  HitFilm process, the total CPU was usually about 1-2% more.

    CPU: 72 - 75%, GPU: 29-30%. Time: 2 minute 37 seconds, Created a 90.3MB file

    Just for fun I used Avidemux to convert the container from AVCHD to mp4 and did the same as above.

    CPU:75-80%, GPU: 17-19%, TIme: 2 minutes 11 seconds, created a 87MB file.

    I'm not sure what's limiting the CPU to ~75% (maybe 3 threads), it's not the hard drive speed as converting the 18 minute m2ts to mp4 in Avidemux took about 10 seconds. And it's also interesting that the file sizes differ given that it should be the same 1 minute of footage, it's only the container that changed.

    I also tried the same file (m2ts version) in Magix Video Pro X4.

    CPU:85-89%: GPU 0%, Time: 1:40, created a 117MB file.

  • edited February 2015

    "it's not the hard drive speed as converting the 18 minute m2ts to mp4 in Avidemux took about 10 seconds." - That's not necessarily true, because if you really just changed the container of the file, you're changing not much more than the file extension and some metadata - the video and audio data can remain just as (and where) they are, and therefore, depending on the implementation, don't have to be written to or even read from disk.

    That aside though, copying a 90MB file doesn't take more than a few seconds on a standard hard drive, so that is indeed most certainly not the limiting factor.

  • __simon__
    What cpu did you encode on? Remember gpu has nothing to do with video encoding since HF is not cuda or opencl accelerated.  Video encoding in hf is all cpu. More cores means faster but in my 3770 cpu it doesn't use all cores.

  • edited February 2015

    @chibi, I have an i7 2600K, I've disabled hyperthreading so I've got 4 cores. I was surprised to see the GPU usage since I wasn't adding any effects.

    @Robin, "Convert" was probably the wrong term to use, I meant made a copy of the video and audio streams in a different container. So I still had the original 3.3GB m2ts and ended up with another 3.1GB mp4 file.

  • @__simon__

    Why disable HT?
    Can you try the same test with HT on and see if there are any differences to render time? If there isn't then hitfilm doesn't use hyperthreading so that's why its not using more cpu resources during encoding. Vegas, etc uses all threads.

  • HitFilm's H.264 encoder is multi-threaded and will make use of all cores and hyper-threading if available.

    The reason the CPU Usage doesn't reach (or get close to) 100% is because HitFilm does all of its timeline rendering on the GPU.  No matter how simple or complex your timeline is, it gets rendered on the GPU.

    The exporting works like this:  Render frame on GPU; encode it on CPU; render next frame on GPU; encode it on CPU.  And so on.  At periodic intervals a bunch of encoded frames will also be written into the file container on disk.  The encoding threads have to wait on the next frame to be rendered by the GPU before they can actually encode it.

    The speed of the GPU (not just in terms of processing power, but also how quickly the driver can upload textures to the GPU, and read textures from the GPU) will directly impact the utilization of the encoding threads.

    Below is a screenshot of exporting a simple timeline (single video clip) on my system.  The GPU averages about 30% load, while the CPU averaged between 50 and 60% load.  You can see in the graphs that all cores are running threads.

  • That's a shame. Because the way it works now the gpu doesn't really make the overall encoding faster if other video editing appz are faster by just using the cpu. Wonder if all the timeline process can be moved to the cpu instead.
    The process now is slower.

  • edited February 2015

    Never mind, reread it :D

  • In most 3d renderers.

    Render time example

    1 thread 217 sec
    8 HT threads (4 cores) 42.1 sec

    If they would be real 8 cores, it would be 27.125 sec
    If they would be real 4 cores, it would be 54.25 sec

    Better to have 42.1 sec than 54.25 sec. It's nearly 30% faster.

  • @chibi, enabling HT requires entering the BIOS and my PC is busy right now, I'll give it ago if I don't forget the next time I boot it. However if we assume that enabling HT would make the OS believe that the CPU is twice as powerful then in theory the 75% usage would now be reported as 37.5% which is closer to what you reported.

    If what @Hendo writes is correct, since my tests resulted in 20-30% GPU load, then this indicates that the overhead of simply coping a frame to the GPU, doing nothing to it, then copying the frame back to be encoded by the CPU uses 20-30% of the GPU. Or is HitFilm actually doing something to the image even though I've applied no effects/scaling?

    As for why I disable HT; that goes back a long way and maybe my reasoning is no longer valid. I use my PC for audio editing and way back when HT first came out one of the differences between a real core and a HT was that the HT shared the floating point processor, so on my 4 core CPU I have 4 floating point processors. Since I use a real audio app that does floating point math (that's a dig at Pro Tools who have only recently moved from fixed point) then using HT is actually worse since a thread can stall while waiting for another thread to finish with the floating point unit, this stalling actually results in worse performance than using less threads.

    On a modern CPU maybe there is a floating point unit per thread and there's something else that's shared that makes the difference between a real core and 2 HT, I don't know. Also maybe the OS can now be told, "hey don't run these threads on HT that share a floating point unit". I should really look into it but for now I'll just stick with HT is bad for audio.

    As for HitFilm one of the Devs would need to step in and say whether HT on/off is preferable.

  •  To clarify one small point - Hendo is a HitFilm developer, previously head of our software team!

  • I completely missed the STAFF icon next to Hendo's name!

    @Hendo, I apologise for doubting that you were correct, had I noticed that you were staff I wouldn't have done so. Thank you for taking the time to explain the process.

  • __simon__  It sounds like you don't understand how HyperThreading works. It's largely a scheduler thing, and there's actually no way for a running application to tell the difference between a logical CPU and a real one, by design. It's really only a scheduling thing, enabling HyperThreading just allows the processors to issue instructions to available execution slots from more than one thread at the same time on a given processor.

    The stalls you described aren't actually a result of simultaneous multithreading, they're a result of a design flaw in the parallelization approach in the software, the conflict is in shared data structures that have to be independent to parallelize well, but aren't, so they force one thread to wait while another thread is working on the same chunk of data.

    In reality, all of the threads running on every processor are regarded as equal from the process scheduler's view. 

    There's an article that covers in more depth how processors really work on Red Shark News. I know it well, I wrote it. :)

    I ran into a similar situation at work not long ago; we converted some large data manipulations into parallel code, and tripped over some shared data issues in some external code we were using. It took a bit of chicanery to sort it out, since we hadn't written that part. 

  • @__simon__

    "However if we assume that enabling HT would make the OS believe that the CPU is twice as powerful then in theory the 75% usage would now be reported as 37.5% which is closer to what you reported."

    That doesn't  make sense.
    Btw, I think you should turn on HT  because that's more power you're not taking advantage of. :)

  • @WhiteCranePhoto, I do understand how HyperThreading works, although I'll admit that most of my knowledge is based on what I read about it when it first came out in the Pentium 4 days. Certainly back then the OS couldn't differentiate between logical and physical cores, I just thought things may have changed in the 10 or more years that Hyper Threading has been around.

    I think you've misunderstood why I believe it's bad, the explanation you've given above is a general multiprocessing issue and not specifically a Hyper Threading one. My understanding of a single core is that there is only one floating point unit which both the logical threads need to share thus in an application that makes heavy use of floating point math one of the threads can stall while the other is using the floating point hardware. It's not immediately obvious why this would be worse than only having one execution thread, I think it's due to the instruction pipeline.

    I really should go hang out in an audio forum and find out if disabling Hyper Threading is still the thing to do.

    Congratulations on the RedShark article, I think I read it. I'll have to go find it and read it again as Hyper Threading may have evolved and I may learn something new.

  • @chibi, why do you think that doesn't make sense?

    If HitFilm is only using 75% of my CPU, and the total CPU usage is only 1-2% more then this means that there is something else that's the limiting factor and approximately 25% of my CPU is idle. If I then enable Hyper Threading I'll now have 4 more cores sitting around idle. Thus HitFilm was using 75 out of every 100 cpu cycles, now it will be using 75 out of 200 cycles hence the 37.5%.

    Having written all of that, I don't really believe it 100% myself.

  • There is a very simple test to rule out "bad support" of multithreading as the cause of limited processor usage, and that is looking at the graphs of the individual core usage in the taskmanager. If the 75% usage is distributed more or less equally between the cores, then the multithreading can't be the issue. Obviously the work IS split across all available cores, and something else is limiting the render - as has come up, the ram to gpu copying or something else might be the problem.

  • __simon__ I do understand what you're saying, what I'm telling you is that your reasoning is simply incorrect. It IS a parallel programming issue. The idea that there is a single FPU per processor indicates that you don't understand how superscalar processors work, and the idea that enabling simultaneous multithreading adds cores indicates that you don't understand how simultaneous multithreading works, also.

    Even in the P4, when Intel first released SMT which they called HyperThreading, your description doesn't fit. The P4 was superscalar, and superpipelined, and stalls weren't from sharing the FPU they were from instructions either blocked waiting for I/O, or in the most common case in parallel computing, threads running into shared data contention issues that prevented the processor's scheduler from issuing instructions in parallel.

    When most programmers who are new to parallel computing write parallel software, especially when they're working with complex data structures and complex algorithms start writing parallel code, they have tendency to create a lot of inefficiency that hinders scaling in the interest of maintaining data integrity. You can't have a thread working on one piece of data while another is modifying it and still get sane results, so the simplistic solution is generally just to put mutexes on shared data structures, which adds a lot of overhead for the mutex, and also blocks threads while accessing that shared code.

    THAT is why more threads can in some cases cause slowdowns, not because there's one FPU per core. Yes, it's a parallel programming issue, and nothing more. That's also not a knock on the programmers, making parallel code scale well take skill and experience, and a lot of times an approach that looks like it will work just for one part of the task works fine, but then trips over a bottleneck that wasn't an issue when you were only using one thread.

    There's really no reason any longer that you should ever disable SMT on modern machines, because even if your main task isn't using more of the processors' resources, there are plenty of other OS tasks that can use those resources.

  • __simon__

    WhiteCranePhoto gave a very technical explanation that I could never explain better. :)

    Just turn it on you're missing out for more than half a decade. Almost as good a jump to ssd.

Sign in to comment

Leave a Comment