Tracking performance questions

This discussion was created from comments split from: Two viewer panels.

Comments

  • @File_Empire I'd actually like the option to turn off the playback in the viewer window in Pro, if using the second monitor view as well.

    I assume that it takes some time (and VRAM) to duplicate the rendered frame to the other window, and sometimes (all the time) you need all the performance you can get. :)

    Which is why I only use the 'float' method.

  • I assume that it takes some time (and VRAM) to duplicate the rendered frame to the other window, and sometimes (all the time) you need all the performance you can get. - @Palacono

    The full screen preview has been designed to be really fast and not take a toll on the performance. That is why if you, for example, disable motion blur in the viewer, you won't get motion blur in the full screen preview either. This preview is just repainting the image rendered by the viewer on a different monitor (I'm simplifying a bit but ultimately that's what it does).

    If you see any difference in performance in performance with the full screen turned on and off, you should let us know as this is an issue.

  • edited September 11

    @CedricBonnier OK, well I was only extrapolating from what I've seen elsewhere, not done definitive tests. I've got enough problems with jumpy RAM Previews and looping playback skipping.

    Taking Tracking as an example. Tracking takes time. Selecting another program/windows/folder on a second monitor to take focus away from the viewer window and the viewer is no longer updated and the timeline indicator whizzes along noticeably faster until the last frame when everything reappears. Win-win if it's got a clean image to track.

    As the only difference is it's not drawing the frame, a tracking box (or two) and some Xs, one or all of those is sucking up a lot of something... Is it likely it's the Xs and the tracking boxes? I'd hope not for such small elements, so guessed it was the actual render of the image. It's been processed, optical flow scanned offscreen etc. because otherwise the tracking wouldn't work, which it does. So I took it that any amount of rendering of the image was taking a significant enough amount of time that doing it twice (even just a cut'n'paste of the finished render frame or copying from an offscreen buffer) was something to be avoided...on my hardware anyway.

    If you've got GPU and CPU to spare, maybe that's not an issue. :)

    Or Tracking could just be faster because it's currently doing something it needn't and I'm comparing Apples to Bananas?

  • "Tracking takes time. Selecting another program/windows/folder on a second monitor to take focus away from the viewer window and the viewer is no longer updated and the timeline indicator whizzes along noticeably faster until the last frame when everything reappears. Win-win if it's got a clean image to track. "

    Decoder context thrashing.

  • DannyDevDannyDev Staff
    edited September 12

    @Palacono

    "Taking Tracking as an example. Tracking takes time. Selecting another program/windows/folder on a second monitor to take focus away from the viewer window and the viewer is no longer updated and the timeline indicator whizzes along noticeably faster until the last frame when everything reappears. Win-win if it's got a clean image to track. "

    Its faster because nothing is being rendered and that is the bottleneck in this case.

    Unfortunately, this is actually an oversight going back to HF1. Tracking should stop when HitFilm loses focus, just like playback and RAMPreview.

    The reason is that the underlying media being used by the tracking engine could change while HitFilm is in then background which could lead to problems when HitFilm gains focus again.

    We will fix this in a future release.

    "As the only difference is it's not drawing the frame, a tracking box (or two) and some Xs, one or all of those is sucking up a lot of something... Is it likely it's the Xs and the tracking boxes? I'd hope not for such small elements, so guessed it was the actual render of the image. "

    Please do not assume a correlation between the size in pixels of an element on the screen and the computation required to actually render that element.

  • edited September 12

    @Danny77uk Hmmm...careful what you wish for, right? How about you don't fix something that makes Tracking run faster? :) Can't you lock files during tracking to prevent them getting changed by external forces? And has it ever even happened once as it stands now?

    So, if rendering is the bottleneck: I guess this is a case of defining "rendering"?  I naively think of it as : transferred from an offscreen buffer, where it's already been decompressed and is ready to moved to VRAM (scaled, windowed, blah blah). You've done all that, then you do your tracking magic, then off it goes. Or, doesn't and is discarded instead if focus is lost.

    So, if that's a bottleneck,  is rendering to two viewer windows not a double bottleneck? Or is RAM to VRAM relatively slow, but VRAM>VRAM blindingly fast by comparison, making it negligible?

    You don't have to go into great detail but if RAM to VRAM is t, but VRAM to VRAM is t/20, then no problem. But if it's t/4, then could an option to toggle it on/off be worth considering?

    Also, stopping when losing focus would be nice if it also didn't do it elsewhere as well, rather than in more places. It's really handy in Vegas, turning tracks on and off and flipping between music tracks and even toggling the FX on/off (or half of them on the split screen button it has) all while it's still playing through a video nice and smoothly.

    As there is no RAM preview in the Editor, the ability to play through with FX off on some heavy composites, then toggle them back on for lighter ones, all while keeping it playing continuously at a reasonable framerate would be brilliant.

    Or, add RAM Preview to the Editor at 1/2 and 1/4 Res so you could run fairly long sections nice and smoothly if you have the RAM?

  • DannyDevDannyDev Staff
    edited September 12

    @Palacono

    "Can't you lock files during tracking to prevent them getting changed by external forces?"

    HitFilm relies on 3rd party frameworks to decode media. We (meaning the HF code base) cannot 'lock' nor assume exclusive access to the files. It very much depends on what kind of  'file' it is.

    "And has it ever even happened onceas it stands now? "

    Irrelvant. It's undefined behavior. It probably won't do anything too horrible, but it's not something we've tested.

    "So, if rendering is the bottleneck: I guess this is a case of defining "rendering"?  I naively think of it as : transferred from an offscreen buffer, where it's already been decompressed and is ready to moved to VRAM (scaled, windowed, blah blah). You've done all that, then you do your tracking magic, then off it goes. Or, doesn't and is discarded instead if focus is lost. "

    As with all 3D applications, HitFilm use a pipeline to transform 'entities' from their 'local' coordinates into 'world' space before projecting them to the screen. Entities (meanings points, lines and polygons) have to be 'clipped' if they outside of the camera's view.

    This is can be extremely complicated because HitFilm has to perform this pipeline for entites in nested composite shots, for multiple perspective and orthographic viewports.

    The transforms required to move points between spaces are not fixed: they must be built up dynamically from hundreds or thousands of properties, any and all of which may be animated. The values for animates properties may have to be evaluated from temporaral curves (think of the value graph).

    These transformations may have to be built from chains of parent transformations (parent layers etc). On top of that, we may have plugin effects, masks 3D model meshes, particle simulators and more.

    My point is that actually transferring or 'blitting' pixels from one place in the computers memory to another is irrelavant. It's not a question of RAM to VRAM transfer speed but the sheer amount of computation that HitFilm must perform for every frame just to plot each pixel. This is why RAM preview and proxy were added.

    You're assuming that everything we're doing should be a cost free operation and when it isn't, there must be an unncessary (and by implication, easily fixed) bottleneck somewhere. This simply isn't the case; HitFilm doesnt work like a game where every polygon and texture can be uploaded to the GPU once and run through the same chain of transformations and shaders in one massively parallelized operation for every single frame.

    And comparing HitFilm to Vegas is comparing apples to oranges. They're completely different code bases and very different applications.

    "Also, stopping when losing focus would be nice if it also didn't do it elsewhere as well, rather than in more places. It's really handy in Vegas, turning tracks on and off and flipping between music tracks and even toggling the FX on/off (or half of them on the split screen button it has) all while it's still playing through a video nice and smoothly."

    Yes it would, but we have to weigh performance gains against stability and that's hardly a fair contest. Until very recently, Premiere would stop playback when the user tried to edit a value for exactly the same reason. They've overcome that limitation now but it took them a long time.

    As I've said before, we are well aware of the performance issues in the product and are working on them on a priority basis.

     

  • @Palacono You can always transcode to something an I-frame only codec. Tracking does not get a performance penalty, AT ALL, with media of this type. It tracks the same speed regardless of if the viewer panel is being update or not. Cineform and DNxHD tested. So all the pipeline rendering @Danny77uk talks about does not appear to be a, or the, performance issue, for at least the performance difference between the viewer being updated or not.

    I know I've mentioned the decoder thrashing multiple times in many threads. I've mentioned various cases where I believe it exists and how it affects performance. I've never documented what I have seen. Why not here and now for this case. My opinion is based on the following...

     The following are screen shots logging disk and thread access of Hitfilm.

    This one is a log of a track of an AVC MP4 file with a GOP length of 8. My fast decode AVC settings. Original media was from a Canon 7D.

    In this track the viewer panel is being updated. Notice the thread pool being terminated and restart. At least for AVC MP4 Hitfilm appears to have a thread pool. You can see there are numerous backwards seeks and re-reads of previously decoded frames. Many frames decoded from media again and again. Apparently in Hitfilm, or Mainconcept, whenever you seek backwards not only does this thrash the GOP reference frame(s) context within the decoder but the whole thread pool gets terminated and restarted.

    This track is the same media, same tracker, where I pop the Windows task manager to the foreground during tracking. Notice the lack of decoder thrashing as indicated by the thread pool not being reset. Also, notice the lack of backwards seeks.

    Here the same camera media transcoded to Cineform (by GoPro studio). The viewer is being updated. There are backwards seeks but other gyrations of the decode stream seem to not be evident compared to the AVC media.

    Here is the Cineform media track with the Windows task manager popped over Hitfilm during tracking. The viewer does not update. Basically the same but now lacking the backwards seeks.

    The AVC LongGOP media with the viewer updated tracks in approx 33 seconds. All others (AVC no update, both Cineform) track in approx 23 seconds. Flat 4Ghz i7 4770k, media is in the system disk cache.

    I submit that the tracker has multiple frames in flight for analysis. aka It is decoding well ahead of the "current" frame. When it has determined the object movement, the viewer gets updated with the current frame marked and moved with this info. The layer viewer used by the tracker is a different beast than the normal viewer. It really just displays the media as is.

    Sadly, the viewer display is not provided with the image buffer that has already been decoded by the tracker code. The layer viewer update code goes directly to the media and decodes that specific frame again. Excess IMO. The tracker has very likely decoded frames well ahead of the current frame. The decoder must reset/forget what it currently has buffered and seek back far enough to decode the necessary reference frames to properly decode the "current" frame. Hitfilm is using the same decoder handle/context for both the tracker and viewer, hence the thrashing.

    With codecs known to be I-frame only there is no reference frame context to thrash. We just have excessive extra frame decodes. At least with Cineform that excess is not a penalty that you can feel with the seat of our pants. I'm sure it can be measured.

    On a side note, why does Hitfilm think it is soooo important to check the recent open projects list every couple of seconds. The logs show this. Sure it is extremely unlikely to affect performance. If I cannot see the list then why check/update it? Update it when the file menu is opened or the Home page visited. Seems excessive, but hey, that's me.

  • @Danny77uk Thanks for taking the time to respond, I'm absolutely not assuming anything is cost free. I'm actually asking if you know what the various relative costs for certain features are; and if they're significant: might it be worth giving the user a choice as to whether they want to toggle some of them off some of the time?

    Speed, Accuracy, Quality. Pick any two. Or something. :)

    Also, just to take a step back: when I'm tracking I have no effects on - because...time - and I can't do it on composite shots (although I would dearly like to) so much of what you're saying could be applied probably isn't at that specific time, other than clipping to a window and/or scaling. I don't think you can do much more when tracking.  UI overhead is a constant either way, although I also 'wishlisted' the option to toggle that off for general playback if it would help with speed.

    So, single MP4 file - worst type of file to decompress because..codecs - no effects, no other layers, no fancy transforms (maybe some scale and zoom), has a frame decompressed in RAM and then scanned and takes time=t. Same overhead has been expended whether it now transfers that to VRAM or doesn't. So don't understand why you say that transfer speed is irrelevant.

    What else are you using to measure performance? 

    Or, if the actual blitting is so fast as to be almost irrelevant, is the calculation overhead of getting from a frame buffer to an onscreen window that high for a single layer and nothing else?

    Surely that pipeline is pretty simple in that case because it's not doing an 'if then...else' for every possible effect per pixel, is it?

    I'm just trying to get some idea of where the 'costs' are spent. Users can only do so much at the decompressing end by using ProRes, DNxHD, or Handbrake etc. to help that part.

    Finally, and which was actually my original question: if you've got the frame in VRAM once, is it then "like a game" where you can now transfer that quad as  two big (clipped and scaled) tris to the 2nd monitor or not?

    Yes, I know Vegas is different, but doesn't stop comparisons being made once you've used it, unfortunately.

  • @NormanPCN I didn't see your post before because I had my reply open for hours while I was doing some editing. ;)

    Very interesting, and if your speculation is even partially correct, it might help guide some of the devs to check if some areas of the pipeline can be given a second look.

    Biggest pain in the neck when optimising is you're always chasing your tail. Fix one bottleneck and the next one then takes centre stage. Fix that and it nullifies some of the improvements you made in the 1st one. On and on until head hits desk. :(

  • I forgot to add this to my post. The original camera file takes about 48 seconds to track. The fast decode AVC helps a bunch. In this case a lot is due to the much shorter GOP length of the fast decode AVC settings. Originally my AVC setting used 15 frames and then I gradually moved down to 8. The shorter GOP helps Hitfilm performance in these circumstances.

    I think there is something there WRT to my conclusions. Given the time I am willing to put into the analysis I can miss something. The smoke seems thick and where there is smoke there usually is fire.

    In this case the simplest thing would be to not re-decode the current frame when it already has been decoded. Just reuse it. Some other Hitfilm thrashes, most likely not so simple. Probably a bit of work.

  • @Palacono

    " Same overhead has been expended whether it now transfers that to VRAM or doesn't. So don't understand why you say that transfer speed is irrelevant. "

    The overhead of decoding frames for media files has nothing, nothing, to do VRAM transfer speeds. The rendering pipleline does not distinguish between a source pixel from a png image and a pixel from a .h264 frame.

    "What else are you using to measure performance? "

    The profilers in our C++ development tools and our own benchmarks.

    "Or, if the actual blitting is so fast as to be almost irrelevant, is the calculation overhead of getting from a frame buffer to an onscreen window that high for a single layer and nothing else? "

    'Frame buffers' are only relevant at the very end of a long rendering pipeline. And they exist on the GPU so cost nothing once uploaded.

    "I'm just trying to get some idea of where the 'costs' are spent. Users can only do so much at the decompressing end by using ProRes, DNxHD, or Handbrake etc. to help that part."

    The cost, as I have tried to explain, is the sum of the computations that HF must perform get a pixel from a source image to 'screen space'. That includes all of the transformations, projections, clipping, effects and more.

    Some of this is performed on the CPU, some by a shaders on the GPU. 

    "Surely that pipeline is pretty simple in that case because it's not doing an 'if then...else' for every possible effect per pixel, is it? "

    I can catagorically state that HitFilm does NOT work like that. That would be insane.

    "Finally, and which was actually my original question: if you've got the frame in VRAM once, is it then "like a game" where you can now transfer that quad as  two big (clipped and scaled) tris to the 2nd monitor or not? "

    The shader, near the end of the linear pipeline, transforms vertices (polygons etc) into fragments (pixels) onto a texture. This texture is then bound to a viewer context, which is basically a view on the GPU's VRAM. No readback required.

    The fullscreen preview is merely another context that binds the same texture, or put another way, displays the same portion of VRAM on the screen.

    Turning on fullscreen preview does not double render times. It's merely a view onto the same memory. It costs nothing.

  • Surely that pipeline is pretty simple in that case because it's not doing an 'if then...else' for every possible effect per pixel, is it? - @Palacono

    troll

    Don't be a troll. Please.

    The bottom line is when using the full screen preview, you should not see any performance hit compared to when it's off. Sure it will take some CPU cycles but nothing you would be able to see without profiling the application and counting each ms. If you see different results then please let us know.

    Now regarding 2D tracking, this is a completely different issue. Thanks @NormanPCN for sharing your results. Tracking is mostly dating back to HF1.1 times and needs to be properly looked at again with performance in mind. In the meantime a solution would be to reencode your videos to Cineform or SmallGOP MP4 using Norman's settings.

  • @Palacono @NormanPCN I've split the messages to a new thread as this was getting lengthy and unrelated to the original question. Feel free to continue the discussion here.

    On a side note, why does Hitfilm think it is soooo important to check the recent open projects list every couple of seconds. The logs show this. Sure it is extremely unlikely to affect performance. If I cannot see the list then why check/update it? Update it when the file menu is opened or the Home page visited. Seems excessive, but hey, that's me. - @NormanPCN

    I'm not sure when that crept in but it was a bug. I've fixed it for the next version. As you said, it's unlikely to have any visible performance impact but thanks for mentioning it. :)

  • edited September 15

    @CedricBonnier "...but it was a bug. I've fixed it for the next version"

    Well, I guess maybe I should have mentioned it some time ago when I first noticed it.

    If you are interested in something else like that. Hitfilm pounds the registry every few frames and registry is disk. Due to the frequency, all this will be cached. To me you just don't read the same thing over and over, especially disk, within a performance loop.

    In compiler speak it, the expression, is loop invariant. Yank it out of the loop, assign it to a temp and use the temp inside the loop. Of course the cost of the expression relative to the loop should be considered. When cached the expression cost gets pretty low.

    I also left in an easter egg showing that the Cineform folks have their own lazy/inefficient garbage going on. They are a driver but I would still not cut much slack for that. Again, it works because it will all be cached even if the machine has gone virtual because they are pounding it soo hard/frequent and the actual data amounts to only 1K or so.

  • @NormanPCN Thanks, we often assume that we know how it works and don't look in details unless it's something that needs investigating. It's good to have some fresh eyes on it with no assumptions.

    Is this during tracking or general scrubbing? It won't happen during playback (and tracking is not considered playback, as it cannot skip frames)

  • edited September 15

    @CedricBonnier That screenshot was during a tracking operation.

    Basic playback pounds various other items once or twice per frame.

    It seems to be systemic that Hitfilm uses the registry as a global variable of sorts. It's one way to do it, but it is tons higher overhead than a simple global variable. Lots of user/kernel/user transitions. Especially as these are private items to Hitfilm. Not like an external process could be changing these while Hitfilm is running. Is something like FullScreenPreviewEnabled going to change value from frame 100 to frame 101. I think not. Loading into a global when someone clicks play/whatever and using that is a way to do it.

    In my options stuff (registry on Windows), I always just read them at start and kept them in an a global. If an option was changed, the global is of course updated and the value is flushed to registry/disk at the time of user change.

    Accessing a Global is at least 100K times faster than the registry call sequence shown accessing a single option item. This assumes the registry keys and data are system cached. But given that registry call sequence is still extremely fast relative to the cost of decoding a frame and doing effects and such on said frame, the registry access cost does not amount to fly sh!t. Probably not even mosquito sh!t. There are mammals roaming around in Hitfilm that could stand to be leashed. 

Sign in to comment