Shotcut Tutorial 6: Motion-tracked tags for people or objects in AR

16.05.2024

Learn how to use motion-tracked tags for persons or objects in your AR video content, using the free open-source VR video editing software, Shotcut.

This is a guest post by VR creator XR Stereo Video - most videos on his DeoVR channel are rendered in Shotcut. Check out his other Shotcut tutorials:

Motion-tracked tags for people or objects in AR

An interesting and cool video effect that can be ported from regular video to VR video is a motion-tracked tag for a person or object. When done in 3D virtual reality video, it turns it into augmented reality, and this can be one of the applications of AR.

Shotcut supports motion tracking, and this guide will explain how a 2D text tag can be added to the video. A motion-tracked 3D object could also be added. The higher the resolution of the video, the higher the resolution of the text.

Toward the end of this video, there is a use case for these motion-tracked tags:

DeoVR | Motion tracking experiments in Shotcut

First, add the video to the timeline.

Then split the section of the video that we want the motion detection to analyze.

Motion Tracking

In the Filters tab/ subwindow, click the “+” sign and add the “Motion Tracker” filter.
Select the subject that you want to be motion-tracked as a rectangle with your mouse.
Click on the “Analyze” button in the Filters tab.
You can tick the “Show preview” box in order to see the rectangle but make sure to untick the box after analysis so you don’t render that box in the exported final video.
Create two new video tracks for the motion-tracked tag to add (one track for each eye).
Select the track for one eye, go to the Toolbar, click on “Open Other”, then “Text”. In the dialogue that pops up, write your text and tick the “Simple” box.
Then, repeat steps 1-4 from this tutorial, so all the filters there, “Text:Simple”, then “360:Rectilinear to Equirectangular”, Then “Size, Position and Rotate”.

Making the text motion tracked:

Note: In the “Size, Position & Rotate” filter in the Filters tab, click on the “Load keyframes from Motion Tracker” button for both eyes. This will make the text move along with the tracked subject. We will also add a background to the text to make it a tag floating in space. The text display duration should match the tracked segment duration.

The Horizontal and Vertical Position

Now we want to move the text at the right point in space, to the vertical and horizontal coordinates we want it to be at, next to the tracked subject and also at the correct depth from the tracked subject.

We will need to open a new subwindow, the “Video Zoom” subwindow, where we will see the horizontal (X) and vertical (Y) coordinates of a point of our choice in the tracked subject whose position we can identify accurately for both eyes.

Go to the Menu bar and select View > Scopes, then click “Video Zoom” to tick it. This will open the new subwindow “Video Zoom”. In this subwindow you can see that you are shown the X and Y coordinates of your mouse pointer tip in the video. We can keep left click pressed to drag the image around in the subwindow and use the scroll wheel to zoom in (and out to 100%). Identify the X and Y coordinates for both eyes and write them down. The Y coordinate should essentially be the same for both eyes.

In the Video Zoom window, the 0,0 position is considered in the upper left corner, while in the “Size, Position & Rotate” filter in the Filters subwindow, it is considered in the middle of the image, both vertically and horizontally.

Shotcut subwindow layout for this use case:

In this screenshot, you can see:

On the left, the Filters tab, with the “Size, Position & Rotate” filter selected.
On the upper middle of the screen, the Timeline and the tracks on it, with tracks grouped in 2 (left and right eye) and offset for each subject in order for the movement of the subjects to not be synchronized with that of my actually tracked subject, even though I’ve loaded the keyframes for the Motion tracked subject for every text segment on every track in the Timeline.
On the lower middle of the screen, the standard video preview window in Shotcut
On the far right, the new “Video Zoom” subwindow is used to determine a pixel’s coordinates.

In the “Size, Position & Rotate” filter, the coordinates for the chosen subject point:

the X coordinate for the left eye FXL
the Y coordinate for the left eye FYL
the X coordinate for the right eye FXR
the Y coordinate for the right eye FYR
the video width W

in the “Video Zoom” window

the X and Y coordinates for the left eye subject point ZXL and ZYL,
the X and Y coordinate for the right eye ZXR and ZYR
the video width W

It will be in the vicinity of our subject that we will want to modify our coordinates to, in the “Size, Position & Rotate” filter, depending on the text’s position and length.

For a 2:1 ratio video (as it should be for 3D VR180), in the “Size, Position & Rotate” we need to subtract half the video width for the X coordinate of the Video Zoom window, while for the Y coordinate, we need to subtract half the video height from the “Y” coordinate of the Video Zoom window, so:

FXL = ZXL – W/2
FYL = ZYL – H/2
FXR = ZXR – W/2
FYR = ZYR – H/2

The Depth

Now let’s calculate the offset in pixels for each eye corresponding to the depth of our subject’s point of choice, from the “Video Zoom” window, with which we will alter the X coordinate in the “Size, Position & Rotate” filter:

Offset = [W – ( ZXR – ZXL)]/2

When we have this offset number, eg 7, we add 7 from the left eye coordinate in the “Size, Position & Rotate” filter so it will be FXL+7, then we go to the right eye track where we subtract 7 so it will be FXR-7. This will bring the text closer to us to the distance of our subject. Depending on the angle, precision of our measurement and how close or far to our subject we want our text to be, we can then make small adjustments to make sure that the text is where we want it to be.

While apparently we can’t motion track several subjects at once, we can add other tracks in order to place tags next to other subjects. While we could import the keyframes from the motion tracker for these other tags, apparently, Shotcut will only create an analysis file for one subject. If we don’t care about the precision of our tracking of these other subjects and we just want the text moving around a little bit but not in unison with our motion-tracked subject, we can offset the position of the text segment in the timeline of the track so that the movement isn’t synchronized with that of our main subject.

Join the discussion at the DeoVR forum, Facebook and Reddit.