Back to devlog

April 17, 2026

AI Mocap Research — Rokoko Vision

What it is

Rokoko Vision is Rokoko's AI-powered video-to-mocap solution. It is capable of generating mocap data from a single video source, dual video source (for more accuracy), or single camera footage.

Rokoko Vision allows you to record up to 15 seconds of mocap footage for free, and there is no limit on the number of clips you can process — I found this to be a genuinely nice convenience compared to other solutions, especially combined with how quickly it processes video. On Windows, it's possible to download and use Rokoko Studio to further refine the generated mocap footage, as Rokoko Studio is free. The length limitation makes this solution ideal for gameplay actions, but could theoretically be used for cinematics in short bursts. It can output FBXes and definitely works as a fine replacement for Flow Studio or other traditional mocap solutions, though it doesn't support multi-character captures.

One thing worth noting: it's worth reading through Rokoko's privacy policy or having it evaluated before using it professionally, as I'm not 100% sure what is done with the inputted video.

Video Log

Notes

I did this all on a Linux machine, which is relatively uncommon for this kind of workflow. I'll use OS agnostic language where possible, but note that some steps may be different on Windows.

Recording

  • Setup for recording using dual camera was rather complicated. It asks for at least 2m of space (over 6 ft) between you and the camera and requires your full body to fit within the scene. In my small Chicago apartment, it was tough, but I managed to make it work just barely.
  • I was able to use a combination of a webcam and my phone as a webcam (something that works out of the box with OBS). The phone camera was far superior to the webcam. I recommend using 2 phones where possible, being sure to attach them using USB for as solid a connection as possible. A long USB cord helps so you can adjust distance and placement as much as possible. A cheap phone mount is ideal too.
  • The recording setup also required printing out a checkerboard and calibration marker to create the space (as seen in the video log), which I think gives it better results than other solutions. This was all done through the browser and aligned well. The instructions were straightforward and I definitely recommend following them closely.

Output

  • It took 5 minutes to process both videos and output an FBX on a mannequin without issue. Check out the video log to see the result — it's pretty decent! There are some foot placement issues and it doesn't quite capture all the nuance of the movement, especially with the rotation of the wrists and the pose of the hands.
  • Compared to Autodesk Flow, this is far faster. This might be because it has no inherent retargeting in the process itself. On top of that, Flow does not currently support FBX exports or give you a preview of your scene as of April 1, 2026.
  • I wore a dark shirt and medium tone pants. If you wear a dark shirt, try not to go straight black — in one of my test poses, the calculation couldn't figure out how my limbs were positioned and produced a less than ideal result. You can see this when I do a thumbs up towards the screen. Dual camera improves the output but doesn't account for all issues. This can be improved by ensuring all motion is in clear line of sight of at least one camera and isn't obscured by the body.