Bending The Four Elements With PoseNet

Renzo Benemerito · October 10, 2020

On February 21, 2005, one of the best animated series to ever grace the TV screen aired for the first time. Avatar: The Last Air Bender will always be one of my favorite animated TV shows from childhood. The show centers Aang, the avatar who can bend all four elements, in his quest to stop the imperialist Fire Nation in their conquest of the world. The plot seems simple enough but all is not as it seems. It is quite rare for a kid’s show to tackle themes geared towards more mature audiences like genocide, marginalization, oppression, female empowerment, stoicism and many more. No wonder ATLA has developed a cult following in recent years. In this blog post, I have utilised yet another pose estimation model called posenet to bend the four elements using body movements.

Element Bending

Bending The Four Elements

Premise

Avatar: The Last Airbender tells the story of Aang and his gang and their battle against the imperialist Fire Nation. The series is set on an Asiatic-like world where people can manipulate the four elements; air, water, earth and fire, through the use of martial arts like movement. Only Aang, the avatar, can bend all for elements as an ordinary bender can only manipulate one element. ATLA blurs the lines between young and adult programming by incorporating mature themes into a show for kids thus, making it popular to all types audiences. It also portrays one of the best character development sequences, in my opinion, in the character Zuko and also, discusses some of the harder questions in life.

As said earlier, bending takes place by performing martial arts like movements such as punching and kicking. Through the use of pose estimation, we can potentially identify punches and kicks by tracking the hands and feet of a person. Then, we could generate particle effects to simulate bending the elements.

Fire Bending

Fire Bending Zuko

AI Solution

I previously made a pose estimation blog post applied to Kpop so I will not go deep in the discussion here. However, I am using a different model now which is PoseNet which is somewhat similar to openpose, the model I used before, but different in some parts of the implementation. You can read more about PoseNet here:

With this, I have made a web application using Tensorflow JS that utilizes your webcam and applies pose estimation to identify punches and kicks. Since pose estimation tracks specific keypoints in a person. We can choose to track both hands and feet of a person. Taking into account distance and time, we could compute the speed of the movement of the hands and feet. If the speed passes a specific threshold, then we could consider it a punch for the hands and a kick for the feet. Of course, this is a really simple way of looking into action recognition for punches, kicks and other movements but for the purposes of this blog post, it will suffice.

Once we have identified punches and kicks from our person movement, through javascript, we can generate particle effects to simulate the elements. Think of this as augmented reality bending.

I have used the following settings for my PoseNet configuration:

const poseNetState = {
    algorithm: 'single-pose',
    input: {
        architecture: 'MobileNetV1',
        outputStride: 16,
        inputResolution: { width: 640, height: 480 },
        multiplier: 0.75,
        quantBytes: 2
    },
    singlePoseDetection: {
        minPoseConfidence: 0.1,
        minPartConfidence: 0.5,
    },
    output: {
        showVideo: true,
        showPoints: true,
    },
};

Try It!

Download the repo here: Repo

Run a web server by running the following command:

python -m http.server

You can bend each element by pressing the following keys on your keyboard:

  • f - Fire

Fire Bending

Fire Bending

  • w - Water

Water Bending

Water Bending

  • a - Air

Air Bending

Air Bending

  • e - Earth

Earth Bending

Earth Bending

The default element is fire. The particle effects will change upon pressing the different keys.

Additional Notes for those who have laptops with Nvidia GPUs with Optimus

If you have a laptop with an Nvidia GPU that has Optimus technology, make sure that your browser is actually utilizing your GPU for WebGL processes as this will speed up the Tensorflow JS model we are using. Refer here on how to do this. I generally use option two because I don’t really want my browser to utilize my GPU all the time, just on the times where I test out the model.

Check this link to see if your browser is using your GPU link. It should display your GPU on the unmasked renderer field.

Earth Bending

Further Improvements

The action recognition algorithm used for this project is barebones in and of itself. You’ll find innacuracies in punch and kick recognition from time to time because of the inherent flaw of the moving average over time implementation when applied to action recognition. Synchronization errors can occur when using this method and it does not really discriminate on the x axis in this implementation. Lastly, the algorithm runs slow on laptops with low-tier GPUs let alone CPU only ones so there is variability in the window we use for moving average.

Conclusion

Avatar has always been one of my favorite TV series. It was a pleasure to dedicate an AI project for the show. This is also my first project using Tensorflow JS and I am hoping for more projects in the future. Truly, pose estimation has the potential to augment the existing tools in the motion graphics and augmented reality fields. Once 3D pose estimation matures to a state that is accurate enough in all conditions and relatively feasible to run on existing machines, we could possibly see shows and movies made with this technology. For now, feel free to fork the repository and improve upon the project as I know that the action recognition algorithm for punches and kicks is pretty basic. As always, til next post!

Twitter, Facebook