2019-03-04 – pix2pix running in high definition in real time
I am grateful for amazing artists like Memo Akten that create amazing openFrameworks addons like ofxMSATensorflow that give me the tools to train and run the pix2pix GAN in real time! Unfortunately, if I want to customize anything like, say, the resolution of the output imagery, I’m out of luck. I’m not fluent enough in these systems to actually tinker with the code myself. I can write code to work with the addon, but I’m at the mercy of how the addon runs my trained neural network model. To a degree, anyway.
For an early installation of this work, which was simply a video simulating the system running in realtime, I was able to use pix2pixHD to generate frames to turn into imagery. But I had no way to write an openFrameworks sketch to utilize the trained models pix2pixHD spits out, so I was stuck with creating videos. Which was great! I loved seeing this model in motion, even if it wasn’t technically in “real time”
Fortunately, after some Googling, I found this medium article by Karol Majek, who so graciously shared how he was able to increase the resolution at which he trained pix2pix, all the way from 256×256 to 1024×1024, which provides an astronomical leap in quality.
I am currently incorporating Majek’s modifications to the pix2pix code I’m running, with the hope that I’ll then be able to take that trained model and feed it into openFrameworks for a real time installation, just as I did with my 256×256 images.
Fingers crossed it works!
(I hope it works, even if the frame rate is atrocious…that’s an easier issue to deal with; I would happily settle for a system outputting 512×512 imagery if it meant getting a frame rate of at least 20fps (though 30+ is ideal)).
After failing to train images at 1024×1024 (which is probably unrealistic for real-time anyway), I managed to finally tweak settings to get 512×512 images training! Woohoo! Not sure if it works yet, but this is a step in the right direction. I’m using a very small dataset–385 images from the OG pink+blue outfit dataset–and I’m only training it for like, 20 minutes. Just enough to get me a model I can test in openFrameworks.
2019-03-01 – Venturing into the physical
I began work on this project at the beginning of September 2018. Up until this point, the bulk of the work has been an exploration of generating imagery using a neural network. Save for a small technical demo with ping pong balls, the entirety of this work has existed on a screen. But it was my intention from the beginning of this open-ended undertaking to incorporate some sort of physical, kinetic element that would make the work “interactive”, although without any human interaction. A kinetic, physical presence could provide input information for the system to parse and use to generate imagery on screen. This would free the system from relying on simply feeding in digital-native imagery.
The solution for this “kinetic element,” I have decided, will be balloons! Balloons arranged in an orientation like one of my body poses from my data set would be the default setting; an array of programmed fans would go on and off at different times and fun at varying speeds to blow the balloons around, creating a ton of generative possibilities.
This log section will chronicle the development of the full installation, balloons and fans and all.
self-contained utilizes the Pix2PixHD neural network to generate speculative physiologies. I trained image pairs of motion capture dots captured using a Kinect V2, and video frames of me in various outfits, adopting various personae through movement. The system has been trained to associate multiple personae with these simply dot patterns. Once the system is trained, feeding my original movements back into the system forces it to make decisions about which of my selves it must present, and how it should connect my limbs.
2018-10-07: Initial results
Live-testing the trained model using the pix2pix example in the ofxMSATensorFlow addon
Generating results by testing single-dot deviations from a model I know generates something coherent.
Trying novel input from the ground up. Absolutely demonic!
Based on a training set of 3,636 images, The model isn’t all that great yet at creating new body forms with novel input (novel input being arrangements of white dots not used in the training data). Still, with a more extensive training set, the sparse white-dot input could do a pretty good job of generating coherent bodies.
Above, you’ll see I’m only manipulating one dot at a time, seeing how far a new dot can be from a previous dot before the imagery becomes complete spaghetti. Right now, the margin is pretty tight.
The training data looks like this:
Video of the complete data set. 6x speed. 3,636 512×256 images.
Not exactly comprehensive of the range of my motion, but enough for the network to learn what arrangements of dots make up what human forms.
2018-10-05: Live testing novel input?
Not yet. The training was a success, in that I was able to feed in my data, and the model it spat out seems decently trained!
But in order to test it, I need to figure out how to feed my model in to the openFrameworks pix2pix example included in ofxMSATensorFlow
Here are some links I’ve gone through to try to get things working in openFrameworks.
This page is important for setting things up: https://github.com/memo/ofxMSATensorFlow/releases
This page is important for preparing my pre-trained models: https://github.com/memo/ofxMSATensorFlow/wiki/Preparing-models-for-ofxMSATensorFlow
This paragraph I somehow missed while thoroughly skimming (can on thoroughly skim?) this page ended up being incredibly crucial in exporting the frozen graph model I need to feed in to openFrameworks. Spent hours trying to figure this out. Turns out I was reading Christopher Hesse’s original pix2pix-tensorflow page, not Memo’s fork, which had this invaluable code-snippet.
2018-10-01: Beginning training!
After a lengthy setup process on the computers, making sure nvidia-docker2 was correctly installed (thanks Kyle Werle), it’s time to try out the training.
I think this project depends on getting the right kind of results from my training data. Whatever that might look like, the first hurdle is, of course, getting the training up and running. I’m going to post a list of links here that I accessed/used at various points to get this going.
- The neural network I’m using, Pix2Pix-tensorflow (courtesy of Memo Akten, courtesy of [add original torch implementation author here]): https://github.com/memo/pix2pix-tensorflow
- Notes from Christopher Baker on getting this up and running on Linux (thanks Chris!): https://gist.github.com/bakercp/ba1db00e25296357e5e3fef11ee147a0
- (haven’t tried this yet) High-resolution (1024×1024) adaptation of standard Pix2Pix: https://medium.com/@karol_majek/high-resolution-face2face-with-pix2pix-1024×1024-37b90c1ca7e8, https://github.com/karolmajek/face2face-demo
2018-09-03: The first steps!
So, the first part of this project is getting pix2pix to output what I need. Well,
Could I ask an AI to interpret biological motion from a sparse set of dots the say we humans are able to?