We have been training with segmentation rather than instance segmentation. Meaning that all the “people ” appear in one layer rather than each person split into their own layer.
The upside it is near pixel accurate on a 1080p canvas.
The downside is that it takes a couple of minutes a frame to calculate , it can be GPU accelerated given enough GPU memory and we know the requirement is more than a 4Gb card can provide.
We hope to make this available to the public within a fortnight.
Above is a screenshot of the script in Foundry’s Nuke . The deep purple node is carrying the neural network’s load the other nodes are coloring the background green.
PS.
I hope you enjoy some holiday footage from 2012.
EDIT: 16th November 2018
Here are some more samples