Extending FRCNN with Nested Classes

Faster R-CNN (FRCNN) Keras model is designed to perform both object identification and classification given a raw image [1].
There are situations when we would like to have not only multi-object identification and classification on an image, but also nested classification.
For example, we have found a dog on an image and we would like to know whether the dog is facing left or right on the image without having extra top-level labels, like dog that is facing right and the dog that is facing left, etc. This reduces the number of labels. If we have dog and cat images and the three directions the animals are facing, then instead of having six labels
  • dog left
  • dog right
  • dog up
  • cat left
  • cat right
  • cat up
, we would have five labels
  • dog
  • cat
  • left
  • right
  • up
As a rule of thumb, the less labels and classes the model has to distinguish, the better it performs.

Current implementation of the FRCNN reuses convolutional feature maps, returned by the pre-trained base neural network (e.g. VGG) after the bounding boxes have been identified and classified in the region proposal part of the network. We can utilise that by adding nested labels in the region-based convolution part of the network.

How is this useful for the automated sign languages processing?
Such end-to-end networks can be used to localise hands in an image and then detect handshape, orientation, movement, location classes for that hand in one pass of an image through the network.

https://github.com/mocialov/FRCNN_multiclass