Although HamNoSys uses more general handshapes to describe the DEZ parameter, we were interested in recognising handshapes of a particular sign language, Danish sign language, with available
public dataset. The dataset consists of the isolated videos of people signing one sign per single video. In addition, the dataset has an XML file that gives information, among other the handshapes in each video. The annotation is, therefore, weak as the XML file specifies overall handshape for the whole video in case if a sign has only one handshape. Unfortunately, the annotation does not specify when a specific handshape begins in a video and when it ends.
We were interested in finding out which visual features of the dataset and which machine learning algorithm produce best handshape recognition results. Figure below shows that 4 datasets were generated: 1) Raw image, cropped around the handshape, 2) Human skeleton features, returned by the
OpenPose library, 3) Distances between the human skeleton features, and 4) Black and white images of the handshape skeleton.
Since the dataset is weakly labeled, we had to filter out the frames that were irrelevant. For example, frames, where the handshape was outside of the frame and frames, where the hand was moving into a position to execute the sign (known as motion epenthesis). our approach to filtering out the epenthesis was in accordance with the method in [1]. As a result, the method produced the segmentation points for every video and only the frames between the segmentation points were taken to contain the annotated handshape for that video.
Machine learning methods for recognising the data in the generated datasets was 1) InceptionV3 pre-trained on the ImageNet, 2) Decision Tree, Random Forest, MLP, kNN, 3) Decision Tree, Random Forest, MLP, kNN, and 4) 2- and 3-layers CNN as well as the same architecture, pre-trained on MNIST dataset.
Results in the image below show that the most efficient algorithm for recognising the handshapes is the Random Forest on raw OpenPose features, giving ~90% recognition on the test set, which is composed of 13 handshapes as describd in [2].
https://github.com/mocialov/HandshapeRecogniser
[1] Mocialov, B. et. al. "Towards continuous sign language recognition with deep learning"
[2] Oscar Koller et. al. "Deep hand: How to train a CNN on 1 million hand images when your data is continuous and weakly labelled"