Sign Language Modelling

Despite the availability of many alternatives for language modelling, such as count-based n-grams and their variations [1-5], hidden Markov models [6-7], decision trees and decision forests [8], and neural networks [9-10], research in sign language modelling predominantly employs simple n-gram models, such as in [11-13].
The reason for the wide-spread use of n-grams in sign language modelling is the simplicity of the method. There is an obvious disconnect between n-grams and the sign language in that sign language is perceived visually, while the n-grams are commonly applied to text sequence modelling. For this reason, the authors in [6], [13-16] model glosses, such as the ones shown on Figure 2, which are obtained from the transcribed sign languages.
Glosses model the meaning of a sign in a written language, but not the execution. Therefore, the true meaning of what was signed may get lost when working with the higher-level glosses. To overcome this issue and to incorporate valuable information into sign language modelling, additional features are added, such as non-manual features (e.g. facial expressions) [13-15], [17].

For our monolingual dataset, we extracted 810 sentences from the BSL corpus with an average length of the sentence being 4.31 words, minimum and maximum lengths of 1 and 13 words respectively.

We explore transfer learning methods, whereby a model developed for one language, such as the pre-processed Penn Treebank (PTB) dataset, is reused as the starting point for a model on a second language, which is less resourced, such as the British Sign Language (BSL). We examine two transfer learning techniques of finetunning and layer substitution for language modelling of the BSL.

The results show improvement in perplexity when using transfer learning with standard stacked LSTM models, trained initially using a large corpus for standard English from the Penn Treebank corpus.