Learning Domain-Specific Policy for Sign Language Recognition

Learned task-oriented policies are applied in navigation [1-2], dialogue [3], control [4-5], multi-agent learning [6], and many more applications mentioned by Deisenroth M.P., Neumann G. and Peters J. in the survey on policy search in robotics [7]. The policy for a specific task can be learned through demonstrations [8-10], guidance, or feedback (reward-driven approach) on overall completion of the task [11].
We were interested in navigation-based and manual-following tasks, touching on appealing recent general purpose approaches that are currently being used in dialogue management. These specific examples have concentrated mostly on policy learning for uni-modal interaction with the main modality being either speech, text, or vision.
We have focused on one-way training, where the human provides instructions to an agent and expects it to follow these instructions to reach the final goal as close as possible to what the human expects.
Vogel A. and Jurafsky D. [1] as well as Branavan S.R.K et al. [12] made use of the reinforcement learning for mapping instructions to actions in navigational map task or for mapping manual text onto actions. Dipendra K. Misra used reinforcement learning to train a neural network with reward shaping for mapping visual observations and textual descriptions onto actions in a simulated environment [13]. In all these cases, the constructed model did not include semantic and syntactic knowledge, although, Vogel A. and Jurafsky D. seeded their method with a set of spatial terms for learning more complicated features, such as the position relative to the landmark. Nevertheless, in their case, learning basic navigation was achieved without semantic and syntactic knowledge.
from state: start go to: start
for utterance: okay

from state: start go to: caravan_park
for utterance: starting off we are above a caravan park
...

Listing above presents a small excerpt from the overall output of the policy, following which the agent moves from one landmark to the other given one utterance. For example, for the first utterance "okay", the agent moves from start landmark to the start landmark, which means the agent does not move. In the next utterance, "caravan park" is mentioned, so the agent moves from start landmark to the caravan park landmark.

Figure above presents both the original map and the envisioned trajectory on the left and the created trajectory by the agent, following the learned policy on the right. The right side shows the final outcome after all the utterances had been presented and all instructions followed.
How is this useful to the computer-based understanding of the sign languages?
There are sign languages instruction-based datasets with a task at hand, similar to the HCRC map task [14]. With these datasets, a model-free policy can be learned, which could respond to navigation commands, given in the sign languages. Unfortunately, we were not able to acquire the datasets for further experiments.

https://github.com/mocialov/RL_HCRC_MapTask_DirectionFollowing_Simplified