This paper proposes a unified system for both visual speech
recognition and speaker identification. The proposed system can handle
image and depth data if they are available. The proposed system consists
of four consecutive steps, namely, 3D face pose tracking, mouth region
extraction, features computing, and classification using the Support Vector Machine method. The system is experimentally evaluated on three
public datasets, namely, MIRACL-VC1, OuluVS, and CUAVE. In one
hand, the visual speech recognition module achieves up to 96 % and
79.2 % for speaker dependent and speaker independent settings, respectively. On the other hand, speaker identification performs up to 98.9 %
of recognition rate. Additionally, the obtained results demonstrate the
importance of the depth data to resolve the subject dependency issue.