Welcome to the home of the TCD-TIMIT database. This dataset was released in 2015 to further work on audio-visual speech recognition. It has been widely used in many other fields too.
TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 phonetically rich sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. Video footage was recorded from two angles: straight on, and at 30degrees. The paper outlines the recording of footage, and the required post-processing to yield video and audio clips for each sentence. Audio, visual, and joint audio-visual baseline experiments are reported. Separate experiments were run on the lipspeaker and non-lipspeaker data, and the results compared. Visual and audio-visual baseline results on the non-lipspeakers were low overall. Results on the lipspeakers were found to be significantly higher. It is hoped that as a publicly available database, TCD-TIMIT will now help further state of the art in audio-visual speech recognition research.
This database is being made free for research use. Please reference the following paper in any publications where you use the data:
Harte, N.; Gillen, E., "TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech," Multimedia, IEEE Transactions on , vol.17, no.5, pp.603,615, May 2015 doi: 10.1109/TMM.2015.2407694
To access the data, you need to create an account. Your account will only be enabled if you supply a valid university email address and properly identify yourself.