LanguageARC | GlobalTIMIT

GlobalTIMIT

Join a 30 year effort, recently revitalized, to document how languages are spoken and thus improve language technologies.

TIMIT is the name of dataset that has probably had the greatest impact on speech technologies; a Google Scholar search in December 2019 found ~ 23,400 hits, that is, a very large number of scholarly papers mentioning the dataset. To create TIMIT, researchers asked 600 speakers from around the US to each read 10 sentences that were selected or constructed to represents the sounds of English. GlobalTIMIT is adjusting the method to make collection more efficient and expand the scope so we can create similar datasets for many more of the world's languages starting with a broader representation of English.