The English language is a critical part of the heritage of the UK, like its history, great houses, museums and culture but even more so as it touches each citizen's life daily. The British National Corpus (BNC) documents the ways in which English is used across the UK via 100 million words of collected text, audio recordings of more than 7400 informal conversations made by members of the public around the turn of the millennium, and over 750 recordings made in specific social contexts (business, education, leisure, and public settings).
One of the first of its kind, the BNC has inspired dozens of national corpus collections in countries around the world. The BNC's recorded conversations have been carefully transcribed and 'time-aligned' via human language technologies that add time-stamps to the transcript to indicate where in the audio recordings each word and phrase is uttered. Unfortunately, these alignments are not perfect. This project combines computer algorithms and human expertise — yours if you join — to identify, classify and correct the imperfections. The BNC conversations are already freely available to teachers, researchers now and for posterity. Your corrections will improve this internationally valuable resource.