TIMIT dataset - The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Defense Advanced Research Projects Agency，DARPA：美国国防高级研究计划局
Advanced Research Projects Agency，ARPA：高等研究计划局
acoustic /əˈkuːstɪk/：adj. 声学的，音响的，听觉的 n. 原声乐器，不用电传音的乐器
phonetic /fəˈnetɪk/：adj. 语音的，语音学的，音形一致的，发音有细微区别的
continuous /kənˈtɪnjuəs/：adj. 连续的，持续的，继续的，连绵不断的
speech /spiːtʃ/：n. 演讲，讲话，语音，演说
corpus /ˈkɔːpəs/：n. 语料库，文集，本金
Texas Instruments，TI：德州仪器
National Institute of Standards and Technology，NIST：美国国家标准与技术研究院
Massachusetts Institute of Technology，MIT：麻省理工学院，麻省理工
SRI International：斯坦福国际研究院

1. The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (声学-音素连续语音语料库)

http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

TIMIT.zip - 440.21MB

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) Training and Test Data

The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found in the printed documentation which is also available from NTIS (NTIS# PB91-100354).
TIMIT 阅读语音语料库旨在提供语音数据，用于获取语音知识以及开发和评估自动语音识别系统。TIMIT 是由美国国防高级研究计划局-信息科学与技术办公室 (DARPA-ISTO) 赞助的多个单位共同努力的结果。文本语料库 (Text corpus) 是由麻省理工学院 (MIT)、斯坦福国际研究院 (SRI) 和德州仪器 (TI) 的共同努力设计。该语音在 TI 录制，在 MIT 转录，并由美国国家标准技术研究院 (NIST) 进行维护、验证和准备用于 CD-ROM 生产。该文件包含 TIMIT 语音语料库的简短描述。可以在印刷文档中找到包括参考材料和文章的某些相关重印在内的其他信息，这些信息也可以从 NTIS (NTIS# PB91-100354) 获的。

sponsorship /ˈspɒnsəʃɪp/：n. 赞助，发起，保证人的地位，教父母身份
transcribe /trænˈskraɪb/：vt. 转录，抄写

1.1 Corpus Speaker Distribution

TIMIT contains a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States. Table 1 shows the number of speakers for the 8 dialect regions, broken down by sex. The percentages are given in parentheses. A speaker’s dialect region is the geographical area of the U.S. where they lived during their childhood years. The geographical areas correspond with recognized dialect regions in U.S. (Language Files, Ohio State University Linguistics Dept., 1982), with the exception of the Western region (dr7) in which dialect boundaries are not known with any confidence and dialect region 8 where the speakers moved around a lot during their childhood.
TIMIT 总共包含 6300 个句子，来自美国 8 个主要方言地区的 630 位讲话者每人说 10 个句子。Table 1 显示了按性别细分的 8 个方言地区的讲话者人数。百分比在括号中给出。说话者的方言地区是他们童年时代居住的美国地理区域。地理区域与美国公认的方言区域相对应 (Language Files, Ohio State University Linguistics Dept., 1982)，但 Western region (dr7) 的方言边界未知。而方言区域 (dialect region) 8 讲话者在他们的童年时期走动很多。

Ohio State University，OSU：俄亥俄州立大学
department /dɪˈpɑːtmənt/：n. 部，部门，系，科，局
linguistics /lɪŋˈɡwɪstɪks/：n. 语言学
dialect /ˈdaɪəlekt/：n. 方言，土话，同源语，行话，个人用语特征 adj. 方言的
parenthesis /pəˈrenθəsɪs/：n. 插入语，插入成分 n. 圆括号 n. 间歇，插曲
geographical /ˌdʒiːəˈɡræfɪkl/：adj. 地理的，地理学的

Table 1:  Dialect distribution of speakersDialectRegion(dr)    #Male    #Female    Total----------  --------- ---------  ----------1         31 (63%)  18 (27%)   49 (8%)  2         71 (70%)  31 (30%)  102 (16%) 3         79 (67%)  23 (23%)  102 (16%) 4         69 (69%)  31 (31%)  100 (16%) 5         62 (63%)  36 (37%)   98 (16%) 6         30 (65%)  16 (35%)   46 (7%) 7         74 (74%)  26 (26%)  100 (16%) 8         22 (67%)  11 (33%)   33 (5%)------     --------- ---------  ---------- 8        438 (70%) 192 (30%)  630 (100%)The dialect regions are:dr1:  New Englanddr2:  Northerndr3:  North Midlanddr4:  South Midlanddr5:  Southerndr6:  New York Citydr7:  Westerndr8:  Army Brat (moved around)

1.2 Corpus Text Material

The text material in the TIMIT prompts (found in the file “prompts.doc”) consists of 2 dialect “shibboleth” sentences designed at SRI, 450 phonetically-compact sentences designed at MIT, and 1890 phonetically-diverse sentences selected at TI. The dialect sentences (the SA sentences) were meant to expose the dialectal variants of the speakers and were read by all 630 speakers. The phonetically-compact sentences were designed to provide a good coverage of pairs of phones, with extra occurrences of phonetic contexts thought to be either difficult or of particular interest. Each speaker read 5 of these sentences (the SX sentences) and each text was spoken by 7 different speakers. The phonetically-diverse sentences (the SI sentences) were selected from existing text sources - the Brown Corpus (Kuchera and Francis, 1967) and the Playwrights Dialog (Hultzen, et al., 1964) - so as to add diversity in sentence types and phonetic contexts. The selection criteria maximized the variety of allophonic contexts found in the texts. Each speaker read 3 of these sentences, with each sentence being read only by a single speaker. Table 2 summarizes the speech material in TIMIT.
TIMIT 提示中的文本材料 (可在文件 prompts.doc 中找到) 由 SRI 设计的 2 种方言 shibboleth 句子，MIT 设计的 450 个音素紧凑句子以及 TI 选择的 1890 个音素发散句子。方言句子 (the SA sentences) 旨在揭示讲话者的方言变体，并且所有 630 位讲话者都朗读。音素紧凑的句子旨在提供很好的音素对覆盖范围，而且语音上下文的额外出现被认为是困难的或特别令人感兴趣。每个讲话者读其中的 5 个句子 (the SX sentences)，每个文本由 7 个不同的讲话者朗读。从现有的文本来源中选择音素发散的句子 (the SI sentences)，the Brown Corpus (Kuchera and Francis, 1967) and the Playwrights Dialog (Hultzen, et al., 1964) ，增加句子类型和音素文本的多样性。选择标准最大程度地提高了文本中所找到的音位变体。每个讲话者读其中的 3 个句子，每个句子仅由一个讲话者朗读。表 2 总结了 TIMIT 中的语音材料。

2 个方言句子 (the SA sentences, dialect sentences)，对于 630 中的每个人这 2 个方言句子都是相同的。
5 个音素紧凑句子 (the SX sentences, phonetically-compact sentences)，尽可能的包含所有的音素对。Each speaker read 5 of these sentences (the SX sentences) and each text was spoken by 7 different speakers.
3 个音素发散句子 (the SI sentences, phonetically-diverse sentences)，为了增加句子类型和音素文本的多样性，使之尽可能的包括所有的音位变体(allophonic contexts)。Each speaker read 3 of these sentences, with each sentence being read only by a single speaker.

Table 2:  TIMIT speech materialSentence Type   #Sentences   #Speakers   Total   #Sentences/Speaker-------------   ----------   ---------   -----   ------------------Dialect (SA)          2         630       1260           2Compact (SX)        450           7       3150           5Diverse (SI)       1890           1       1890           3-------------   ----------   ---------   -----    ----------------Total              2342                   6300          10

prompt /prɒmpt/：v. 提示，鼓励，促进，激起，导致，(给演员) 提白 adj. 敏捷的，迅速的，立刻的，及时的，准时的，(商品) 即期要送的 n. 提示，提词，(电脑屏幕上的) 提示符，鼓励，催促，付款期限 adv. 准时地
occurrence /əˈkʌrəns/：n. 发生，出现，事件，发现
allophonic /,æləʊ'fɒnɪk/：adj. 音位变体的，同位音的，音子的

1.3 Suggested Training/Test Subdivision

The speech material has been subdivided into portions for training and testing. The criteria for the subdivision is described in the file “testset.doc”. THIS SUBDIVISION HAS NO RELATION TO THE DATA DISTRIBUTED ON THE PROTOTYPE VERSION OF THE CDROM.
细分标准在文件 testset.doc 中进行了描述。此细分与 CDROM 原型版本上分发的数据无关。

1.4 Core Test Set

The test data has a core portion containing 24 speakers, 2 male and 1 female from each dialect region. The core test speakers are shown in Table 3. Each speaker read a different set of SX sentences. Thus the core test material contains 192 sentences, 5 SX and 3 SI for each speaker, each having a distinct text prompt.
测试数据的核心部分包含 24 个说话者，每个方言区域分别有 2 位男性和 1 位女性。表 3 中显示了核心测试说话者。每个说话者读一组不同的 SX 句子。因此，核心测试材料包含 192 个句子，每个说话者 5 个 SX 和 3 个 SI，每个句子都有不同的文本提示。

Table 3:  The core test set of 24 speakersDialect        Male      Female-------       ------     ------1        DAB0, WBT0    ELC0    2        TAS1, WEW0    PAS0    3        JMP0, LNT0    PKT0    4        LLL0, TLS0    JLM0    5        BPM0, KLT0    NLP0    6        CMJ0, JDH0    MGD0    7        GRT0, NJM0    DHC08        JLN0, PAM0    MLD0

1.5 Complete Test Set

A more extensive test set was obtained by including the sentences from all speakers that read any of the SX texts included in the core test set. In doing so, no sentence text appears in both the training and test sets. This complete test set contains a total of 168 speakers and 1344 utterances, accounting for about 27% of the total speech material. The resulting dialect distribution of the 168 speaker test set is given in Table 4. The complete test material contains 624 distinct texts.
通过包括所有阅读核心测试集中包含的SX文本的所有发言者的句子，可以获得更广泛的测试集。这样，在训练和测试集中都不会出现句子文本。完整的测试集共包含168个扬声器和1344个发音，约占语音材料总数的27％。表4给出了168个扬声器测试集的最终方言分布。完整的测试材料包含624个不同的文本。

subdivision /'sʌbdɪvɪʒ(ə)n; sʌbdɪ'vɪʒ(ə)n/：n. 细分，分部，供出卖而分成的小块土地

Table 4:  Dialect distribution for complete test setDialect    #Male   #Female   Total-------    -----   -------   -----1           7        4       112          18        8       263          23        3       264          16       16       325          17       11       286           8        3       117          15        8       238           8        3       11-----      -----   -------   ------Total       112       56      168