BDPA

Download the Data of the BDPA

The phonetic alignment data is available in form of specific text-based alignment formats for pairwise and multiple alignments. A closer description of the formats we use can be found here.

Pairwise Phonetic Alignment Benchmark

For pairwise alignments, we offer three different benchmarks:

Covington's (1996) original benchmark dataset,
our master benchmark containing the most diverse 7126 sequence pairs automatically chosen from our multiple alignment benchmark in two flavors, as global and local alignments
a benchmark containing a selection of the most diverse 1089 sequence pairs automatically drawn from those languages in our multiple alignment benchmark that are tone languages, again, both in a global and in a local variant.

All files can be downloaded from here.

Multiple Phonetic Alignments

For multiple alignments, we offer a large master dataset of 750 files. This masterset can be subdivided into several small datasets according to different criteria, such as

the language family from which the data is taken, or
the diversity of the phonetic sequences that occur in the alignments.

All files can be downloaded from here.