Download the Data of the BDPA

The phonetic alignment data is available in form of specific text-based alignment formats for pairwise and multiple alignments. A closer description of the formats we use can be found here.

Pairwise Phonetic Alignment Benchmark

For pairwise alignments, we offer three different benchmarks:

  • Covington's (1996) original benchmark dataset,
  • our master benchmark containing the most diverse 7126 sequence pairs automatically chosen from our multiple alignment benchmark in two flavors, as global and local alignments
  • a benchmark containing a selection of the most diverse 1089 sequence pairs automatically drawn from those languages in our multiple alignment benchmark that are tone languages, again, both in a global and in a local variant.

All files can be downloaded from here.

Multiple Phonetic Alignments

For multiple alignments, we offer a large master dataset of 750 files. This masterset can be subdivided into several small datasets according to different criteria, such as

  • the language family from which the data is taken, or
  • the diversity of the phonetic sequences that occur in the alignments.
All files can be downloaded from here.