Download the Data of the BDPA
The phonetic alignment data is available in form of specific text-based alignment formats for pairwise and multiple alignments. A closer description of the formats we use can be found here.
Pairwise Phonetic Alignment Benchmark
For pairwise alignments, we offer three different benchmarks:
- Covington's (1996) original benchmark dataset,
- our master benchmark containing the most diverse 7126 sequence pairs automatically chosen from our multiple alignment benchmark in two flavors, as global and local alignments
- a benchmark containing a selection of the most diverse 1089 sequence pairs automatically drawn from those languages in our multiple alignment benchmark that are tone languages, again, both in a global and in a local variant.
All files can be downloaded from here.
Multiple Phonetic Alignments
For multiple alignments, we offer a large master dataset of 750 files. This masterset can be subdivided into several small datasets according to different criteria, such as
- the language family from which the data is taken, or
- the diversity of the phonetic sequences that occur in the alignments.