Some basic aspects have to be pointed up to the user of this database:
This data base has to be understood as a supplement to the book "The Distribution of the Human DNA-PCR-Polymorphisms" by W Huckenbeck, K Kuntze and H-G Scheil (Verlag Dr. Köster, Berlin); ISBN 3-89574-300-3; 1997.
We offer data which have been published after time of going to press (June 1997). The user will find the completed population data and references. Additionally the data base includes the actualized pooled data (Data of the book and new data pooled by weighted arithmetical means) for the relevant populations.
These pooled values are characterized by bold letter. These data are found at the following links:
Misprints – in the book – can be found in corrected form at the same site. The corrected values are characterized by red colour.
In contrast to the book we decided that the sample size "n" should indicate the number of individuals instead of the number of alleles. This fact will be taken into account for the second edition of the book.
The descriptions of the populations examined are often insufficient. As far as possible they were cautiously standardized. In borderline cases the statements of the cited authors were assumed. Nevertheless, insufficient definitions could sometimes not be avoided. Descriptions like ‘China’ are not sufficiently clear for such a large area. Additionally, it was sometimes found to be more sensible to pick ethnic groups instead of political areas. For example, the data of Basque populations were pooled as ‘Basques’ without regard to their political affiliation.
As shortly mentioned in the preface an important loss of data was caused by the fact that a number of authors limited themselves to the isolated presentation of bar charts instead of allele frequencies. To avoid a falsification of the other authors’ data, we knowingly avoided a visual evaluation of the bar charts and consequent extrapolation of the allele frequencies, which would be possible on a small scale. Nevertheless, we hope that one day this data will be published in a correct form and then be taken into account for a second edition. As far as possible, frequencies with three decimal places had been converted into four decimal places, obvious incorrect allele frequencies had been recalculated using the published genotype frequencies
The partly differing nomenclature has also created some difficulties. Although the International Society of Forensic Haemogenetics (ISFH) has recommended standardization (4, 5, 6), some authors have used their individual or obsolete nomenclature. In each case we tried to bring this data into line with the usual nomenclature, but it was not feasible in every case. The accuracy of these adaptions can verified by the authors themselves. If we made mistakes here we ask them to be lenient.
One more problem is in the use of different DNA protocols. Varied techniques like the use of native or denaturated gels resulted in a graphic loss of data for example in the SE33 system.
The aim of this work was to create by pooling sample sizes as large as possible. But one more problem occured. Some authors described alleles found close together (sub-alleles) separately while some authors combined them. In order to avoid a major loss of data, most of the time we used the combined data for our calculations. This concerns the alleles FES*10/10 a and 11/11 a, for example, but also the subtyping in the HLA-DQa system: alleles *4.1 and *4.2/4.3. Due to the population specific importance of the allele TH01*10, for this system population samples with separately typed alleles *9.3 and *10 were handled in the combined version and separately.
In many cases the sample sizes are still inadequate. Apart from these exceptions, very low sample sizes were not taken into account. If possible, data of political (‘Germany’) and ethnic (‘Basques’) units respectively were pooled by use of the weighted arithmetical means. The aim was to create more solid data bases. Apart from this advantage this technique also has disadvantages: it appears bold to the authors (even though it is done) to pool data of such a large area as like China, Russia or India. It may happen that differing populations were compiled without considering the historical or genetic relations. Taking this into account in future, studies should attach more importance to the exact definition of the population samples. Another disadvantage of the pooling of data is that existing differences in populations may be concealed. In our opinion (with sample sizes as large as they are today) the errors caused by this effect will be small and can be neglected. Surely, in the future it will be desirable to get solid data bases for smaller geographical and political units, too. In any case, the user of the tables can fall back on the additionally cited single data.
In some cases of pooled data the sum of allele frequencies deviates clearly from 1. This is due to the fact, that the published data were not rarely found relatively unclear, a phenomenon which inevitably influenced the pooled data, too.