It was initially very difficult to separate the “noise” of false positive matches from the real matches. Many techniques were tried, including using gene boundaries to ascertain unmodified genes passed on from ancestors to descendants. However the gene matching techniques were difficult to correlate with currently accepted practices of autosomal matching and cM to generation estimates. Alternative tables were developed for the gene matching method to infer generational estimates. Another problem encountered with the gene matching was that it was difficult to determine which genes were genealogically relevant and which were simply passed to mass population groups from ancient times.
Version 2 of the testing reverted to using the chromosome SNP segments without taking gene boundaries into consideration. However unlike gedmatch and others, no allowance was made for small genetic differences. Rather, segments were calculated where each SNP in the segment was matched using the accepted half-identical region matching technique of matching at least one nucleotide from each kit on the same SNP. This however resulted in too much “noise” of false positive matches between unrelated people, or between children of related people but not the parent and the related person.
To address this problem a simple weighting was applied which consisted of multiplying the number of SNPs in the segment by the number of cMs in the segment. A weighting threshold was set for each segment to meet, and a second weighting threshold for the sum of the weightings of each segment that had passed the segment weighting threshold. Applying both of these allowed the screening out of false positives to be significantly more effective. The philosophy of the weighting is that an exact SNP cutoff or cM cutoff was impractical and error prone. So a small amount variance in both measures was necessary. The weighting allowed for these small variances while simultaneously screening out unlikely matches such as large cMs with small SNPs and vice versa.
The Ancestry build 37 results are the most promising and show a linked relationship between some known relatives and some probably related relatives. This is an encouraging result and further kit results from more known and probably known relatives are needed to further this study.
The linked relationships that were found are:
Also related to the linked relationships found are known relationships that did not show up:
From these findings the following two potential genealogical relationships should be investigated to attempt to find a paper trail:
People who have tested on FTDNA but not Ancestry might consider also testing on Ancestry as doing so does reveal more connections with people who have only tested on Ancestry, and many different SNPs are tested as well.
People who have tested on Ancestry but not FTDNA might consider also testing on FTDNA for the same reasons.
People who have only provided me with Build 36 FTDNA files might consider providing me with the Build 37 FTDNA file as well so that their results can be included in the Build 37 FTDNA analysis.
Inferring relationships at genealogical distance of 5th/6th cousin seems possible via relationship linkage as has been found above. Finding direct DNA relationships of a distance of 5th/6th cousins is much less likely due to the limited amount of shared DNA at that distance, and the fact that the “noise” of false positives masks any shared DNA that might exist.
The results are encouraging and further analysis will be undertaken when more kit results are provided.