Haplogroup R and Subclades

Introduction

Haplogroup R is defined by rs2032658 also known as M207. The group is believed to have developed about 19,000 to 34,000 years ago in Central Asia. In modern times the group is common in Europe, South Asia, and Central Asia. With the increasing popularity of genetic genealogy large amounts of data has been collected using direct to consumer testing. This site attempts to collect and interpret as much of this data as possible.

Methodology

The underlying tree structure was based on ISOGG's Y-DNA Haplogorup Tree 2015. There are important methodology differences which have prompted the branch:

  1. ISOGG's tree form has suffered from a manual update process. This results in not conforming to YCC's original principle of being "drawn as asymmetrically as possible by sorting the descendants of each interior node so that the bottom most descendant had the greatest number of immediate descendants." YCC, 2002.
  2. The submission requirements regarding Next Generation Sequencing are not observed in the same manner. Rather than excluding variants due to numbers of reads, alignment quality, proximity or inclusion in more active regions of the Y chromosome, these markers are coded on the tree. The ultimate goal is to create an accurate vision of the evolutionary tree with a willingness to restructure when nodes are found to be unstable in newer sets of data.
  3. All new branches must be evidence based to the samples where they were originally found. Information about the kit, surnames, origins, and testing platforms are presented with the terminal branch markers. In an effort to be more useful to genetic genealogists terminal branches are not restricted to arbitrary diversity criteria. As long as two men share a mutation, it forms a potentially interesting branch.
  4. Having the original sequencing results in the database allows age estimations using a variation of the methodology presented by Adamov et al's, Defining a New Rate Constant for Y-Chromosome SNPs based on Full Sequencing Data. The major deviation in the estimation method comes from using a recursive calculation for all samples under a given branch rather than using averaging of the child branches.

Data Policy

This policy was created to balance the rights and privacy of individuals, with the benefit to the whole community of gathering information for their research projects. We have tried to ensure the safety and privacy of any personal data, including data likely to have significant medical relevance, or which can identify a specific person. At the same time, we have tried to retain enough information that test results are useful for research, meaningful for close matches and can be cross-referenced against information on other sites.

  1. The following is the Policy agreed to on upload of data, between Submitters of that data (genetic testers or their designated proxies) and the Project. The Project is defined as those persons with administrative access to the data archive, or successors thereof.
  2. Submitters give the Project free license to analyse the genetic and ancestral data they submit, and publicly release semi-anonymized, filtered analyses of that data, and any associated meta-data found in the public domain. Released genetic data is to be limited to calls assigned to the Y-chromosome.
  3. Raw DNA sequencing data (e.g. BAM or FASTQ datasets) will only be shared with a member's explicit written consent. However, reduced sets of Y-chromosome data (including calls in VCF/gVCF format, test coverage information in BED format, and submitted meta-data) may be shared with co-operating projects.
  4. Tests are publicly identified by the meta-data supplied on submission, i.e. kit numbers and most-distant known paternal ancestor information. Project members may request that public reports anonymize all or part of this information to an internal project identifier instead. Such requests should be made by e-mail before submission to prevent public release of information.
  5. Submitters or legal data owners have the right to request that their raw data is removed from the data analysis at any time. However, since we release a reduced set of data into the public domain, we cannot guarantee these data are removed from external sites once the kit has been analyzed.
  6. The Project may contact Submitters about specific queries regarding their data, using the e-mail address supplied on submission. Sharing of e-mail addresses with any third parties will only be done with Submitters' consent.
  7. Minor updates to this agreement may be necessary, e.g. to modify or make explicit the names of people and parties; to include new data formats; to clarify specific points of ambiguity; or to ensure compliance with national and international law, existing privacy agreements with testing companies, or community guidelines. Such changes may me made by the Project without notification, provided they don't constitute material infringements of the rights and/or privacy granted to Submitters, as described in the version of the Policy they initially accept.

As of 19 October 2017, the list of project administrators is: James Kane (www.haplogroup-r.org), Alex Williamson (www.ytree.net), Iain McDonald (www.jb.man.ac.uk/~mcdonald/genetics.html), Mike Walsh (for the R-P312 project groups) and Jef Treece (data analyst).