Haplogroup R and Subclades

DNA Helix

Haplogroup R is defined by rs2032658 also known as M207. The group is believed to have developed about 19,000 to 34,000 years ago in Central Asia. In modern times descendants are common in Europe, South Asia, and Central Asia.

Supporting data from publicly available Haplogroup R related repositories is integrated as a service to the community. The Kits page contains a cross-reference list to track sample donors between labs. This allows testers to be placed to their closest branch in the Experimental Tree. To aid converting coordinates of variants placed in the tree, consult the Variant Index.

Contribute Y-DNA Data for Analysis

Haplogroup-R.org's emphasis is on collecting the original BAM raw data when possible to construct a phylogentic tree using the GRCh38 human genome reference. To contribute please use the BAM Submission Tool.

Direct to consumer sequencing data in the form of VCF, variantCompare, masterVar, and other formats can also be contributed to the Y-DNA Warehouse. The warehouse is a Private FTP server available to citizen scientists interested in promoting knowledge of the human Y-DNA tree.

YSEQ customers are encouraged to join Group 223: haplogroup-r.org Public Results. This group is the primary location monitored to collect new sequencing results.

News

2017-10-19:

The Data Use Policy has been amended to facilitate the changes needed for the Shared Data repository. The repository is a collection of variant calls sourced from the testing vendor by the data owners. The Data Use Policy lists all individuals with access to retrieve the data at this time. Project administrators who would like to gain access to their members' data should apply via email.

Raw BAM & FASTQ format submissions are not included in this private repository to protect privacy. Called formats produced by haplogroup-r.org may be added in the future with the data owners' consent.

2017-09-21:

Individual kit information for all R-FGC22501 Subclades removed per request from the haplogroup project. Kit owners who wish to be made visible again must send a request to the contact address in the page footer.

2017-08-21:

The private variants report has been improved. A small percentage of SNPs were not being presented when the same source had Sanger Sequencing verification performed. The report now also allows INDELs to show. GRCh38 results are currently showing a larger ratio of INDELs than expected. It is recommended advanced users do not attempt to verify these unshared INDELs via Sanger Sequencing. These changes also laid the foundation to redefine private to local tree context. Future updates will leverage this capability.

Swapped the ancestral and derived alleles for several basal variants to match observations in haplogroups P, Q and R men. Upstream variants that remain negative in the Kit "Known SNPs" report appear to be mixed calls, possible sequencing errors, or full back mutations.

A small percentage of kits with sequences close to the reference or low overall SNP testing remain classified to the wrong group. These will self correct as additional branches are approved.

2017-08-10:

Addressed an issue with group assignment reported by R-FGC5494 samples. The problem was caused by recurrent SNPs in branches related within 3000 years and sparse data loads. This has introduced a new control value for the best fit algorithm, which may require additional training. Please report any issues you may notice.

Adjusted the algorithm for the "Private Variants" report. Diploid (calls having more than one possible allele value) positions with low read depths have been removed.

"Known SNPs" report enhanced to show immediate descendants of the kit's terminal branch. This is intended to further support the placement, but could be used as a guide to see if any shared downstream variants remain to be tested.

Removed all terminal tree branch leaves with less than two supporting kits. Future updates will remove interior branches without supporting splits.

2017-07-28:

Matrix report generation is on-hold while enhancing the software. They are expected to resume late-August or early-September.