Welcome to the Y-DNA Data Warehouse

The purpose of this project is to collect Y-DNA related test results from a variety of sources and make that information available to citizen scientists. The information in this collection is hosted by the Haplogroup R project and access to it is governed by our data policy outlined below.

Just in case we may need to contact you if issues arise processing your file.
Test Information
The most common reference build is GRCh37/h19, but new Big Y's are GRCh38/hg38. The easiest check to determine which you have is to look at the size of the ZIP archive. A file size larger than 1MB for a Big Y VCF/BED ZIP archive is GRCh38/hg38. It is expected that other labs will begin issuing GRCh38/hg38 results as well.
Most Distant Known Paternal Ancestor (Optional)
The modern name of the most specific location where your ancestor was born. If you are only certain of a new world country, please use it. Check that the map pin corresponds to the correct region before submitting the form.
Any additional information you would like to share publicly about the known ancestry of this tester. 2048 characters left.

Raw data upload

Warning: The maximum upload size of the ZIP file is 60MB. Please contact us if your analysis files exceed this size.

Data Policy

This policy was created to balance the rights and privacy of individuals, with the benefit to the whole community of gathering information for their research projects. We have tried to ensure the safety and privacy of any personal data, including data likely to have significant medical relevance, or which can identify a specific person. At the same time, we have tried to retain enough information that test results are useful for research, meaningful for close matches and can be cross-referenced against information on other sites.

  1. The following is the Policy agreed to on upload of data, between Submitters of that data (genetic testers or their designated proxies) and the Project. The Project is defined as those persons with administrative access to the data archive, or successors thereof.
  2. Submitters give the Project free license to analyse the genetic and ancestral data they submit, and publicly release semi-anonymized, filtered analyses of that data, and any associated meta-data found in the public domain. Released genetic data is to be limited to calls assigned to the Y-chromosome.
  3. Raw DNA sequencing data (e.g. BAM or FASTQ datasets) will only be shared with a member's explicit written consent. However, reduced sets of Y-chromosome data (including calls in VCF/gVCF format, test coverage information in BED format, and submitted meta-data) may be shared with co-operating projects.
  4. Tests are publicly identified by the meta-data supplied on submission, i.e. kit numbers and most-distant known paternal ancestor information. Project members may request that public reports anonymize all or part of this information to an internal project identifier instead. Such requests should be made by e-mail before submission to prevent public release of information.
  5. Submitters or legal data owners have the right to request that their raw data is removed from the data analysis at any time. However, since we release a reduced set of data into the public domain, we cannot guarantee these data are removed from external sites once the kit has been analyzed.
  6. The Project may contact Submitters about specific queries regarding their data, using the e-mail address supplied on submission. Sharing of e-mail addresses with any third parties will only be done with Submitters' consent.
  7. Minor updates to this agreement may be necessary, e.g. to modify or make explicit the names of people and parties; to include new data formats; to clarify specific points of ambiguity; or to ensure compliance with national and international law, existing privacy agreements with testing companies, or community guidelines. Such changes may me made by the Project without notification, provided they don't constitute material infringements of the rights and/or privacy granted to Submitters, as described in the version of the Policy they initially accept.

As of 19 October 2017, the list of project administrators is: James Kane (www.haplogroup-r.org), Alex Williamson (www.ytree.net), Iain McDonald (www.jb.man.ac.uk/~mcdonald/genetics.html), Mike Walsh (for the R-P312 project groups) and Jef Treece (data analyst).

As the legal owner of this data, or their proxy, I agree that this data can be used as per the Data Policy.