Third Party Sites

8 Key Points about Raw Data Files

If you are going to download a raw data file from an at-home DNA test like 23andMe or AncestryDNA, here are eight key points to be aware of:

•       The responsibility for the security and privacy of the data is in your hands once you     download the file

•       Raw data have not been validated (thus files can, and often do, contain errors)

•       Raw data generated by one company will typically differ from the next

•       Raw data generally include markers from only a small fraction of the entire genome

•       There are consistent issues recognized for certain markers in raw data files (i.e. false positives, also called miscalls)

•       There are additional problematic markers currently unknown, uncomfirmed, and/or unreported

•       A raw data file without a separate tool to analyze it is generally not useful

•       A finding in the raw data can be a “hint” in the right direction but is never the final answer


One of my areas of specialty work through Watershed DNA is helping people with their raw data.

Raw data can be very useful for many purposes, but there are limitations. This doesn't mean using raw data is of no benefit. Rather, it's better to know there are both benefits and limitations, and be aware of them as you move forward.

I’ve taken my own raw data to many different third-party tools, some for genealogical purposes (investigating how my DNA matches other people's, for example) and some for health (figuring out if any well-established health risk markers could be found). My "insider’s" view has given me a better understanding of raw data generated by consumer genetics companies testing and how to make the best use of the data.

I have worked with a number of clients in the past interested in understanding how to “do more”. Some have wanted basic guidance in what direction to take with a raw data file, and some have wanted me to do more of the leg work. I’m happy to meet my clients where they are and help them on to the next steps.

Have a raw data file and interested in knowing what to do with it?

Reach out through my contact me button in the upper right corner. I’d be happy to work with you.  


Raw Data: What is it?

You know that phrase "No moss grows on a rolling stone"? I think the world of consumer genomics is best considered as the rolling stone that will never find an end.

Much has happened in the consumer genomics world in the past 8 months since I published a video on YouTube to explain "raw data" and its uses, benefits, and limitations.

It could use some updating, but the basic messages are unchanged: 

1) You can get more than you bargained for when you hunt through your raw data.

2) You might go through a period of confusion before you have a sense of clarity again.

3) You can contribute your information to research and help future generations.

4) No two people will have the same experiences or emotional reactions to downloading, uploading, and uncovering information from a raw data file.  

5) I am here as a resource.

Before you take your raw data out of your ancestry testing account, please consider stopping and watching this video: "DNA Raw Data: What is it?"

Reach out for a one-time consultation with me, before you make the download or after you've used a tool to sort through your raw data and have gotten back a report. I don't mind chasing a rolling stone with you! It makes for an interesting and enlightening journey, for sure.

Differences between third-party DNA reports and clinical genetics laboratory reports

This post assumes the reader already knows the meaning of certain terms like "raw file" and "third-party tool" and "VCF". Apologies to DNA newbie readers. 

People are using third-party tools that give consumers information they interpret as health information on themselves.

What comes out of these third-party tools is not actually a genetic health report, but it kinda "looks" like one. The people who manage one such third-party tool, Promethease, have explained in places like the comments section on Judy Russell's blog that they try to help people understand what Promethease is and isn't.  But based on the questions I have coming to me about this and other third-party tools on a regular basis, many people still have questions.

I'll keep on trying to help others understand what third-party tools can and can't tell them about potential health risks and will continue to point out the limitations of third-party tools. No matter how well curated the SNPedia database becomes (and there are many people admirably working toward this goal), it does not change the inherent limitations of the tool itself.

I acknowledge that people are going to use third-party tools no matter what an expert in genetic testing says. If you are going to do this, I support you and will offer guidance as best as I can.

But I would be remiss not to point out what an actual genetic health report from a clinical laboratory includes. 

A genetic health report from a clinical laboratory includes information that genetics professionals can use to gauge the utility and validity of test results. In other words, the details on a clinical report help to answer questions like "Are the results accurate?"  and "Are the results useful?" The type of info I'm talking about are things like testing methodology, for example. There are many different TYPES of genetic testing and different methods to test DNA, and none is anywhere near perfect yet.

I need details to put a genetic result in proper perspective, and a lone VCF file rarely gives me everything I need to assess the results. Clinical laboratory reports fit medical purposes better. These reports provide me with data such as whether confirmation studies were done on positively-identified genetic variants. Why does this matter? Because when you re-test a variant using the same or a different genetic testing method, sometimes a variant no longer shows up. If you repeat the entire test, something might show up that wasn't seen before.

That's right, genetic tests can be wrong! There I said it, and it didn't feel good, but it's true. Genetic tests can be wrong because going from a biological tissue to a computer file is a complicated process and failures can happen at many points along with way. Good news is that most of the time failures don't happen, and clinical laboratories have safeguards in place in case they do. 

Clinical laboratories include info to allow genetics professionals to judge whether the lab is adhering to standards and guidelines to try to cut down on the inevitable errors and mistakes. By including CLIA and CAP certification numbers on a report, for example.

All of this data is important me. I know that when I hold a clinical report in my hand (or, in these in modern times, see it on a screen), that data has come from a place where I can call up and ask questions to laboratory scientists, research scientists, and genetic counselors. Questions like, "Were you able to find a lot of studies or database entries of this rare genetic variant found? If the studies had conflicting results, how did you decide which one to believe? What is your rate of false-positives?" 

The details on a clinical report help me assess how well the DNA results have been analyzed, how confident I can be in telling someone they do or do not have a risk or a predisposition or a disease-causing genetic variant. Many people view certifications and regulations as roadblocks, hurdles, something to bypass or overcome. I view these things as necessary, important, and valuable.

A final note on raw data files at this time...

Raw data from an ancestry company (and raw data from a direct-to-consumer exome sequencing company for that matter) have not gone through quality checks. The data hasn't gone through confirmatory analysis. It's also representing just a smidgen of your entire DNA and only from one type of cell in your body - cells that slough off in your mouth. It's far from perfect, but this is where we are right now.

We'll get to that future of reliable genetic health information at low cost, eventually. But please don't expect that right now because we still have a lot of work to do to get there. 

Watershed DNA founder and client featured in June 2016 issue of Dr Oz The Good Life magazine

The story of a Watershed DNA client who sought out genetic counseling services after a home DNA test was highlighted in a six-page article in "Dr. Oz The Good Life" magazine (the June 2016 "popcorn" issue). Find the periodical on newsstands now, and learn more about the pros and cons of testing and using third party sites for analyzing your own DNA.  Other topics covered include "SNPs" and what you should consider about insurance coverage before ordering a DNA test.

Flip to page 50 next time you're line at the grocery checkout!

Flip to page 50 next time you're line at the grocery checkout!