Sequencing your DNA with a USB dongle and open source code

Sequencing your DNA with a USB dongle and open source code(stackoverflow.blog)

398 points by johntortugo 4 years ago | 176 comments

m12k 4 years ago |

I'm really curious about what I could learn by getting my DNA sequenced, but I'm worried about my rights to not have it recorded and shared without my consent if I got someone else to do it for me - so any advance toward an affordable home test setup is very welcome.

adabaed 4 years ago | |

Imagine insurers refusing to give you a service due to your predisposition to certain diseases...

meltedcapacitor 4 years ago | | |

Protection from this comes from laws that ban DNA-based policies, not by being secretive about sequencing. If it is allowed, insurers will have no need to obtain DNA sequences in devious ways, they will just ask and refuse cover or charge more when clients refuse to get sampled.

ajuc 4 years ago | | |

It's amazing how many problems you avoid by having public health system.

foobarbecue 4 years ago | | |

If you haven't seen Gattaca, you should

tekproxy 4 years ago | | |

Imagine gene therapy to fix the problems. After a few generations, many diseases will be extinct.

There's a guy on YouTube doing diy gene therapy to treat his lactose intolerance so it's not exactly science fiction.

dekhn 4 years ago | |

Note that you are literally shedding identifiable DNA from your body at all times and a truly motivated adversary would have no problem obtaining enough sample material to do high quality sequencing.

nomercy400 4 years ago | | |

It's not the motivated adversary I am worried about, who actually has to show up where I have physically been. It is the company on the other side of the world in a country with lax legislation, profiling me based on the data I 'shed' online, like a cloud-based DNA sequencing service.

ClumsyPilot 4 years ago | | |

The data monopolies and abuse originate from people giving these companies data for free. If they had to buy it, or pay goons to collect it, they wouldn't be profitable.

russdill 4 years ago | | |

In the near future (or arguably now depending on your purpose) you don't even need that. Assuming enough of your relative's sequences are available, the probability of you having certain genes/mutations can be narrowed down so much that having your individual genome doesn't add much.

tgsovlerkhgsel 4 years ago | | |

One of the key differences is that in the case of the DNA sequencing services, you're agreeing to ToS that allow them to abuse your data (and thus indirectly the data of any of your blood-relatives), and you directly tie the data to a name and address.

Teever 4 years ago | | |

I assume this line of reasoning is also why you don't lock your doors at night?

shukantpal 4 years ago | | |

At scale?

diplodocusaur 4 years ago | | |

I imagine one's DNA can't be too different from the cousin that agrees to share that kind of data?

Method-X 4 years ago | |

When I had 23andme sequence my DNA, I used a fake last name and pre-paid credit card.

albertgoeswoof 4 years ago | | |

But if enough of your relatives get sequenced, they’ll know who you are anyway

authed 4 years ago | | |

If your family members got it done with their real name, they will be able to create a link.

amelius 4 years ago | | |

Uh yeah, but you logged in from an IP address which Google already tied to your real name.

lostlogin 4 years ago | |

It’s out the bag now - you can be identified via relatives DNA.

https://www.latimes.com/california/story/2020-12-08/man-in-t...

dataflow 4 years ago | | |

I don't think that implies the increase in risk would be negligible, which was the parent's point.

Gatsky 4 years ago | |

It’s not exactly DIY but there are in theory ways to ‘encrypt’ your DNA before it gets sequenced. Something like amplifying/enzymatically modifying the DNA in a way that changes the sequence which you can undo computationally once you get the data back.

biophysboy 4 years ago | |

Its only valuable if somebody also interprets it for you, such as telling you whether you have a genetic predisposition for certain diseases.

m12k 4 years ago | | |

One of the other comment threads indicates that the data, that you need to do that kind of annotation of the sequence, is to some extent available for home use as well: https://news.ycombinator.com/item?id=29695449

I'm really hoping someone will work on an open source "23andme@home" solution that ties all this together in an accessible way.

DoctorOW 4 years ago | | |

Is that not something software can theoretically provide?

refurb 4 years ago | | |

That's Prometheus, no? They got acquired however, but prior to that you could upload data anonymously and then browse the analysis. It was very rough though, just linking sequences to risk, but a lot of it was inconclusive.

fragmede 4 years ago |

I don't know if this is the exact nanopore USB dongle used in the article, but this one is $1,000 for the base package, first released in 2014

https://store.nanoporetech.com/us/minion.html

https://www.extremetech.com/extreme/190409-minion-usb-stick-...

cge 4 years ago | |

Note that Oxford Nanopore seems to have very much a "sell the ink/razor/etc" business model with their devices: that $1,000 package comes with one flow cell, which is a consumable and costs $900. They're essentially giving the device away for free.

On some of their larger devices (eg, the PromethION), they've moved outright to a "we lend you the device for free, you buy the consumables" model.

up6w6 4 years ago | | |

There is some exciting work around this flow cells to create something more durable. It would be really interesting to be able to buy something like that and use it in schools/personal hacks without worrying about small mistakes in the sample.

https://en.wikipedia.org/wiki/Nanopore#Inorganic https://nanoporetech.com/how-it-works/types-of-nanopores

hobofan 4 years ago | | |

IIRC you even have to send back the used flow cells to buy new ones so they can keep prices down.

koeng 4 years ago | |

Yep that’s the one. They update the flow cells over time. The bit they don’t tell you is the stuff you need, like a qubit, to properly run the thing.

joshuamcginnis 4 years ago | | |

A qubit or fluorometer isn't required. You can use a simple DNA ladder to measure the relative quantity and quality of DNA that's good enough for nanopore sequencing. I just did a full genome sequence of a novel fungus using this exact approach.

LinuxBender 4 years ago |

This is very cool. Are there by chance any associated projects that could evolve into something like 23andme but remain entirely within a private network meaning that the data is entirely in the hands of the individual?

glofish 4 years ago |

Alas the information presented is an over simplification of the process.

To actually sequence DNA with this USB thingy you need to prepare a so called sequencing library - and for that you need a fairly well equipped lab - expensive reagents and years of practice and skill ... a mid level biology Ph.D can prepare these ...

in addition the flowcell sold by Oxford Nanopore often malfunctions and the whole run is a bust ... (behaves like this since 2014 ... so no, the technology does not seem to improve a whole lot)

9tailedkitsune 4 years ago | |

Yep

inglor_cz 4 years ago |

DNA sequencing bugs me quite a bit.

On one hand, I would love to learn something new about my body.

On the other hand, what if the results tell me that I am predisposed to some horrible untreatable disease? Will I spend the rest of my days observing every little pain or discomfort and thinking "is this IT?"

Cyclical 4 years ago |

Nanopore sequencing is a really interesting technology. It utilizes fundamentally the same apparatus as a Coulter Counter [1], which is a general method of counting and sizing arbitrary particles that's frequently used in flow cytometry. Applying it to sequencing by drawing unwound DNA through the pore was a really excellent logical leap, and we're only now starting to see the benefits of even though it was first ideated over 30 years ago.

[1] https://en.wikipedia.org/wiki/Coulter_counter

unemphysbro 4 years ago |

Happy to see this year. I worked on solid-state nanopore development as a part of my PhD.

Now I'm a Data Engineer doing backend work in public sector. :)

Here are some press releases related to articles I published during my PhD:

https://physics.illinois.edu/news/article/34064

https://www.sciencedaily.com/releases/2014/10/141014095320.h...

a-dub 4 years ago |

the nanopore units are awesome! although if i recall, most of the device is a replaceable one time use consumable and the cost of that consumable is quite expensive (at least hundreds, if not thousands).

when i looked i was interested, but was turned off when i saw that the cost far outstripped commercial sequencing services.

walterbell 4 years ago |

There are some bio HackerSpace labs with memberships open to the public.

London, UK https://biohackspace.org/

Brooklyn, NY, https://www.genspace.org/

Baltimore, MD, https://bugssonline.org/

Australia, https://foundry.bio/

up6w6 4 years ago |

Reports of people trying to use it at home without any special lab:

https://abarry.org/dna-sequencing-in-our-extra-bedroom/

http://blog.booleanbiotech.com/sequencing-at-home-with-flong...

GekkePrutser 4 years ago |

I don't see any reference to the "USB dongle" mentioned in the title. I was thinking this would be some cool thing you could do at home.

dekhn 4 years ago | |

https://nanoporetech.com/products/minion

GekkePrutser 4 years ago | | |

Ah thanks! Not something 'just for fun', so. But good to see this tech is becoming more affordable!

kingcharles 4 years ago |

So, how long before I can take my DNA "ROM" file and boot it in an emulator that would allow it to grow?

Lev1a 4 years ago | |

An idea just popped into my head reading your comment:

What if you could take the (binary) data file of your DNA and use it as input in the (recently remastered) Monster Rancher games to generate a monster? Apparently those games use external user-provided data (like music CDs, game discs etc.) to generate the monsters the player would then train and use (something I only recently learned about through gaming livestreams).

I'd actually like to see the level of jank that would come out of something like that.

callesgg 4 years ago | |

Many years, we still have problems simulating a single protein folding correctly. If we don’t find some new algorithm for simulating cells we would need computers that are billions of times faster than our current ones.

Also your dna is bootstraped from your mothers cells. And the prenatal environment has quite a large effect on development so your simulation might end up quite different from you if we only started with your dna.

dekhn 4 years ago | |

it's unlikely we would ever be able to achieve this. Even simulating a single cell at high resolution is a serious challenge.

GistNoesis 4 years ago | | |

It's likely that you don't have to simulate even a single cell at high resolution to be able to simulate how an organism would grow. There are numerical shortcuts.

For example today we can already predict the color of the eyes and other phenotype from the DNA.

If you are able to observe enough samples of cell growth and their associated DNA, you probably can model and predict the statistics of a cell from their DNA. Because the cell is itself the result of a lot of chemical processes, the law of large number will help smooth those statistics.

Given that we have a lot of cells, the collective behavior is probably entirely governed by these statistics.

323 4 years ago | | |

You seriously underestimate the continuous growth of computer power. And quantum computers after, which are perfect for simulating chemical reactions.

What was unthinkable 50 years ago, playing chess better than a human, it's now trivial for a $100 device.

And it's not necessarily required that to simulate the growth of a human you'll need to simulate the entirety of chemical reactions in all 50 trillion cells and all that.

twotwotwo 4 years ago |

A researcher mentions using a compact index based on the Burrows-Wheeler Transform to fit things in less memory compared to using a huge hashtable.

I see open-source implementations of BWT-based indexes (FM-Index/FMtree) out there. Out of curiosity, does anyone know of anything using BWTs for compact indexes in more everyday uses (like full-text search), or alternately reasons it doesn't really work outside the genome-alignment use case? Likely it only 'pays for itself' if you really need the space savings (like, it's what makes an index fit in RAM) or else we'd see it in use more places. It'd still be kinda neat to actually see those tradeoffs.

jltsiren 4 years ago | |

There was some interest in the information retrieval research community 10-15 years ago, but I don't think anyone ever found a good application for it. Some limitations of the BWT always got in the way.

The BWT sees strings as integer sequences. Either "ABC" and "abc" are two unrelated strings, or you normalize before building the index and lose the ability to distinguish between the two.

Search proceeds character-by-character backwards, jumping arbitrarily around the BWT using the same LF-mapping function as when inverting the BWT. You get cache misses for every character.

BWT construction is expensive, because you want a single BWT for the entire string collection. There is a ridiculous number of papers on BWT construction, as well as on updating and merging existing BWTs, but the problem has still not been solved adequately. If your data is measured in gigabytes, you can just pay the price and build the index, but a few terabytes seems to be the practical upper limit for the current approaches.

You can of course partition the data and build multiple indexes, but then you have to search for each pattern in each index. There is no way to partition the data in a way that different indexes would be responsible for different queries.

twotwotwo 4 years ago | | |

All interesting! Thank you.

dekhn 4 years ago |

Folks are free to analyze my genome, https://my.pgp-hms.org/profile/hu80855C

Last time it was analyzed the conclusion was that there was nothing actionable.

zmmmmm 4 years ago | |

Have you ever encountered any insurance implications from it? eg: questioned whether you have ever had a genomic test etc. and had to answer yes and then them wanting to see results?

I guess in your case where nothing actionable is found it's benign. It will be the cases where there are risk factors for late onset things - cancer, diabetes, heart disease etc. where it would get sticky.

dekhn 4 years ago | | |

No, my health insurance company doesn't care about my whole genome data. Health Insurance companies are already quite skilled at (and profitable due to) their ability to model life expectancy and health issues without genomic data, and they are legally prohibited from using this data, in my country anyway. Life insurance is different (they are allowed to incorporate much more information) but I've never been asked for anything like that.

As for the case where nothing actionable is found- it's not benign. It's absence of information, not information of absence.

lend000 4 years ago |

How does it get the DNA to go through the hole?

Cyclical 4 years ago | |

Initially, the DNA is brought near the pore through diffusive (brownian) motion + any small attraction it'll have to the membrane. Close to the pore it uses a combination of the electrophoretic and electro-osmotic effects to draw the DNA molecules through. The application of an external magnetic field will cause the charged DNA molecules to migrate along the field (electrophoresis). This is independent of the fluid, and happens to any ions under voltage. The electro-osmotic flow, on the other hand, is a motion of the fluid itself, pulling the DNA molecules along with it. EOF is a really interesting phenomenon which is caused by the interaction between the surface chemistry (vis-a-vis charge distribution) and the concentration gradient of charge carriers in the fluid. I'd recommend Fundamentals and Application of Microfluidics by Nguyen et al if you're looking for a good primer on electrically induced flows in microfluidics.

wombatmobile 4 years ago |

> Why not make the software into a proprietary product? ... There’s such a race there that it’s hard to commercialize the software for the long term.” Schatz continues, “Plus our work is largely funded through government sponsored grants, so this is one of the important ways for us to give back to society.”

In some people's thoughts, making a better society is the first and most obvious thing to do with technology like this, not an accidental consequence of inconvenience. Fortunately, enough of those people are active in the world to make Main Street different to Wall Street, at least sometimes.

klmr 4 years ago | |

It’s a weird quote anyway since there is commercial, proprietary software for DNA sequence analysis. Just a few examples of companies in this space are Sentieon, Edico (acquired by Illumina) and Parabricks (acquired by Nvidia). And Michael knows this (they’re sufficiently well known, and his own research laid some of the earliest foundations that Parabricks would ultimately build upon) so I’m assuming the quote was taken out of context or he was talking specifically about his own lab.

thadk 4 years ago |

Maybe at our local library we should be able to check these nanopore sequencers, or even other devices like simple & robust medical devices like handheld ultrasound devices that plug into iPad's?

luxpir 4 years ago |

There is a 3+ year old London-based project, partnered with an established genome sequencing company, doing something highly interesting.

They sell swab kits directly, or via NFT purchase, for ~$500 for a 30x near complete sequencing (that's 30 passes for over 99.9% vs 0.2% for 23andme et al). The results are stored in an encrypted AMD SEV-E vault to be accessed by big pharma or individuals, only for specific markers, in exchange for the $GENE token paid directly to the genome owner. Figures touted are $50-80 per request. This token is burned as kits are sold, can be staked, offers rewards like DAO membership, can be gifted to charities researching specific diseases in various populations. It can act as a form of UBI in unbanked populations and puts your DNA back in your control.

To me it's the best use of web3 tech I've come across, so disclaimer, I am invested and a DAO member, but it's early in the project still. They are not quite ready for mass marketing. They are moving over to Polygon for very low transaction fees in January, will be launching the first joint NFT/kit sale (the next season might include personal genetically generated art) to fill the vaults with 10k sequenced genomes. They are over half way already through work with charities, but that is the magic number before big pharma can start making queries. Right now though they are quietly building and preparing before marketing plans kick in later in Q1.

Take a look at https://genomes.io where everything is explained in more detail, the team are presented and the tokenomics set out.

TL;dr - for $500 right now you can get your entire genome sequenced, stored in a vault to earn you passive income, if you agree to each query. But wait for the NFT vs buying directly, it will have more perks.

billiam 4 years ago |

TMI.