The first step in most data science and machine learning workflows is obtaining and preparing data for analysis. Imperfect data jeopardizes the end result, and in the case of detecting cancer, can have numerous negative consequences. At GRAIL, many data sources are factored into our cancer predictions, including a patient’s clinical information, lab processing metrics, genomic data, and bioinformatics pipeline output...