Insights from Nathan WanWe spoke to Nathan Wan from Freenome about how he uses machine learning to help detect cancer from blood samples.
Colonoscopies are a good way to detect cancer early, but no one wants to get one “just in case.” However, patients regularly have their blood taken as part of their annual check up.
What if we could provide the benefits of a colonoscopy with the convenience of a blood test?
This is what Nathan and his team at Freenome are working on.
The role of colonoscopies and using a blood sample to detect colorectal cancer
Almost 1 in 20 people will be affected by colorectal cancer – which is commonly known as bowel cancer – in their lifetime.
In line with this, Nathan’s company has a vision which is to be able to detect early warning signs of colorectal cancer from a standard blood sample. If doctors can detect warning signs in a blood sample, then they can recommend an interim diagnostic colonoscopy to diagnose colorectal cancer.
As such, standard blood samples wouldn’t necessarily replace a colonoscopy completely, but it could indicate whether a diagnostic colonoscopy might be required. Therefore, patients would still be advised to get a colonoscopy every 5 or 10 years.
Besides the colonoscopy and blood samples, other diagnostic options for colorectal cancer are societal-norm problems as much as technical because these procedures may lack accuracy and are usually uncomfortable for the patient. As Nathan points out:
“Some competitors do it, but they use fecal matter instead of blood. There are concerns about both accuracy and compliance: patients are not always willing to defecate into a little bag, seal it up, and send it for analysis.”
The process of detecting cancer with blood and machine learning
Simply put, Nathan’s team is building a system that takes a tube of blood as input and predicts whether the patient is healthy or presents cancer risk indicators.
Nathan’s team uses a “multi-omic” approach to handle the data that they extract from blood. He explains:
“That means we're interested in measuring more than just one type of molecule or one measurement. So not only are we going to try to run one assay to measure DNA, we're also going to measure protein and potentially RNA or other ways to compute molecular measurements.”
“So there's some chemistry involved, and there's potentially some other hardware or lab machines to measure the molecules. We take multiple measurements of those molecules and combine them together into a classifier, and use that classifier to predict cancer, using both DNA and proteins.”
The challenges related to Nathan’s work
Compared to his time at Google where Nathan was given the data and tasked only with processing it efficiently, he says the challenges are much broader in his current role.
Requiring more data to make accurate predictions
Realistically, Nathan and his team need more patient data that cannot be extracted from a patient’s blood work to make accurate predictions. For example, important data includes whether the patient smokes, what medications they take, and whether they have existing lesions that could grow into cancer:
“There are many sources for our data: whether we are collecting samples from a blood bank or we're actively going out and collecting them from patients.”
Getting the data pipeline accurate and certified
Nathan and his team have to make sure that the data is collected properly, that the right models are trained, and that their estimated performance is accurate:
“The experiment design and the way data is generated has a much bigger impact on a model’s final usability than any algorithm you use.”
He says getting the right regulatory approval is challenging too:
“We have to build the airplane as we fly. Our pipeline is constantly changing and moving as we try to submit it to the FDA. As the model changes, the assay will also change, and that in turn affects the model.”
Once the data has proven itself in a research setting, it gets handed over to an engineering team to prepare for the FDA. Nathan emphasizes the difference between research and production settings:
“There's no way we would bring all the research code to the FDA and say, ‘Please approve our product because we had this crazy idea that didn't even work.’ We create a completely separate system for the final version. We actually make sure that there's no dependencies on the research system.”
“We work very closely with them to extract the relevant bits and keep it lean. It's like a research presentation. There's a lot of messiness that happens in the background, but we take the part that worked well – the most important bits – and just present those.”
Structuring teams and working with non-technical people
Nathan works in a diverse team of computational biologists, machine learning scientists, statisticians, and research engineers. People with different expertise work together fluidly:
“We may have statisticians who are not the best at Python, but when they partner with someone who's much better at Python, they're able to work cohesively to make faster progress. We come from academia, from industry, from tech, and from biology, so it's a really nice mix of different people.”
Though his team is well-functioning and dynamic, it can be challenging to work and communicate effectively with so many different specialists. What Nathan brings to the table is his experience with cross-functional interaction. He says:
“I'm definitely not an expert in any one of these fields, but it’s important to understand where people come from and what their work involves.”
“I think you have to develop an aspect of this in any technical field: whether it's communicating with biologists, chemists, or business executives.”
Detecting overfitting and bias
Machine learning is powerful, but it’s easy to fall into the trap of overestimating your own models. Nathan says:
“It's really easy to overfit an example or a model. How do we identify potential confounders or anything that might affect our data without being explicitly involved in data collection?”
There’s no easy solution to this:
“We spend a lot of time interrogating our own results. And we spend lots of time thinking about how we might have overfit.”
Working with human data is expensive
Even if you get everything right, dealing with biological data is expensive. As Nathan puts it:
“The process of collecting human samples, or live samples, is incredibly expensive and incredibly new.”
As such, Nathan and his team endeavour to raise money for their research. He explains:
“We raise a lot of money, and a lot of that money's actually going into collecting these samples.”
Clearly, Nathan and the Freenome team have to deal with several challenges in the face of achieving their vision, but to Nathan the challenges and complexity of the work are welcomed. Nathan’s overarching goal, despite the technical breakthroughs of their work, remains the potential impact that machine learning will have on the way people think about health in general:
“If we're successful, we’ll really change the way people think about healthcare and about treating and diagnosing cancer. There is potential for a huge impact on the way we as a society think about healthcare.”
Nathan studied computer science and electrical engineering, along with computational biology. He went on to do a six-year stint at Google, joining a machine learning team.
Still, Nathan sought bigger challenges and more impactful work, so he moved on to his current position at Freenome. He says:
“I was looking for a company where the problem was a lot more challenging. I’m very excited by the potential for incredible impact as well.”
Are you working on medical diagnostics?
We’re a machine learning agency that specialises in metabolomic analysis. If you’re working on similar problems, we’d love to learn more. Contact us at any time to discuss.