The Power of Universal Latent Space In Medical Breakthroughs

The cost to develop new medicines has grown tremendously despite our computing and medical advances. Although we have a greater need than ever for massive breakthroughs in medical science, the industry itself is still a slow, highly regimented field. Mason Victors, CTO and CPO for Recursion Pharmaceuticals believes that universal latent space is vital to the drug discovery process and outlines ways that this could revolutionize how we make breakthroughs in healing.
The Problem With Drug Development

Advancements in drug development haven’t kept up with technology. BioPharma still follows a slow pace, often focusing on a single drug for years in a sequential approach. A single disease, a single hypothesis, a single biochemical goal. It’s slow. Even worse, it’s expensive.
The product stages of the drug development pipeline are ripe for disruption. Technology is ready to model that biology and can finally explore many diseases and hypothesis in parallel, or removing the hypothesis altogether, paving the way for true, continuous innovation. If other industries can accomplish this, Victors believes the BioPharma space can as well.

Imaging data is cheaper, contains vastly better data, and is the future of drug development scalability. Because their approach is so detailed and highly targeted, a single image could house so much relevant data that it drastically reduces the cost to train processes. It’s rich, and it’s fast.
Representational Learning Versus Classification

Hand selecting and training with humans is prohibitively difficult. The classification process could take years to get completely right whereas deep learning can learn abstraction in a computational way. Deep learning can describe those classes without this type of human labor. Learning those spaces that aren’t precisely human defined opens up the field to authentic deep learning.
It’s different from machine learning because the machine can learn as it’s training, instead of teaching models and tasks independently. A simple example would be using machine learning versus using transfer learning to predict the listing length and selling price. Traditional ML would learn those models separately while transfer learning would overlap similar information to learn both models together.
Triplet Networks for Representation Learning

If it were a simple classification problem, you open up your neural network to cheating. Systems are remarkably good at this. In one famous example, a network learned to classify tanks versus other armored vehicles because tank pictures were always taken during the day. Certainly clever but not good. Instead, representational learning accounts for hidden layers without training your system to “cheat.”
Triplet networks also handle the problem of overfitting. The layers don’t alter with new classes, and the data sets aren’t tied to the specific class. The models don’t have to change, and you reduce the waste associated with each new problem.
You do have to select triplets very intelligently, or the system builds up something Victors calls stalled learning. This system isn’t meant for naive learning; instead, making sure your triplet selection is built through a large batch of data (all things similar, all things contrasting), and those similarities fall on the same diagonal.
Applying These Principles

His methods provide better implementation than simple naive selection, adding stability to the space. The system is highly consistent despite learning two different systems. The approach at scale is complicated, and you need a lot of computing power. Using a system like Horovad approaches that theoretical training input and is something Victors suggests looking into if you’re planning to mimic this type of latent space for other projects that require distributed deep learning.
Other Modalities

Common representations with tasks are also possible. Images and structures are pushed into the same latent space, opening up the possibility of discovering what compound could mimic the effects of a certain disease.
Experiments take about a week to run with Victors’ lab, and the lab uses domain adaptation to overcome batch effects (i.e., effects you can’t control like lab humidity). This method can be used to ignore noise and amplify the appropriate signal.
Their lab uses even higher level functions such as the incorporation of context to amplify relevant operations and apply this learning to specific models. Domain adaptation (forgetting elements) can also be used to forget when you can’t control your model and conditioning (incorporation of context) can be used where you can control your model.
A New Wave of Medical Breakthroughs
Using this multitask neural network type of learning allows machines to reduce waste and shorten the iteration time for medical breakthroughs. Using existing technology could revolutionize the way we research new drugs and how we unlock the secrets to the diseases and conditions driving that research. The medical field needs this upheaval, and thankfully, we’re building the capability to do so.



