Recent comments in /f/IAmA

PatentSavvy t1_j7mo0tm wrote

Are you guys engaged in protecting your methods of drug discovery via patent applications? Or do you guys plan on protecting any potential candidates once their existence becomes known through the methods? Or both?

As a patent attorney, your model sounds interesting and I hope you protect your discoveries and inventions. I have been involved in patents relating to pharmaceutical design and drug development and have seen the various processes first hand. It definitely is an iterative and arduous process but it can be totally worth it in the end if you have that one successful candidate that proves therapeutically effective and obtains FDA approval.

2

ShakeNBakeGibson OP t1_j7mncyd wrote

Neither. Time is the most limited resource. So much unmet need and so much science to explore. Having a searchable database of 3 trillion gene and compound relationships results in a superabundance of potential insights. We want to focus our efforts on those where we have the highest confidence in the compound<>gene relationship and that addressing this biology has a high likelihood of addressing patient needs. To do this, we integrate additional automated layers of information, such as transcriptomics and SAR tractability to accelerate discovery and reveal which insights have the highest potential to benefit our vision of a diverse pipeline of high-impact programs. We have to spend a lot of time onboarding folks to think this way and that’s why time is our most limited resource.

8

IHaque_Recursion t1_j7mn89n wrote

So, data sharing in industrial science is complicated. I’ve spent my career in biotech driving for greater openness and data release in the companies where I’ve been. The “natural” state of data is to be siloed. This isn’t just an industrial thing – I’ve read plenty of papers from academic groups with “data available on request” (lol nope, I tried) – and the driver is always the same: a fear that “we spent this money to make the data, how do we get value out of it?”

One of the reasons I joined Recursion in 2019 was that Chris and the team shared that commitment to sharing learnings back to the world. The balance we’ve struck to support open science, but also use this data to drive internal research and develop therapeutics as a public company, is to share a huge dataset that is partially blinded. In RxRx3 we are revealing ~700 genes and 1600 compounds. We’ve sometimes chosen different points on the balance; for example, our COVID datasets RxRx19a and RxRx19b were released completely openly (CC-BY) because we thought the public health crisis was more important than any commercial interest we might have in the data. Our current aim is to continue to unblind parts of the RxRx3 dataset over time, so please stay tuned for additional releases over time.

We have also contributed to open science releasing not just datasets, but tools. Associated with our COVID datasets, we released a data explorer allowing folks to explore the results from our COVID screens. Along with RxRx3, we released a tool (MolRec) where people outside of Recursion can explore some of the same insights that our scientists use to generate novel therapeutic hypotheses and advance new discovery programs, and get a look at how Recursion is turning drug discovery from a trial-and-error process into a search problem.

16

ShakeNBakeGibson OP t1_j7mmu8i wrote

Always great to hear from a fan… we’re blushing.

But your question is good - mRNA works really well in some important parts of biology - like tricking your body into thinking it has seen components of a virus so it mounts an immune response. But mRNA is not probably the right tool for other areas of biology (like inhibiting an overactive protein).

We think Moderna’s work is awesome

3

IHaque_Recursion t1_j7mm269 wrote

I’ve been super excited to see how our datasets have driven academic research out in the world. Recursion has been on the cutting edge of developing phenomics as a high-throughput biological modality, and the RxRx datasets are among the largest and best-organized public datasets out there for folks to work with. I’ve seen blog posts, conference posters, MS theses, and more written on our datasets. (We’ve also hired a number of folks to our team based on their work on these data!)

1

ShakeNBakeGibson OP t1_j7mka71 wrote

We actually think about this a lot and we believe that these processes need to learn from each other. We build feedback and feed forward loops between dry lab and experimental work - essentially we think iteration is most important. We do up to 2.2 millions experiments in our wet lab each week to feed machine learning predictions and those predictions feed back into the wet lab experiment design. We do all of this in service of decoding biology and delivering therapeutics to patients.

&#x200B;

EDIT: Removed a typo.

3

BioRevolution t1_j7mjzjz wrote

Last question from my side: What are you plans around Closed Loop optimization?

You are experts in AI/ML and super-users/heavy on lab. automation. Do you have any ambitions on implementing workflows for autonomous experiments (also called self driving labs in some publications)?

Thanks a lot for taking the time to do this and answer all the questions, I appreciate it.

1

IHaque_Recursion t1_j7mjumw wrote

Batch effects are probably the most annoying part about doing machine learning in biology – if you’re not careful, ML methods will preferentially learn batch signal rather than the “real” biological signal you want.

We actually put out a dataset, RxRx1, back in 2019, to address this question. You can check this here.Here is some of what we learned (ourselves, and via the crowdsourced answers we got on Kaggle).

Handling batch effects takes a combination of physical and computational processes. To answer at a high level:

  1. We’ve carefully engineered and automated our lab to minimize experimental variability (you’d be surprised how clearly the pipetting patterns of different scientists can come out in the data – which is why we automate).
  2. We’ve scaled our lab, so that we can afford to ($ and time!) collect multiple replicates of each data point. This can be at multiple levels of replication – exactly the same system, different batches of cells, different CRISPR guides targeting the same gene, etc. – which enables us to characterize different sources of variation. Our phenomics platform can do up to 2.2 million experiments per week!
  3. We’ve both applied known computational methods and built custom ML methods to control / exclude batch variability. Papers currently under review!
4

ShakeNBakeGibson OP t1_j7mj1gd wrote

I’m really hard to work for…
In all seriousness, almost all of the executives at Recursion today have been with the company for four or more years, and we are proud of that track-record. That said, we have a really ambitious mission at the intersection of many diverse fields, and we fully support our current leadership while we make sure we get the right people into these roles.

1