So Julia shared this awhile ago and it’s been stuck in my head and I owe her a response.
I’m talking, of course, about Unlearn Intelligent Control Arms. The basic idea is to save time and cost in recruiting control groups for medical clinical trials. If you’re testing a new medication or therapy you have to recruit a bunch of people, half of whom receive a placebo and you’re essentially paying a lot of money for people to sit around and do nothing. Unlearn promises to create “Digital Twins” who will exactly reflect the results that a control group would have, meaning you can have a much small control group and therefore save a bunch of money without losing scientific accuracy.
So I don’t think this will really work but I’m very confused about why it won’t work and went through three stages in thinking about this: it’s a scam, it makes a ton of sense, it’s confusing.
It’s a scam
There’s no details, a bunch of references to machine learning and rigorous statistics. A blob of marketing buzzwords. Total scam. What would machine learning even do here. There’s no iteration, no repetition, what would the algorithm even test on in a meaningful sense? What is this even doing?
It makes a ton of sense
The only thing it could really be doing is creating dummies based on real patients. And they do claim to have clinical data. But how’s that any better than just taking the average? Wait…why don’t we just take the average?
Hear me out. A lot of medications/therapies are testing well known issues like diabetes or heart attacks. And we know, we have tons of data, on these things. For example, let’s say we’re testing a blood pressure medication, we want it decrease the incidence of heart attacks by 15% for people 50-64. Well, we have a pretty darn good idea of the incidence of heart attacks for this subpopulation. There must be hundreds of thousands of medical records of people having heart attacks at 60. Why do we need a control group? Consider Kaiser Permanente. They have the medical records for like 25% of California. Any medical condition with a big enough market to justify research will have thousands of records in the Kaiser Permanente database, they can just tell you what the odds are of, say obese men between the ages of 57-62 developing diabetes within 2 years. Why do we need a control group? If the control group is successful and representative of the general population, shouldn’t it just reflect the healthcare data we already have?
To clarify, let’s go back to our theoretical heart attack medication. We want to decrease the odds of the people 50-64 getting heart attacks. Instead of having a control group, we go find the average risk of heart attack for people in this age range, which is 37/10,000 (literally the first result on google). We make some basic adjustments for the exact age/gender/weight/income/etc in case our test group isn’t perfectly representative. We then test the medicine and get an incidence rate of 25/10,000 people. What’s wrong with this?
Especially if you steelman this, which is I think what the Unlearn guys are doing, and just replace part of the control group with “digital twins”, or basically just the average. Like, instead replacing the whole control group, just replace half and fill the other half in with digital twins. Still saves about 25% of the clinical trial cost (or at least a decent chunk) and you should still get an accurate result.
Basically, a control group makes sense if we have no pre-existing information but if we do, if we know incidence rates for disease and health risks with a fair degree of detail, then why do we need a control group?
So, first, I’m just waiting for a stats teacher to yell at me. I don’t know why, my instincts are just screaming I’ve missed something obvious.
Second, why hasn’t this been done before. I mean, we can do “digital twins” with nothing more than averages or random samples. We literally did that in the example, no machine learning required. But there’s lots of smart people and money in clinical trials; why hasn’t someone done this by now?
I’m not posing this as answers, just as confusion. I don’t think this will work but I don’t know why.