I really love this article! Things like this are *super* important to remember, especially now when big labs are intentionally nerfing AI biomed capabilities (one of the few positive applications of AI that society all agrees upon) for "safety".
Thanks so so much Kenneth! I'm curious - which biomed capabilities do you mean? Claude downgraded me to Sonnet 4.6 almost 100% of the time when doing research for this series, it seems like any curiousity about bioweapons and virology is seen as suspect by the current model. Is this what you meant?
Thank for flagging Ze Shen! The paper’s main evidence is a man who created a *chemical* weapon (much much easier than a bioweapon) and his “efforts to document the steps” of construction of viral pathogens which is definitely not a way to measure tacit knowledge. The study would have had to verify in the real world whether these documentations were actually increasing success to assess tacit knowledge similar to how ActiveSite did. :)
Can absolutely confirm on the pipetting thing. It's a bit like driving. When you can drive it seems very natural. Very few people are able to drive confidently and accurately and consistently after their first leason/hour of experience.
One of my more traumatic memories of undergrad was my dissertation supervisor getting unnecessarily angry (to a degree I could easily have reported him for) very, very quickly because I wasn't able to pipette like him within the first few tries. (FWIW my proprioception and hand-eye coordination and hand steadiness are all decent; I've picked up stuff from spilikins to target shooting to card flourishes reasonably well.) Like driving, I think people sometimes forget how hard it once was for them to do.
o.o That's rough - the skill feels almost random! From the interviews, it seemed that some people were very good and others weren't, and it was sometimes years of experience, but sometimes not!
The irony that Claude wrote most of the examples of 'tacit' knowledge :P (of course being able to name a challenge in abstract is far from being able to solve it).
This is a welcome and well written article (my jibe notwithstanding). The importance of tacit and procedural knowledge - both in pushing the frontier of the known/doable and in proliferating access to that - is routinely underappreciated by AI doomers and optimists alike. But the in principle (and plausibly soon - months to years) access and accumulation of tacit knowledge into AI is important to recognise, as you acknowledged towards the end.
Thanks Oliver - I did use Claude, though not for the category of examples e.g. the choice to focus on pipetting. It did help me get more info about why pippetting is hard - which I then checked with the bio folks. :)
One thing I'm super curious about is whether there's a way to track how much tacit knowledge is left to figure out? It seems hard to measure, but not impossible.
My sarcasm intended in comradely fashion - I thought the pipetting challenges maybe had AI voice and something like AI vibe (no body: no pipetting experience!). I trust you to have done sufficient diligence for readers to be able to take these as reasonable high-level descriptions and gestures at tacit knowledge bottlenecks.
I'm also curious about that! Some indirect approaches might look at degree (and depth) of automation, maybe something like '(automated) capital intensity', use of AI by lab practitioners, 'productivity' (perhaps as measured by other outputs than crude GDP). Even more indirect, but perhaps telling, might be indicators based on the distribution of years of experience of lab practitioners, and maybe overall number of lab employees in various capacities (also likely tracking number of new labs over time).
More direct... you could do something like surveys (on whom? likely very noisy and with some systematic bias, but could reveal trends). Another approach to estimation might be really fine-grained ecologically valid breakdown of subtasks, though of course this would miss the possibility for automation to 'route around' particular 'hard steps' right now (i.e. playing to comparative advantages of AI might unlock, or simply make prudent, alternative task trajectories that no human lab would pursue).
Do you have thoughts on this question? What about the more general case besides wetlab work?
Thanks, Oliver, these are great points. Things we've been investigating, at least in self-driving labs, is the fine-grain breakdown of subtasks. This seems to be what any physical automation has to focus on. They're automating very narrow workflows with a lot of effort e.g. custom-built robotics. Maybe this will end up like how automated vehicles slowly got to be more automated and have more abilities. They started out with just lane contro to then doing specific routes. I think the hardest thing here is that biology is weird and wacky. Every organism itself seems very unpredictable, which makes guessing or forecasting the transition hard. :)
Heh, yep, that makes a lot of sense for self driving. If you bracket out the 'routing around' (or more generally, changing priority distribution over multi-step trajectory paths) problem, this kind of breakdown is a great way to go. (There's also something like the additional task of composing subtasks.) If you start to see low but nonzero subtask success rates, this can also be a leading indicator of overall composed task success rates, naively multiplying success rates at subtasks for an estimate of the composed success rate.
At AISI we did something like this for RepliBench (https://arxiv.org/abs/2504.18565), crudely. The 'science of evals' team at AISI has also done some investigation into that type of approach in general. I don't know where they're at on it - could put you in touch if you like.
I really love this article! Things like this are *super* important to remember, especially now when big labs are intentionally nerfing AI biomed capabilities (one of the few positive applications of AI that society all agrees upon) for "safety".
Thanks so so much Kenneth! I'm curious - which biomed capabilities do you mean? Claude downgraded me to Sonnet 4.6 almost 100% of the time when doing research for this series, it seems like any curiousity about bioweapons and virology is seen as suspect by the current model. Is this what you meant?
Yes this is one example! But the more pernicious one is that they literally can untrain the model to forget how to do bio. An example approach is here: https://alignment.anthropic.com/2025/selective-gradient-masking/
Nice post! Any thoughts about the claims from this paper that challenges the importance of tacit knowledge in biological weapons development? https://www.rand.org/pubs/perspectives/PEA3853-1.html
Thank for flagging Ze Shen! The paper’s main evidence is a man who created a *chemical* weapon (much much easier than a bioweapon) and his “efforts to document the steps” of construction of viral pathogens which is definitely not a way to measure tacit knowledge. The study would have had to verify in the real world whether these documentations were actually increasing success to assess tacit knowledge similar to how ActiveSite did. :)
Can absolutely confirm on the pipetting thing. It's a bit like driving. When you can drive it seems very natural. Very few people are able to drive confidently and accurately and consistently after their first leason/hour of experience.
Thanks Jacob! Good to know! I've not pipetted myself but now I really wish I had minored in biology - it's fascinating.
One of my more traumatic memories of undergrad was my dissertation supervisor getting unnecessarily angry (to a degree I could easily have reported him for) very, very quickly because I wasn't able to pipette like him within the first few tries. (FWIW my proprioception and hand-eye coordination and hand steadiness are all decent; I've picked up stuff from spilikins to target shooting to card flourishes reasonably well.) Like driving, I think people sometimes forget how hard it once was for them to do.
So it's not all good 😉
o.o That's rough - the skill feels almost random! From the interviews, it seemed that some people were very good and others weren't, and it was sometimes years of experience, but sometimes not!
The irony that Claude wrote most of the examples of 'tacit' knowledge :P (of course being able to name a challenge in abstract is far from being able to solve it).
This is a welcome and well written article (my jibe notwithstanding). The importance of tacit and procedural knowledge - both in pushing the frontier of the known/doable and in proliferating access to that - is routinely underappreciated by AI doomers and optimists alike. But the in principle (and plausibly soon - months to years) access and accumulation of tacit knowledge into AI is important to recognise, as you acknowledged towards the end.
Thanks Oliver - I did use Claude, though not for the category of examples e.g. the choice to focus on pipetting. It did help me get more info about why pippetting is hard - which I then checked with the bio folks. :)
One thing I'm super curious about is whether there's a way to track how much tacit knowledge is left to figure out? It seems hard to measure, but not impossible.
My sarcasm intended in comradely fashion - I thought the pipetting challenges maybe had AI voice and something like AI vibe (no body: no pipetting experience!). I trust you to have done sufficient diligence for readers to be able to take these as reasonable high-level descriptions and gestures at tacit knowledge bottlenecks.
I'm also curious about that! Some indirect approaches might look at degree (and depth) of automation, maybe something like '(automated) capital intensity', use of AI by lab practitioners, 'productivity' (perhaps as measured by other outputs than crude GDP). Even more indirect, but perhaps telling, might be indicators based on the distribution of years of experience of lab practitioners, and maybe overall number of lab employees in various capacities (also likely tracking number of new labs over time).
More direct... you could do something like surveys (on whom? likely very noisy and with some systematic bias, but could reveal trends). Another approach to estimation might be really fine-grained ecologically valid breakdown of subtasks, though of course this would miss the possibility for automation to 'route around' particular 'hard steps' right now (i.e. playing to comparative advantages of AI might unlock, or simply make prudent, alternative task trajectories that no human lab would pursue).
Do you have thoughts on this question? What about the more general case besides wetlab work?
Thanks, Oliver, these are great points. Things we've been investigating, at least in self-driving labs, is the fine-grain breakdown of subtasks. This seems to be what any physical automation has to focus on. They're automating very narrow workflows with a lot of effort e.g. custom-built robotics. Maybe this will end up like how automated vehicles slowly got to be more automated and have more abilities. They started out with just lane contro to then doing specific routes. I think the hardest thing here is that biology is weird and wacky. Every organism itself seems very unpredictable, which makes guessing or forecasting the transition hard. :)
Heh, yep, that makes a lot of sense for self driving. If you bracket out the 'routing around' (or more generally, changing priority distribution over multi-step trajectory paths) problem, this kind of breakdown is a great way to go. (There's also something like the additional task of composing subtasks.) If you start to see low but nonzero subtask success rates, this can also be a leading indicator of overall composed task success rates, naively multiplying success rates at subtasks for an estimate of the composed success rate.
At AISI we did something like this for RepliBench (https://arxiv.org/abs/2504.18565), crudely. The 'science of evals' team at AISI has also done some investigation into that type of approach in general. I don't know where they're at on it - could put you in touch if you like.
Whoa, this is fascinating. I did not realize AISI had a science of evals team! I'll PM you!
Really resonates. AI can give you the steps, but not the instinct that comes from actually doing the work.
Thank you! I really appreciate it
Interesting!
I had some similar thoughts (albeit much less detailed) in comments to this 2023 article: https://forum.effectivealtruism.org/posts/ZuzK2s4JsJcexBJxy/will-releasing-the-weights-of-large-language-models-grant?commentId=wm7JrifbiDXDBWdgf
Fascinating discussion - I did not realize chroline gas was so easy to make!