Tacit Knowledge: The Missing Factor in AI Bio…

Apr 23

Lab skills come from hands-on mentorship, not from reading the entire internet.

19 Comments

I really love this article! Things like this are *super* important to remember, especially now when big labs are intentionally nerfing AI biomed capabilities (one of the few positive applications of AI that society all agrees upon) for "safety".

Reply (1)

Abi Olvera

Apr 24

Thanks so so much Kenneth! I'm curious - which biomed capabilities do you mean? Claude downgraded me to Sonnet 4.6 almost 100% of the time when doing research for this series, it seems like any curiousity about bioweapons and virology is seen as suspect by the current model. Is this what you meant?

Reply (1)

Kenneth Ge

Apr 24Edited

Yes this is one example! But the more pernicious one is that they literally can untrain the model to forget how to do bio. An example approach is here: https://alignment.anthropic.com/2025/selective-gradient-masking/

Ze Shen

Apr 24

Nice post! Any thoughts about the claims from this paper that challenges the importance of tacit knowledge in biological weapons development? https://www.rand.org/pubs/perspectives/PEA3853-1.html

Reply (1)

Abi Olvera

Apr 25

Thank for flagging Ze Shen! The paper’s main evidence is a man who created a *chemical* weapon (much much easier than a bioweapon) and his “efforts to document the steps” of construction of viral pathogens which is definitely not a way to measure tacit knowledge. The study would have had to verify in the real world whether these documentations were actually increasing success to assess tacit knowledge similar to how ActiveSite did. :)

Jacob

Apr 27

Can absolutely confirm on the pipetting thing. It's a bit like driving. When you can drive it seems very natural. Very few people are able to drive confidently and accurately and consistently after their first leason/hour of experience.

Reply (1)

Abi Olvera

Apr 27

Thanks Jacob! Good to know! I've not pipetted myself but now I really wish I had minored in biology - it's fascinating.

Reply (1)

Jacob

Apr 27

One of my more traumatic memories of undergrad was my dissertation supervisor getting unnecessarily angry (to a degree I could easily have reported him for) very, very quickly because I wasn't able to pipette like him within the first few tries. (FWIW my proprioception and hand-eye coordination and hand steadiness are all decent; I've picked up stuff from spilikins to target shooting to card flourishes reasonably well.) Like driving, I think people sometimes forget how hard it once was for them to do.

So it's not all good 😉

Reply (1)

Abi Olvera

Apr 27

o.o That's rough - the skill feels almost random! From the interviews, it seemed that some people were very good and others weren't, and it was sometimes years of experience, but sometimes not!

Oliver Sourbut

The irony that Claude wrote most of the examples of 'tacit' knowledge :P (of course being able to name a challenge in abstract is far from being able to solve it).

This is a welcome and well written article (my jibe notwithstanding). The importance of tacit and procedural knowledge - both in pushing the frontier of the known/doable and in proliferating access to that - is routinely underappreciated by AI doomers and optimists alike. But the in principle (and plausibly soon - months to years) access and accumulation of tacit knowledge into AI is important to recognise, as you acknowledged towards the end.

Reply (1)

Abi Olvera

Thanks Oliver - I did use Claude, though not for the category of examples e.g. the choice to focus on pipetting. It did help me get more info about why pippetting is hard - which I then checked with the bio folks. :)

One thing I'm super curious about is whether there's a way to track how much tacit knowledge is left to figure out? It seems hard to measure, but not impossible.

Reply (1)

Oliver Sourbut

My sarcasm intended in comradely fashion - I thought the pipetting challenges maybe had AI voice and something like AI vibe (no body: no pipetting experience!). I trust you to have done sufficient diligence for readers to be able to take these as reasonable high-level descriptions and gestures at tacit knowledge bottlenecks.

I'm also curious about that! Some indirect approaches might look at degree (and depth) of automation, maybe something like '(automated) capital intensity', use of AI by lab practitioners, 'productivity' (perhaps as measured by other outputs than crude GDP). Even more indirect, but perhaps telling, might be indicators based on the distribution of years of experience of lab practitioners, and maybe overall number of lab employees in various capacities (also likely tracking number of new labs over time).

More direct... you could do something like surveys (on whom? likely very noisy and with some systematic bias, but could reveal trends). Another approach to estimation might be really fine-grained ecologically valid breakdown of subtasks, though of course this would miss the possibility for automation to 'route around' particular 'hard steps' right now (i.e. playing to comparative advantages of AI might unlock, or simply make prudent, alternative task trajectories that no human lab would pursue).

Do you have thoughts on this question? What about the more general case besides wetlab work?

Reply (1)

Abi Olvera

Thanks, Oliver, these are great points. Things we've been investigating, at least in self-driving labs, is the fine-grain breakdown of subtasks. This seems to be what any physical automation has to focus on. They're automating very narrow workflows with a lot of effort e.g. custom-built robotics. Maybe this will end up like how automated vehicles slowly got to be more automated and have more abilities. They started out with just lane contro to then doing specific routes. I think the hardest thing here is that biology is weird and wacky. Every organism itself seems very unpredictable, which makes guessing or forecasting the transition hard. :)

Reply (1)

Oliver Sourbut

Heh, yep, that makes a lot of sense for self driving. If you bracket out the 'routing around' (or more generally, changing priority distribution over multi-step trajectory paths) problem, this kind of breakdown is a great way to go. (There's also something like the additional task of composing subtasks.) If you start to see low but nonzero subtask success rates, this can also be a leading indicator of overall composed task success rates, naively multiplying success rates at subtasks for an estimate of the composed success rate.

At AISI we did something like this for RepliBench (https://arxiv.org/abs/2504.18565), crudely. The 'science of evals' team at AISI has also done some investigation into that type of approach in general. I don't know where they're at on it - could put you in touch if you like.

Reply (1)