Jan. 5, 2025

Improving Research Through Safer Learning from Data

Another one of Frank Harrell‘s posts. Given my day job, and the R&D background this one is quite close to home. As a team leader on the industry side one hopes to build a culture with the team that aligns scientific rigor with company goals. Any method, statistical or cultural (in this case both), that solves for this tension will get you the most bang for your buck.

Building a startup, doing research, or even just launching a moonshot project is dependent on emergent actions and decisions. Behind this madness, there is a kind of Science of “Muddling Through” and Bayesian methods are perhaps best equipped:

make all the assumptions you want, but allow for departures from those assumptions. If the model contains a parameter for everything we know we don’t know (e.g., a parameter for the ratio of variances in a two-sample t-test), the resulting posterior distribution for the parameter of interest will be flatter, credible intervals wider, and confidence intervals wider. This makes them more likely to lead to the correct interpretation, and makes the result more likely to be reproducible.

In an environment of limited resources (time, money, investor patience), being able to quantify your data and obtaining bankable evidence is critical.

only the Bayesian approach allows insertion of skepticism at precisely the right point in the logic flow, one can think of a full Bayesian solution (prior + model) as a way to “get the model right”, taking the design and context into account, to obtain reliable scientific evidence.

The post provides a much more in-depth view, including an 8-fold path to enhancing the scientific process.


Why I walk

Chris Arnade gives us a view into why he goes on insanely long (both distance and time) trips by foot and what he discovers there. I found this passage hilarious, in that money seems to converge on a version of living that is identical with different paint while the rich likely search for that unique way of living.

Every large global city has a few upscale neighborhoods that are effectively all the same. It is where the very rich have their apartments, the five and four star hotels are, the most famous museums, and a shopping district with the same stores you would find in Manhattan’s Upper East Side, or London’s Mayfair.

The only difference is the branding. So you get the Upper East Side with Turkish affectations, or a Peruvian themed Mayfair. The residents of these neighborhoods are also pretty comfortable in any global city. As long as it is the right neighborhood.

Having traveled a fair bit, the designation of tourist brings with it multiple horrors. Chris uses walking as a mini residing and side-steps much of it. :

Walking also changes how the city sees you, and consequently, how you see the city. As a pedestrian you are fully immersed in what is around you, literally one of the crowd. It allows for an anonymity, that if used right, breaks down barriers and expectations. It forces you to deal with and interact with things and people as a resident does. You’re another person going about your day, rather than a tourist looking to buy or be sold whatever stuff and image a place wants to sell you.

This particular experience reminded of the book Shantaram, a book about a foreigner being adsorbed to and absorbed by the local.


AI chatbots fail to diagnose patients by talking with them

While interesting the research uses LLMs to play both patient and doctor. I like that there is research happening in this area and at least we are moving in the right direction with the testing of AI i.e. not just structured tests. I would perhaps not judge the “failure” too harshly as the source of the data as also an LLM and would suffer from deficiencies which an actual patient suffering from symptoms would not.

This paper introduces the Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD) approach for evaluating clinical LLMs. Unlike traditional methods that rely on structured medical examinations, CRAFT-MD focuses on natural dialogues, using simulated artificial intelligence agents to interact with LLMs in a controlled environment.

While the paper has a negative result we do come away with a good set of recommendations for future evaluations.

Link to paper, research by Shreya Johori et.al. from the lab of Pranav Rajpurkar


Fediverse reactions

Comments

Leave a comment