Category: Blog Posts

  • Dwarf Fortress, Emacs, & AI: The allure of generative complexity

    There is a shared soul shard between Dwarf Fortress, Emacs, and AI that lured me to them and has kept me engaged for over a decade. For a long time, I struggled to articulate the connection, managing only to describe Dwarf Fortress as the Emacs of games. But this analogy, while compelling, doesn’t fully capture the deeper resonance these systems share. They are not merely complicated; they are complex—tools for creativity that reward immersion and exploration.

    Zunzar Machi at Torna – Wikipedia

    Complicated, Complex, Dev.

    To understand the allure, let’s revise the distinction between complicated and complex. Complicated systems, say a spinning-disk microscope, consist of interlocking parts (each with internal complications) that interact in predictable ways. They require technical expertise to master, but their behavior remains largely deterministic and I tire of them soon.

    Complex systems, see Cynefin framework, exhibit emergent behavior. Their value/fun lies in the generative possibilities they unlock rather than the sum of their parts.

    Dwarf Fortress, Emacs, and AI live on the froth of this complexity. None of these systems exist as ends in themselves. You don’t play Dwarf Fortress to achieve a high score (there isn’t one, you eventually lose). You don’t use Emacs simply to edit text, and you don’t build AI to arrange perceptrons in aesthetically pleasing patterns. These are platforms, altars for creation. Dev environments.

    In Emergence We Trust

    Like language with the rules of poetry, these environments are generative places enabling exploration of emergent spaces. Emergence, which manifests both in the software but also in you. There is always a point where you find yourself thinking, I didn’t expect I could do that. In Dwarf Fortress first you fight against tantrum spirals and then through mastery, against FPS death. Similarly, Emacs enables workflows that evolve over time, as users build custom functions and plugins to fit their unique needs. In AI, emergence arrives rather late but it’s there. Putting together datasets, training them, optimizing, starting over, are complicated but not complex per se. The complexity (and emergence) is in the capabilities of the trained network. Things infinitely tedious or difficult are a few matrix multiplications away.

    This desire for emergence is spelunking. It rewards curiosity and experimentation but demands patience and resilience. Mastery begins with small victories: making beer in Dwarf Fortress, accessing help in Emacs, or implementing a 3-layer neural network. Each success expands your imagination. The desire to do more, to push the boundaries of what’s possible, becomes an endless rabbit hole—one that is as exhilarating as it is daunting.

    Complexity as a Gateway to Creativity

    The high complexity of these systems—their vast degrees of freedom—opens the door to infinite creativity. This very openness, however, can be intimidating. Confronted with the sprawling interface of Emacs, the arcane scripts of Dwarf Fortress, or the mathematical abstractions of AI, it’s tempting to retreat to the familiar. Yet this initial opacity is precisely what makes these systems so rewarding. Engaging with something that might blow up in your face—whether it’s drunk cats, a lisp error, or an exploding gradient—forces you to give up.

    But just then you have an idea, what you tried this…

    Awaken, H. ludens.

    Fediverse Reactions
  • Domain Ontologies: Indispensable for Knowledge Graph Construction

    AI slop is all around and increasingly extraction of useful information will face difficulties as we start to feed more noise into the already noisy world of knowledge. We are in an era of unprecedented data abundance, yet this deluge of information often lacks the structure necessary to derive meaningful insights. Knowledge graphs (KGs), with their ability to represent entities and their relationships as interconnected nodes and edges, have emerged as a powerful tool for managing and leveraging complex data. However, the efficacy of a KG is critically dependent on the underlying structure provided by domain ontologies. These ontologies, which are formal, machine-readable conceptualizations of a specific field of knowledge, are not merely useful, but essential for the creation of robust and insightful KGs. Let’s explore the role that domain ontologies play in scaffolding KG construction, drawing on various fields such as AI, healthcare, and cultural heritage, to illuminate their importance.

    Vassily Kandinsky, 1913 – Composition VII (1913)
    According to Kandinsky, this is the most complex piece he ever painted.

    At its core, an ontology is a formal representation of knowledge within a specific domain, providing a structured vocabulary and defining the semantic relationships between concepts. In the context of KGs, ontologies serve as the blueprint that defines the types of nodes (entities) and edges (relationships) that can exist within the graph. Without this foundational structure, a KG would be a mere collection of isolated data points with limited utility. The ontology ensures that the KG’s data is not only interconnected but also semantically interoperable. For example, in the biomedical domain, an ontology like the Chemical Entities of Biological Interest (ChEBI) provides a standardized way of representing molecules and their relationships, which is essential for building biomedical KGs. Similarly, in the cultural domain, an ontology provides a controlled vocabulary to define the entities, such as artworks, artists, and historical events, and their relationships, thus creating a consistent representation of cultural heritage information.

    One of the primary reasons domain ontologies are crucial for KGs is their role in ensuring data consistency and interoperability. Ontologies provide unique identifiers and clear definitions for each concept, which helps in aligning data from different sources and avoiding ambiguities. Consider, for example, a healthcare KG that integrates data from various clinical trials, patient records, and research publications. Without a shared ontology, terms like “cancer” or “hypertension” may be interpreted differently across these data sets. The use of ontologies standardizes the representation of these concepts, thus allowing for effective integration and analysis. This not only enhances the accuracy of the KG but also makes the information more accessible and reusable. Furthermore, using ontologies that follow the FAIR (Findable, Accessible, Interoperable, Reusable) principles facilitates data integration, unification, and information sharing, essential for building robust KGs.

    Moreover, ontologies facilitate the application of advanced AI methods to unlock new knowledge. They support both deductive reasoning to infer new knowledge and provide structured background knowledge for machine learning. In the context of drug discovery, for instance, a KG built on a biomedical ontology can help identify potential drug targets by connecting genes, proteins, and diseases through clearly defined relationships. This structured approach to data also enables the development of explainable AI models, which are critical in fields like medicine where the decision-making process must be transparent and interpretable. The ontology-grounded KGs can then be used to generate hypotheses that can be validated through manual review, in vitro experiments, or clinical studies, highlighting the utility of ontologies in translating complex data into actionable knowledge.

    Despite their many advantages, domain ontologies are not without their challenges. One major hurdle is the lack of direct integration between data and ontologies, meaning that most ontologies are abstract knowledge models not designed to contain or integrate data. This necessitates the use of (semi-)automated approaches to integrate data with the ontological knowledge model, which can be complex and resource-intensive. Additionally, the existence of multiple ontologies within a domain can lead to semantic inconsistencies that impede the construction of holistic KGs. Integrating different ontologies with overlapping information may result in semantic irreconcilability, making it difficult to reuse the ontologies for the purpose of KG construction. Careful planning is therefore required when choosing or building an ontology.

    As we move forward, the development of integrated, holistic solutions will be crucial to unlocking the full potential of domain ontologies in KG construction. This means creating methods for integrating multiple ontologies, ensuring data quality and credibility, and focusing on semantic expansion techniques to leverage existing resources. Furthermore, there needs to be a greater emphasis on creating ontologies with the explicit purpose of instantiating them, and storing data directly in graph databases. The integration of expert knowledge into KG learning systems, by using ontological rules, is crucial to ensure that KGs not only capture data, but also the logical patterns, inferences, and analytic approaches of a specific domain.

    Domain ontologies will prove to be the key to building robust and useful KGs. They provide the necessary structure, consistency, and interpretability that enables AI systems to extract valuable insights from complex data. By understanding and addressing the challenges associated with ontology design and implementation, we can harness the power of KGs to solve complex problems across diverse domains, from healthcare and science to culture and beyond. The future of knowledge management lies not just in the accumulation of data but in the development of intelligent, ontologically-grounded systems that can bridge the gap between information and meaningful understanding.

    References

    1. Al-Moslmi, T., El Alaoui, I., Tsokos, C.P., & Janjua, N. (2021). Knowledge graph construction approaches: A survey of recent research works. arXiv preprint. https://arxiv.org/abs/2011.00235
    2. Chandak, P., Huang, K., & Zitnik, M. (2023). PrimeKG: A multimodal knowledge graph for precision medicine. Scientific Data. https://www.nature.com/articles/s41597-023-01960-3
    3. Gilbert, S., & others. (2024). Augmented non-hallucinating large language models using ontologies and knowledge graphs in biomedicine. npj Digital Medicine. https://www.nature.com/articles/s41746-024-01081-0
    4. Guzmán, A.L., et al. (2022). Applications of Ontologies and Knowledge Graphs in Cancer Research: A Systematic Review. Cancers, 14(8), 1906. https://www.mdpi.com/2072-6694/14/8/1906
    5. Hura, A., & Janjua, N. (2024). Constructing domain-specific knowledge graphs from text: A case study on subprime mortgage crisis. Semantic Web Journal. https://www.semantic-web-journal.net/content/constructing-domain-specific-knowledge-graphs-text-case-study-subprime-mortgage-crisis
    6. Kilicoglu, H., et al. (2024). Towards better understanding of biomedical knowledge graphs: A survey. arXiv preprint. https://arxiv.org/abs/2402.06098
    7. Noy, N.F., & McGuinness, D.L. (2001). Ontology Development 101: A Guide to Creating Your First Ontology. Semantic Scholar. https://www.semanticscholar.org/paper/Ontology-Development-101%3A-A-Guide-to-Creating-Your-Noy/c15cf32df98969af5eaf85ae3098df6d2180b637
    8. Taneja, S.B., et al. (2023). NP-KG: A knowledge graph for pharmacokinetic natural product-drug interaction discovery. Journal of Biomedical Informatics. https://www.sciencedirect.com/science/article/pii/S153204642300062X
    9. Zhao, X., & Han, Y. (2023). Architecture of Knowledge Graph Construction. Semantic Scholar. https://www.semanticscholar.org/paper/Architecture-of-Knowledge-Graph-Construction-Zhao-Han/dcd600619962d5c1f1cfa08a85d0be43a626b301
    Fediverse Reactions
  • The Universal Library in the River of Noise


    Few ideas capture the collective human imagination more powerfully than the notion of a “universal library”—a singular repository of all recorded knowledge. From the grandeur of the Library of Alexandria to modern digital initiatives, this concept has persisted as both a philosophical ideal and a practical challenge. Miroslav Kruk’s 1999 paper, “The Internet and the Revival of the Myth of the Universal Library,” revitalizes this conversation by highlighting the historical roots of the universal library myth and cautioning against uncritical technological utopianism. Today, as Wikipedia and Large Language Models (LLMs) like ChatGPT emerge as potential heirs to this legacy, Kruk’s insights—and broader reflections on language, noise, and the very nature of truth—resonate more than ever.


    The myth of the universal library

    Humanity has longed for a comprehensive archive that gathers all available knowledge under one metaphorical roof. The Library of Alexandria, purportedly holding every important work of its era, remains our most enduring symbol of this ambition. Later projects—such as Conrad Gessner’s Bibliotheca Universalis (an early effort to compile all known books) and the Enlightenment’s encyclopedic endeavors—renewed the quest for total knowledge. Francis Bacon famously proposed an exhaustive reorganization of the sciences in his Instauratio Magna, once again reflecting the aspiration to pin down the full breadth of human understanding.

    Kruk’s Historical Lens  

    This aspiration is neither new nor purely technological. Kruk traces the “myth” of the universal library from antiquity through the Renaissance, revealing how each generation has grappled with fundamental dilemmas of scale, completeness, and translation. According to Kruk,

    inclusivity can lead to oceans of meaninglessness

    The library on the “rock of certainty”… or an ccean of doubt?

    Alongside the aspiration toward universality has come an ever-present tension around truth, language, and the fragility of human understanding. Scholars dreamed of building the library on a “rock of certainty,” systematically collecting and classifying knowledge to vanquish doubt itself. Instead, many found themselves mired in “despair” and questioning whether the notion of objective reality was even attainable. As Kruk’s paper points out,

    The aim was to build the library on the rock of certainty: We finished with doubting everything … indeed, the existence of objective reality itself.”

    Libraries used to be zero-sum

    Historically,

    for some libraries to become universal, other libraries have to become ‘less universal.’

    Access to rare books or manuscripts was zero-sum; a collection in one part of the world meant fewer resources or duplicates available elsewhere. Digitization theoretically solves this by duplicating resources infinitely, but questions remain about archiving, licensing, and global inequalities in technological infrastructure.


    Interestingly, Google was founded the same year as Kruk’s 1999 paper was nearing publication. In many ways, Google’s search engine became a “library of the web,” indexing and ranking content to make it discoverable on a scale previously unimaginable. Yet it is also a reminder of how quickly technology can outpace our theoretical frameworks: Perhaps Kruk couldn’t have known about Google without Google. Something something future is already here…

    Wikipedia: an oasis island

    Wikipedia stands as a leading illustration of a “universal library” reimagined for the digital age. Its open, collaborative platform allows virtually anyone to contribute or edit articles. Where ancient and early modern efforts concentrated on physical manuscripts or printed compilations, Wikipedia harnesses collective intelligence in real time. As a result, it is perpetually expanding, updating, and revising its content.

    Yet Kruk’s caution holds: while openness fosters a broad and inclusive knowledge base, it also carries the risk of “oceans of meaninglessness” if editorial controls and quality standards slip. Wikipedia does attempt to mitigate these dangers through guidelines, citation requirements, and editorial consensus. However, systemic biases, gaps in coverage, and editorial conflicts remain persistent challenges—aligning with Kruk’s observation that inclusivity and expertise are sometimes at odds.

    LLMs – AI slops towards the perfect library

    Where Wikipedia aspires to accumulate and organize encyclopedic articles, LLMs like ChatGPT offer a more dynamic, personalized form of “knowledge” generation. These models process massive datasets—including vast portions of the public web—to generate responses that synthesize information from multiple sources in seconds. In a way this almost solves one of the sister aims of the perfect library, perfect language, where the embeddings serve as a stand in for perfect words.

    The perfect language, on the other hand, would mirror reality perfectly. There would be one exact word for an object or phenomenon. No contradictions, redundancy or ambivalence.


    The dream of a perfect language has largely been abandoned. As Umberto Eco suggested, however, the work on artificial intelligence may represent “its revival under a different name.” 

    The very nature of LLMs highlights another of Kruk’s cautions: technological utopianism can obscure real epistemological and ethical concerns. LLMs do not “understand” the facts they present; they infer patterns from text. As a result, they may produce plausible-sounding but factually incorrect or biased information. The quantity-versus-quality dilemma thus persists.

    Noise is good actually?

    Although the internet overflows with false information and uninformed opinions, this noise can be generative—spurring conversation, debate, and the unexpected discovery of new ideas. In effect, we might envision small islands of well-curated information in a sea of noise. Far from dismissing the chaos out of hand, there is merit in seeing how creative breakthroughs can emerge from chaos. Gold of Chemistry from leaden alchemy.

    Concerns persist, existence of misinformation, bias, AI slop invites us to exercise editorial diligence to sift through the noise productively. It also echoes Kruk’s notion of the universal library as something that “by definition, would contain materials blatantly untrue, false or distorted,” thus forcing us to navigate “small islands of meaning surrounded by vast oceans of meaninglessness.”

    Designing better knowledge systems

    Looking forward, the goal is not simply to build bigger data repositories or more sophisticated AI models, but to integrate the best of human expertise, ethical oversight, and continuous quality checks. Possible directions include:

    1. Strengthening Editorial and Algorithmic Oversight:

    • Wikipedia can refine its editorial mechanisms, while AI developers can embed robust validation processes to catch misinformation and bias in LLM outputs.

    2. Contextual Curation:  

    • Knowledge graphs are likely great bridges between curated knowledge and generated text

    3. Collaborative Ecosystems:  

    • Combining human editorial teams with AI-driven tools may offer a synergy that neither purely crowdsourced nor purely algorithmic models can achieve alone. Perhaps this process could be more efficient by adding a knowledge base driven simulation (see last week’s links) of the editors’ intents and purposes.

    A return to the “raw” as opposed to social media cooked version of the internet might be the trick afterall. Armed with new tools we can (and should) create meaning. In the process Leibniz might get his universal digital object identifier after all.

    Compression progress as a fundamental force of knowledge

    Ultimately, Kruk’s reminder that the universal library is a myth—an ideal rather than a finished product—should guide our approach. Its pursuit is not a one-time project with a definitive endpoint; it is an ongoing dialogue across centuries, technologies, and cultures. As we grapple with the informational abundance of the digital era, we can draw on lessons from Alexandria, the Renaissance, and the nascent Internet of the 1990s to inform how we build, critique, and refine today’s knowledge systems.

    Refine so that tomorrow, maybe literally, we can run reclamation projects in the noisy sea.


    Image: Boekhandelaar in het Midden-Oosten (1950 – 2000) by anonymous. Original public domain image from The Rijksmuseum

  • Jan 4, 2025

    Bayesian Thinking Talk (youtube)

    Talk details from Frank Harrell’s blog – includes slides

    This beautiful talk about Bayesian Thinking by Frank Harrell should be essential material for scientists who are trained in frequentist methods. The talk covers the shortcomings of frequentist approaches, but more importantly the paths out of those quagmires are also shown.

    Frank discusses his journey to Bayesian stats in this blog post from 2017 which is also in the next section.

    Bayesian Clinical Trial Design Course is linked at the end of the talk and happens to also have many good resources.

    • There is probably sufficient material here for me to be able to include a dedicated section on Bayes for all my link posts.

    My Journey from Frequentist to Bayesian Statistics

    The most useful takeaway for me from this post is that even experienced statisticians had to steer towards Bayes fighting both agains norms and their education.
    The post has many many good references to whet your appetite if you are Bayes-curious. I particularly liked the following take:

    slightly oversimplified equations to contrast frequentist and Bayesian inference.

    • Frequentist = subjectivity1 + subjectivity2 + objectivity + data + endless arguments about everything
    • Bayesian = subjectivity1 + subjectivity3 + objectivity + data + endless arguments about one thing (the prior)

    where

    subjectivity1 = choice of the data model

    subjectivity2 = sample space and how repetitions of the experiment are envisioned, choice of the stopping rule, 1-tailed vs. 2-tailed tests, multiplicity adjustments, …

    subjectivity3 = prior distribution


    I found Frank’s blog from this page, I went from not knowing who he was to adopting him as a teacher. Maybe you’ll find someone interesting too.

  • The Arrival of Composable Knowledge

    Traversing through human history, even in the last two decades, we see a rapid increase in the accessibility of knowledge. The purpose of language, and of course all communication is to transfer a concept from one system to another. For humans this ability to transfer concepts has been driven by advancements in technology, communication, and social structures and norms.

    This evolution has made knowledge increasingly composable, where individual pieces of information can be combined and recombined to create new understanding and innovation. Ten years ago I would have said being able to read a research paper and having the knowledge to repeat that experiment in my lab was strong evidence of this composability (reproducibility issues not withstanding).

    Now, composability itself is getting an upgrade.

    In the next essay I’ll be exploring the implications of the arrival of composable knowledge. This post is a light stroll to remind ourselves of how we got here.

    Infinite knowledge, finite time, inspired by Hakenes & Irmen, 2005, pdf

    Songs, Stories, and Scrolls

    In ancient times, knowledge was primarily transmitted orally. Stories, traditions, and teachings were passed down through generations by word of mouth. This method, while rich in cultural context, was limited in scope and permanence. The invention of writing systems around 3400 BCE in Mesopotamia marked a significant leap. Written records allowed for the preservation and dissemination of knowledge across time and space, enabling more complex compositions of ideas (Renn, 2018).

    Shelves, Sheaves, and Smart Friends

    The establishment of libraries, such as the Library of Alexandria in the 3rd century BCE, and scholarly communities in ancient Greece and Rome, further advanced the composability of knowledge. These institutions gathered diverse texts and fostered intellectual exchanges, allowing scholars to build upon existing works and integrate multiple sources of information into cohesive theories and philosophies (Elliott & Jacobson, 2002).

    Scribes, Senpai, and Scholarship

    During the Middle Ages, knowledge preservation and composition were largely the domain of monastic scribes who meticulously copied and studied manuscripts. The development of universities in the 12th century, such as those in Bologna and Paris, created centers for higher learning where scholars could debate and synthesize knowledge from various disciplines. This was probably when humans shifted perspective and started to view themselves as apart from nature  (Grumbach & van der Leeuw, 2021).

    Systems, Scripts and the Scientific Method

    The invention of the printing press by Johannes Gutenberg in the 15th century revolutionized knowledge dissemination. Printed books became widely available, drastically reducing the cost and time required to share information. This democratization of knowledge fueled the Renaissance, a period marked by the synthesis of classical and contemporary ideas, and the Enlightenment, which emphasized empirical research and the scientific method as means to build, refine, share knowledge systematically (Ganguly, 2013).

    Silicon, Servers, and Sharing

    The 20th and 21st centuries have seen an exponential increase in the composability of knowledge due to digital technologies. The internet, open access journals, and digital libraries have made vast amounts of information accessible to anyone with an internet connection. Tools like online databases, search engines, and collaborative platforms enable individuals and organizations to gather, analyze, and integrate knowledge from a multitude of sources rapidly and efficiently. There have even been studies which allow, weirdly, future knowledge prediction (Liu et al., 2019).

    Conclusion

    From oral traditions to digital repositories, the composability of knowledge has continually evolved, breaking down barriers to information and enabling more sophisticated and collaborative forms of understanding. Today, the ease with which we can access, combine, and build upon knowledge drives innovation and fosters a more informed and connected global society.

  • Dancing on the Shoulders of Giants

    In Newton’s era it was rare to say things like “if I have seen further, it is by standing on the shoulders of giants” and actually mean it. Now it’s trivial. With education, training, and experience, professionals always stand “on shoulders of giants” (OSOG). Experts readily solve complex problems but the truly difficult ones aren’t solved through training. Instead, a combination of muddling through and the dancer style of curiosity is deployed, more on this later. We have industries like semiconductors, solar, and gene sequencing with such high learning rates that the whole field seems to ascend OSOG levels daily.

    These fast moving industries follow Wright’s Law. Most industries don’t follow Wright’s law due to friction against discovering and distributing efficiencies. In healthcare regulatory barriers, high upfront research costs, and resistance to change keeps learning rates low. Of course, individuals have expert level proficiencies, many with private hacks to make life easier. Unfortunately, the broader field does not benefit from individual gains and progress is made only when knowledge trickles down to the level of education, training, and regulation.

    This makes me rather unhappy, and I wonder if even highly recalcitrant fields like healthcare could be nudged into the Wright’s law regime.

    No surprise that I view AI being central, but it’s a specific cocktail of intelligence that has my attention. Even before silicon, scaling computation has advanced intelligence. However, we will soon run into limits of scaling compute and the next stage of intelligence will need a mixed (or massed, as proposed by Venkatesh Rao). Expertise + AI Agents + Knowledge Graphs will be the composite material that will enable us not just to see further, but to bus entire domains across what I think of as the Giant’s Causeway of Intelligence.

    Lets explore the properties of this composite material a little deeper, starting with expertise and it’s effects.

    An individual’s motivation and drive are touted as being the reason behind high levels of expertise and achievement. At best, motivation is an emergent phenomenon, a layer that people add to understand their own behavior and subjective experience (ref, ref). Meanwhile, curiosity is a fundamental force. Building knowledge networks, compressing them, and then applying them in flexible ways is a core drive. Everyday, all of us (not just the “motivated”) cluster similar concepts under an identity then use that identity in highly composable ways (ref).

    There are a few architectural styles of curiosity that are deployed. ‘Architecture’ is the network structure of concepts and connections uncovered during exploration. STEM fields have a “hunter” style of curiosity, tight clusters and goal directed. While great for answers, the hunter style has difficulty making novel connections. Echoing Feyerabend’s ‘anything goes’ philosophy, novel connections require what is formally termed as high forward flow. An exploration mode where there is significant distance between previous thoughts and new thoughts (ref). Experts don’t make random wild connections when at the edge of their field but control risk by picking between options likely to succeed, what has been termed as ‘muddling through’.

    Stepping back, if you consider that even experts are muddling at the edges then the only difference between low and high expertise is their knowledge network. The book Accelerated Expertise, summarized here, explores methods of rapidly extracting and transmitting expertise in the context of the US military. Through the process of Cognitive Task Analysis expertise can be extracted and used in simulations to induce the same knowledge networks in the minds of trainees. From this exercise we can take away that expertise can be accelerated by giving people with base training access to new networks of knowledge.

    Another way to build a great knowledge network is through process repetition, you know… experience. These experience/learning curves predict success in industries that follow Wright’s Law. Wright’s Law is the observation that every time output doubles the cost of production falls by a certain percentage. This rate of cost reduction is termed as the learning rate. As a reference point, solar energy drops in price by 20% every time the installed solar capacity doubles. While most industries benefit from things like economies of scale they can’t compete with these steady efficiency gains. Wright’s Law isn’t flipped on through some single lever but emerges through the culture right from the factory floor all the way up to strategy.

    There are shared cultural phenomena that underlie the experience curve effect:

    • Labor efficiency – where workers are more confident, learn shortcuts and design improvements.
    • Methods improvement, specialization, and standardization – through repeated use the tools and protocols of work improve.
    • Technology-driven learning – better ways of accessing information and automated production increases rates of production.
    • Better use of equipment – machinery is used as full capacity as experience grows
    • Network effects – a shared culture of work allows people to work across companies with little training
    • Shared experience effects – two or more products following a similar design philosophy means little retraining is needed.

    Each of these is essentially a creation, compression, and application of knowledge networks. In fields like healthcare efficiency gain is difficult because skill and knowledge diffusion is slow.

    Maybe, there could be an app for that…

    Knowledge graphs (KGs) are databases but instead of a table they create a network graph, capturing relationships between entities where both the entities and the relationship have metadata. Much like the mental knowledge networks built during curious exploration, knowledge graphs don’t just capture information like Keanu → Matrix but more like Keanu -star of→ Matrix. And all three, Keanu, star of, and Matrix have associated properties. In a way KGs are crystalized expertise and have congruent advantages. They don’t hallucinate and are easy to audit, fix, and update. Data in KGs can be linked to real world evidence enabling them to serve as a source of truth and even causality, a critical feature for medicine (ref).

    Medicine pulls from a wide array of domains to manage diseases. It’s impossible for all the information to be present in one mind, but knowledge graphs can visualise relationships across domains and help uncover novel solutions. Recently projects like PrimeKG have combined several knowledge graphs to integrate multimodal clinical knowledge. KGs have already shown great promise in fields like drug discovery and leading hospitals, like Mayo Clinic, think that they are the path to the future. The one drawback is poor interactivity.

    LLMs meanwhile are easy to interact with and have wonderful expressivity. Due to their generative structure LLMs have zero explainability and completely lack credibility. LLMs are a powerful, their shortcomings make them risky in applications like disease diagnosis. The right research paper and textbooks trump generativity. Further, the way that AI is built today can’t fix these problems. Methods like fine-tuning and retraining exists, but they require massive compute which is difficult to access and quality isn’t guaranteed. The current ways of building AI, throwing in mountains of data into hot cauldrons of compute and stirred with the network of choice (mandatory xkcd), ignores the very accessible stores of expertise like KGs.

    LLMs (and really LxMs) are the perfect complement to KGs. LLM can access and operate KGs in agentic ways making understanding network relationships easy through natural language. As a major benefit, retrieving an accurate answer from KGs is 50x cheaper than generating one. KGs make AI explainable “by structuring information, extracting features and relations, and performing reasoning” (ref). With easy update and audit abilities KGs can easily disseminate know-how. When combined with a formal process like expertise extraction, KGs could serve as a powerful store of knowledge for institutions and even whole domains. We will no longer have to wait a generation to apply breakthroughs.

    Experts+LxMs+KGs are the composite material to accelerate innovation and lower costs of building the next generation of intelligence. We have seen how experts are always trying to have a more complete knowledge network with high compression and flexibility allowing better composability. The combination of knowledge graphs and LLMs provide the medium to stimulate dancer like exploration of options. This framework will allow high-proficiency but not-yet-experts to cross the barrier of experience with ease. Instead of climbing up a giant, one simply walks The Giant’s Causeway. Using a combination of modern tools and updated practices for expertise extraction we can accelerate proficiency even in domains which are resistant to Wright’s Law unlocking rapid progress.

    ****

    Appendix

    Diving a little deeper into my area of expertise, healthcare, a few ways where agents and KGs can help:

    ApplicationRole of IntelligenceOutcomes
    Efficiency in Data ManagementKGs organize data in a way that reflects how entities are interconnected, which can significantly enhance data accessibility and usabilityfaster and more accurate diagnoses, streamlined patient care processes, and more personalized treatment plans
    Predictive AnalyticsAI can analyze vast amounts of healthcare data to predict disease outbreaks, patient admissions, and other important metricsallows healthcare facilities to optimize their resource allocation and reduce wastage, potentially lowering the cost per unit of care provided
    Automation of Routine TasksAI agents can automate administrative tasks such as scheduling, billing, and compliance tracking using institution specific KGs.With widespread use, the cumulative cost savings could be in a similar range as Wright’s law
    Improvement in Treatment ProtocolsRefine treatment protocols using the knowledge graph of patient cases.More effective treatments being identified faster, reducing the cost and duration of care.
    Scalability of Telehealth ServicesAgentic platforms rooted in strong Knowledge Graphs can handle more patients simultaneously, offering services like initial consultations, follow-up appointments, and routine check-ups with minimal human intervention.Drive down costs of service delivery at high patient volumes
    Enhanced Research and DevelopmentAlready in play, AI and KGs accelerate medical research by better utilizing existing data for new insights.Decreases time and cost of developing new treatments
    Customized Patient CareAI can analyze multimodal KGs of individual patients integrating history, tests, and symptoms for highly customized care plans.When aggregated across the patient population healthcare systems can benefit from economies of scale and new efficiencies
  • Reimagining AI in Healthcare: Beyond Basic RAG with FHIR, Knowledge Graphs, and AI Agents

    Introduction

    While exploring the application of AI agents in healthcare we see that standard Retrieval-Augmented Generation (RAG) and fine-tuning methods often fall short in the interconnected realms of healthcare and research. These traditional methods struggle to leverage the structured knowledge available, such as knowledge graphs. Data approaches like Fast Healthcare Interoperability Resources (FHIR) used alongside advanced knowledge graphs can significantly enhance AI agents, providing more effective and context-aware solutions.

    The Shortcomings of Standard RAG in Healthcare

    Traditional RAG models, designed to pull information from external databases or texts, often disappoint in healthcare—a domain marked by complex, interlinked data. These models typically fail to utilize the nuanced relationships and detailed data essential for accurate medical insights​ (GitHub)​​ (ar5iv)​.

    Leveraging FHIR and Knowledge Graphs

    FHIR offers a robust framework for electronic health records (EHR), enhancing data accessibility and interoperability. Integrated with knowledge graphs, FHIR transforms healthcare data into a format ideal for AI applications, enriching the AI’s ability to predict complex medical conditions through a dynamic use of real-time and historical data​ (ar5iv)​​ (Mayo Clinic Platform)​.

    Enhancing AI with Advanced RAG Techniques

    Advanced RAG techniques utilize detailed knowledge graphs covering diseases, treatments, and patient histories. These graphs underpin AI models, enabling more accurate and relevant information retrieval and generation. This capability allows healthcare providers to offer personalized care based on a comprehensive understanding of patient health​ (Ethical AI Authority)​​ (Microsoft Cloud)​.

    Implementing AI Agents in Healthcare

    AI agents enhanced with RAG and knowledge graphs can revolutionize diagnosis accuracy, patient outcome predictions, and treatment optimizations. These agents offer actionable insights derived from a deep understanding of individual and aggregated medical data​ (SpringerOpen)​.

    A Novel Approach: RAG + FHIR Knowledge Graphs

    Integrating RAG with FHIR-knowledge graphs to significantly enhance AI capabilities in healthcare. This method maps FHIR resources to a knowledge graph, augmenting the RAG model’s access to structured medical data, thus enriching AI responses with verified medical knowledge and patient-specific information. View the complete notebook in my AI Studio.

    Challenges and Future Directions

    While promising, integrating FHIR, knowledge graphs, and advanced RAG with AI agents in healthcare faces challenges such as data privacy, computational demands, and knowledge graph maintenance. These issues must be addressed to ensure ethical implementation and stakeholder consideration​ (MDPI)​.

    Conclusion

    Integrating FHIR, knowledge graphs, and advanced RAG techniques into AI agents represents a significant advancement in healthcare AI applications. These technologies enable a precision and understanding previously unattainable, promising to dramatically improve care delivery and management as they evolve.

    If you’re in the field or exploring applying AI, do get in touch!

  • On Being Good vs. Knowing Good: Perspectives on AI

    In this video, Stephen Fry narrates Nick Cave’s letter, which argues that using ChatGPT as a shortcut to creativity is detrimental. Surprisingly, I found myself in agreement. Having been involved in the AI industry for over a decade, I’ve always viewed AI positively. As an entrepreneur who pitches AI to investors and customers, I liken AI to technologies like spreadsheets: they eliminate tedious tasks, but you still need to understand what you’re doing.

    I use various contemporary AI tools daily for tasks ranging from creating ISO template documents to drafting reference letters. However, when I’ve tried using AI as a thinking partner or advisor, it has fallen short, primarily due to its inability to discern or have taste.

    Expertise involves having taste – the ability to distinguish good from bad, one decision from another. We depend on experts and advisors not just for their knowledge, but for their ability to guide us optimally, sometimes even questioning our intended goals. They speak confidently amid uncertainty, drawing on their experience.

    ChatGPT/LLMs and their generative counterparts exhibit confidence but, by design, lack real-world experience.

    My agreement with the video’s sentiment stems not from an inherent issue with computational tools, but from the understanding that taste develops through experience, however imperfect. Just as we learn arithmetic before using calculators, and calculators before spreadsheets, we need to cultivate a new culture around these emerging tools.

    In mission-critical fields like healthcare, balancing exploration and regulation is crucial. Over-regulation can hinder society from benefiting from innovative uses and discoveries, while a lack of regulation places undue risk on vulnerable populations, as seen in historical clinical trials

    I envision a path where doctors integrate AI into their workflows with enthusiasm yet maintain high standards. Unlike drug development and other biotech fields, AI and its software can be corrected relatively easily. Creating a selective environment will drive quality.

    Users who recognize what is good will elevate the collective ability to achieve excellence.

  • Links 20240119

    1. Shorthand to make your handwriting worse than it is:

    https://orthic.shorthand.fun/

    2. This one feels like a direct attack given my recent mini tongue-in-cheek rant:

    https://www.smbc-comics.com/comic/llm-2

    3. First self-amplifying mRNA vaccine was approved in Japan. Apparently a much lower (1/6th) has the same effect as a normal dose. I wonder if the side effects are better:

    https://www.nature.com/articles/s41587-023-02101-2

    4. Norah Jones will have a new album out on March 8. The woman is a machine! With the very active podcast I had thought she might be taking a break, but nope here she is with Visions:

    https://www.norahjones.com/news-1/visions

    5. And here is me usurping your free will:

  • There is sadness in the Long Earth

    I’m on the last book of the Long Earth Series, The Long Cosmos and reading it makes me sad and wistful.

    Perhaps it’s knowing that Terry Pratchett didn’t live to see it published, perhaps it’s the little hints and hooks of Discworld.

    Whatever it is, there are parts of the book where I laugh out loud, like when Sancho calls himself “The Librarian” and then I’m plunged into soft sadness.

    I’m about half way through the book, I’m sure it will end well.

    GNU Terry Pratchett