Author: Aneesh Sathe

Jan 17, 2025

Table Turpentine

Came across the gt package for better tables in R. Might sound silly but it’s one of those cheap turpentine things.

Who made the train?

via ChiPa

Yunus Emre-The Watermill

via harvard blogs

Translation

Why do you groan, O Watermill; For I’ve troubles, I groan
I fell in love with the Lord; For It do I groan
They found me on a mountain; My arms and wings they plucked
Saw me fit for a watermill; For I’ve troubles, I groan
From the mountain they cut my wood; My disparate order they ruined
But an unwearied poet I am; For I’ve troubles, I groan
I am The Troubled Watermill; My water flows, roaring and rumbling
Thus has God commanded; For I’ve troubles, I groan
I am but a mountain’s tree; Neither am I bitter, nor sweet
I am but a pleader to the Lord; For I’ve troubles, I groan
Yunus, whoever comes here will find no joy, will not reach his desire
Nobody stays in this fleeting abode; For I’ve troubles, I groan

January 17, 2025
Jan 16, 2025
Hardbreak: Hardware Hacking Wiki

This was a cool little find. I’ve always played with software in one form or another, but besides building PCs actual hardware hacking felt out of reach. Maybe I can start with some simple things like radio hacking.

The beauty of understanding

My love for science seems to always involve some sort of rube goldberg machine: you set things up just so and discoveries magically flow out. Sure, designing pretty experiments is difficult and there is a lot of literal and metaphorical heartbreak along the way but to finally discover the way is all frisson.

Bridget Ritz and Brandon Vaidyanathan conducted a study about that feeling. From the study website:

From the easier to digest Aeon article by the same authors, How the search for beauty drives scientific-enquiry:

At the deepest level, what motivates scientists to pursue and persist in their work is the aesthetic experience of understanding itself. Centring the beauty of understanding presents an image of science more recognisable to scientists themselves and with greater appeal for future scientists.

Do you hate it when scientists unbraid a moonbeam? Well, there’s three types of happiness scientists feel, apparently:
1. Sensory beauty – what is visually or aurally striking
2. Useful beauty – involves treating aesthetic properties such as simplicity, symmetry, aptness or elegance as heuristics or guides to truth.
3. Beauty of understanding – grasping the hidden order, inner logic or causal mechanisms of natural phenomena.
Perhaps Edward Tufte knew a thing or two when he named his book Beautiful Evidence.
January 16, 2025
Domain Ontologies: Indispensable for Knowledge Graph Construction
AI slop is all around and increasingly extraction of useful information will face difficulties as we start to feed more noise into the already noisy world of knowledge. We are in an era of unprecedented data abundance, yet this deluge of information often lacks the structure necessary to derive meaningful insights. Knowledge graphs (KGs), with their ability to represent entities and their relationships as interconnected nodes and edges, have emerged as a powerful tool for managing and leveraging complex data. However, the efficacy of a KG is critically dependent on the underlying structure provided by domain ontologies. These ontologies, which are formal, machine-readable conceptualizations of a specific field of knowledge, are not merely useful, but essential for the creation of robust and insightful KGs. Let’s explore the role that domain ontologies play in scaffolding KG construction, drawing on various fields such as AI, healthcare, and cultural heritage, to illuminate their importance.

Vassily Kandinsky, 1913 – Composition VII (1913)
According to Kandinsky, this is the most complex piece he ever painted.

At its core, an ontology is a formal representation of knowledge within a specific domain, providing a structured vocabulary and defining the semantic relationships between concepts. In the context of KGs, ontologies serve as the blueprint that defines the types of nodes (entities) and edges (relationships) that can exist within the graph. Without this foundational structure, a KG would be a mere collection of isolated data points with limited utility. The ontology ensures that the KG’s data is not only interconnected but also semantically interoperable. For example, in the biomedical domain, an ontology like the Chemical Entities of Biological Interest (ChEBI) provides a standardized way of representing molecules and their relationships, which is essential for building biomedical KGs. Similarly, in the cultural domain, an ontology provides a controlled vocabulary to define the entities, such as artworks, artists, and historical events, and their relationships, thus creating a consistent representation of cultural heritage information.

One of the primary reasons domain ontologies are crucial for KGs is their role in ensuring data consistency and interoperability. Ontologies provide unique identifiers and clear definitions for each concept, which helps in aligning data from different sources and avoiding ambiguities. Consider, for example, a healthcare KG that integrates data from various clinical trials, patient records, and research publications. Without a shared ontology, terms like “cancer” or “hypertension” may be interpreted differently across these data sets. The use of ontologies standardizes the representation of these concepts, thus allowing for effective integration and analysis. This not only enhances the accuracy of the KG but also makes the information more accessible and reusable. Furthermore, using ontologies that follow the FAIR (Findable, Accessible, Interoperable, Reusable) principles facilitates data integration, unification, and information sharing, essential for building robust KGs.

Moreover, ontologies facilitate the application of advanced AI methods to unlock new knowledge. They support both deductive reasoning to infer new knowledge and provide structured background knowledge for machine learning. In the context of drug discovery, for instance, a KG built on a biomedical ontology can help identify potential drug targets by connecting genes, proteins, and diseases through clearly defined relationships. This structured approach to data also enables the development of explainable AI models, which are critical in fields like medicine where the decision-making process must be transparent and interpretable. The ontology-grounded KGs can then be used to generate hypotheses that can be validated through manual review, in vitro experiments, or clinical studies, highlighting the utility of ontologies in translating complex data into actionable knowledge.

Despite their many advantages, domain ontologies are not without their challenges. One major hurdle is the lack of direct integration between data and ontologies, meaning that most ontologies are abstract knowledge models not designed to contain or integrate data. This necessitates the use of (semi-)automated approaches to integrate data with the ontological knowledge model, which can be complex and resource-intensive. Additionally, the existence of multiple ontologies within a domain can lead to semantic inconsistencies that impede the construction of holistic KGs. Integrating different ontologies with overlapping information may result in semantic irreconcilability, making it difficult to reuse the ontologies for the purpose of KG construction. Careful planning is therefore required when choosing or building an ontology.

As we move forward, the development of integrated, holistic solutions will be crucial to unlocking the full potential of domain ontologies in KG construction. This means creating methods for integrating multiple ontologies, ensuring data quality and credibility, and focusing on semantic expansion techniques to leverage existing resources. Furthermore, there needs to be a greater emphasis on creating ontologies with the explicit purpose of instantiating them, and storing data directly in graph databases. The integration of expert knowledge into KG learning systems, by using ontological rules, is crucial to ensure that KGs not only capture data, but also the logical patterns, inferences, and analytic approaches of a specific domain.

Domain ontologies will prove to be the key to building robust and useful KGs. They provide the necessary structure, consistency, and interpretability that enables AI systems to extract valuable insights from complex data. By understanding and addressing the challenges associated with ontology design and implementation, we can harness the power of KGs to solve complex problems across diverse domains, from healthcare and science to culture and beyond. The future of knowledge management lies not just in the accumulation of data but in the development of intelligent, ontologically-grounded systems that can bridge the gap between information and meaningful understanding.

References
1. Al-Moslmi, T., El Alaoui, I., Tsokos, C.P., & Janjua, N. (2021). Knowledge graph construction approaches: A survey of recent research works. arXiv preprint. https://arxiv.org/abs/2011.00235
2. Chandak, P., Huang, K., & Zitnik, M. (2023). PrimeKG: A multimodal knowledge graph for precision medicine. Scientific Data. https://www.nature.com/articles/s41597-023-01960-3
3. Gilbert, S., & others. (2024). Augmented non-hallucinating large language models using ontologies and knowledge graphs in biomedicine. npj Digital Medicine. https://www.nature.com/articles/s41746-024-01081-0
4. Guzmán, A.L., et al. (2022). Applications of Ontologies and Knowledge Graphs in Cancer Research: A Systematic Review. Cancers, 14(8), 1906. https://www.mdpi.com/2072-6694/14/8/1906
5. Hura, A., & Janjua, N. (2024). Constructing domain-specific knowledge graphs from text: A case study on subprime mortgage crisis. Semantic Web Journal. https://www.semantic-web-journal.net/content/constructing-domain-specific-knowledge-graphs-text-case-study-subprime-mortgage-crisis
6. Kilicoglu, H., et al. (2024). Towards better understanding of biomedical knowledge graphs: A survey. arXiv preprint. https://arxiv.org/abs/2402.06098
7. Noy, N.F., & McGuinness, D.L. (2001). Ontology Development 101: A Guide to Creating Your First Ontology. Semantic Scholar. https://www.semanticscholar.org/paper/Ontology-Development-101%3A-A-Guide-to-Creating-Your-Noy/c15cf32df98969af5eaf85ae3098df6d2180b637
8. Taneja, S.B., et al. (2023). NP-KG: A knowledge graph for pharmacokinetic natural product-drug interaction discovery. Journal of Biomedical Informatics. https://www.sciencedirect.com/science/article/pii/S153204642300062X
9. Zhao, X., & Han, Y. (2023). Architecture of Knowledge Graph Construction. Semantic Scholar. https://www.semanticscholar.org/paper/Architecture-of-Knowledge-Graph-Construction-Zhao-Han/dcd600619962d5c1f1cfa08a85d0be43a626b301
Fediverse Reactions
January 15, 2025
Jan 14, 2025

Photo: by me, Aneesh Sathe, Malaysia, 2011

Font fight!

The fonts have assembled, their ligatures sharp their curves are shiny. They all line up with perfect kerning… or do they? Your eyes and mind are their battlefield.

https://www.codingfont.com/

I got JetBrains mono btw.

Decentralization is just partial centralization

Renée DiResta writes about the social media flux in Noema.

Decentralization places a heavy burden on individual instance administrators, mostly volunteers, who may lack the tools, time or capacity to address complex problems effectively.

Identity verification is another weak point, leading to impersonation risks that centralized platforms typically manage more effectively. Inconsistent security practices between servers can allow malicious actors to exploit weaker links.

While all this is fine, I have a completely different view about the ongoings around social media. I’d rather completely quit something than go through the pain of sorting through and tuning the place just right.

The internet has infinite space. Make your own blog, follow people you like (RSS feeds still work!) and ignore those you don’t. Nostalgic about back when Twitter was good? Well there was a time when the internet was good. It was good because the people with access created little gardens of their own (not just of the digital garden variety, but those too). Psst… it’s still good btw, the social media blinds you.

While I’m a staunch early adopter I’m also an early abandoner. The only thing I’ve been unable to abandon is blogs. I’ve never felt like it’s was better to shut myself inside a walled garden, but I would suffocate if I weren’t able to surf, what a wonderful word that is, the internet.

Obsessing is happiness

My happiest times have been when I was completely consumed by some task or project for days on end. I’ve learned hydropinics and grown an ungodly amount of mint in the Singapore Sun. I’ve made terrible mead, taught myself programming, then machine learning… countersteering a motorcycle? You should watch me lean.

All this to say that happiness is an entirely oblique activity. This was crystallized for me in this post about Betrand Russell’s Conquest of Happiness by the wholly awesome Maria Popova

The 1900s are here

Every 24th frame of Stanley Kubrick’s masterpiece 2001: A Space Odyssey posted once an hour

That cools projects like these exist is a testament to the eternal cool of the Internet. As I write, the bot is on its 1,899th hour. The exciting 1900s are coming up.

If you are new to Bayesian Stats start here

January 14, 2025
Jan 13, 2025 Life, platforms, vectors

Crystal-Bison

I don’t want to give away much of the poem but it captures the nature, ferocity, and purpose of life.

Ghost Exits

Aligning with the idea that blogs will be the last of the good internet there is a broader question about platforms and their methods. Long before meta and X abandoned all pretense the internet was already under attack while we were believed this was fine.

We need more (and better) institutions and fewer platforms, and the latter have flourished at the expense of the former, advancing a specific agenda under an apolitical guise…

We depend on those platforms more because our institutions have weakened. The present arrangement was far from inevitable and need not be permanent.

…[Platforms have an] inherent tendency to extract value from their users and seek growth, while presenting the whole arrangement as a utopia.

Covid 5 years later

Speaking of this being all fine, the WHO had a 4 day conference about COVID. There is more research than people can read but

Despite the flood of insights into the behavior of the virus and how to prevent it from causing harm, many at the meeting worried the world has turned a blind eye to the lessons learned from the pandemic.

One of those black holes in history, we seem to not be able to peer beyond the event horizon.

Virologist Jesse Bloom of the Fred Hutchinson Cancer Center, who is not convinced the pandemic began at the market and has urged colleagues to remain open to the possibility of a lab leak […] “There’s still little actual information about the first human cases,” Bloom says. “There’s just not a lot of knowledge about what was really going on in Wuhan in late 2019.”

Against the backdrop of the world pretending everything is going back to normal, one group, virologists remain under attack.

the world is dropping its guard against novel pathogens. Infectious disease is “not a safe space to really be working in,” she told Science. “Labs have been threatened. People have been threatened. Governments don’t necessarily want to be the ones to say, ‘Hey, we found something new.’”

To my tiny set of newsletter subscribers: HI! 👋

January 13, 2025
The Universal Library in the River of Noise
Few ideas capture the collective human imagination more powerfully than the notion of a “universal library”—a singular repository of all recorded knowledge. From the grandeur of the Library of Alexandria to modern digital initiatives, this concept has persisted as both a philosophical ideal and a practical challenge. Miroslav Kruk’s 1999 paper, “The Internet and the Revival of the Myth of the Universal Library,” revitalizes this conversation by highlighting the historical roots of the universal library myth and cautioning against uncritical technological utopianism. Today, as Wikipedia and Large Language Models (LLMs) like ChatGPT emerge as potential heirs to this legacy, Kruk’s insights—and broader reflections on language, noise, and the very nature of truth—resonate more than ever.

The myth of the universal library

Humanity has longed for a comprehensive archive that gathers all available knowledge under one metaphorical roof. The Library of Alexandria, purportedly holding every important work of its era, remains our most enduring symbol of this ambition. Later projects—such as Conrad Gessner’s Bibliotheca Universalis (an early effort to compile all known books) and the Enlightenment’s encyclopedic endeavors—renewed the quest for total knowledge. Francis Bacon famously proposed an exhaustive reorganization of the sciences in his Instauratio Magna, once again reflecting the aspiration to pin down the full breadth of human understanding.

Kruk’s Historical Lens

This aspiration is neither new nor purely technological. Kruk traces the “myth” of the universal library from antiquity through the Renaissance, revealing how each generation has grappled with fundamental dilemmas of scale, completeness, and translation. According to Kruk,

inclusivity can lead to oceans of meaninglessness

The library on the “rock of certainty”… or an ccean of doubt?

Alongside the aspiration toward universality has come an ever-present tension around truth, language, and the fragility of human understanding. Scholars dreamed of building the library on a “rock of certainty,” systematically collecting and classifying knowledge to vanquish doubt itself. Instead, many found themselves mired in “despair” and questioning whether the notion of objective reality was even attainable. As Kruk’s paper points out,

The aim was to build the library on the rock of certainty: We finished with doubting everything … indeed, the existence of objective reality itself.”

Libraries used to be zero-sum

Historically,

for some libraries to become universal, other libraries have to become ‘less universal.’

Access to rare books or manuscripts was zero-sum; a collection in one part of the world meant fewer resources or duplicates available elsewhere. Digitization theoretically solves this by duplicating resources infinitely, but questions remain about archiving, licensing, and global inequalities in technological infrastructure.

Interestingly, Google was founded the same year as Kruk’s 1999 paper was nearing publication. In many ways, Google’s search engine became a “library of the web,” indexing and ranking content to make it discoverable on a scale previously unimaginable. Yet it is also a reminder of how quickly technology can outpace our theoretical frameworks: Perhaps Kruk couldn’t have known about Google without Google. Something something future is already here…

Wikipedia: an oasis island

Wikipedia stands as a leading illustration of a “universal library” reimagined for the digital age. Its open, collaborative platform allows virtually anyone to contribute or edit articles. Where ancient and early modern efforts concentrated on physical manuscripts or printed compilations, Wikipedia harnesses collective intelligence in real time. As a result, it is perpetually expanding, updating, and revising its content.

Yet Kruk’s caution holds: while openness fosters a broad and inclusive knowledge base, it also carries the risk of “oceans of meaninglessness” if editorial controls and quality standards slip. Wikipedia does attempt to mitigate these dangers through guidelines, citation requirements, and editorial consensus. However, systemic biases, gaps in coverage, and editorial conflicts remain persistent challenges—aligning with Kruk’s observation that inclusivity and expertise are sometimes at odds.

LLMs – AI slops towards the perfect library

Where Wikipedia aspires to accumulate and organize encyclopedic articles, LLMs like ChatGPT offer a more dynamic, personalized form of “knowledge” generation. These models process massive datasets—including vast portions of the public web—to generate responses that synthesize information from multiple sources in seconds. In a way this almost solves one of the sister aims of the perfect library, perfect language, where the embeddings serve as a stand in for perfect words.

The perfect language, on the other hand, would mirror reality perfectly. There would be one exact word for an object or phenomenon. No contradictions, redundancy or ambivalence.

The dream of a perfect language has largely been abandoned. As Umberto Eco suggested, however, the work on artificial intelligence may represent “its revival under a different name.”

The very nature of LLMs highlights another of Kruk’s cautions: technological utopianism can obscure real epistemological and ethical concerns. LLMs do not “understand” the facts they present; they infer patterns from text. As a result, they may produce plausible-sounding but factually incorrect or biased information. The quantity-versus-quality dilemma thus persists.

Noise is good actually?

Although the internet overflows with false information and uninformed opinions, this noise can be generative—spurring conversation, debate, and the unexpected discovery of new ideas. In effect, we might envision small islands of well-curated information in a sea of noise. Far from dismissing the chaos out of hand, there is merit in seeing how creative breakthroughs can emerge from chaos. Gold of Chemistry from leaden alchemy.

Concerns persist, existence of misinformation, bias, AI slop invites us to exercise editorial diligence to sift through the noise productively. It also echoes Kruk’s notion of the universal library as something that “by definition, would contain materials blatantly untrue, false or distorted,” thus forcing us to navigate “small islands of meaning surrounded by vast oceans of meaninglessness.”

Designing better knowledge systems

Looking forward, the goal is not simply to build bigger data repositories or more sophisticated AI models, but to integrate the best of human expertise, ethical oversight, and continuous quality checks. Possible directions include:

1. Strengthening Editorial and Algorithmic Oversight:
- Wikipedia can refine its editorial mechanisms, while AI developers can embed robust validation processes to catch misinformation and bias in LLM outputs.
2. Contextual Curation:
- Knowledge graphs are likely great bridges between curated knowledge and generated text
3. Collaborative Ecosystems:
- Combining human editorial teams with AI-driven tools may offer a synergy that neither purely crowdsourced nor purely algorithmic models can achieve alone. Perhaps this process could be more efficient by adding a knowledge base driven simulation (see last week’s links) of the editors’ intents and purposes.
A return to the “raw” as opposed to social media cooked version of the internet might be the trick afterall. Armed with new tools we can (and should) create meaning. In the process Leibniz might get his universal digital object identifier after all.

Compression progress as a fundamental force of knowledge

Ultimately, Kruk’s reminder that the universal library is a myth—an ideal rather than a finished product—should guide our approach. Its pursuit is not a one-time project with a definitive endpoint; it is an ongoing dialogue across centuries, technologies, and cultures. As we grapple with the informational abundance of the digital era, we can draw on lessons from Alexandria, the Renaissance, and the nascent Internet of the 1990s to inform how we build, critique, and refine today’s knowledge systems.

Refine so that tomorrow, maybe literally, we can run reclamation projects in the noisy sea.

Image: Boekhandelaar in het Midden-Oosten (1950 – 2000) by anonymous. Original public domain image from The Rijksmuseum
January 12, 2025
Jan 11, 2025 – Leading with Kindness
Leading with Kindness

PDF kindly made available by the author, Steve Swensen – via Helen Bevan on Bluesky.

Steve Swensen held leadership positions at Mayo Clinic ensuring not just improving care but also preventing burnout. This paper from May 2024 provides leaders with a framework to help colleagues do better and “Kindness is helping people do better.”

That colleague or work environment that creates stress and anxiety have a very real impact on your health and long-term well-being. An organization can improve team health by creating space for “nurturing human conditions” to emerge:
- Agency is the capacity of individuals or
  teams to act independently.
- Collective effervescence is the sense
  of meaning, community spirit, energy,
  invigoration and harmony people feel
  when they come together in groups with
  a shared purpose.62
- Camaraderie is a multidimensional
  combination of social connectedness,
  teamwork, respect, authenticity,
  appreciation, loyalty and recognition
  of each other’s mattering. It is about
  belonging.
- Positivity is choosing a disposition
  of optimism and positive affect with
  a mindset that sees opportunities for
  learning, abundance and possibility in
  the world.
The paper being primarily a systems paper provides 10 systems to lower stress and increase resilience:

Steve deep dives into each of these. Below are some practices that I have experienced or conditions I’ve strived to create:
- Promoting Agency by asking the team how to improve, prioritizing where to focus and empowering the team to execute on the opportunities.
- Ikigai and lifecrafting: Creating space and giving opportunities for people work on personally meaningful work.
- Having lunch or coffee as a team 🙂
- Pushing the decisions of work-life balance down to the people that actually have to live with the choices. Colleagues are adults and providing them with agency in these matters creates psychological safety.
Five kindness behaviours: Leader behaviours that reduce emotional exhaustion and engender satisfaction.
- Seek to understand
  
  Solicit input from colleagues with humility
- Appreciate
  
  Recognise associates with authentic gratitude
- Mentor
  
  Nurture and support coworker aspirations
- Foster belonging
  
  Welcome everyone with respect and
  acceptance
- Be transparent
  
  Communicate openly for collective decisions
Other references to Steve’s work:

[Video] The Mayo Clinic model of care.

Framework to Reduce Professional Burnout – [PDF] via linkedin.

Image: The Harbinger of Autumn (1922) by Paul Klee.
January 11, 2025
Jan 10, 2025 – AI Agents, Machiavelli’s Study
Agents Are Not Enough

Last year I was heavily experimenting with Knowledge Graphs because it’s been clear that LLMs by themselves fall short because of the lack of knowledge. This paper by Chirag Shah and Ryen White (you can click the heading above) from Dec 2024 expands on those shortcomings by exploring not just knowledge but also value generation, personalization, and trust.

They open the paper by casting a very wide definition of an “agent” everything from thermostats to LLM tools. While this seems facetious at first, their next point is interesting. Agents by definition “remove agency from a user in order to do things on the user’s behalf and save them time and effort.”. I think this is an interesting way to injext an LLM flavored principal agent problem into the Agentic AI conversation.

Their broad suggestion is to expand the ecosystem of agents by including “Sims”. Sims are simulations of the user which address
- privacy and security
- automated interactions and
- representing the interests of the user by holding intimate knowledge about the user
It’s a short easy read, if you have 10 min.

Machiavelli and the Emergence of the Private Study

Infinite knowledge is available through the internet today. It is available trivially and, some, ahem, blogs make a performance of consuming it. Machiavelli used to

put on the garments of court and palace. Fitted out appropriately, I step inside the venerable courts of the ancients, where, solicitously received by them, I nourish myself on that food that alone is mine and for which I was born, where I am unashamed to converse with them and to question them about the motives for their actions, and they, in their humanity, answer me. And for four hours at a time I feel no boredom, I forget all my troubles, I do not dread poverty, and I am not terrified by death. I absorb myself into them completely.

Some folks have a private office, but an office is not a study. A study or, studiolo

in Italian, a precursor to the modern-day study — came to offer readers access to a different kind of chamber, a personal hideaway in which to converse with the dead. Cocooned within four walls, the studiolo was an aperture through which one could cultivate the self. After all, to know the world, one must begin with knowing the self, as ancient philosophy instructs. In order to know the self, one ought to study other selves too, preferably their ideas as recorded in texts. And since interior spaces shape the inward soul, the studiolo became a sanctuary and a microcosm. The study thus mediates the world, the word, and the self.

In the 1500s Michel de Montaigne writes:

We should have wife, children, goods, and above all health, if we can; but we must not bind ourselves to them so strongly that our happiness [tout de heur] depends on them. We must reserve a back room [une arriereboutique] all our own, entirely free, in which to establish our real liberty and our principal retreat and solitude.

A little later, Virginia Woolf points out what seems to be an eternal inequality by struggling to find “a room of one’s own”.

The enclosure of the study, for those of us lucky to have one, offers us a paradoxical sort of freedom. Conceptually, the studiolo is a pharmakon, a cure or poison for the soul. In its highest aspirations, the studiolo, as developed by humanists from Petrarch to Machiavelli to Montaigne, is a sanctuary for self-cultivation. Bookishness was elevated into a saintly virtue

The world today would perhaps be better off if more of us had our own studiolos.

Image source
Fediverse Reactions
January 10, 2025
Jan 9, 2025 – The Age of Fire and Gravel
Today I discovered the Public Domain Image Archive attached to the Public Domain Review which has some great essays and commentary on image collections, like the one below.

Utagawa Hiroshige: Last Great Master of Ukiyo-e

Just before Hiroshige died, possibly of cholera, he wrote the following poem:

I leave my brush in the East
And set forth on my journey.
I shall see the famous places in the Western Land.

This was a fortelling of sorts because Hiroshige

was a hugely influential figure, not only in his homeland but also on Western painting. Towards the end of the nineteenth century, as a part of the trend in “Japonism”, European artists such as Monet, Whistler, and Cézanne, looked to Hiroshige’s work for inspiration, and a certain Vincent van Gogh was known to paint copies of his prints.

Los Angeles Burns

Driven by global climate change, the Santa Ana “Devil” winds were intense this year parching the land.

Part of the reason for spread was the rerouting of funds away from the fire department which reduced the ability to respond appropriately.

Besides the immense personal damage, NASA’s JPL was shut down.

Though not all happy memories, LA was home once. It’s where I discovered my love bio bio and computers, not to mention the tremendous library system and the Tar Pits.

Rethinking Dose-Response Analysis with Bayesian Inference

Analyzing dose-response experiments is tricky and the standard Marquardt-Levenberg algorithm “does not evaluate the certainty (or uncertainty) of the estimates nor does it allow for the statistical comparison of two datasets,”. This can lead to biased conclusions as a lot of subjectivity and wishful thinking has scope to creep in.

A Bayesian Approach?

The authors propose a Bayesian inference methodology that addresses these limitations. This approach can “characterize the noise of dataset while inferring probable values distributions for the efficacy metrics,”.
- It also allows for the statistical comparison of two datasets and can “compute the probability that one value is greater than the other”.
- Critically, it incorporates prior knowledge (and intution) through prior distributions: “The model incorporates the notion of intuition through prior distributions and computes the most probable value distribution for each of the efficacy metrics”.
Beyond Single Point Estimates

This method moves beyond single-point estimates, which can be misleading, and “explicitly quantifies the reliability of the efficacy metrics taking into account the noise over the data”. The goal is to help researchers “analyze and interpret dose–response experiments by including uncertainty in their reasoning and providing them a simple and visual approach to do so”.

So What?

By “considering distributions of probable values instead of single point estimates,” this Bayesian approach provides a more robust interpretation of your data. This is particularly important when dealing with noisy or unresponsive datasets, which can often occur in drug discovery.

Though their demo is offline, the code is available.

This was on repeat today

Image source: Book Cover: Ignatius Donnelly. Ragnarok: The Age of Fire and Gravel. New York, D. Appleton and Company, 1883
January 9, 2025
Jan. 8, 2025: Count your DIGITS! Drunk Bayesian
NVIDIA Project DIGITS

Around 2015 I was putting together funds in academia. Convincing IT, senior professors, and finance that yes, it was worth giving me a LOT of cash to build a workstation with multiple GPUs.

…
“No, it isn’t for gaming.”
…
“Yes, it will change the world.”
…
“No, there are no university rules that hardware bought multiple invoices across multiple departments can’t be used in the same box.”
…
“Yes, I’m aware that all my individual quotes are just below the bureaucracy summoning purchase limits.”
…
“Yes I tried random forest with the other stats and ML methods, this really is better. How do I know? Well…”

Deep Learning was taking off, but in the biotech world it was seen as a regular tech update.

I did get the money and built that loud jet engine sounding monster. Among the first software I installed was DIGITS (freshly archived), this was just as keras had it’s release and all I wanted to do was to build a neural network. That little step changed my life.

Today NVIDIA announced hardware also named DIGITS with the claimed performance (or ever near there) it will be as life changing for the early explorers as the DIGITs software was for me.

The $3000 price tag is probably much better justified than it was to another little piece of hardware from last year. I hope to get my hands on a couple of these not just for LLMs but for my first love, images.

From the press release:

GB10 Superchip Provides a Petaflop of Power-Efficient AI Performance
The GB10 Superchip is a system-on-a-chip (SoC) based on the NVIDIA Grace Blackwell architecture and delivers up to 1 petaflop of AI performance at FP4 precision.
GB10 features an NVIDIA Blackwell GPU with latest-generation CUDA® cores and fifth-generation Tensor Cores, connected via NVLink^®-C2C chip-to-chip interconnect to a high-performance NVIDIA Grace™ CPU, which includes 20 power-efficient cores built with the Arm architecture. MediaTek, a market leader in Arm-based SoC designs, collaborated on the design of GB10, contributing to its best-in-class power efficiency, performance and connectivity.
The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation. In addition, using NVIDIA ConnectX^® networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models.

Technically it’s not a proper petaflop at FP4 precision, but I’m ok with that kind of impropriety.

Drunk Bayesians and the standard errors in their interactors

This video is from 2018 and of the interesting things Gelman touches on is the ability of AI to do model fitting automatically. Gelman argues that an AI designed to perform statistical analysis would also need to deal with the implications of Cantor’s diagonal.

Essentially, new models need to be built when new data don’t fit the old model. You go down the diagonal of more data vs increasing model complexity.

This means that an AI cannot have a full model ahead of time, and it must have different modules and an executive function and it must make mistakes. He suggests that AI needs to be more like a human, with analytical and visual modules, and an executive function, rather than a single, monolithic program

Perhaps we aren’t quite there yet but the emerging agentic methods are looking promising in light of his thoughts.

Some cool quotes below that I hope encourage you to watch this longish but lively window into one of the mind of one of the best known Bayesian statisticians.

-> I think that there’s something about statistics in the philosophy of statistics which is inherently untidy

-> …in Bayes you do inference based on the model and that’s codified and logical but then there’s this other thing we do which is we say our inferences don’t make sense we reject our model and we need to change it

-> Our statistical models have to be connected to our understanding of the world

On the reproducibility crisis in science in three parts:

Studies not replicating: This is the most obvious part of the crisis, where a study’s findings are not supported when the study is repeated with new data. This casts doubt on the original claims and the related literature and invalidate a whole sub-field

Problems with statistics: Gelman argues that some studies do not do what they claim to do, often due to errors in statistical analysis or research methods. He gives the example of a study about male upper body strength that actually measured the fat on men’s arms.

Three fundamental problems of statistics:
- generalizing from a sample to a population
- generalizing from a treatment to a control group, and
- generalizing from measurements to underlying constructs of interest
This last point is particularly interesting in the biotech space. Which brings us to,

Problems with substantive theory: Many studies lack a strong connection between statistical models and the real world. A better understanding of mechanisms and interactions is necessary for more accurate inferences. Gelman also discusses the “freshman fallacy,” where researchers assume a random sample of the population is not needed for causal inference, when in fact, it is crucial if the treatment effect varies among people (especially important if you are trying to discover drugs!). He further notes that the lack of theory and mechanisms lead to not being able to estimate interactions, which are crucial.

There are many more topics he covers from p-values and economics to bayesians not being bayesian enough.

As thanks for providing the source of the snowclone title here’s some Static

Image: Vintage European style abacus engraving from New York’s Chinatown. A historical presentation of its people and places by Louis J. Beck (1898). Original from the British Library. Digitally enhanced by rawpixel.
January 8, 2025