One Maintainer Away from Disaster — How Foundations Build the Safety Net

Open source infrastructure runs the world. The question is not whether it is fragile — it is. The question is whether we are building the institutions to catch it when it breaks. The answer, increasingly, is yes.

By Jurg van Vliet

One Maintainer Away from Disaster — How Foundations Build the Safety Net

In 2022, every Kubernetes cluster on earth depended on a database maintained by one person.

etcd — the distributed key-value store that holds every piece of cluster state, every secret, every deployment manifest — had a single active maintainer. Marek Siarkowicz at Google was it. The others had moved on. Gyuho Lee went to Amazon and stopped participating. Sam Batschelet at Red Hat went quiet. Piotr Tabor at Google and Sahdev P Zala at IBM contributed occasionally, but "occasionally" does not fix critical data inconsistency bugs in a system that underpins the entire cloud-native ecosystem.

etcd v3.5 had shipped with multiple data corruption bugs. Not theoretical risks — actual data inconsistency. The kind that silently destroys cluster state. Fixes took months because one person cannot review their own patches, cannot achieve the supermajority required for governance decisions, and — in a detail that should alarm everyone — could not even access the CNCF helpdesk.

But here is where the story diverges from the usual open-source tragedy. The situation escalated to the Kubernetes Steering Committee under the subject line "Worrying state of Etcd community." The CNCF Technical Oversight Committee got involved. The resolution was the creation of SIG-etcd in 2023, elevating the project to first-class Kubernetes citizen status with formal governance, dedicated contribution paths, and institutional accountability. New maintainers were recruited. Review capacity was restored. The data corruption bugs were fixed.

The foundation caught it. Not fast enough — but it caught it. And the mechanisms it used are worth understanding, because they represent a model for sustaining public technology infrastructure.

The bus factor is real

Software engineers have a grim metric called the "bus factor" — how many people need to disappear before a project dies. For etcd in 2022, it was one. For thousands of critical open-source projects right now, it is one.

The CNCF landscape lists over a thousand projects. A project becomes critical infrastructure — adopted by thousands of companies, embedded in production systems that handle billions in transactions — and the people who built it get nothing except more issues filed against them.

This is the problem foundations exist to solve.

What happens without a foundation: the xz backdoor

The most dramatic illustration arrived in 2024 with the xz/liblzma backdoor. xz was not a CNCF project. It was not governed by any foundation. It was maintained by one person, Lasse Collin, essentially alone for years. He was visibly burned out.

A contributor named "Jia Tan" appeared, built trust over two years through legitimate contributions, and gradually inserted a sophisticated backdoor into the build system — one designed to compromise SSH authentication on every Linux system that linked against liblzma.

The attack succeeded precisely because the bus factor was one. There was no second pair of eyes with sufficient context to catch the changes. The social engineering worked because a burned-out maintainer was desperate for help, and there was no institution — no foundation, no governance body, no formal contributor pipeline — to provide it.

The backdoor was discovered by accident. Andres Freund at Microsoft noticed SSH connections were 500 milliseconds slower than expected and investigated. Not a code review. Not a security audit. A performance anomaly.

Compare this to etcd. Same bus factor. Same risk. But etcd had the CNCF — an institution that could escalate, intervene, restructure, and recruit. xz had nothing. The difference is not the code. The difference is the institution around it.

Log4Shell: the response that changed everything

In December 2021, a critical remote code execution vulnerability was disclosed in Apache Log4j. CVE-2021-44228 — Log4Shell — allowed unauthenticated remote code execution on any system that logged attacker-controlled input, which turned out to be most of the internet.

Log4j was maintained by a handful of volunteers. When the vulnerability was disclosed, those volunteers worked around the clock to produce fixes while the entire technology industry — companies with trillion-dollar market capitalisations — waited for patches from people who were not being paid.

But Log4Shell did not just expose a vulnerability. It triggered an institutional response. The Linux Foundation launched the Open Source Security Foundation (OpenSSF) mobilisation plan, backed by a $150 million commitment from major technology companies. The Alpha-Omega project was created specifically to fund security work on critical open-source projects. The Scorecard project began systematically evaluating the security posture of open-source dependencies.

The crisis was real. But so was the response. And the response was only possible because a foundation — the Linux Foundation — had the institutional weight to convene the industry, allocate resources, and create lasting structures rather than one-time fixes.

The Linux kernel: what foundation-scale investment looks like

The Linux kernel is the proof that this model works at scale. Thousands of paid contributors from hundreds of companies, with the Linux Foundation providing the institutional backbone — governance, infrastructure, legal protection, and coordination.

This did not happen by accident. It happened because the Linux Foundation made it possible for companies to invest in shared infrastructure without surrendering competitive advantage. The foundation provides neutral ground. Companies that compete fiercely in the market collaborate on the kernel because the foundation structure makes that collaboration safe and productive.

Even Linux is not immune to risk. The kernel.org compromise of 2011 went undetected for seventeen days. The "hypocrite commits" incident in 2021 — when University of Minnesota researchers deliberately submitted flawed patches — exposed how thin the review layer could be. But in both cases, the institutional response was swift: infrastructure rebuilt, policies updated, contributor vetting strengthened. The foundation absorbed the shock and came back stronger.

That resilience is not a property of the code. It is a property of the institution.

ReiserFS: the cost of having no safety net

ReiserFS offers the starkest illustration. Hans Reiser created the filesystem, maintained it as part of the Linux kernel, and was convicted of murder in 2008. The filesystem stagnated for sixteen years — a zombie in the codebase — because nobody else had the context or mandate to maintain it. It was finally removed from the kernel in 2024.

"Maintainer convicted of murder" is extreme. But it sits on a spectrum with "maintainer changes jobs," "maintainer has a child," "maintainer gets burned out." All normal human events. All capable of killing a project that millions depend on — unless an institution exists to ensure continuity.

What CNCF and Linux Foundation governance actually provides

Foundations are not charities. They are infrastructure for infrastructure. Here is what the CNCF provides, concretely:

Graduation criteria that enforce maintainer depth. A CNCF project cannot graduate from sandbox to incubating to graduated without demonstrating healthy contributor diversity, documented governance, and a security audit. These are not suggestions. They are gates. etcd's crisis happened partly because these mechanisms had not yet been applied retroactively to projects that predated them. That gap has been closed.

Technical Oversight Committee (TOC) intervention. The TOC is not decorative. When etcd was failing, the TOC acted — restructuring the project's governance, creating SIG-etcd, and ensuring institutional support. This is the immune system in action: detecting a failing project and mobilising resources before it collapses.

Security audits and vulnerability response. CNCF funds third-party security audits for graduated projects. The Linux Foundation's OpenSSF provides tooling, funding, and coordination for security across the broader ecosystem. These are not things individual maintainers can do alone.

Neutral governance for competing contributors. When Google, Red Hat, and VMware all contribute to Kubernetes, the CNCF provides the legal and organisational framework that makes that possible. Without it, every contribution becomes a negotiation between corporate legal departments.

Contributor pipelines. Programs like LFX Mentorship and Google Summer of Code, coordinated through foundation infrastructure, create pathways for new maintainers. This is how you solve the bus factor — not by hoping someone shows up, but by building the pipeline that ensures they do.

Contributing back: from TAG Environmental Sustainability to KEIT

Foundations do not just protect projects. They create the spaces where new ideas take root.

Our CTO, Flavia, served as a SIG Tech Lead for CNCF's TAG Environmental Sustainability — the group working on carbon-aware computing, sustainability metrics, and environmental impact measurement for cloud-native infrastructure. That work — the conversations, the specifications, the cross-company collaboration that only a foundation can convene — directly inspired KEIT, the Kubernetes Emissions Insights Tool we built and open-sourced.

KEIT estimates the carbon emissions of a Kubernetes cluster by combining Kepler for energy measurement, ElectricityMaps for grid carbon intensity, and Boavizta for hardware embodied emissions. It exists because a foundation created the space to think about the problem, connected the people working on it, and established the shared vocabulary that made a practical tool possible.

This is the foundation model working as designed: participate, learn, contribute back. The investment flows in both directions.

Open source is public technology

Roads, bridges, water systems — we call these public infrastructure and fund them accordingly. Open-source software that runs the global economy deserves the same treatment.

The CNCF and Linux Foundation are, in effect, the governance bodies for public technology. They maintain the roads that every company drives on. When those roads are well-funded — when companies invest in foundation membership, employ maintainers, participate in governance, contribute upstream — the entire ecosystem benefits. When they are neglected, we get xz.

Three things every European organisation running open-source infrastructure should do:

Fund the foundations that govern your dependencies. Not as a marketing exercise. Not for the conference badge. Because your production infrastructure depends on software that needs institutional support to survive. CNCF membership, Linux Foundation membership, OpenSSF participation — these are investments in the reliability of your own systems.

Contribute people, not just money. Foundations need participants. SIGs and TAGs need leads. Working groups need reviewers. The most valuable contribution is sustained human engagement — engineers who show up, review code, mentor newcomers, and build the maintainer depth that prevents the next etcd crisis.

Map your dependencies and their governance. Know which of your critical dependencies are governed by a foundation and which are one maintainer away from disaster. For the governed ones, invest in them. For the ungoverned ones, help bring them into a foundation — or build your own contingency plan.

The threat to open-source infrastructure is not technical. It is institutional. The good news is that the institutions exist, they work, and they are getting stronger. The question is whether we invest in them before the next crisis — or only after.


#kubernetes #cncf #linuxfoundation #opensource #sovereignty #sustainability