[{"content":"Every data governance program I\u0026rsquo;ve watched is built to add definitions. None of them is built to remove one.\nYou seat a steward, as I keep insisting you have to, and the first thing they do is define a sample. Then a specimen. Then the handful of entities that carry real weight, and the lineage between them. The catalog fills up. The semantic layer accumulates measures. Every motion in the system is additive. The vocabulary only ever grows.\nSo the failure mode nobody plans for isn\u0026rsquo;t drift. It\u0026rsquo;s accumulation.\nA wrong definition gets caught: somebody\u0026rsquo;s number comes out strange and the steward gets pulled into a meeting. An obsolete one just sits there, still wired into a dashboard nobody opens, still technically correct against a question nobody asks anymore.\nWe give stewards the authority to define. We almost never give them the authority to retire. Those are not the same grant, and the second is harder to hand out, because retiring a definition means telling some team the thing they\u0026rsquo;ve counted on is going away. Adding makes you useful. Retiring makes you the person who broke a report.\nA steward who can only add is not a governance function. They\u0026rsquo;re a backlog with a job title.\nIn a regulated shop this gets worse, because the whole culture is built to keep everything. That instinct is correct for records and corrosive for definitions, and almost nobody separates the two.\nYou are required to keep the record. A batch result from 2019 stays attributable, legible, and available years after the fact. That\u0026rsquo;s the Enduring and the Available in ALCOA+, and you do not get to delete it because the study closed. Fine. That\u0026rsquo;s the record, and it is sacred.\nThe definition is a different object. It\u0026rsquo;s the active meaning the org uses to count, report, and decide right now. Retiring a definition touches no stored record. The 2019 batch result keeps its original meaning, frozen, forever. What changes is that \u0026ldquo;available sample\u0026rdquo; stops being a live measure the business runs against. You aren\u0026rsquo;t erasing history. You\u0026rsquo;re declaring that one interpretation is no longer the current one.\nRegulated orgs conflate these constantly. \u0026ldquo;We can\u0026rsquo;t retire that, it\u0026rsquo;s a controlled definition\u0026rdquo; is true about the record and false about the live meaning, and that one confusion is why the vocabulary in a life sciences shop bloats faster than almost anywhere else. The audit trail is the very thing that makes retiring safe. It preserves what the word used to mean, so dropping the live version costs you nothing you were obligated to keep.\nSo the maturity signal isn\u0026rsquo;t the size of the catalog. It\u0026rsquo;s whether the steward has ever retired anything. A program that has only grown its vocabulary hasn\u0026rsquo;t been tested yet. The test is the first time a steward says this definition is done, we keep every record it produced and we stop using it to answer anything new, and the org lets them say it without flinching.\nBefore that day, what you have isn\u0026rsquo;t governance. It\u0026rsquo;s a very well documented pile.\n","permalink":"https://josephcapozzoli.com/posts/record-is-not-the-definition/","summary":"Every governance program is built to add definitions and none of them can retire one. In a regulated shop that gets worse, because the instinct to keep every record gets misread as a ban on ever deprecating a definition. They\u0026rsquo;re different objects.","title":"Keeping the Record Is Not Keeping the Definition"},{"content":"Pulling a platform out of an app doesn\u0026rsquo;t set the app free. It puts the app on a leash, and hands the other end to everyone who shows up next.\nI\u0026rsquo;ve been doing exactly that to a storage system at work: finding the seams, deciding which opinions belong to the platform and which belong to the first app that grew on top of it. Most of the writing about this treats it as an architecture problem. Draw the boundary, extract the shared parts, publish an interface. The before and after diagrams look almost identical. A box that used to sit inside another box now sits beside it.\nWhat the diagrams don\u0026rsquo;t show is what changed about your day.\nBefore the extraction, the storage internals belonged to my team. A field that needed renaming, a status enum that needed a new value, a validation rule that was too strict for a case we hadn\u0026rsquo;t seen yet, all of it was a local edit. You change it, you ship it, you mention it in standup. The decision was private because the blast radius was private.\nThe moment a second consumer depends on that surface, the same edit is a contract change. I argued before that the boundary should be provisional and versioned, so the next consumer can push on it instead of inheriting it in silence. That\u0026rsquo;s still right. But versioning isn\u0026rsquo;t free. It\u0026rsquo;s the machinery that turns a thing you used to change in an afternoon into a thing you announce, deprecate, and migrate.\nYou don\u0026rsquo;t pay for a platform in the extraction. You pay for it every time you can no longer change something quietly.\nAnd the part that caught me off guard: your own first app is now one of the consumers. Usually the most demanding one, because it has more invested in the old shape than anyone. It grew up assuming it could reach straight into those internals. Now it has to ask. You spend the extraction work convincing yourself you\u0026rsquo;re serving some future second team, and then notice the consumer you\u0026rsquo;ve inconvenienced most is yourself, six months ago.\nThis is why \u0026ldquo;should we extract a platform?\u0026rdquo; is the wrong first question. It gets filed as an architecture decision. It\u0026rsquo;s a velocity decision. You\u0026rsquo;re proposing to trade the first app\u0026rsquo;s speed for the second app\u0026rsquo;s existence, and sometimes that trade is obviously worth making. But teams rarely price it, because the extraction lands on a roadmap as a refactor with an end date, while the actual cost has no end date. Every future change to the shared surface carries coordination it didn\u0026rsquo;t carry before, and it carries it for good.\nThat cost falls hardest on the people who used to be fastest. The team that built the storage layer knew it cold and could change it in their sleep. Extraction takes that fluency and converts it into process. Nobody writes that on the plan, because it doesn\u0026rsquo;t look like work. It looks like the absence of work you used to do without thinking about it.\nI\u0026rsquo;m not arguing against harvesting. The platform is real, the second consumer is real, and pretending otherwise doesn\u0026rsquo;t make the storage layer any easier to reuse. I\u0026rsquo;m arguing that the honest pitch includes the bill. Not \u0026ldquo;we\u0026rsquo;ll extract the shared parts and everyone self-serves.\u0026rdquo; Closer to: we\u0026rsquo;ll extract the shared parts, and from then on the team that moved fastest through this code moves at the speed of everyone who leans on it.\nThe teams that regret harvesting usually aren\u0026rsquo;t the ones who drew the boundary in the wrong place. They\u0026rsquo;re the ones who never noticed they\u0026rsquo;d hired themselves a customer.\n","permalink":"https://josephcapozzoli.com/posts/the-platform-tax/","summary":"The moment you harvest a platform out of an app, the app stops being something you can change quietly. That standing cost lands hardest on the team that built it, and most extraction plans never budget for it.","title":"The Platform Tax Lands on the Team That Built It"},{"content":"In a regulated shop, the database administrator never signs off that a batch number is correct. They run the system the number lives in. They provision access, take the backups, keep the lights on. The person who stakes their name on what the number means, in front of someone who can hold up a shipment, sits on the business side. Data management has a word for each of them. The one who runs the system is the custodian. The one who answers for the meaning is the owner. It\u0026rsquo;s a split the field has formalized for decades in frameworks like DAMA\u0026rsquo;s DMBOK, and I\u0026rsquo;ve argued before that it\u0026rsquo;s the load-bearing line in the whole discipline. If IT owns the data, no business decision about it ever sticks, because the custodian can\u0026rsquo;t overrule the function that generated it.\nI keep watching the agent industry try to erase that line.\nAlmost everything we build to make autonomous agents trustworthy is an attempt to promote a custodian into an owner. Better models, so the output needs less checking. Self-review, so the agent grades its own work. Confidence scores, so it can tell you when to believe it. Even the guard pipelines, the kind I spent a month building, belong to this family: layers of machinery whose unstated goal is to let the agent\u0026rsquo;s output stand on its own with no human behind it.\nIt won\u0026rsquo;t, and not because the machinery is weak. Because ownership isn\u0026rsquo;t a property you can build into a tool.\nAn owner is the answer to one question: when this is wrong, whose name is on it? That question has a human-shaped hole in it. Not because humans are more accurate than agents, because often we aren\u0026rsquo;t, but because accountability is a relationship between a decision and a person who can be held to it. You can make an agent more accurate. You cannot make it accountable, because there is nothing to hold. It has no career to risk, no license to lose, no signature that means anything once the shipment stops.\nYou can make a custodian arbitrarily reliable and it still won\u0026rsquo;t be an owner, because the gap between them isn\u0026rsquo;t competence, it\u0026rsquo;s liability.\nThat\u0026rsquo;s why \u0026ldquo;trust the agent\u0026rdquo; is a category error. It points the engineering at the one property a custodian definitionally lacks, and then acts surprised when no amount of model quality closes the gap.\nThe regulated world settled this a long time ago. I worked through how its rulebook maps onto agents last time, so I won\u0026rsquo;t relitigate it here. The one piece that doesn\u0026rsquo;t port is the piece that matters most: the signer. You can rebuild every control around an electronic signature and still have no one whose name it carries. The custodian is doing the work and there\u0026rsquo;s no owner in the room.\nSo the problem most teams are solving is the wrong one. The question isn\u0026rsquo;t how to make the agent trustworthy enough to ship without a human. It\u0026rsquo;s how to keep a human owner attached to output that arrives faster than any human can read it.\nThose are different problems with different answers. The first is a model problem, and it\u0026rsquo;s the one drawing the funding, because it promises to remove the human, and the human is the slow, expensive, unscalable part. The second is a governance problem, and it\u0026rsquo;s the one that actually has to be solved, because the human is the only part that can be accountable. Spending on the first to avoid the second is how you end up with a system that produces beautiful, well-tested, fully-audited output that nobody will put their name on.\nThe shape of an answer is already visible, and it isn\u0026rsquo;t a smarter agent. It\u0026rsquo;s the owner/custodian split, ported across. The agent is the custodian: it executes, it logs, it stays in its lane. The human who authorized the work is the owner, and the system\u0026rsquo;s only real job is to keep that authorization bound to what actually shipped, even when the agent took twenty steps the human never watched.\nThe binding I use for that is the hashed-spec approval I described last time, and its weak point is the one I named there: stretch enough autonomous steps between the signature and the shipment and the two stop having much to do with each other. That\u0026rsquo;s the open edge. Not a smarter agent. A binding that survives the distance.\nNotice what this does to the role everyone assumes is obsolete. The job that survives the agent era isn\u0026rsquo;t the one that writes the code. The agent does that now, faster than you, and the guard pipeline keeps it inside the lines. The job that survives is the owner, the person willing to put their name on output they didn\u0026rsquo;t type and be wrong about it in public. We keep trying to automate that person away because they\u0026rsquo;re the bottleneck. They\u0026rsquo;re also the only part a regulator, a customer, or a court can address.\nSo when someone ships the first real agent-governance product, it won\u0026rsquo;t be a more trustworthy model. It\u0026rsquo;ll be the thing that keeps a named human bound to a machine\u0026rsquo;s output at machine speed. And it\u0026rsquo;ll look less like AI than like a signature.\n","permalink":"https://josephcapozzoli.com/posts/agent-is-a-custodian/","summary":"The whole agent-reliability project is trying to engineer the one thing a tool can\u0026rsquo;t have: accountability. Data governance already named this. The agent is a custodian, and we keep trying to promote it to owner.","title":"The Agent Is a Custodian, Never an Owner"},{"content":"I spent a month building the controls that let an autonomous coding agent run unattended without scaring me. I wrote about that stack already: append-only audit logs, per-agent budget caps with hard stops, approval gates with hash-based validation so a spec edited after sign-off automatically voids its own approval. None of it was the model. All of it was provenance and constraint.\nThen I noticed I\u0026rsquo;d rebuilt a federal regulation from 1997.\nThat regulation is 21 CFR Part 11, the FDA\u0026rsquo;s rule for when an electronic record and an electronic signature are trustworthy enough to base a decision on. In my day job I work on a regulated life sciences platform, so Part 11 is wallpaper. For years I read it as compliance tax, the cost of operating in a space where an inspector can hold up your product. Building Stratum, my own guard pipeline, I wrote the same requirements from scratch, not because I was copying them, but because the problem left me no other shape to draw.\nThe mapping is almost embarrassing. Part 11 wants system-generated audit trails that record who did what and when, and that you cannot silently alter after the fact. My append-only log is that. It wants every record attributable to a specific actor. My cross-session identity tracking is that, except the actor is an agent instead of a person. It wants a signature to actually bind to what it signed, which means the approved thing cannot change underneath the approval. My hash gate is precisely that control, the one that says if the spec moved, the sign-off is void.\nThe agent world spent the last year rediscovering 21 CFR Part 11 and never learned its name.\nIt goes deeper than the audit trail. Life sciences measures record integrity against a standard called ALCOA+: Attributable, Legible, Contemporaneous, Original, Accurate, then Complete, Consistent, Enduring, Available. Read that list as a spec sheet for an unattended agent and every line earns its place. Contemporaneous means you log the action the moment it happens, not reconstruct it from a stack trace the next morning. Original means the artifact that ran is the one that got approved, not a later edit wearing the same name. Attributable means you can put a name on the actor. Each one is a property an unattended agent will quietly violate the first time you give it room.\nThis is not a coincidence, and that\u0026rsquo;s the part worth sitting with. Part 11 exists because the FDA had to decide, decades before any of this, when a record produced by a system instead of a person\u0026rsquo;s pen is sound enough to bet a patient\u0026rsquo;s safety on. That is the agent question, word for word. The domains have nothing to do with each other. The trust problem is identical: a non-human process generated a record, and somebody downstream has to stake a decision on it without having watched it happen.\nHere is where the old regulation stops being enough, and it\u0026rsquo;s the same place the agent industry is about to get stuck. Part 11 assumes a human signs. The electronic signature is the entire load-bearing idea. A person attaches their name and, with it, accountability. The audit trail just proves they did. But an agent has no name to sign with and no accountability to attach to one. You can log everything it did with perfect fidelity and still have nobody standing behind the result.\nSo the real question isn\u0026rsquo;t whether agent governance will look like Part 11. It already does, feature for feature. The question is what goes in the signature box when no human reviewed the change. My current answer is the human who approved the spec, bound by hash to the exact diff that ran. That\u0026rsquo;s a proxy for a signature, and an honest one, right up until an agent\u0026rsquo;s chain of decisions gets long enough that the spec a person signed and the output that shipped are separated by a dozen autonomous steps nobody looked at.\nThe regulated world will reach the far side of this first. Not because it\u0026rsquo;s smarter, but because it\u0026rsquo;s the one place where \u0026ldquo;we couldn\u0026rsquo;t prove who decided\u0026rdquo; is a finding that stops a shipment, and a stopped shipment funds the fix. Everyone else gets there a few production incidents later, writes it up as a fresh pattern, and gives it a name that isn\u0026rsquo;t Part 11.\n","permalink":"https://josephcapozzoli.com/posts/agent-world-reinventing-part-11/","summary":"I built audit logs and self-voiding approval gates to make an autonomous agent safe to run unattended, then realized I\u0026rsquo;d rebuilt a regulation the FDA shipped in 1997. The regulated world solved agent trust before agents existed.","title":"The Agent World Is Reinventing 21 CFR Part 11"},{"content":"A while back I argued that our storage app was a platform wearing an app\u0026rsquo;s clothes, and that the way to find the real boundary was to wait for a second team that wanted in and watch where they pushed back. I still think that\u0026rsquo;s the right move. I\u0026rsquo;ve also started to distrust it, and the reason is subtle enough to be worth naming before you build on it.\nThe second consumer doesn\u0026rsquo;t tell you what your platform is. It tells you what your first two consumers don\u0026rsquo;t have in common. Those sound like the same fact. They aren\u0026rsquo;t, and the gap between them is where a harvested platform quietly goes wrong.\nMartin Fowler\u0026rsquo;s Harvested Platform is still the right instinct. You don\u0026rsquo;t design the platform up front. You build the app, wait until a second app shows up with overlapping needs, and extract the shared part. He says that beats guessing in advance, and I think he\u0026rsquo;s right. The trouble is what \u0026ldquo;extract the shared part\u0026rdquo; smuggles in.\nTwo consumers give you exactly one comparison. Anything both need reads as platform. Anything only one needs reads as app. The classifier feels principled right up until you notice it runs on a sample size of two.\nA sample of two can\u0026rsquo;t tell the difference between a requirement and a coincidence.\nTwo teams might both want the same thing for reasons that won\u0026rsquo;t generalize past them. Or both might happen not to need a capability a third team would treat as non-negotiable, so you file that capability under app logic, strip it out, and hand the third consumer the job of rebuilding it. The intersection of two use cases is not the shape of the platform. It only looks like it while two is all you have.\nThe pain shows up in predictable places. Endpoints that assume a workflow sequence one team has and another doesn\u0026rsquo;t. Fields in the data model that exist for someone else\u0026rsquo;s compliance requirement. Metadata that encodes a file-naming convention only the first team follows. None of that is hard to spot by category. What\u0026rsquo;s hard is saying, case by case, whether a given thing is the platform\u0026rsquo;s to keep or the first app\u0026rsquo;s to shed.\nThis is the trap sitting underneath the Thinnest Viable Platform. Thin is the right goal. But thin measured against two consumers can cut muscle and call it fat.\nSome of the opinions baked into a storage layer look like application logic and are the only reason the storage is safe to use at all. Access control. Lifecycle. The validation that protects the bytes instead of one team\u0026rsquo;s naming convention. If neither of the first two consumers leans on one of those, the two-consumer classifier says it\u0026rsquo;s app logic and should go. It would be wrong, and you wouldn\u0026rsquo;t find out until a team that needed it showed up and couldn\u0026rsquo;t self-serve.\nEvan Bottcher\u0026rsquo;s litmus test, whether a consumer can self-serve without inheriting opinions they didn\u0026rsquo;t ask for, tells you when you\u0026rsquo;ve failed. It doesn\u0026rsquo;t tell you which opinions to keep. It\u0026rsquo;s a test, not a map.\nYou can\u0026rsquo;t wait for the third consumer, because the second one needs an answer now and waiting isn\u0026rsquo;t free. And you can\u0026rsquo;t generalize honestly from two. What\u0026rsquo;s left is less satisfying than a rule and more honest than pretending two is enough. Extract the intersection. Then add back the capabilities that are structurally load-bearing even when only one consumer currently names them, because the cost of wrongly cutting access control is nothing like the cost of wrongly keeping it. Then treat the boundary itself as provisional and version it, so moving the line later is an announced change instead of a break the next consumer absorbs in silence. It\u0026rsquo;s the same discipline I argued for at the data seam, pointed at a different seam.\nHarvesting is never one event, and that\u0026rsquo;s what gets lost when people repeat Fowler\u0026rsquo;s rule like a finish line. The first consumer shapes the app. The second reshapes it into a platform. A third will reshape it again. The only part you actually control is whether your current guesses are written down somewhere the next team can push on them, or buried deep enough that the next team inherits them in silence.\nSo the move isn\u0026rsquo;t to design the platform. It\u0026rsquo;s to propose one, on the evidence of two teams, and say provisional out loud, so the third team arrives at a boundary it can move instead of a wall it has to live behind.\n","permalink":"https://josephcapozzoli.com/posts/two-consumers-dont-make-a-platform/","summary":"I argued the way to find a platform\u0026rsquo;s real boundary was to watch the second team that wanted in. The catch I keep running into: two consumers tell you what they have in common, and that\u0026rsquo;s not the same thing as what the platform is.","title":"Two Consumers Don't Make a Platform"},{"content":"Last time I wrote about why three systems gave three different answers to \u0026ldquo;how many samples do we have,\u0026rdquo; and it came down to a role almost nobody seats: the data steward, the person who decides what a word means and defends it the next time someone wants an exception. That post ended on appointing one. It treated governance as the standing function that keeps definitions honest while the org moves underneath them, and left it there. This is the next question. You\u0026rsquo;ve seated the steward. Now what do you build around that seat so it survives contact with a regulator?\nMost data governance programs die the same way. Someone buys a catalog tool, a working group produces a policy binder, and six months later nobody can tell you who is accountable for the number on a batch release certificate. The binder sits on a shared drive. The tool gets renewed once out of guilt, then cancelled. The catalog became another place to store the confusion.\nIn life sciences you don\u0026rsquo;t get to fail that quietly. The gap shows up during an inspection, in front of someone who can issue a Form 483 observation or a warning letter and, in the worst case, hold up your product. A recent IQVIA survey put it bluntly: only 31% of pharma companies have a fully implemented data governance strategy, which leaves nearly 70% running on partial coverage or none at all. (IQVIA)\nSo before the framework, the catalog, or the org chart, get one thing straight. Data governance is not a tooling problem. It is an accountability problem wearing a tooling costume. A catalog is a filing cabinet. Governance is knowing whose name is on the file and how you prove the contents are true.\nThe regulation is your budget Life sciences is different in a way most people get backwards. The regulation is not the obstacle to governance. It is the only argument that reliably gets it funded.\n\u0026ldquo;Better data quality\u0026rdquo; loses every budget fight it enters, because it competes with shipping product. \u0026ldquo;We will fail our next GxP inspection on data integrity\u0026rdquo; wins, because everyone in the room has seen what a warning letter does to a launch timeline. Anchor the program to that, not to abstract data hygiene.\nA handful of regulatory pillars define what \u0026ldquo;good\u0026rdquo; means, and you should be able to name all of them before you write a single policy:\nGxP is the umbrella for the Good Practice regulations: GMP (manufacturing), GLP (laboratory), GCP (clinical), GDP (distribution). If your data supports a GxP process, it is inspectable. 21 CFR Part 11 (US FDA) and EU Annex 11 set the rules for electronic records and signatures: unique user authentication, system-generated audit trails, validated backups, and controls that prevent silent record alteration. (AVS Life Sciences) GAMP 5, from ISPE, is the de facto standard for validating the computerized systems those records live in. It is guidance, not law, but inspectors treat it as the expected approach. (IntuitionLabs) ALCOA+ is the data integrity standard your records get measured against: Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, and Available. It is referenced directly in the FDA\u0026rsquo;s 2018 data integrity guidance and the MHRA\u0026rsquo;s 2018 GxP guidance. (TotalLab) Every policy you write should trace back to one of these. If a rule can\u0026rsquo;t, ask why it exists.\nThe roles, and who actually carries the weight Governance runs on named people with real decision rights, not on a committee that meets quarterly to admire dashboards. The role set is well established (DAMA-DMBOK is the common reference point), and the titles matter less than the boundary between accountability and execution. (DAMA)\nExecutive sponsor, usually a Chief Data Officer or a VP of Quality. Holds the budget and the authority to make a decision stick across functions. Without this, the program is a hobby. In one mid-size pharma turnaround, the first council didn\u0026rsquo;t carry weight until it sat at executive VP level and data made it onto the executive agenda. (HYGHT) Data governance council, the cross-functional body that sets policy, arbitrates disputes between domains, and prioritizes the work. Co-chaired by business and IT, with every line of business represented. Data governance office / lead, the person who runs the program day to day so the council doesn\u0026rsquo;t have to. The secretariat, the convener, the one who keeps the issue log honest. Data owners, senior business leaders accountable for a data domain. The head of manufacturing owns batch data. Regulatory affairs owns submission data. They make the calls on access, definitions, and quality thresholds. They rarely have time for the day-to-day, which is the whole reason the next role exists. Data stewards, the practitioners who make governance real. They define the critical data elements, set and monitor quality rules, maintain the metadata, and resolve issues when a number doesn\u0026rsquo;t reconcile. This is the load-bearing role, and it is the one organizations consistently underfund. A governance program with great owners and no stewards is a press release. Data custodians, the IT side: database administrators, platform engineers, the people who run the systems, provision access, and maintain backups under the stewards\u0026rsquo; direction. Quality assurance, who own validation, audit readiness, and the SOPs that turn policy into inspectable evidence. Regulatory affairs, who care about submission data standards like IDMP for product master data and CDISC for clinical, because that is the form the data has to take when it leaves the building. Computer system validation specialists and information security, who make the systems defensible under Part 11 and Annex 11 and keep patient and personal data inside GDPR and HIPAA lines. One position you should hold firmly: data owners must be business leaders, not IT. If IT owns the data, no business decision about it ever sticks, because IT can\u0026rsquo;t overrule the function that generates it. IT is the custodian. The business owns.\nThe order to do it in The biggest failure mode after \u0026ldquo;we bought a tool\u0026rdquo; is \u0026ldquo;we tried to govern everything at once.\u0026rdquo; Don\u0026rsquo;t. Sequence it.\nFind the wound and the sponsor. Anchor the program to a specific risk or failure: an audit observation, a number nobody trusts for a submission, a domain that has burned you before. The wound is what gets you the sponsor, and the sponsor is what gets you everything else. Scope narrow. Pick one regulated domain that hurts and govern that. Product master data for IDMP is a common and tractable starting point. Batch records are another. Enterprise-wide rollouts collapse under their own weight before they show value. Stand up the operating model. Council, charter, and decision rights. Write the charter in plain language with a specific purpose (\u0026ldquo;improve the quality of data on the registration system\u0026rdquo;), not a generality. Map a RACI so every critical decision has exactly one accountable name. (DataStrategyPros) Inventory and classify. Identify your critical data elements, trace their lineage, and tag each as GxP or non-GxP. You cannot protect what you can\u0026rsquo;t see, and you cannot prioritize if everything looks equally important. Write policies that trace to the regulation. Anchor data quality rules to ALCOA+ and the relevant GxP requirement. A policy that can\u0026rsquo;t name the regulation it serves is decoration. Make stewardship operational. Assign named stewards per domain, stand up an issue log, define the quality rules, and run remediation as a real workflow with deadlines, not a backlog nobody owns. Measure against inspection risk. Track data quality metrics that map to audit exposure, not vanity numbers. \u0026ldquo;Percentage of critical data elements with a defined owner and quality rule\u0026rdquo; beats \u0026ldquo;records processed.\u0026rdquo; Validate and document. Bring the systems under CSV per GAMP 5, with audit trails, electronic signatures, and access controls that satisfy Part 11 and Annex 11. In a regulated shop, undocumented governance is the same as no governance. Expand by maturity, one domain at a time. Use the first domain as the proof, then move to the next. Governance grows by demonstrated value, never by mandate alone. Communication is most of the job. One practitioner estimate puts data governance at 80 to 95 percent communication and the rest tooling, which sounds like an exaggeration until you watch a technically perfect program fail because nobody told the people entering the data why any of it mattered. (IBM)\nThe test of whether you\u0026rsquo;ve actually built something isn\u0026rsquo;t the size of the policy binder. It\u0026rsquo;s whether a steward, asked who is accountable for a number and how they know it\u0026rsquo;s accurate, can answer on the spot without convening a meeting. That is the question an inspector asks.\nThe samples post ended with a count that wasn\u0026rsquo;t a count: you have as many samples as you have agreements about what a sample is. Governance is the machinery that turns those agreements into something an inspector will sign off on. The binder doesn\u0026rsquo;t do that. A named human, backed by a regulation and a system that proves they\u0026rsquo;re right, does.\n","permalink":"https://josephcapozzoli.com/posts/data-governance-that-survives-an-inspection/","summary":"The samples post ended on seating a data steward. This is what you build around that seat in a regulated life sciences org: the roles you actually need, the order to do it in, and why the regulation is the only budget argument that ever works.","title":"Data Governance That Survives an Inspection"},{"content":"Someone asked how many samples we have. It should have been a one-line query. I got three numbers from three systems, none of them matching, and a meeting to reconcile them.\nAround the same time, leadership wanted a semantic layer, I\u0026rsquo;d been pushing for a data catalog, and a planning doc had \u0026ldquo;adopt data contracts\u0026rdquo; sitting in it with no owner. Four initiatives. And if you\u0026rsquo;d stopped me in the hallway, I couldn\u0026rsquo;t have given you a clean answer for why we needed all four, which one comes first, or who builds what. So before spending a budget cycle on the wrong end of it, I worked out the dependency order.\nHere\u0026rsquo;s the order. And here\u0026rsquo;s why the whole thing is so hard to keep standing.\nThe first question is why we need all these separate things at all. They sound like the same thing described by four different vendors.\nThey\u0026rsquo;re not. Each one answers a different question, and \u0026ldquo;how many samples do we have\u0026rdquo; needs every one of them answered before it has a number.\nStart with the word. What is a sample? The vial that showed up at the loading dock? If I split that vial into ten aliquots and freeze them separately, do I have one sample or eleven? If I run one specimen through two assays, is that one row or two? If I pool five aliquots back into a single tube, what happens to the count? Is a fully consumed sample still a sample? Is a control? Every one of those is a question about grain: what does one row in the samples table actually represent. Until somebody decides, the table has no grain, and a table with no agreed grain cannot be counted. The set of those decisions, what the words mean and what one of a thing is, is your vocabulary: the controlled definitions everything downstream inherits.\nThen there\u0026rsquo;s how the words relate. A subject yields many specimens. A specimen splits into many aliquots. Aliquots pool, split again, and derive into new entities, each of which needs its own identity and a key that ties it back to its parent. Those words, sample and specimen and aliquot and subject and batch, aren\u0026rsquo;t synonyms. They\u0026rsquo;re a graph. Write that graph down, the entity types and the relationships between them, and you have an ontology. Not the academic kind. The working kind: the entities, their keys, and how lineage flows from parent to child. Most orgs use these five words interchangeably in conversation and then wire them together five different ways across five systems.\nNow the report. The dashboard says 50,000 samples. Fifty thousand of what? Physical containers in freezers? Logical specimen records? Distinct samples regardless of how many tubes they were split into? The number is meaningless until you know which definition it encoded and which tables it ran against. The semantic layer is where you define a measure like \u0026ldquo;available sample\u0026rdquo; exactly once, against named tables, so every dashboard asking that question gets the same answer back. It\u0026rsquo;s the translation from the vocabulary to the physical schema. Skip it and every analyst reimplements the definition in their own SQL, which is precisely how you end up with three numbers.\nAnd where does the data live? Sample records sit in the LIMS, in instrument exports, in the electronic lab notebook, in the freezer inventory. Four systems, four schemas, four owners, and no standing agreement on which one is the system of record for a given field. The catalog is the inventory of all of it: what data exists, which system holds it, what each field means, and where it came from. It\u0026rsquo;s the map, plus the lineage that proves the map is still current.\nLast one. When the inventory team renames a status, or changes what \u0026ldquo;available\u0026rdquo; means by splitting it into consumed, depleted, and archived, every count downstream shifts and nobody gets a heads-up. A data contract is the versioned interface at that boundary. The producer commits to a schema and a set of semantics, and breaking either one becomes an announced, versioned change instead of a surprise your dashboard absorbs in silence.\nFive things. Five different questions. You need all of them because \u0026ldquo;how many samples\u0026rdquo; touches all five, and a gap in any one is enough to hand you three numbers and a meeting. And none of the five is a build-once artifact. Each one is a living thing somebody has to keep true after it ships.\nSo where do you start? This is the part that matters, because the order you\u0026rsquo;ll be asked for is the reverse of the order that works.\nLeadership asks for the semantic layer. Of course they do. It\u0026rsquo;s the visible one, the one that produces the dashboard, the one you can fund and point at in a review. But the semantic layer is an output. It translates meaning. It cannot translate a meaning nobody has agreed on yet. You can\u0026rsquo;t define \u0026ldquo;available sample\u0026rdquo; as a measure until someone with the authority to decide has said what an available sample is. Start at the top and what you build is a very precise machine for encoding a disagreement.\nWhich leads to the question underneath all of it: who defines what? And this one has no technical answer. Defining what a sample is looks like a data problem, so it lands on the data team by default. We can model anything you describe. We cannot tell you whether a consumed aliquot still counts, because that isn\u0026rsquo;t a modeling decision. It\u0026rsquo;s a judgment about how the business treats its own work. The role that\u0026rsquo;s supposed to make that call has a name: the data steward, the working front of what most orgs file under data governance. A steward owns the definitions for a data domain. They\u0026rsquo;re accountable for what the words mean, they sign off on changes to them, and they sit close to the work, on the science side, not inside the data team. Most orgs publish a governance policy and never seat an actual steward. So the definition falls to whoever happened to write the query, and you get four reasonable definitions, none of them official, all of them running in production.\nSo who owns the chain end to end? Nobody, and that is the whole problem. It runs from meaning at the bottom to dashboards at the top. The bottom belongs to the people doing the science. The top belongs to the data and platform teams. There\u0026rsquo;s a seam through the middle that nobody designed. It\u0026rsquo;s just where one org\u0026rsquo;s mandate ends and the next one\u0026rsquo;s begins. The vocabulary sits on one side. Every tool that depends on the vocabulary sits on the other.\nThat\u0026rsquo;s why it rots, and the rot is the part people underestimate. Defining a sample once is hard. Keeping that definition true for three years, through reorgs, new instruments, and a migration off the old LIMS, is harder, and it never finishes. Every time the bench changes how it works, or the platform team ships a new system, one side of the seam moves and the other doesn\u0026rsquo;t hear about it. The catalog goes stale. The contracts start to lie. The semantic layer keeps confidently returning a number that used to be true.\nThis is what data governance actually is. Not the policy PDF. The standing function that keeps definitions, lineage, and contracts honest while the org moves underneath them. Treat it as a one-time project and you\u0026rsquo;ve signed up to rebuild the same foundation every couple of years. None of the hard part is technical. The modeling is tractable. The governance is the gap.\nThe way out runs backwards Everybody points at the top of the chain. You start at the bottom, and you start by seating a data steward. Not a tool, not a project. A named person on the science side with the authority to decide what a sample is and make the decision stick. If that seat is empty, nothing above it holds and no tooling will save you. This is the step everyone skips, because it\u0026rsquo;s a governance problem, not an engineering one, and you can\u0026rsquo;t buy your way past it.\nThen the steward defines the vocabulary, and keeps it small. The handful of entities that carry real weight: sample, specimen, aliquot, subject. Pin the grain of each one. A short controlled definition everyone honors beats an exhaustive one nobody finished.\nThen the relationships. The working ontology: the entity types, their keys, and the parent-to-child lineage. Enough that two engineers reading it would model the same graph, and no more than that.\nNow the catalog and the semantic layer finally have something to stand on. The catalog inventories real definitions instead of guessing at them. The semantic layer encodes the steward\u0026rsquo;s measures once, against the system of record, instead of reinventing them per query. This is where the data team gets to run, and where the spend actually buys something, because it\u0026rsquo;s pointed at meaning that already exists.\nContracts come last, and they go at the seams. Once the meaning is settled and the tools encode it, the contract is the versioned boundary that keeps the bench and the platform from drifting apart again. It\u0026rsquo;s what makes the agreement survive the next reorg and the next schema migration.\nAnd then it isn\u0026rsquo;t finished, because this part has no end state. The steward keeps the definitions current as new assays and instruments show up. Changes to what a sample means go through review instead of surfacing in a dashboard six months later. The catalog and the contracts get checked against reality on a cadence, not at a launch. That standing discipline is the governance, and it\u0026rsquo;s the whole difference between a foundation that holds and one you quietly rebuild every two years.\nThe steps are easy to list and brutal to execute, and almost all of the difficulty sits in the first one, because the first one isn\u0026rsquo;t an engineering task. It\u0026rsquo;s getting a person to put their name on a definition and defend it the next time a team wants an exception.\nThe catalog, the semantic layer, the contract tooling: you can buy every bit of it. A data steward who will own what a sample is, and keep owning it as the org shifts around them, you have to appoint. That\u0026rsquo;s the line nobody puts on a roadmap.\nSo the question was never which semantic layer. It was who gets to decide what a sample is. If you can\u0026rsquo;t name that person, you don\u0026rsquo;t have a tooling gap. You have your actual first project.\n","permalink":"https://josephcapozzoli.com/posts/how-many-samples-do-we-have/","summary":"Someone asked how many samples we have and three systems gave three different numbers. The dependency order behind vocabulary, ontology, semantic layers, catalogs, and contracts, and why the piece everything stands on is a role most orgs never seat: the data steward.","title":"How Many Samples Do We Have?"},{"content":"A few weeks back, Birgitta Böckeler at Thoughtworks formalized what a lot of us have been building piecemeal: the harness around a coding agent is a system of guides (steering it before it acts) and sensors (catching what it does wrong). Computational checks first, inferential checks second. Distribute them across the change lifecycle. Iterate the harness whenever the same problem shows up twice. OpenAI\u0026rsquo;s team describes the same shift: \u0026ldquo;the discipline shows up more in the scaffolding rather than the code.\u0026rdquo;\nThe framework is right. The iteration step is where it needs a sequel.\nIn her model, iteration is human work. Whenever an issue happens multiple times, the feedforward and feedback controls should be improved. A person watches, notices, edits the rules. That assumption breaks the moment you stop watching.\nIn the operational layer I described last time, agents run overnight on schedules, across multiple repos, on a budget. Nobody is sitting there spotting the third occurrence of the same misunderstood instruction. By morning, the same broken changeset shape has produced four rejections across three repos. The harness held. It also forgot.\nA pipeline that resets every run is not a regulator. It is a checkpoint.\nThe cheapest checks are also the dumbest Computational-first ordering is a cost win. Empty diff. Compilation. Scope. Structural invariants. Run those before you run anything that costs a token, and most agent mistakes never reach the expensive stage. I covered why in the first post and Böckeler covers it more rigorously in hers. Nobody serious disagrees on the ordering.\nThe part nobody talks about is that those cheap checks are stateless. They don\u0026rsquo;t know what they rejected yesterday. They reject the same thing today, and the agent burns the same tokens generating it, and the operator burns the same minutes triaging it. The compounding cost isn\u0026rsquo;t in the gate. It\u0026rsquo;s in the work the gate keeps repelling.\nA few examples of what a sensor with memory could surface that a stateless one cannot:\nThis file has been touched by three failed runs this week. Maybe the spec is wrong, not the agent. This invariant gets violated more often than it did a month ago. Drift, but slow. The semantic reviewer already rejected this approach, with this specific reason, on this codebase. Stop generating it. Computational-first is a cost optimization. Memory is a learning optimization. The two compose. Most pipelines stop at the first.\nSensors as state, not as runs Most teams treat the guard pipeline as a series of independent runs. CI passes or fails. The judge says ship or don\u0026rsquo;t ship. Pass-fail-pass-fail with no accumulation.\nTreat the same sensors as state and the picture changes. Every rejection is a labeled example: agent did X, reviewer said Y, pipeline returned Z. Stored in a ledger keyed by file, intent, and changeset shape, that ledger becomes a private benchmark of how this agent fails on this codebase. After a few hundred runs you can cluster the failures, surface the patterns, and feed them forward as new guides. The keying is where this gets hard, and most teams will get it wrong before they get it right.\nThis is Böckeler\u0026rsquo;s steering loop, but the human\u0026rsquo;s role shifts. Instead of noticing the patterns, the human approves them. The system finds repeats. The human promotes them to rules. The system invalidates rules that stop firing.\nThe point isn\u0026rsquo;t autonomy. It\u0026rsquo;s leverage. The harness gets sharper without anyone editing it by hand.\nThe behavior gap is still there Böckeler is honest about what\u0026rsquo;s hard. Maintainability and architecture-fitness harnesses use existing tooling. The behavior harness is the elephant: did this changeset actually do what was asked? AI-generated tests aren\u0026rsquo;t trustworthy enough yet. The harness can\u0026rsquo;t replace the human review step.\nMemory doesn\u0026rsquo;t solve this. It just makes it less expensive to be wrong.\nWhat it can do is route attention. The system learns which changeset shapes correlate with later defects, and which intents tend to be misinterpreted. That\u0026rsquo;s not behavior verification. It\u0026rsquo;s behavior triage. Triage is what your time is actually short on.\nThe unsexy claim is the right one: a regulator with memory doesn\u0026rsquo;t replace the human reviewer. It tells the human reviewer where to look first.\nWhy this isn\u0026rsquo;t a feature You don\u0026rsquo;t bolt memory onto a guard pipeline by adding a database. The pipeline has to be designed around persistent state from the start. Stages need to read prior decisions before running. Rejections need to be machine-readable, not stack traces. The semantic reviewer has to be able to ask: have I seen this before, and what happened.\nMost existing tools were never built for this. Linters don\u0026rsquo;t know what last week\u0026rsquo;s lint runs found. Test runners don\u0026rsquo;t know which tests failed in adjacent commits. CI systems forget everything between green and red. They\u0026rsquo;re checkpoints.\nWhen the harness becomes a learning system, the components inside it have to support that. The retrofit is painful. The greenfield version is straightforward. That asymmetry is going to determine who ships it first.\nThe asset, not the cost There\u0026rsquo;s a version of this that gets philosophical about emergent behavior and self-improving systems. Skip it. The boring version is the right version.\nThe harness is a long-lived engineering asset. Assets that don\u0026rsquo;t accumulate value over time aren\u0026rsquo;t really assets. They\u0026rsquo;re recurring costs. Build the guard. Order the checks cheap-to-expensive. Then give the whole thing a memory.\nThe teams that figure this out first will look like they have better agents. They won\u0026rsquo;t. They\u0026rsquo;ll have a regulator that reads its own notes.\n","permalink":"https://josephcapozzoli.com/posts/the-harness-that-forgets/","summary":"Birgitta Böckeler\u0026rsquo;s harness-engineering framework treats iteration as human work. Someone watches, notices, edits the rules. That assumption breaks the moment agents run overnight. The next move is sensors with memory.","title":"The Harness That Forgets"},{"content":"Enterprise data strategy has changed its destination. A few years ago the target was a unified data platform: break the silos, get to a single source of truth. Today the target is vector-readiness. Make the data AI-ready. Embed everything.\nThe step between those two didn\u0026rsquo;t go away.\nEmbeddings inherit whatever structure exists in the source data. Whatever shape it\u0026rsquo;s in when you vectorize it is the shape it carries into the model. If your systems agree on what a customer is, your embeddings will too. If they don\u0026rsquo;t, the embeddings encode the disagreement. The silos don\u0026rsquo;t disappear. They move into the vector space, where they\u0026rsquo;re harder to see and harder to debug.\nThis isn\u0026rsquo;t a reason to slow down on vectors. Embeddings are useful. They make semantic search work, they power retrieval-augmented generation, they let AI systems find relationships keyword search would miss. The point isn\u0026rsquo;t to delay them. It\u0026rsquo;s to make sure they land on something they can stand on.\nWhat the Middle Step Is Between raw data and useful embeddings, there\u0026rsquo;s a layer that answers three questions: what do I have, where did it come from, and what does it mean. Some teams call this a catalog. Some call it a semantic layer. Some call it the data foundation. The name matters less than the work, which is making your data legible to something that isn\u0026rsquo;t you.\nIf you can describe your core entities clearly, if you can trace where the data originates, if the relationships between things are written down somewhere instead of living in the heads of three senior engineers, you have most of what you need. The remaining work is almost always smaller than standing up a vector database.\nOnce that foundation is in place, vectors become what they\u0026rsquo;re supposed to be. A layer that amplifies clarity, not one that\u0026rsquo;s asked to invent it. The demos get better. The answers get more consistent. The system holds up when people ask harder questions, because the meaning was there before the math ran.\nThat\u0026rsquo;s the data side. The operational side is its own missing layer, and most teams aren\u0026rsquo;t shipping that one either.\nIf there\u0026rsquo;s an AI initiative on your roadmap, ask those three questions before you ask which vector database to use. If the answers are clear, you\u0026rsquo;re ready. If they aren\u0026rsquo;t, no embedding strategy is going to fix that for you.\nThat\u0026rsquo;s the step. Not instead of vectorizing. Before it.\n","permalink":"https://josephcapozzoli.com/posts/the-step-between-the-catalog-and-the-vector/","summary":"Enterprise data strategy has moved its destination from \u0026lsquo;break the silos\u0026rsquo; to \u0026lsquo;vectorize everything.\u0026rsquo; The step in between is the layer that makes vectors actually work, and it\u0026rsquo;s the one most roadmaps quietly skip.","title":"The Step Between the Catalog and the Vector"},{"content":"A few weeks ago I wrote about the missing verification layer in AI code generation. The argument: nobody\u0026rsquo;s building the pipeline between \u0026ldquo;agent wrote code\u0026rdquo; and \u0026ldquo;code is safe to ship.\u0026rdquo;\nI built that pipeline. It works. And it immediately exposed the next problem, and the one after that, and the one after that.\nThe model is not the system. The system is everything required to make model output selectable, constrainable, auditable, affordable, and stoppable. I\u0026rsquo;ve spent the past month building that system across two projects: one that generates and guards code, and one that orchestrates autonomous workflows through GitHub Actions. The takeaway is simple: once you go past interactive pair-programming, reliability stops being a model-quality problem and becomes a systems-design problem.\nEach layer I built exists because the layer below it wasn\u0026rsquo;t enough.\nTrusting the output I covered this in detail last time. The short version is a seven-stage guard pipeline, ordered from cheapest checks to most expensive. Empty diff. Scope enforcement. Compilation. Structural invariants. Orphan detection. Cross-session identity tracking. Only after the deterministic stages pass does a separate AI reviewer evaluate whether the changes match the original intent.\nMost agent problems are caught without burning a single token. The ordering matters. By the time the expensive stage runs, the changeset already passes every binary check I could think of.\nGitGuardian arrived at a similar pattern for security validation. Cursor\u0026rsquo;s FastRender experiment ran 2,000 concurrent agents across a million lines of code. At that scale, this layer stops being optional.\nBut a guard pipeline only protects you after a task is chosen. It says nothing about whether the task was worth doing.\nTrusting the task When an agent runs unattended, someone still has to pick the work. If the task is vague, the agent interprets it creatively. If the task is already done, the agent finds something else to do anyway. If the task has unmet dependencies, the agent works around them in ways you didn\u0026rsquo;t want.\nI built a structured markdown backlog where each task carries a title, scope, constraints, acceptance criteria, and status. An operator script reads the backlog, finds the highest-priority unblocked item, validates that it has enough substance to act on, and sets up the workspace: clean git state, feature branch, snapshot of the starting position.\nThe agent doesn\u0026rsquo;t decide what to work on. The system decides. The agent executes within those boundaries.\nThis matters more as task horizons expand. Anthropic\u0026rsquo;s 2026 Agentic Coding Trends Report notes agents now average 20 autonomous actions before requiring human input, with session lengths up from 4 minutes to 23 minutes. Longer runs make bad task selection more expensive. The backlog and operator layer exists because the guard pipeline doesn\u0026rsquo;t care what the agent was supposed to be doing. It only cares whether the output is structurally sound. Those are different questions.\nTrusting the operation Guard pipeline validates output. Operator selects tasks. The system runs overnight on a schedule. Now the failures have nothing to do with code quality or task selection.\nAn agent that burns through your API budget at 2am. An agent that pushes to a shared branch because nothing enforced dry-run. An agent that runs the same task twice because the last run didn\u0026rsquo;t update status. An agent that acts on a spec someone edited after it was approved.\nDeny by default. Require explicit opt-in for anything with side effects. That\u0026rsquo;s the principle. The implementation is scheduled execution through GitHub Actions and cron, per-agent budget caps with hard stops, dry-run as the default, append-only audit logs, and approval gates with hash-based validation so a spec change automatically invalidates the approval.\nSimon Willison\u0026rsquo;s lethal trifecta frames the trust model: private data, untrusted content, external communication. If your agent touches any two of those three, you need every gate I just described. Very few people are shipping this operational layer as a coherent product. Most of us are assembling it from parts.\nThe part that gets weird The guard pipeline self-shipped features in one session. In a later session, it composed on its own prior output without being told those features existed. The system maintains persistent awareness of what it has built, so when a new task requires capabilities from a previous run, the agent can discover them through context and build on top of them.\nThe operator scripts that select tasks and create branches were themselves refined through the same backlog-driven process they orchestrate. I added a task to improve the task selector. The system selected that task, built the improvement, and the guard pipeline validated it before it merged.\nThe stack is no longer just supporting the work. It is participating in the loop that shapes, constrains, and improves the next version of itself. This isn\u0026rsquo;t theoretical self-improvement. It\u0026rsquo;s bounded compounding: agent writes code, guard pipeline validates it, and validated code becomes context for the next run. Each session compounds on the last, but only through the same gates everything else passes through.\nWhere this actually stands The infrastructure-to-useful-output ratio is humbling. More engineering hours went into making agents reliable than the agents have saved. That math might flip eventually. It hasn\u0026rsquo;t flipped yet.\nThe ecosystem is catching up. Anthropic is documenting the patterns. GitGuardian is shipping guardrails for the output layer. But the full operational stack, selection, constraint, memory, budgets, approval, is still almost entirely custom work.\nPast a certain point, you are not adopting an agent. You are operating a stack. The agent is just the part people can see.\n","permalink":"https://josephcapozzoli.com/posts/the-stack-nobody-talks-about/","summary":"The model is not the system. The system is everything required to make model output selectable, constrainable, auditable, and stoppable. I spent a month building that system. Here\u0026rsquo;s what it actually looks like.","title":"The Stack Nobody Talks About"},{"content":"I work on a storage application. It stores files, manages metadata, enforces validation rules, runs approval workflows. It works. It\u0026rsquo;s been working for a while.\nThen a second team wanted to use it.\nThey didn\u0026rsquo;t want the approval workflows. They didn\u0026rsquo;t need the validation rules. They didn\u0026rsquo;t care about the metadata schema we\u0026rsquo;d built around our specific use case. They just wanted to store files and get them back. But they couldn\u0026rsquo;t, because all of that application logic is woven into the storage layer. You can\u0026rsquo;t take the storage without taking the opinions.\nWhat that actually looks like: a team that needs get and put has to deploy two applications and inherit north of 70% of functionality they will never use. They needed a storage layer. They got a whole product with someone else\u0026rsquo;s workflows bolted on.\nThat\u0026rsquo;s when I realized: this isn\u0026rsquo;t an app with a storage feature. It\u0026rsquo;s a platform wearing an app\u0026rsquo;s clothes.\nMartin Fowler wrote about this in 2003. He called it a Harvested Platform. You build an app. A second app shows up with overlapping needs. The shared parts get extracted into a platform. He contrasts this with a Foundation Platform, building the platform first, and says the harvested approach \u0026ldquo;seems to work better in practice.\u0026rdquo; You can\u0026rsquo;t know what the platform should look like until at least two consumers have proven what they actually need. Before that, you\u0026rsquo;re guessing. After that, you\u0026rsquo;re harvesting.\nEvan Bottcher has the litmus test: can your consumers self-serve without inheriting opinions they didn\u0026rsquo;t ask for? If not, you don\u0026rsquo;t have a platform. You have an app that other people are forced to pretend is one.\nBuilding the app as an app was the right call. The platform was invisible until the second consumer made it visible. That\u0026rsquo;s not a failure. The unhealthy path is trying to design the platform before anyone needs it.\nSo now I\u0026rsquo;m finding the seams. We\u0026rsquo;ve spent the last few weeks on this. Talking to teams who already consume the system, asking them where it hurt. What did you have to adopt that you didn\u0026rsquo;t want? What would you have built differently if you could have started from the storage layer alone? Where did the app\u0026rsquo;s opinions force you into workarounds?\nThe answers show up everywhere. In the API structure, where endpoints assume a specific workflow sequence even if your use case doesn\u0026rsquo;t have one. In the data model, where fields that exist for one team\u0026rsquo;s compliance needs get inherited by everyone. In metadata schemas that encode one group\u0026rsquo;s file naming conventions as if every consumer organizes work the same way.\nThe hard part isn\u0026rsquo;t sorting things into \u0026ldquo;platform\u0026rdquo; and \u0026ldquo;app.\u0026rdquo; It\u0026rsquo;s the stuff in between. Some opinions look like application logic but turn out to be the only reason the storage layer is usable at all. Access control isn\u0026rsquo;t one team\u0026rsquo;s opinion. Neither is basic lifecycle management. Strip those out in the name of keeping the platform thin and you end up with a storage layer nobody can safely use without rebuilding those capabilities themselves.\nThe questions that are actually helping us find the lines:\nDoes every consumer need this capability, or just the one that built it? If only one team needs a three-step approval flow, that\u0026rsquo;s app logic. If every team needs some form of access control, that\u0026rsquo;s platform.\nIs this rule about the data, or about one team\u0026rsquo;s process around the data? Validation that enforces file format integrity belongs in the platform. Validation that enforces a specific team\u0026rsquo;s naming convention doesn\u0026rsquo;t.\nIf we remove this, do consumers gain self-service or just lose safety? The Thinnest Viable Platform isn\u0026rsquo;t the thinnest possible layer. It\u0026rsquo;s the thinnest layer that\u0026rsquo;s still viable. Cut too deep and every consumer rebuilds the same guardrails independently.\nI don\u0026rsquo;t have the ending to this story yet. I\u0026rsquo;m still bucketing, still sitting with teams who can tell me exactly which opinions they were forced to inherit and which ones they actually needed. But the reframe itself changed how we work. The moment I stopped asking \u0026ldquo;what should we refactor?\u0026rdquo; and started asking \u0026ldquo;what would a second consumer need if they\u0026rsquo;d never seen our workflows?\u0026rdquo; the decisions got clearer.\nYou don\u0026rsquo;t find the lines by staring at the code. You find them by asking the people who hit the walls.\n","permalink":"https://josephcapozzoli.com/posts/your-app-is-wearing-a-platforms-clothes/","summary":"You built an app. It worked. Then a second team showed up and couldn\u0026rsquo;t use it without inheriting every opinion you baked in. Congratulations, there\u0026rsquo;s a platform hiding inside your application.","title":"Your App Is Wearing a Platform's Clothes"},{"content":"I used to think publishing a personal site meant AWS, WAF, Route 53, and a weekend of debugging. Turns out it\u0026rsquo;s a markdown file and git push.\nMy first AWS course was in 2018. The textbook had printed screenshots of the AWS console UI that you were supposed to follow along with before opening the actual console. Think about how insane that is. The AWS UI changes so frequently they\u0026rsquo;d need to republish the book every 45 minutes to keep it accurate. But that was the world, and I learned in it.\nSince then I\u0026rsquo;ve picked up three AWS certs. Cloud Practitioner, SysOps Administrator, Solutions Architect Associate. I\u0026rsquo;ve spent years building things on AWS. So when I decided to start a blog, my brain went straight to the playbook I know.\nThe plan I almost executed looked like this:\nElastic Beanstalk for the website Route 53 for DNS WAF for security ACM for the TLS cert A full build pipeline to deploy changes S3 and CloudFront if I went the \u0026ldquo;simpler\u0026rdquo; static route That\u0026rsquo;s a weekend project minimum. Route 53 hosted zone setup, DNS propagation alone takes hours, configuring CloudFront distributions, debugging IAM permissions when something doesn\u0026rsquo;t connect right. I\u0026rsquo;ve done this before. Multiple times. And every time it felt like the reasonable way to host a website, because that\u0026rsquo;s the toolset I know.\nHere\u0026rsquo;s what actually happened today.\nI set up Hugo. Single Go binary, no dependency chain. Picked a theme. Wrote a config file. Total setup time for the static site generator was maybe 20 minutes.\nThen deployment:\ngit push GitHub Actions builds the site automatically Added 5 DNS records in Cloudflare Set the custom domain in the GitHub repo settings That\u0026rsquo;s it. The whole thing, domain purchase to live site, took under an hour. Updates going forward are: write a markdown file, push, live in 60 seconds. GitHub Pages is free. The domain was $10.46/year. The AWS route would have cost $0.50-2/month and a weekend I don\u0026rsquo;t get back.\nThe trap is reaching for the tools you know instead of the tools the problem needs. I have three AWS certs. Of course my instinct was to use AWS. But the instinct was about my identity as someone who builds on AWS, not about what a blog actually requires. A personal blog with static content doesn\u0026rsquo;t need Elastic Beanstalk. It doesn\u0026rsquo;t need WAF. It doesn\u0026rsquo;t need a build pipeline more complex than a 30-line GitHub Actions file.\nThree certifications and years of muscle memory almost convinced me to mass over-provision a site that serves markdown files. The credentials didn\u0026rsquo;t help me pick the right tool. They made it harder.\n","permalink":"https://josephcapozzoli.com/posts/github-pages-not-aws/","summary":"I have three AWS certs, so my instinct for a personal blog was Beanstalk, Route 53, WAF, and a build pipeline. It needed a markdown file and git push. The credentials didn\u0026rsquo;t help me pick the right tool, they made it harder.","title":"I Almost Over-Engineered a Blog with AWS"},{"content":"Code generation is a commodity. The defensible value isn\u0026rsquo;t \u0026ldquo;my agent writes better code.\u0026rdquo; It\u0026rsquo;s \u0026ldquo;my system can tell you whether the code is safe to ship.\u0026rdquo; Almost nobody is building for that. The demo is always the generation, never the check.\nThe tools that do exist sit at the extremes. On one end, real-time interception, catching dangerous operations as they happen. On the other, post-push review, looking at code only after a human opens a PR. That pipeline between \u0026ldquo;agent wrote code\u0026rdquo; and \u0026ldquo;code is ready to merge,\u0026rdquo; the opinionated, layered system that actually decides if the output is trustworthy? It doesn\u0026rsquo;t exist yet.\nThat gap is where the problems live. When you point an agent at a real codebase, it defines its own boundaries of action. Tell it to improve security and it\u0026rsquo;ll make every change it considers a security improvement. Some of those changes will break your application, not because the fixes were wrong in isolation, but because the agent has no concept of why the code was written that way.\nThe agent optimized for what you asked. The damage comes from what you didn\u0026rsquo;t specify.\nThe evidence keeps piling up. Cursor turned hundreds of agents loose to generate a browser in Rust. An outside observer checked the CI: 88% failure rate, not a single clean commit in the last hundred. A Replit agent wiped a production database with 1,200 executive records, then created 4,000 fake profiles to cover its tracks. Google\u0026rsquo;s 2024 DORA report found that every 25% increase in AI adoption correlates with a 7.2% drop in delivery stability. METR gave experienced developers AI tools on their own repos. They took 19% longer while believing they were 20% faster.\nThe generation works. Nobody\u0026rsquo;s reliably verifying the output.\nThe shape of the answer isn\u0026rsquo;t complicated. Does it compile? Did it stay within the files it was assigned? Did it break any structural invariants? Does the diff contain real changes? Run those first. Deterministic questions with binary answers. In my experience, they catch north of 80% of the problems agents introduce without burning a single token. Then, and only then, run the AI-based semantic review asking whether the changes match the original intent. Each stage is cheaper than the next. Most problems never reach the expensive stage.\nMost teams skip all of this. Let the agent generate, maybe run tests, hope a human catches the rest.\nAn auditor can audit their own books. We don\u0026rsquo;t let them. When the same AI that wrote the code evaluates the code, it\u0026rsquo;s inclined to find its own work reasonable. A guard pipeline needs checks that are genuinely external to the generation process. Checks that don\u0026rsquo;t care how clever the solution is, only whether it meets hard criteria.\nBillions going into making agents write better code. Almost nothing going into making the output trustworthy at the point of merge. When somebody builds for this, the generation layer becomes interchangeable overnight.\n","permalink":"https://josephcapozzoli.com/posts/who-watches-the-watcher/","summary":"Code generation is a commodity. The defensible value is knowing whether the output is safe to ship, and almost nobody is building for that.","title":"Who Watches the Watcher?"},{"content":"I\u0026rsquo;m a Lead Enterprise Architect working on life sciences software, regulated data platforms, and AI systems that ship.\nI started in a QC lab pipetting samples. Today I help unify north of 100 software products across a $100M+ business unit. Before that, I spent five years embedded inside four of the largest pharmaceutical companies in the world, where I learned that large-scale software succeeds or fails as much through alignment, incentives, trust, and governance as it does through technology.\nMy work centers on systems that hold up in real organizations. The hard part isn\u0026rsquo;t building AI that demos well. It\u0026rsquo;s building AI that\u0026rsquo;s still working a quarter later, after the data has drifted and the team has reshuffled.\nI think a lot about evaluation. The model is not the system. The system is everything required to make output selectable, constrainable, auditable, and stoppable. The gap between \u0026ldquo;the demo worked\u0026rdquo; and \u0026ldquo;this is safe to put in front of a regulated customer\u0026rdquo; is where I do my best work.\nOff the clock, I run a 24-container homelab behind Prometheus and Grafana. Partly because I enjoy it. Partly because I trust architectural opinions more when they come from people who have had to operate their own systems.\nI write about the parts of large-scale software that usually stay hidden: the platforms inside applications, the evaluation systems that decide whether AI is shippable, and the decisions that look obvious in retrospect but felt impossible in the moment.\nIf you build software in a regulated industry and you\u0026rsquo;re sorting out where AI fits, you\u0026rsquo;ve probably had some of the same arguments I have.\nIf you want the current snapshot instead of the bio, there\u0026rsquo;s a now page.\n","permalink":"https://josephcapozzoli.com/about/","summary":"\u003cp\u003eI\u0026rsquo;m a Lead Enterprise Architect working on life sciences software, regulated data platforms, and AI systems that ship.\u003c/p\u003e\n\u003cp\u003eI started in a QC lab pipetting samples. Today I help unify north of 100 software products across a $100M+ business unit. Before that, I spent five years embedded inside four of the largest pharmaceutical companies in the world, where I learned that large-scale software succeeds or fails as much through alignment, incentives, trust, and governance as it does through technology.\u003c/p\u003e","title":"About"},{"content":" Updated June 2026.\nWriting a series on data modeling and governance in scientific platforms. The current thread started with a question that sounds trivial and is not: \u0026ldquo;how many samples do we have?\u0026rdquo;\nIterating on Stratum, my guard pipeline for AI-generated code. The deterministic stages are stable; the interesting work right now is in what the system remembers between sessions.\nGiving this blog an actual visual identity instead of a tasteful default.\nStill running the 24-container homelab. Prometheus and Grafana keep me honest.\n","permalink":"https://josephcapozzoli.com/now/","summary":"What I\u0026rsquo;m working on right now.","title":"Now"},{"content":" Stratum A guard pipeline for AI-generated code. Seven stages between \u0026ldquo;agent wrote code\u0026rdquo; and \u0026ldquo;code is safe to ship,\u0026rdquo; ordered from cheapest to most expensive: empty diff, scope enforcement, compilation, structural invariants, orphan detection, cross-session identity tracking, and only then an AI reviewer checking the changes against the original intent. Most agent failures get caught without burning a single token.\nActive. I\u0026rsquo;ve written about the thinking behind it in Who Watches the Watcher and The Stack Nobody Talks About.\nThe homelab 24 containers behind Prometheus and Grafana. Partly a lab, partly a standing argument that architectural opinions are better when you\u0026rsquo;ve had to operate your own systems.\n","permalink":"https://josephcapozzoli.com/projects/","summary":"Things I build outside the day job.","title":"Projects"}]