Augmentation is a data flow problem

A short argument that "augment, do not replace" is about data control.

2026-05-26

Figures in the AI industry have been saying things along the lines of this: “We want to build AI to augment humans -- without replacing humans!” I think the idea is great. But so far, we haven’t seen much evidence that expressing pro-augmentation views has led to commitments to making this happen. Perhaps these statements are shaping internal priorities (more compute for interface experimentation, more product work on copilot tools, more attention to human-AI workflows, etc.), but we can't be sure.

Allocating more resources toward interface-focused research could produce tools that are better at augmentation. However, I do not think that making AI systems better at augmenting workers will, by itself, prevent substitution or replacement. In fact, augmenting systems may accelerate replacement in domains that are currently data-scarce.

When a worker uses an AI system to perform some task requiring what we might now think of as "human stuff" (judgment, taste, domain knowledge, private context, etc.) that worker produces rich workflow traces and outcome data. If traces/outcomes are captured by the upstream AI developer providing the model (or captured by the worker's employer and then passed on to the AI developer), the worker has produced training and/or evaluation data. The next model will be better at doing that task with less human input. The worker’s marginal contribution and bargaining power fall, even though the original system was “augmenting” at deployment time.

If we really believe in data scaling, we should expect scaling to apply to any capability domain that can be captured in data records. Many areas where models are currently bad are areas where data is harder to get. But if we had the data, and no countervailing forces, why wouldn't models be able to learn the necessary patterns to capture judgment, taste, and domain knowledge? (I do not mean to argue that every social or relational dimension of work will disappear; people may continue to value human presence, accountability, care, etc., but even these aspects of labor will not be immune to data scaling.)

One tempting response: we should simply impose “augment-only” rules at the modelling level: build systems that assist workers but are somehow prevented from replacing them. I do not think this is technically coherent. Once a model has learned a capability, it is very hard to guarantee that the capability will only be used to complement human labor rather than substitute for it. Data that makes a system useful as a copilot will also make it useful as a replacement. “Augment, don’t replace” cannot be secured primarily through constraints or norms around model building. I think it has to be secured through constraints on data capture and use.

So I'd contend: an AI system can be stably augmentative if and only if the system is deployed in a way that preserves meaningful control over use-time information and rights over downstream training/evaluation data. Many efforts to build augmenting systems -- with the best of intentions -- will directly support replacement unless they somehow restrict the flow of data. This friction could come in the form of increased individual data rights and/or an approach emphasizing data intermediaries and collective bargaining.

Source revision history

Selected Git commits that changed this source file.

4b26d764e6 2026-07-13 - Clarify why AI augmentation depends on data control
9fb4674b8a 2026-07-12 - Migrate blog into digital presence monorepo

Source and AT Protocol record

Source path
content/writing/short-posts/2026-05-26-augmentation-is-a-data-flow-problem.md

AT Protocol URI
at://did:plc:doxvahqvyhyqf32v7wz7p5xk/site.standard.document/3mni4cwedk57p

Local AT Protocol-shaped preview used to inspect the record before an exact public cache is refreshed.

{
  "note": "Local AT Protocol-shaped preview. Run `make garden-refresh-atproto` to cache exact public records where available.",
  "sourcePath": "content/writing/short-posts/2026-05-26-augmentation-is-a-data-flow-problem.md",
  "uri": "at://did:plc:doxvahqvyhyqf32v7wz7p5xk/site.standard.document/3mni4cwedk57p",
  "value": {
    "$type": "site.standard.document",
    "title": "Augmentation is a data flow problem",
    "description": "A short argument that \"augment, do not replace\" is about data control.",
    "publishedAt": "2026-05-26",
    "site": "at://did:plc:doxvahqvyhyqf32v7wz7p5xk/site.standard.publication/3mmpcciuaj22a",
    "content": {
      "$type": "at.markpub.markdown",
      "text": "Figures in the AI industry have been saying things along the lines of this: “We want to build AI to augment humans -- without replacing humans!” I think the idea is great. But so far, we haven’t seen much evidence that expressing pro-augmentation views has led to commitments to making this happen. Perhaps these statements are shaping internal priorities (more compute for interface experimentation, more product work on copilot tools, more attention to human-AI workflows, etc.), but we can't be sure.\n\nAllocating more resources toward interface-focused research could produce tools that are better at augmentation. However, I do _not_ think that making AI systems better at augmenting workers will, by itself, prevent substitution or replacement. In fact, augmenting systems may accelerate replacement in domains that are currently data-scarce.\n\nWhen a worker uses an AI system to perform some task requiring what we might now think of as \"human stuff\" (judgment, taste, domain knowledge, private context, etc.) that worker produces rich workflow traces and outcome data. If traces/outcomes are captured by the upstream AI developer providing the model (or captured by the worker's employer and then passed on to the AI developer), the worker has produced training and/or evaluation data. The next model will be better at doing that task with less human input. The worker’s marginal contribution and bargaining power fall, even though the original system was “augmenting” at deployment time.\n\nIf we really believe in data scaling, we should expect scaling to apply to any capability domain that can be captured in data records. Many areas where models are currently bad are areas where data is harder to get. But if we had the data, and no countervailing forces, why wouldn't models be able to learn the necessary patterns to capture judgment, taste, and domain knowledge? (I do not mean to argue that every social or relational dimension of work will disappear; people may continue to value human presence, accountability, care, etc., but even these aspects of labor will not be immune to data scaling.)\n\nOne tempting response: we should simply impose “augment-only” rules at the modelling level: build systems that assist workers but are somehow prevented from replacing them. I do not think this is technically coherent. Once a model has learned a capability, it is very hard to guarantee that the capability will only be used to complement human labor rather than substitute for it. Data that makes a system useful as a copilot will also make it useful as a replacement. “Augment, don’t replace” cannot be secured primarily through constraints or norms around model building. I think it *has* to be secured through constraints on data capture and use.\n\nSo I'd contend: an AI system can be stably augmentative if and only if the system is deployed in a way that preserves meaningful control over use-time information and rights over downstream training/evaluation data. Many efforts to build augmenting systems -- with the best of intentions -- will directly support replacement unless they somehow restrict the flow of data. This friction could come in the form of increased individual data rights and/or an approach emphasizing data intermediaries and collective bargaining.\n"
    }
  }
}