Data Leverage Digital Garden
Long Posts
36 total-
The AI "Evaluation Crisis" Is an Opportunity to Get Data Flow Right
Why the AI evaluation crisis could force a reckoning on dataset provenance, attribution, and consent.
-
Attestation across the AI Supply Chain
A proposal for interoperable attestation objects that connect training data, evaluation labor, and AI-generated outputs across the AI supply chain.
-
AI is driving the cost of polish down; some musings on fancy versus terse artifacts
AI progress means the "polish" of a figure or website no longer proxies for quality. Can we try to turn this into a good thing for curation, attention allocation, and even AI progress itself?
-
A Short Guide to Data Strikes and Conscious Data Contribution in the Context of 2026 Frontier AI
Back to the basics of data leverage.
-
The Paradox of Reuse in 2026: A Case of Quasi-Enclosure, or "Subsidized Club Goods that Sort of Look Like Public Goods"
How we can understand, and react to, the complicated impacts of AI systems on online communities and knowledge commons
-
The Coding Agent Data Deal
On user data control, coding agents as retrievers, and the value of your coding transcripts
-
Coding agents are (1) a big deal, (2) very relevant to data leverage, and (3) able to help build tools that support data leverage!
Sharing an early reaction to recent coding agent discourse and two relevant projects
-
Almost Everybody -- Including Both Data Creators and AI Companies -- Stands to Benefit from Clearer "Data Rules".
In fact, anyone who doesn''t think they will be a "big winner" long term benefits from clear rules, even if it means training data costs more in the short term.
Focus Posts
3 total-
N-gram search as posterior updating for data attribution
A short argument that model outputs and training-data priors can improve rough approximations of data attribution.
-
Augmentation is a data flow problem
A short argument that "augment, do not replace" is about data control.
-
AI progress as quasi-public good production
A short argument that AI systems pool diffuse human work into privately governed, public-good-like model weights.
Short Reactions
4 total-
"People First" Policy Ideas that Complement Each Other (through better data flow)
Reacting to a wide-ranging set of policy ideas from OpenAI.
-
Two natural allies of a "Data Transparency" agenda: capabilities forecasters and social simulators
Making an "if you like X, you might want to support Y" argument for data-focused policy
-
[microblog] One book is worth "0.06%" benchmark points to AI; is "no different from noise". What gives?
Commenting on recent coverage of, and discussion about, Meta''s arguments about training data value quantification.
-
Perplexity CEO's Interaction with Striking New York Times Workers Does Not Reflect Well on the AI Industry
The idea that data-dependent AI systems are ready and willing to crush any leverage from knowledge workers is unlikely to make the AI industry look good to the public.
Meta Notes
2 total-
How I am publishing right now
A rough map of my current local Markdown to Leaflet, Substack, and social-post workflow.
- April 2026 small points