5 Flywheel design space

Key insight: There is a broad spectrum of technical implementation of the flywheel, ranging from traditional database-on-a-company-server to low-friction-peer-production (our preferred MVP) to radical approaches (e.g. truly federated data acess).

5.1 Purpose of this section

This section gives more context about the many ways we might built flywheels, and lays out alternative governance paths and a future work (in particular, a focus on futures that involve healthy data markets, data intermediaries, federated learning, etc.)

We also discuss why we think an approach that includes a minimal retention frontend + opt-in flyhweel platform can serve as a pragmatic bridge to more advanced approaches. For instance, we can use the patterns and concepts discussed here to move towards independently governed data co-ops, eventual federated learning, etc.

5.2 More on all the other approaches we could’ve taken

First, let’s lay out a toy model of data “creation” and “flow” (this will come again Part 2, when we walk through the flow for a real flywheel app).

In Chapter 2 we talked about the numerous combinations of sensors, forms, task settings, social structure from institutions, communities, etc. that might exist, and in the Appendix we discuss a number of formats and types of data for LLMs in particular.

To summarize, an AI developers might collect or use some of the following kinds of data:

Simple Signal: Binary feedback (👍/👎), star ratings, or flags
Annotated Conversation: Full chat with user corrections, ratings, or notes
Preference Pair: A/B comparisons between responses
Examples: User-created prompts and ideal responses
Structured Feedback: Form-based input (error type, severity, correction)
Multimodal Bundle: Text + images + voice + metadata
More advanced structured data …

Further, the creation of data could be prompted at several points in time:

Proactive: User initiates contribution unprompted (e.g., “Share this chat” button)
Reactive: System prompts based on signals (e.g., after trigger word, patterns in usage behaviour, ask “What went wrong?”)
Passive: Automatic collection with prior consent (e.g., telemetry, browser extension)
Scheduled: Regular prompts (e.g., weekly “best conversations” review)
Task-Based: Specific requests for data types (e.g., “Help us improve math responses”)

This choice will likely impact the level of “friction” users experience, roughly:

Zero-Friction: Purely passive
Almost zero-friction: Purely passive with some regular re-consenting process (monthly or yearly “checkup” on sharing settings)
Low-Friction: One-click actions with no interruption
Medium-Friction: Multi-click actions or actions that redirect to separate interface
High-Friction: Multi-step process, account creation, or technical skills required

Data might also be processed at one or more points in time (In practice, there is likely be some degree of “processing” at various steps, but it is important to clarify this to users):

Pre-submission: Client-side processing before data leaves user’s device
On-submission: Real-time processing during the contribution flow
Post-submission: Batch processing after data is received
Pre-publication: Review and processing before making data public
On-demand: Processing happens when data is accessed/downloaded

So, person visits an AI interface (e.g. visits a chatbot product on a website). They sit down, enter a prompt, and then react to the Output (take the information and do something with it, follow up, leave positive or negative feedback, etc.). This is our canonical object of interest: a prompt (“Input”), response (“Output”), and optional follow up data (feedback, more queries and responses, etc.).

Typically, this data must live, for some time, on the user’s device. It must also be processed by an AI model (“inference”), which involves sending a payload to a hosted service or some local endpoint (if e.g. user is running open weights on their own device). It may or may not be stored on the server/system (we’ll use these interchangeably for now to refer to all the devices controlled by the organization running each module) where the interface is hosted. It may or may not be stored by the server/system where the model is hosted. And finally, a flywheel may send that data to a third location.

This final data could live in a centralized database (e.g. traditional relational database), a public repository (e.g. GitHub, HuggingFace), totally local, or even in some kind of distributed network (IPFS, BitTorrent).

Finally, the resulting flywheel-produced data might be accessed in a number of ways:

Direct Download: Raw access to complete dataset (with rate limits)
API Access: Programmatic access with authentication and quotas
Static Site: Read-only web interface with anti-scraping measures
Gated Access: Application/approval process for researchers
Hybrid Access: Public samples + gated full access, or public metadata + restricted content
Streaming Access: Real-time feeds for continuous model training

So we have five useful questions for classifying flywheel designs:

Where data lives: …
When prompted: …
When processed: …
How accessed: …
Friction level: …

5.3 Some Categories of Architectural Models

With all these design choices in mind, it will be useful to describe the general approaches we might take to build a data flywheel.

5.3.1 Standard “PrivateCo” Web App

An obvious option is to simply build a hosted “standard” “PrivateCo” / start-up style web app. If Netflix is successful because of its flywheel, why not just build a public AI data flywheel that looks like a private tech company’s product from a technical perspective? Indeed, in some contexts it may make sense to skip building an opt-in flyhweel and simply use the data generated by users directly for training, eval, etc. In this case, there is no “third location” needed; just read data from the existing prod database. While one could argue that the Terms of Service for many existing tech products do make these products “opt in” in some sense, there are also serious downsides to the status quo. Many would argue that standard practice in tech (long, difficult to read Terms of Service and Privacy policy documents; opacity about exact details of data collection and usage; general challenges in conveying the complexity of modern data pipelines) make it hard for the standard PrivateCo Web App model to offer truly informed consent for data contribution. (For more on general issues with ToS, see e.g. Fiesler, Lampe, and Bruckman 2016 #todo add more of the “classics” of this genre.)

While some users might even prefer this approach, we believe this would not be a good starting place for a public AI data flywheel. We also believe it’s important to communicate to users how the public AI interface differs standard practices (for instance, how does a public AI model differ in terms of data use from e.g. using ChatGPT, Gemini, or AI overviews via search).

The defining characteristics of this approach is that data is held by a private entity at all times. Under this approach, we can collect all types of signals, mix proactive and reactive data collection, use telemetry freely, process data whenever we want. It’s highly likely under this approach, data from a flywheel would live in centralized, privately governed database.

Answering each of the questions posed above:

Where data lives: private database
When prompted: flexible
When processed: flexible
How accessed: flexible; likely API
Friction level: flexible; likely low

It’s also likely we would want to follow corporate practices in locking down the final data, which makes this a bad choice for maximizing publicly visible output. Put simply: while an interesting idea in theory, we probably can’t run an AI product that has a prod database that is openly readable by the public.

Within the broad umbrella of taking a “Standard PrivateCo Web App” approach to data flywheels, some archetypes might include:

Telemetry heavy approach (imagine an LLM chat app with no feedback buttons, but lots of data is collected re: dwell time, conversation length, user responses, etc.)
Feedback heavy approach (imagine an LLM chat app where the UX is heavily focused on asking users to use thumbs up / thumbs down buttons, or presenting users with frequent A/B test responses)

5.3.2 Git/Wiki Platform

Another option to build a “very active flywheel” (that arguably stretches the definition because friction will be very high) is to just use peer production or version control software (a “wiki” or “git” approach) and just ask people to make their contributions using existing contribution avenues (for instance, “editing” a wiki page or making a “pull request” to a version-controlled git repository).

If we choose this approach, we do likely constrain our answers to the above questions:

Where data lives: Public repository
When prompted: Proactive (user initiates)
When processed: Pre-submission (user does it) + CI/CD validation
How accessed: Direct download via Git + web interface
Friction level: High (technical knowledge required)

This approach has maximum transparency, built-in versioning, and low cost. But, it is likely to exclude non-technical users and has very high friction even for technical users.

Example Stack: some combo of MediaWiki, GitHub, GitLab, HuggingFace + CI/CD validation

5.3.3 Web service + Git Platform

The option described in Part 2 is to use a Git/Wiki approach, but use some kind of “serverless” approach (or a more traditional app; doesn’t have to be serverless) with special endpoints that are triggered by users via low friction in-app actions (clicking a special button, entering special command, etc.) that writes to a Wiki / Git repo on the contributor’s behalf. We could also build a system so that users can effectively commit data to the source control / wiki system automatically (e.g., “Every day, run an anonymization script on my chat history and then write the output as a new file to a shared, version-controlled server”).

Where data lives: Public repository
When prompted: Proactive or reactive
When processed: On-submission via serverless function
How accessed: Git access + static site generation
Friction level: Low (automated complexity)
Pros: Transparency + usability, serverless scaling
Cons: Technical issues Cold starts, API rate limits, complex error handling
Example Stack: Vercel/Netlify + GitHub API + Hugging Face Hub

5.3.4 Federated Learning Model

One radically different approach might involve using federated learning.

Where data lives: User devices (distributed)
When prompted: Passive with consent
Information object: Model gradients or aggregated statistics
When processed: Pre-submission (on-device)
How accessed: Only aggregated model updates available
Friction level: Zero after setup
Pros: Maximum privacy, no data transfer, infinite scale
Cons: Complex implementation, limited debugging, device requirements

5.3.5 Browser Extension

We could implement a flywheel that relies on users downloading a browser extension! This only reflects a data ingestion choice: can be used with various backend choices above.

Where data lives: Centralized or distributed
When prompted: Proactive or passive
Information object: DOM captures, interaction logs, selections
When processed: Depends on backend
How accessed: Depends on storage choice
Friction level: Very low after installation

5.3.6 Export-based approach

Another idea is to build a flywheel that leverages existing export features and export mechanisms. Instead of adding feedback buttons or telemetry, flywheel designers could simply create a static site that lets users manually upload exported data from various apps. This would require manual effort (and some friction could be reduced via careful attention to UX, adding features to help standardize data, etc.) but could be powerful in jurisdictions with portability/export rights.

5.3.7 Other experimental approaches

Other approaches to building a flywheel might involve more radical approaches to decentralizing the actual data storage, for instance using peer to peer protocols, various crypto/web3 approaches to data sovereignty, etc.

5.4 Scenario Walkthroughs: A Practical Comparison

Here, we walk through two common scenarios and describe what happens (in one sentence) for each of the architectures described above.

#todo: these could be made crisper to highlight the key differences better (But also be honest about where there are similarities)

5.4.1 Scenario A: User marks a chat as “Good”, but the flywheel needs to do some checks for personally identifying information (PII) – when does processing happen?

Web App: Redirects to platform, PII scrubbed on submission, available via API after review
Git/Wiki: User removes PII manually, creates PR, instantly visible on merge
Telemetry: Signal sent, processed in real-time, only visible in aggregates
Hybrid: Signal sent immediately, full chat processed if shared
Serverless+Git: Modal appears, serverless function strips PII, PR created automatically
Federated: Local processing only, contributes to next model update
Extension: Captures state, removes PII client-side, sends to chosen backend
P2P: Processes locally, shares with peers who validate before propagating

5.4.2 Scenario B: User corrects a factual error

Web App: Editor interface, toxicity check on submission, published after human review
Git/Wiki: User edits markdown, CI/CD checks format, visible immediately on merge
Telemetry: Only captures “error” signal, no correction possible
Hybrid: Error signal triggers correction UI, correction queued for review
Serverless+Git: Inline correction, automated PII/toxicity checks, PR needs approval
Federated: Correction processed locally, differential privacy applied
Extension: Highlights error, pre-processes correction, sends to backend
P2P: Broadcasts correction, network consensus before acceptance

5.4.3 Scenario C: Accessing the contributed data

Web App: Researchers apply for API key, public sees samples on static site
Git/Wiki: Anyone can clone repo, but rate-limited through CDN
Telemetry: Only aggregated statistics available via public dashboard
Hybrid: Public can see signals dashboard, researchers apply for conversation access
Serverless+Git: Public (or gated) repo with all data, static site with search/filter
Federated: No direct data access, only model checkpoints released
Extension: Depends on backend choice, typically follows that model
P2P: Must run client to access network, can specify data sharing preferences

5.5 Frontier approaches: data cooperatives, federated learning, and more

In many cases, users may want to have data governed by community organizations (e.g., organized by domain/region/language) that hold rights and decide release cadence, licensing defaults, and benefit policies.

Practically, taking a collective/intermediary focused approach has the potential to massively reduce user friction / attention costs. One visio for a low friction data intermediary approach is: users spend some time once a year choosing which intermediaries to join. Upon joining, they can choose to delegate key decision-making and participate in intermediary governance as suits their desires and needs. If joining process is good + governance is good, can achieve good outcomes!

We note that if an implementation of the flywheel is built on top of open-source software, communities can easily choose to deploy their own instance and their own data flywheel and effectively operate entirely parallel, self-governed instances. If they also choose to share opt-in data via similar licensing and preference signal approaches, such datasets could be easily merged – but with fine-grained adjustments to precise details (e.g., slight modifications on retention, access, release cadence, content moderation, adn so on.) Of course, data co-ops may choose to use quite different technical stacks. This approach is just one among many.

5.5.1 Transitioning for opt-in flywheel to federated learning

It may be possible to also move from an opt-in data flywheel approach to a federated learning-first approach. Here, model training occurs across user or institutional nodes; only gradients/updates (with privacy tech) are centralized. The dataset remains partitioned or local; central custodian minimized. This approach would:

Reduces central data custody and breach surface
Aligns with data-residency and institutional constraints
Enables “learning from data that can’t leave”

But has some major downsides / existing barriers:

Harder reproducibility and data auditability
Complex privacy stack (secure aggregation, DP, client attestation)
Benchmarking must be redesigned (federated eval)

This is a bigger leap, but we believe it’s important to begin to think about how the implementation of the Public AI Data Flywheels might support communities wishing to transition towards an FL approach.

One rough sketch might look like: * Build the MVP defined in Chapter 2 * Ship license + AI-preference metadata (MVP). * Maintain gated HF releases and public leaderboards/full data access. * Publish provider-payload transparency and link to provider terms (no guarantees). * Process deletions via HF mechanisms when possible; keep our mirrors in sync. * Phase 1 — Co-op pilots * Charter one or two community co-ops; define bylaws, scope, and release cadence. * Spin up many instances of interface + flywheel combos (can fork software directly, or use similar approaches) * Establish a concrete sharing / merging plan * And beyond! * Once several independent data communities, are operated, it might be possible to move from lightweight sharing and merging to more serious federation with technical guarantees. Perhaps this might start with federated evaluation and then move to federated training. Much more to do here, out of scope for this document.