8  The OpenWebUI Action MVP

8.1 Overview

Our v1 MVP is implemented as an OpenWebUI Action that enables opt-in data contribution directly from the chat interface to a HuggingFace dataset repository. This approach leverages OpenWebUI’s existing infrastructure and user accounts, eliminating the need for a separate flywheel website.

The goal is to obtain the benefits of using a source control backend (in this case, git on HuggingFace) but try to mitigate the additional friction / UX challenges associated with using git or other “high-effort” source control approaches.

8.2 Architecture

The flywheel consists of:

  • Frontend: OpenWebUI instance at https://chat.publicai.co/
  • Action Plugin: Python-based OpenWebUI action that handles contributions
  • User Settings: OpenWebUI’s “User Valves” for persistent preferences
  • Data Storage: Private HuggingFace dataset repository for staging / quaratine. Public but user agreement gated HuggingFace dataset repository for approved data.
  • Processing Pipeline: Asynchronous scripts that process the waiting room
  • Static site: Static site with anti-scraping for direct download.

8.3 How Contribution Works

8.3.1 User Setup (One-time)

  1. User creates an OpenWebUI account
  2. User opens Controls / Valves / Functions
  3. User toggles Sharing Enabled to ON
  4. User selects:
    • License: CC0-1.0, CC-BY-4.0, or CC-BY-SA-4.0
    • AI Preference Signal: IETF/CC preference (e.g., train-genai=n;exceptions=cc-cr)
    • Pseudonym: Use username, anonymous, or custom name
    • Auto-feedback: Whether to skip feedback prompts

8.3.2 Contributing a Chat

  1. User has a conversation with any model
  2. User triggers the “Public AI Data Flywheel” action
  3. System shows current settings and asks for confirmation
  4. Optional: User provides more feedback or context
  5. Action creates a contribution JSON with:
    • Conversation messages
    • Metadata (model, tokens, timestamp)
    • User’s license and AI preference selections
    • Attribution (based on pseudonym setting)
    • Contributor hash (anonymized ID)
  6. Contribution uploads to _waiting_room/ in HuggingFace repo

8.3.3 Data Processing Pipeline

  1. Waiting Room: Contributions land in _waiting_room/ directory
  2. Validation: Daily script processes pending files
  3. PII Redaction: Automated check for emails, SSNs, phone numbers, etc.
  4. Quarantine: Files with PII hits or errors go to _quarantined/
  5. Release: Clean contributions move to “ready” directory
  6. Distribution: Published via gated HF repo and public gallery

8.4 Key Features

8.4.1 Privacy & Attribution

  • Contributor Hash: SHA256 hash of salt + "openwebui:" + user_id (16 chars)
  • Pseudonymity: Users choose between username, anonymous, or custom pseudonym
  • Avoid unintended PII in final dataset: Some automated redaction before release

8.4.2 User Control

  • Opt-in only: Requires explicit enabling in settings
  • Per-contribution consent: Confirmation before each share
  • Persistent preferences: Settings saved in User Valves
  • License selection: Per-user default (not per-contribution in v1)

8.4.3 Safety Features

  • Mock mode: Test contributions without actual upload
  • PII detection: Email, IP, SSN, IBAN, crypto wallets, phone, credit cards
  • Quarantine system: Content can be held for review
  • Rate limiting: Handled by OpenWebUI’s existing limits