8 The OpenWebUI Action MVP
8.1 Overview
Our v1 MVP is implemented as an OpenWebUI Action that enables opt-in data contribution directly from the chat interface to a HuggingFace dataset repository. This approach leverages OpenWebUI’s existing infrastructure and user accounts, eliminating the need for a separate flywheel website.
The goal is to obtain the benefits of using a source control backend (in this case, git on HuggingFace) but try to mitigate the additional friction / UX challenges associated with using git or other “high-effort” source control approaches.
8.2 Architecture
The flywheel consists of:
- Frontend: OpenWebUI instance at https://chat.publicai.co/
- Action Plugin: Python-based OpenWebUI action that handles contributions
- User Settings: OpenWebUI’s “User Valves” for persistent preferences
- Data Storage: Private HuggingFace dataset repository for staging / quaratine. Public but user agreement gated HuggingFace dataset repository for approved data.
- Processing Pipeline: Asynchronous scripts that process the waiting room
- Static site: Static site with anti-scraping for direct download.
8.3 How Contribution Works
8.3.1 User Setup (One-time)
- User creates an OpenWebUI account
- User opens Controls / Valves / Functions
- User toggles Sharing Enabled to ON
- User selects:
- License: CC0-1.0, CC-BY-4.0, or CC-BY-SA-4.0
- AI Preference Signal: IETF/CC preference (e.g.,
train-genai=n;exceptions=cc-cr
) - Pseudonym: Use username, anonymous, or custom name
- Auto-feedback: Whether to skip feedback prompts
8.3.2 Contributing a Chat
- User has a conversation with any model
- User triggers the “Public AI Data Flywheel” action
- System shows current settings and asks for confirmation
- Optional: User provides more feedback or context
- Action creates a contribution JSON with:
- Conversation messages
- Metadata (model, tokens, timestamp)
- User’s license and AI preference selections
- Attribution (based on pseudonym setting)
- Contributor hash (anonymized ID)
- Contribution uploads to
_waiting_room/
in HuggingFace repo
8.3.3 Data Processing Pipeline
- Waiting Room: Contributions land in
_waiting_room/
directory - Validation: Daily script processes pending files
- PII Redaction: Automated check for emails, SSNs, phone numbers, etc.
- Quarantine: Files with PII hits or errors go to
_quarantined/
- Release: Clean contributions move to “ready” directory
- Distribution: Published via gated HF repo and public gallery
8.4 Key Features
8.4.1 Privacy & Attribution
- Contributor Hash: SHA256 hash of
salt + "openwebui:" + user_id
(16 chars) - Pseudonymity: Users choose between username, anonymous, or custom pseudonym
- Avoid unintended PII in final dataset: Some automated redaction before release
8.4.2 User Control
- Opt-in only: Requires explicit enabling in settings
- Per-contribution consent: Confirmation before each share
- Persistent preferences: Settings saved in User Valves
- License selection: Per-user default (not per-contribution in v1)
8.4.3 Safety Features
- Mock mode: Test contributions without actual upload
- PII detection: Email, IP, SSN, IBAN, crypto wallets, phone, credit cards
- Quarantine system: Content can be held for review
- Rate limiting: Handled by OpenWebUI’s existing limits