The Unified Desktop & Mobile Ecosystem for Voice, AI, Connectivity, and Automation
01
Whim is a comprehensive desktop and mobile ecosystem built in Python (Tkinter) that unifies voice recording, AI-powered conversation, messaging, IoT automation, screen sharing, and document editing into a single dark-themed application. It runs on Linux (CARRARA Mint) and connects to Samsung Android devices via a reverse SSH tunnel through a public VPS, LAN, or USB/ADB.
02
Whim uses a multi-layered connectivity architecture to bridge the desktop application with mobile devices, messaging platforms, and IoT infrastructure.
System Architecture & Network Topology (PlantUML)
| Channel | Protocol | Default Port | Purpose |
|---|---|---|---|
| OpenClaw Gateway | WebSocket | 18789 | Core command bus for chat, approvals, sessions, presence |
| Journal Ingest / Whim.m | HTTP (multipart) | 8088 | Voice recording upload, Whim.m PWA, AI chat proxy |
| Screen Share | HTTP (MJPEG stream) | 8091 | Desktop-to-phone screen share + phone camera feed |
| SSH Tunnel (autossh) | Reverse SSH via VPS | 8089 | Primary: secure cross-network connectivity via VPS at 104.207.140.242 |
| Tailscale (fallback) | WireGuard mesh VPN | 8089 | Fallback: direct connection via Tailscale IP 100.69.17.20, handles WiFi↔cellular handoffs |
| Ollama | HTTP REST | 11434 | Local LLM inference (streaming chat completions) |
| Signal CLI | HTTP | 8080 | Signal messenger send/receive via signal-cli |
| ADB | USB / TCP | 5555 | Android device management, APK install, screenshots |
| Singleton Lock | TCP | 48891 | Prevents duplicate Whim instances, sends SHOW signal |
The application header provides real-time connectivity controls:
ws://127.0.0.1:18789)All 17 module tabs are split across two rows for better visibility and click targets. Row 1 contains the primary workflow tabs (Chat through TRV Cipher). Row 2 contains utility and configuration tabs (Library through Settings). The active tab is highlighted in green with a border accent. Tabs are button-based with hover effects.
03
Whim organizes its functionality into 14 dedicated tabs, each serving a specific domain within the ecosystem.
| Tab | Internal Key | Description |
|---|---|---|
| CHAT | chat | Direct command-line chat with the OpenClaw gateway. Send messages, abort tasks, view real-time WebSocket traffic. |
| WHIM.AI | whimai | Full AI console with local Ollama LLM, presets, observability, context metering, tool trace, output templates, and capture tools. |
| SMARTTHINGS | smartthings | Samsung SmartThings device browser with scan, filter, favorites, device detail, and recently-controlled history. |
| AVR LAB | xtts | XTTS v2 voice synthesis lab with speaker references, spectrogram visualization, and Table Reads output. |
| VOICE ENGINE | voice_engine | Wake word tuning: live spectrogram (Whim-Scope), gain/HPF/AGC/parametric EQ, sensitivity, VAD, spectral subtraction, confidence ghost bar, intelligibility band. |
| PERSONA | persona | Voice personality manager: coined response playlists per voice clone, confidence-gated, context-aware, XTTS pre-render pipeline, behavioral categories. |
| TRV CIPHER | hearmeout | Audio transcription workstation with spectrogram, playback transport, Whisper transcription, ODT export, and scrub tools. |
| GEOF | geof | Geofence tracker with canvas map, collar status table, LoRa bridge integration, 20-minute heartbeat monitor, and fence pin management for livestock tracking. |
| NODEFLOW | nodeflow | Visual node-based flow editor showing active droids, LLM reasoning, OpenClaw telemetry, and data flow connections with drag-and-drop canvas, auto-poll, and node inspector. |
| ARCHIVE | archive | Rich text editor with formatting toolbar, font selection, alignment, bullet lists, find/replace, word count, and file browser. |
| SS | ss | Screen share server with QR code, phone camera feed, desktop preview, FPS/quality settings, and MJPEG streaming. |
| SETTINGS | settings | API keys & endpoints (Ollama, OpenAI, SmartThings, Notion), model management (pull/delete from Ollama), app preferences, theme, paths. |
04
Whim.AI is the central intelligence hub. It connects to a local Ollama instance for streaming chat completions with full observability.
| Preset | Model | Context | Temperature | Tools | System Prompt |
|---|---|---|---|---|---|
| Default | llama3.1:8b-16k | 16384 | 0.7 | all | (none) |
| Creative | llama3.1:8b-16k | 16384 | 1.2 | all | Creative writing assistant |
| Code | llama3.1:8b-16k | 16384 | 0.2 | code | Concise code assistant |
| Analyst | llama3.1:8b-16k | 8192 | 0.3 | search,calc | Data analyst |
| Minimal | llama3.1:8b-16k | 4096 | 0.5 | none | As few words as possible |
Real-time performance telemetry including:
One-click templates for structured content: Weekly Recap, Meeting Summary, Script Draft, Debug Report.
The right panel lists all available commands organized into categories:
05
Whim.m v3.4 is a full-featured mobile companion app served as a native APK or Progressive Web App (PWA) from either the desktop Whim app or a standalone Python server. It provides voice recording, AI chat (via Ollama proxy), wake word voice commands, a cross-device file library, and inter-device messaging — all organized into five dedicated bottom navigation tabs: REC, LIBRARY, CHAT, WAKE, and DEVICES. A persistent "Listening for Hey Whim" banner runs across the top of every tab, giving always-on wake word access regardless of which tab is active. The app uses a hybrid connection strategy: VPS tunnel by default with Tailscale as an opt-in fallback, switchable from a dropdown in the mobile UI or from the desktop Control Panel.
open_maps — Open Organic Maps for navigationopen_app — Launch any app directly on the device via Android intentplay_music — Open YouTube Music with optional search queryWhen a voice command is recognized, Whim.m embeds a whim-cmd JSON block in the AI response to trigger device actions:
Output will show LAN IP and VPS tunnel URL for connecting from your phone.
| File | Size | Description |
|---|---|---|
| whim_m_v3.4_phone.apk | ~61 KB | WebView wrapper for phones (Samsung Galaxy S22, Galaxy S9) |
| whim_m_v3.4_tablet.apk | ~61 KB | WebView wrapper for tablet (Lenovo TB311FU) |
06
Whim generates QR codes in two locations to enable instant phone-to-desktop connectivity without typing URLs:
Clicking "Upload from Phone" in the TRV Cipher tab opens a modal dialog with:
http://192.168.1.100:8088)
The QR code is generated using the qrcode Python library with error correction level M, rendered as a pixel-perfect canvas grid.
The SS tab has a dedicated QR CODE card in the left column that displays a QR code for the screen share server URL. When the server starts, the QR is generated via qrcode.make() and displayed on a Tkinter canvas, scaled to fit the available space.
Use a COLON before the port number, not a period.
Correct: http://192.168.1.100:8088
Wrong: http://192.168.1.100.8088
07
Whim uses a hybrid two-mode connection strategy: a reverse SSH tunnel through a public VPS as the always-on primary, with Tailscale retained as an opt-in fallback for situations where rock-solid stability is needed (e.g., WiFi↔cellular handoffs). The mode is switchable from the mobile app or the desktop Control Panel.
| Mode | Default | How It Works |
|---|---|---|
| VPS Tunnel | YES | Phone → VPS:8089 → SSH tunnel → PC:8089. Works everywhere, no VPN client needed on phone. |
| Tailscale | OPT-IN | Phone → 100.69.17.20:8089 direct via WireGuard mesh. Handles network transitions natively. Requires Tailscale on phone. |
| Auto-detect | OPT-IN | On connection, checks if Tailscale IP is reachable; uses it if available, otherwise falls back to VPS tunnel. |
/connection_mode APIconfig/connection_mode.jsonWhen any connection drops, the mobile client automatically retries:
fetchWithRetry (up to 3 attempts per request)| Device | Tailscale IP | LAN IP |
|---|---|---|
| PC (carraramint) | 100.69.17.20 | 192.168.1.231 |
| Samsung Galaxy S9 | 100.97.96.1 | 192.168.1.198 |
| Samsung Galaxy S22 | 100.77.59.2 | — |
| Lenovo TB311FU (Tablet) | 100.64.255.124 | 192.168.1.112 |
VPS Tunnel (default): The CARRARA desktop opens an outbound SSH connection to a public VPS. The VPS accepts inbound connections from mobile devices and forwards them through the tunnel back to the desktop. No ports need to be opened on the home router.
Tailscale (fallback): When enabled, phones connect directly to the PC's Tailscale IP (100.69.17.20) via WireGuard mesh. This is more stable during WiFi↔cellular transitions because Tailscale handles NAT traversal and connection migration natively.
| Item | Value |
|---|---|
| VPS | 104.207.140.242 (Vultr) |
| Tunnel port | 8089 |
| Service | whim-tunnel.service (systemd, starts on boot) |
| Tool | autossh (auto-reconnects on failure) |
| Auth | SSH key only (~/.ssh/id_ed25519), passwords disabled |
| Firewall | ufw: ports 22 (SSH) + 8089 (tunnel) |
| sshd config | GatewayPorts yes |
The tunnel runs as a persistent systemd service on CARRARA:
The Whim Terminal header bar displays two auto-updating status dots that poll every 10 seconds:
| Indicator | Green | Red/Grey |
|---|---|---|
| Tunnel | whim-tunnel.service active AND VPS:8089 reachable | Service down or VPS unreachable |
| Whim | Whim.m server responding on localhost:8089 | Server not running |
| Tailscale | Tailscale daemon running (BackendState: Running) | Tailscale stopped or not installed |
| Ollama | Ollama responding on localhost:11434 | Ollama not running |
Whim also displays a system tray icon with three states:
| State | Icon Color | Tray Tooltip |
|---|---|---|
| Tunnel down | Grey | Tunnel: Down | Whim: Offline |
| Tunnel up, Whim unreachable | Yellow | Tunnel: Connected | Whim: Offline |
| Both connected | Green | Tunnel: Connected | Whim: Online |
The Whim.m mobile app health bar shows five indicators: tunnel, server, mic, ollama, and TS (Tailscale). The tunnel dot turns green when the phone can reach the Whim server through the VPS, confirming end-to-end tunnel connectivity. The TS dot turns green when Tailscale is running on the PC.
A connection mode dropdown in the top-right corner allows switching between VPS Tunnel, Tailscale, and Auto-detect modes. When disconnected, a pulsing red banner appears: "Connection lost — reconnecting..." with automatic exponential backoff retries.
When the tunnel is active, the Whim.m standalone server prints the VPS URL alongside the LAN IP:
ssh -v -R 8089:localhost:8089 root@104.207.140.242 -Nsudo lsof -i :8089sudo systemctl status whim-tunnel.servicesudo systemctl restart whim-tunnel.service
08
Whim uses a fully local AI inference stack powered by Ollama. All models run on the CARRARA machine's GPU with no external API calls.
| Model | Role | Context | Notes |
|---|---|---|---|
| DeepSeek R1:32B | Primary agent model | Varies | Default model for OpenClaw gateway agents. Reasoning-optimized with chain-of-thought. |
| Llama 3.1:8B-16K | Fallback / Whim.AI default | 16384 | Used for the Whim.AI console and mobile Whim.m chat. Fast inference, 16K context window. |
http://localhost:11434/api/chat with streaming enabled/api/chat POST requests through the Ingest server to Ollama, enabling AI on the phone without direct Ollama access
Both desktop and mobile clients poll /health to check Ollama availability. The health endpoint returns {"status": "ok", "ollama": true/false} by probing http://localhost:11434/api/tags.
09
The AVR Lab tab provides text-to-speech synthesis using Coqui XTTS v2 running in a dedicated conda environment (xtts).
tts_models/multilingual/multi-dataset/xtts_v2~/voices (speaker reference WAV files)~/xtts_out.wav (default) or ~/TableReads/~/miniconda3/envs/xtts/bin/pythonThe TRV Cipher tab is a complete audio transcription workstation:
10
The VOICE ENGINE tab is a dedicated audio diagnostics and wake word calibration environment built for use in noisy environments such as vehicles, outdoor settings, or anywhere ambient noise interferes with "Hey Whim" detection. It provides a real-time spectrogram, signal processing controls, and wake word sensitivity tuning — all in a three-column layout.
The top half of the tab displays a real-time frequency heatmap covering the 300 Hz – 8 kHz range, driven by a 512-point Hanning FFT at 16 kHz mono. Key visual features:
| Control | Range | Description |
|---|---|---|
| Dynamic Gain | 0.1x – 5.0x | Adjusts input volume before processing. Drop if mic is near a vent to avoid clipping. |
| Noise Floor Gate | -80 to 0 dB | Silence threshold. Anything below is ignored, preventing wake word hallucinations from static. |
| High-Pass Filter | Toggle (150 Hz cutoff) | Cuts engine vibration and road hum. Critical for vehicles. Hotkey: H |
| Spectral Subtraction | Toggle + Capture | "Capture Noise Profile" learns ambient/keyboard sound and subtracts that frequency profile from mic input. |
| Automatic Gain Control | Toggle | Auto-levels gain based on ambient noise. Raises gain at highway speed, lowers at idle. Smooth tracking with -20 dB target. |
| Parametric EQ (400 Hz) | Toggle + depth (-24 to 0 dB) | Narrow notch dip at ~400 Hz to reduce cabin reverb "boxiness" that masks the "W" sound in "Whim". |
| Control | Range | Description |
|---|---|---|
| Sensitivity Threshold | 0.0 – 1.0 | Lower = fewer false starts but must shout. Higher = hears whispers but sneezes may trigger. Hotkey: S |
| Phonetic Trigger Delay | 200 – 1500 ms | How long the engine waits after "Hey" to hear "Whim." Bump up to ~800 ms for slow speech. |
| Voice Activity Detection | Toggle | Only runs the expensive AI wake-word check when human-like speech patterns are detected. Saves CPU. |
| Wake Word Engine | Selector | Choose: placeholder (energy-based), openWakeWord, or Porcupine. The latter two support custom "Hey Whim" phrase. |
| Intelligibility Band | Toggle | Highlights 1–3 kHz on the Whim-Scope to visualize the critical voice frequency range. |
| Stat | Value |
|---|---|
| Sample Rate | 16,000 Hz (16 kHz) — optimal for voice; higher wastes CPU, lower loses "s" and "sh" sounds |
| Bit Depth | 16-bit PCM Mono |
| FFT Window | 512-point Hanning |
| Freq Range | 300 Hz – 8,000 Hz |
| Buffer Size | Adjustable 256 – 4096 frames (80–100 ms standard) |
Live readouts include inference latency (ms), buffer frame count, CPU usage, and active audio device name. All settings persist across sessions to ~/.openclaw/voice_engine.json.
| Key | Action |
|---|---|
G | Cycle Gain (0.5 → 1.0 → 2.0 → 5.0) |
S | Cycle Sensitivity (0.3 → 0.5 → 0.7 → 0.9) |
H | Toggle High-Pass Filter on/off |
Uses sounddevice (PortAudio) at 16 kHz mono with float32 samples. The audio callback pipeline processes in order: gain → HPF → parametric EQ → AGC → spectral subtraction → FFT → spectrogram → wake word detection. The wake word function is a placeholder (_ve_detect_wake_word) returning energy-based confidence, ready to swap in openWakeWord or Porcupine for actual custom "Hey Whim" inference.
If mechanical keyboard clacks trigger the wake word, use "Capture Noise Profile" to learn the keyboard's frequency signature, then enable Spectral Subtraction to remove it from the mic input.
11
The PERSONA tab is a voice personality manager that treats coined responses like playlists. Each voice clone (MillyAI, Revy, future voices) gets its own persona profile with a curated set of responses organized by behavioral situation. When Whim needs to respond to a trigger — wake word, command acknowledgment, error, idle chatter — it pulls from that persona's playlist instead of generating a generic response.
~/voices/. Create, duplicate, delete personas. The active persona (starred) is what Whim uses for all responses.| Category | Color | When It Fires |
|---|---|---|
| Wake Word | Green | Immediately after "Hey Whim" is detected (e.g., "Yeah?") |
| Acknowledgment | Cyan | After a command is successfully parsed (e.g., "On it.") |
| Misheard | Orange | When confidence is below threshold (e.g., "The road's loud. One more time?") |
| Error | Red | When a command fails (e.g., "Can't reach the PC. Tunnel might be down.") |
| Narrative | Purple | During table read sessions in AVR Lab (e.g., "Rolling.") |
| Ambient | Grey | System events: boot, reconnect, idle timeout (e.g., "Tunnel's back up.") |
| Custom | Blue | User-defined triggers for future expansion |
Each response has a confidence range (e.g., 40–60%). The Voice Engine's wake word confidence score determines which response fires. At 90%+ confidence, wake responses fire. At 40–60%, partial-match misheard responses fire. Below 20%, the strongest "speak up" responses fire. This maps directly to the Confidence Ghost Bar in the Voice Engine tab.
The context field enables situational awareness. A response tagged "driving" only fires when connected via VPS tunnel (implying mobile/vehicle use). "Morning" fires between 5–10am. "table_read" only fires when AVR Lab is active. Multiple responses matching the same trigger + context are selected randomly to prevent repetition.
Responses are pre-rendered as cached WAV files via the XTTS conda environment (same GPU-accelerated pipeline as AVR Lab). Render All batch-processes every unrendered entry, skipping existing cache. Cached clips play in <100ms instead of waiting 2–5 seconds for live XTTS generation. Cache is stored at ~/voices/personas/[name]/cache/.
Ships with 42 coined responses across all 7 categories: 6 wake word, 8 acknowledgment, 8 misheard, 7 error, 6 narrative, 7 ambient. Ready to render with the MillyAI voice clone.
Coined responses are deterministic — they fire the same way every time. LLMs drift, get verbose, add qualifiers. The LLM handles open conversation; the persona handles mechanical reflexes. "Hey Whim" → "Yeah?" is not a conversation, it's a reflex. Reflexes should be fast, consistent, and characteristic.
10
Whim integrates with Signal via signal-cli running as a local HTTP service:
http://127.0.0.1:8080/opt/Signal/signal-desktopThe Discord tab manages the OpenClaw bot (Enoch persona) with full action control:
/usr/share/discord/Discordopenclaw.json
11
The SmartThings tab provides a complete dashboard for Samsung SmartThings device management via the OpenClaw gateway.
12
The GEOF tab is a geofencing and livestock tracking system designed for hilly terrain (Ozarks). It combines a canvas-based map with real-time collar monitoring via LoRa radio, GPS point-in-polygon fence checking, and ESP32-S3 collar firmware with deep sleep power management.
GeoF works just as well for the four-legged family members who think the backyard fence is more of a suggestion than a rule. If your dog has mastered the art of the great escape — or simply can't resist chasing squirrels into the neighbor's yard — a lightweight GPS collar with GeoF gives you peace of mind without the drama. You'll get a gentle heads-up the moment your adventurous pup wanders past the boundary, so you can call them back before they make it three blocks down the street. Same LoRa collar, same map, same alerts — just swap "Cow-1" for "Biscuit" and you're set.
| Panel | Content |
|---|---|
| Toolbar | Sync Pins, Load/Save/Clear Fence, Start/Stop Bridge, Start/Stop Heartbeat |
| Left (60%) | Canvas map with pan, zoom, grid lines, fence polygon, pin markers, and collar positions (color-coded by status) |
| Right (40%) | Collar status treeview, detail panel, and LoRa log |
| Status | Color | Condition |
|---|---|---|
| OK | Green | Heartbeat received within 20 minutes and inside fence |
| STALE | Yellow | No heartbeat for 20–40 minutes |
| OFFLINE | Red | No heartbeat for >40 minutes |
| ALERT | Bright Red | Collar reported position outside the geofence boundary |
[{lat, lon}] or {pins: [...]} format). Auto-builds fence polygon from 3+ pins.~/.openclaw/fence_config.json
The LoRa bridge (services/lora_bridge.py) runs as a subprocess managed from the GeoF tab. It supports three modes:
| Mode | Flag | Description |
|---|---|---|
| Serial | --port /dev/ttyUSB0 | Reads from a hardware LoRa gateway via serial (default 115200 baud) |
| TCP | --tcp 0.0.0.0:9600 | Accepts collar packets over TCP sockets |
| Simulated | --simulate | Generates synthetic collar data for testing without hardware |
The bridge performs ray-casting point-in-polygon geofence checks on every packet. If a collar reports a position outside the fence boundary, the packet is tagged with OUTSIDE_FENCE alert.
| Parameter | Default | Note |
|---|---|---|
| Frequency | 915 MHz | US ISM band |
| Spreading Factor | SF12 | Maximum range for hilly Ozarks terrain. Slower data rate but signals “bend” over ridges. |
| TX Power | 20 dBm | Maximum allowed for LoRa in US |
| CRC | Enabled | Error detection on all packets |
Each livestock collar runs on an ESP32-S3 with GPS, LoRa radio (SX1276), and IMU accelerometer. The firmware (Collar/firmware/main.cpp) uses a deep sleep cycle:
Collars transmit CSV over LoRa: COLLAR_ID,LAT,LON,BATTERY,NAME[,OUTSIDE_FENCE]
| Path | Purpose |
|---|---|
services/lora_bridge.py | LoRa bridge service (serial/TCP/simulated) |
Collar/firmware/main.cpp | ESP32-S3 Arduino firmware |
Collar/config/fence.json | Default fence config (flash to ESP32 SPIFFS) |
~/.openclaw/fence_config.json | Active fence config (desktop) |
~/.openclaw/geof_pins.json | Cached pin data from mobile sync |
The heartbeat monitor runs as a background timer in the Whim Terminal. Every 20 minutes it scans all registered collars and flags any that have gone silent as STALE or OFFLINE. Alerts appear in the LoRa Log panel and collar table rows change color accordingly.
SF12 (Spreading Factor 12) is critical for hilly terrain. It trades data rate for range, significantly increasing the chance of a signal clearing ridgelines between the collar and your antenna mast. Expect 2–5 km line-of-sight range, or 500m–1.5 km over hills with SF12 + 20 dBm.
13
The NodeFlow tab is a visual node-based flow editor that maps the real-time data pipeline inside Whim. It renders each active component — User Input, Whim Brain (LLM), Opus Droid, OpenClaw Telemetry, and Wisp/GPS — as draggable nodes on an infinite canvas, with dashed edges showing how data flows between them.
| Node | Type | Description |
|---|---|---|
| User Input | input | Prompt and command entry point for the pipeline |
| Whim Brain (LLM) | brain | Local Ollama model handling reasoning, tool calls, and token streaming |
| Opus Droid | droid | Code execution, syntax analysis, and active path highlighting |
| OpenClaw Telemetry | openclaw | Hardware telemetry: RSSI, battery level, heartbeat status |
| Wisp / GPS | wisp | GPS coordinates, geofence status, and LoRa packet data |
| Panel | Content |
|---|---|
| Header | Title, Refresh / Auto-Poll / Reset View buttons, idle/active status indicator |
| Canvas (left, 75%) | Infinite dark canvas with grid lines, color-coded draggable nodes, dashed edge connections, zoom (scroll wheel), and pan (right-click drag) |
| Node Inspector (right top) | Detail card showing the selected node’s label, type, metadata, and connection list |
| Flow Log (right bottom) | Timestamped event log with color-coded severity (info, ok, warn, err) |
| Type | Border Color | Purpose |
|---|---|---|
| brain | Purple | LLM reasoning engine |
| droid | Green | Code execution agents |
| openclaw | Orange | Hardware telemetry sources |
| wisp | Blue | GPS and geofence endpoints |
| input | Tan | User entry points |
14
The Archive tab is a full-featured document editor that saves files to ~/ARCHIVE. All documents created in Whim are stored in this directory.
~/.openclaw/WhimUI/fontsThe right column shows all files in ~/ARCHIVE with refresh, open, and double-click-to-load. A changelog panel at the bottom tracks all document actions with timestamps.
16
The Whim ADB Portal is a standalone GUI (whim_adb_portal.py) for managing APK installs and Android emulators, matching the Whim dark theme.
com.whim.m package| Profile | Resolution | DPI | RAM | API Level |
|---|---|---|---|---|
| Samsung Galaxy S9 | 1440 x 2960 | 570 | 4096 MB | 30 (Android 11) |
| Samsung Galaxy S22 | 1080 x 2340 | 425 | 8192 MB | 33 (Android 13) |
The portal can download and set up the full Android SDK command-line tools (~2 GB), accept licenses, install platform-tools, emulator, and system images, create AVDs with custom device profiles, and launch emulators with GPU acceleration.
17
The OpenClaw Gateway is the central command bus that connects the Whim desktop client to the AI agent infrastructure via WebSocket.
auth.mode: "token")tkuioperator.read, operator.write, operator.approvals (optional)ws://127.0.0.1:18789)The Sessions tab manages active OpenClaw sessions with auto-refresh, presets, crash recovery, and a Notion integration for session notes. The Presence tab shows real-time online status with heartbeat pings to each connected component.
The Events/Debug tab provides a structured, filterable event log with:
18
The SETTINGS tab provides a three-column configuration panel for managing API keys, LLM models, and application preferences. All settings persist to ~/.openclaw/whim_settings.json.
| Field | Description |
|---|---|
| Ollama URL | Base URL for the local Ollama LLM server (default: http://localhost:11434) |
| OpenAI API Key | API key for optional OpenAI integration (masked input, stored locally) |
| SmartThings | Personal access token for Samsung SmartThings API |
| Notion Token | Integration token for Notion session tracking |
Manages Ollama models directly from the Whim Terminal:
mistral:7b)A global model selector in the header bar lets you switch between local LLMs at any time without opening Settings. It shows all models available in Ollama (fetched on startup and via the refresh button). Selecting a model immediately updates Whim.AI's active model for the next prompt. Currently available:
llama3.1:8b-16k — 4.9 GB, 16K context (default, fast)llama3.1:8b — 4.9 GB, standard contextdeepseek-r1:32b — 19.9 GB, reasoning model (slower, smarter)
19
A floating always-on-top tool window for capturing system audio output as lightweight audio files — no video, just audio. Designed for the use case of turning YouTube videos, podcasts, or any playing audio into portable files you can listen to in the car.
Clicking the 🎧 Capture button in the header bar opens a compact floating window that stays on top of all other windows. It captures audio from PipeWire/PulseAudio monitor sources — virtual loopback devices that tap into whatever audio is playing through your speakers or HDMI output. No screen recording, no video — just the audio stream, encoded to a small file.
| Control | Description |
|---|---|
| Source | Dropdown listing all PipeWire monitor sources. Auto-selects HDMI if available. Options include USB speakers, headphones, S/PDIF, and HDMI. |
| Format | Output codec: MP3 (default, car-compatible), Opus, OGG Vorbis, M4A (AAC), WAV (lossless) |
| Bitrate | 64k – 320k. Default 128k gives ~1 MB/min for MP3 (good for podcasts/speech). |
| Record / Stop | Start/stop capture. Header button flashes red while recording. |
| VU Meter | Live level indicator (green/yellow/red). |
| Timer | Running elapsed time (HH:MM:SS) and live file size. |
| Name / Rename | Inline rename of the output file after stopping. |
Files save to ~/Journal/audio_captures/ with timestamps (e.g. capture_20260317_143022.mp3). The folder link in the tool opens the directory in the file manager. At 128k MP3, a 1-hour podcast capture is roughly 60 MB.
Uses ffmpeg -f pulse to read from PipeWire/PulseAudio monitor sources. The monitor sources are virtual loopback devices created automatically by PipeWire for every output sink. No additional driver or loopback configuration is needed.
20
| Component | Technology |
|---|---|
| OS | Linux Mint (CARRARA machine) |
| Python | 3.12+ (system) + conda env: xtts (3.10+) |
| GUI Framework | Tkinter with ttk (Azure dark theme) |
| AI Runtime | Ollama (local GPU inference) |
| Voice Synthesis | Coqui XTTS v2 (conda env: xtts) |
| Transcription | OpenAI Whisper |
| Networking | Reverse SSH tunnel via VPS (autossh + systemd) |
| Messaging | signal-cli (Signal) + discord.py/nextcord (Discord) |
| IoT | Samsung SmartThings via OpenClaw gateway |
| Android | ADB + Android SDK command-line tools |
| Screen Capture | mss (Python) |
| Image Processing | Pillow (PIL) |
| QR Codes | qrcode (Python library) |
| System Tray | pystray |
| Document Export | odf (OpenDocument Format) + LibreOffice Writer |
| Audio Processing | FFmpeg, NumPy, wave |
| Path | Purpose |
|---|---|
~/vaults/WHIM/ | Main Whim project vault |
~/vaults/WHIM/app/ | Desktop application source code |
~/vaults/WHIM/mobile/ | Mobile app, APKs, build artifacts |
~/vaults/WHIM/assets/ | Fonts, icons, logos |
~/.openclaw/ | OpenClaw config, Whim icon, sessions store |
~/.openclaw/WhimUI/ | Custom fonts and icon packs (Papirus, Mint-Y) |
~/Journal/ | Voice recordings and notes uploaded from phone |
~/ARCHIVE/ | Documents created in the Archive Tab Editor |
~/TRANSCRIPT/ | Exported ODT transcripts |
~/TableReads/ | XTTS voice synthesis output |
~/voices/ | Speaker reference files for XTTS |
~/Incoming/fire.png | Flame logo used in the header and taskbar |
The main configuration lives at ~/.openclaw/openclaw.json and controls:
Whim enforces a single instance by binding TCP port 48891. If a second instance is launched, it sends a SHOW signal to the existing instance, which restores and focuses its window.
21
The CARRARA desktop runs Linux Mint with Cinnamon. The following customizations have been applied to the desktop environment for a cleaner workflow and ergonomic comfort.
All non-pinned application entries have been removed from the start menu. Only taskbar-pinned favorites remain accessible via the start menu:
| Application | .desktop ID | Status |
|---|---|---|
| Firefox | firefox.desktop | Pinned |
| Software Manager | mintinstall.desktop | Pinned |
| System Settings | cinnamon-settings.desktop | Pinned |
| Terminal | org.gnome.Terminal.desktop | Pinned |
| Files (Nemo) | nemo.desktop | Pinned |
| Google Chrome | google-chrome.desktop | Pinned |
Removed .desktop overrides are backed up at ~/.local/share/applications/_backup_removed/. Custom app entries removed include: OpenClaw, Whim ADB Portal, Control Panel, Droid, Revy Acousto, and OnlineChat webapp. System app overrides (Discord, Signal, Audacity, LibreOffice, etc.) were also removed, reverting them to default system entries.
Additionally, all Preferences and Administration category entries (65 items) have been hidden from the start menu via NoDisplay=true overrides. This includes all Cinnamon settings sub-panels (Backgrounds, Themes, Keyboard, Display, etc.), system tools (Firewall, Timeshift, Driver Manager, Update Manager, etc.), and utility launchers. The main System Settings app remains accessible from the pinned taskbar for when settings changes are needed.
All Cinnamon keyboard shortcuts that use the ALT key have been disabled for ergonomic reasons (wrist rest positioning). This includes:
| Action | Previous Shortcut |
|---|---|
| Switch windows | Alt+Tab |
| Switch windows backward | Shift+Alt+Tab |
| Close window | Alt+F4 |
| Toggle maximized | Alt+F10 |
| Unmaximize | Alt+F5 |
| Window menu | Alt+Space |
| Move window | Alt+F7 |
| Resize window | Alt+F8 |
| Run dialog | Alt+F2 |
| Switch group | Alt+Above_Tab |
| Action | Previous Shortcut |
|---|---|
| Switch workspace up/down/left/right | Ctrl+Alt+Arrow |
| Move window to workspace | Ctrl+Shift+Alt+Arrow |
| Switch panels | Ctrl+Alt+Tab |
| Action | Previous Shortcut | Retained Non-ALT Binding |
|---|---|---|
| Logout | Ctrl+Alt+Delete | — |
| Terminal | Ctrl+Alt+T | — |
| Lock screen | Ctrl+Alt+L | XF86ScreenSaver |
| Shutdown | Ctrl+Alt+End | XF86PowerOff |
| Restart Cinnamon | Ctrl+Alt+Escape | — |
| Toggle recording | Ctrl+Shift+Alt+R | — |
| Window screenshot | Alt+Print | — |
| Magnifier zoom | Alt+Super+=/−/0 | — |
To restore all Cinnamon ALT shortcuts to defaults, run:
gsettings reset-recursively org.cinnamon.desktop.keybindings
22
Whim Terminal runs natively on Windows 11 via a platform compatibility layer that abstracts OS-specific calls (paths, services, audio). The same core codebase powers both the Linux and Windows builds.
| Software | Required | Install From |
|---|---|---|
| Python 3.10+ | Required | python.org (check "Add to PATH") |
| Ollama for Windows | Required | ollama.com |
| Tailscale | Optional | tailscale.com |
| ffmpeg | Optional | ffmpeg.org (add to PATH) |
| Signal Desktop | Optional | signal.org |
git clone https://github.com/scarter84/Whim.git
cd Whim
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
.\scripts\setup_windows.ps1This creates a virtual environment, installs dependencies, sets up data directories, and creates a desktop shortcut.
git clone https://github.com/scarter84/Whim.git
cd Whim
scripts\setup_windows.batscripts\launch_whim.batOr use the desktop shortcut created by the PowerShell setup.
Whim stores data in Windows-native locations:
| Linux Path | Windows Path |
|---|---|
~/.openclaw/ | %APPDATA%\OpenClaw\ |
~/Journal/ | Documents\Whim\Journal\ |
~/ARCHIVE/ | Documents\Whim\ARCHIVE\ |
~/TRANSCRIPT/ | Documents\Whim\TRANSCRIPT\ |
~/TableReads/ | Documents\Whim\TableReads\ |
~/voices/ | Documents\Whim\voices\ |
~/Incoming/ | Documents\Whim\Incoming\ |
| Feature | Linux | Windows 11 |
|---|---|---|
| File opener | xdg-open | os.startfile() |
| Service check | systemctl is-active | sc query |
| Audio sources | pactl (PulseAudio/PipeWire) | sounddevice (Windows Audio) |
| SSH Tunnel | systemd whim-tunnel.service | Manual SSH or Tailscale direct |
| DPI scaling | System native | Per-monitor DPI aware (auto-set) |
| Control Panel | Custom Cinnamon panel | Use Windows Settings directly |
| TTS Engine | XTTS via conda env | XTTS via pip or system Python |
app/
openclaw_tkui.py ← Main terminal (cross-platform)
whim_windows.py ← Windows 11 entry point
platform_compat.py ← OS abstraction layer
requirements_windows.txt
scripts/
setup_windows.bat ← Batch setup
setup_windows.ps1 ← PowerShell setup
launch_whim.bat ← Quick launcher
The platform_compat.py module detects the OS at import time and provides
correct path defaults, service checkers, audio source enumeration, and file-open commands.
The whim_windows.py launcher sets DPI awareness, verifies Ollama, patches
path constants, then loads the main app.
On Windows, the preferred connection method to mobile devices is Tailscale (direct mesh VPN). The Linux systemd SSH tunnel is not available natively on Windows, but Tailscale provides the same end-to-end encrypted connectivity with zero configuration.
Alternatively, use Windows OpenSSH to create a manual tunnel:
ssh -N -R 8089:localhost:8089 user@YOUR_VPS_IPsounddevice instead of PipeWire monitor sourcescontrol_panel.py) is Cinnamon-specific and not included
22.5
Whim.m is accessible on iOS devices via Safari or Chrome as a Progressive Web App (PWA). The iOS variant, codenamed Tahoe, connects to the same Whim server backend and provides the same five-tab experience (REC, LIBRARY, CHAT, WAKE, DEVICES) with platform-specific adaptations for Apple hardware.
| Software | Required | Notes |
|---|---|---|
| iOS 16+ | Required | PWA support requires iOS 16 or later |
| Safari / Chrome | Required | Safari recommended for best PWA integration (Add to Home Screen) |
| Tailscale for iOS | Optional | Required for direct Tailscale mesh connection mode |
| Setting | Value |
|---|---|
| Connection | VPS Tunnel (default) or Tailscale (requires Tailscale iOS app) |
| URL | http://104.207.140.242:8089 (VPS) or http://100.69.17.20:8089 (Tailscale) |
| PWA Install | Safari → Share → Add to Home Screen |
| Audio Recording | WebRTC MediaRecorder API (Safari 14.5+) |
| Wake Word | Requires microphone permission grant; iOS may suspend background audio |
| Notifications | Web Push supported on iOS 16.4+ (requires PWA mode) |
| Feature | Android (APK) | iOS (PWA / Tahoe) |
|---|---|---|
| App Delivery | Native APK via ADB sideload | PWA via Safari "Add to Home Screen" |
| WebView Engine | Chromium (Android WebView) | WebKit (Safari) |
| Audio Format | WebM/Opus (native) | MP4/AAC (Safari MediaRecorder default) |
| Background Audio | Supported (WebView keeps running) | Limited — iOS may suspend after ~30s in background |
| Wake Word | Always-on via WebView | Active only while app is in foreground |
| File Upload | Full filesystem access via intent | Photo Library + Files app picker |
| Camera Access | Direct WebRTC + Screen Share | WebRTC supported; no Screen Share capture |
| Notification | Firebase / local | Web Push (iOS 16.4+ in PWA mode only) |
| Install Size | ~61 KB APK | ~0 KB (bookmark/PWA shell) |
| Tailscale | Tailscale Android app | Tailscale iOS app (App Store) |
getDisplayMedia() in PWAs, so the desktop-to-phone Screen Share viewer works but phone-to-desktop camera capture may be limited.
23
The SYNC tab enables state synchronization across multiple Whim Terminal instances running on different machines (Linux + Windows). Seven sync approaches are available, managed through a unified engine.
| # | Approach | Transport | Real-time | Offline |
|---|---|---|---|---|
| 1 | WebSocket Daemon | Tailscale mesh | Yes | No |
| 2 | VPS rsync | SSH to VPS | No | Yes |
| 3 | CRDT Collaboration | WebSocket | Yes | No |
| 4 | Git Sync | Git remote | No | Yes |
| 5 | Hybrid (1+2) | Tailscale + VPS | Yes | Yes |
| 6 | Session Mirror | WebSocket | Yes | No |
| 7 | Phone Bridge | HTTP (Whim.m) | Buffered | Yes |
| Data | File | Sync Default |
|---|---|---|
| Session History | whim_sessions.json | On |
| Settings | whim_settings.json | On |
| Voice Engine Config | voice_engine.json | On |
| Device Locations | device_locations.json | On |
| Personas | personas.json | On |
| Journal Manifest | ~/Journal/*.json | On |
| Archive Text | ~/ARCHIVE/*.txt | On |
| API Keys / Tokens | — | Never |
Combines WebSocket + VPS for maximum reliability:
Real-time peer-to-peer sync via Tailscale. Both machines must be online. Heartbeat every 10s, full reconciliation every 5 min.
Async push/pull via rsync over SSH. Works even when the other machine is off. Manual or auto-triggered.
Auto-commit every 60s, push/pull from a private Git repo. Full version history and easy rollback.
The sync engine uses vector clocks for last-writer-wins conflict resolution. Each node maintains a logical clock that increments on every local change. When merging, the node with the higher clock value wins. For simultaneous edits at equal clocks, the node with the lexicographically higher ID wins (deterministic tie-breaking).
The CRDT layer (Approach 3) provides conflict-free merging for structured data like session lists and chat histories, ensuring eventual consistency without data loss.
Cast a live Whim Terminal session to another machine for read-only viewing. Enter the host's Tailscale IP in the SYNC tab and click WATCH. The mirror updates in real-time. Optional control handoff allows the viewer to operate the remote session.
Uses connected Whim.m phones as store-and-forward relays. When desktop A pushes changes, the phone stores them. When desktop B comes online, it pulls buffered changes from the phone. Leverages the existing Whim.m HTTP server on port 8089.
Sync config is stored at:
| Platform | Path |
|---|---|
| Linux | ~/.openclaw/whim_sync.json |
| Windows | %APPDATA%\OpenClaw\whim_sync.json |