Files
whisper-local/docs/superpowers/plans/2026-05-14-microphone-monitor.md
T
2026-05-14 17:29:57 +02:00

769 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Mikrofon-Monitor Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Mikrofon-Geräteänderungen erkennen, bei fehlendem konfiguriertem Mikrofon automatisch auf Standard-Mikrofon wechseln und den Nutzer per Toast und Tray-Tooltip benachrichtigen.
**Architecture:** Neues `whisper_local/microphone/` Paket mit `MicrophoneMonitor`-Protocol und `create_monitor()`-Factory (analog zu `whisper_local/media/`). Windows nutzt `IMMNotificationClient` via `comtypes` mit Fallback auf Polling; alle anderen Plattformen nutzen Polling (`asyncio.sleep(2.5)`). Benachrichtigungen laufen über `notify-py` (cross-platform) und `PystrayApp.set_warning()`.
**Tech Stack:** Python 3.13+, `sounddevice` (device listing), `comtypes` (Windows COM), `notify-py` (Toast), `pystray` (Tray-Tooltip), `pytest-asyncio` (Tests)
---
## Dateiübersicht
| Aktion | Datei | Zweck |
|--------|-------|-------|
| Erstellen | `whisper_local/microphone/__init__.py` | Protocol + Factory |
| Erstellen | `whisper_local/microphone/_poll.py` | Polling-Implementierung |
| Erstellen | `whisper_local/microphone/_win32.py` | Windows IMMNotificationClient |
| Erstellen | `whisper_local/tray/_notification.py` | notify-py Wrapper |
| Ändern | `whisper_local/tray/_tray.py` | `set_warning()` zu `PystrayApp` + `NoOpTray` |
| Ändern | `whisper_local/__main__.py` | Monitor-Integration in `App` |
| Ändern | `pyproject.toml` | `notify-py` + `comtypes` als Abhängigkeiten |
| Erstellen | `tests/test_microphone_monitor.py` | Tests für `PollMonitor` |
---
## Task 1: Abhängigkeiten + `_notification.py`
**Files:**
- Modify: `pyproject.toml`
- Create: `whisper_local/tray/_notification.py`
- [ ] **Schritt 1: Abhängigkeiten in `pyproject.toml` eintragen**
In der `dependencies`-Liste nach `"darkdetect>=0.8.0",` folgende Zeilen ergänzen:
```toml
"notify-py>=0.3.43",
"comtypes>=1.4.0; sys_platform == 'win32'",
```
- [ ] **Schritt 2: Lock-File aktualisieren**
```
uv lock
```
Erwartete Ausgabe: `Resolved N packages` ohne Fehler.
- [ ] **Schritt 3: `_notification.py` anlegen**
```python
# whisper_local/tray/_notification.py
"""Desktop-Benachrichtigungen via notify-py."""
import logging
logger = logging.getLogger(__name__)
_APP_NAME = "whisper-local"
def notify(title: str, message: str) -> None:
"""Zeigt eine Desktop-Benachrichtigung. Bei Fehler wird nur geloggt."""
try:
from notifypy import Notify
n = Notify()
n.application_name = _APP_NAME
n.title = title
n.message = message
n.send()
except Exception:
logger.warning("Benachrichtigung fehlgeschlagen: %s %s", title, message)
```
- [ ] **Schritt 4: Importtest**
```
uv run python -c "from whisper_local.tray._notification import notify; print('OK')"
```
Erwartete Ausgabe: `OK`
- [ ] **Schritt 5: Committen**
```
git add pyproject.toml uv.lock whisper_local/tray/_notification.py
git commit -m "feat(notify): notify-py + _notification.py Wrapper"
```
---
## Task 2: `MicrophoneMonitor` Protocol + Factory-Skeleton
**Files:**
- Create: `whisper_local/microphone/__init__.py`
- [ ] **Schritt 1: Paket anlegen**
```python
# whisper_local/microphone/__init__.py
"""Mikrofon-Geräteüberwachung — plattformspezifische Backends."""
import sys
from collections.abc import Awaitable, Callable
from typing import Protocol
class MicrophoneMonitor(Protocol):
on_device_added: Callable[[str], Awaitable[None]] | None
on_device_removed: Callable[[str], Awaitable[None]] | None
on_configured_missing: Callable[[], Awaitable[None]] | None
async def start(self) -> None: ...
def stop(self) -> None: ...
def create_monitor(configured_device: str | None) -> MicrophoneMonitor:
"""Erstellt den plattformspezifischen Mikrofon-Monitor."""
if sys.platform == "win32":
from whisper_local.microphone._win32 import Win32Monitor
return Win32Monitor(configured_device)
from whisper_local.microphone._poll import PollMonitor
return PollMonitor(configured_device)
```
- [ ] **Schritt 2: Importtest**
```
uv run python -c "from whisper_local.microphone import create_monitor; print('OK')"
```
Erwartete Ausgabe: `OK` (auf Windows schlägt das vorerst fehl, weil `_win32.py` noch nicht existiert — das ist OK, kommt in Task 5)
- [ ] **Schritt 3: Committen**
```
git add whisper_local/microphone/__init__.py
git commit -m "feat(microphone): Protocol + create_monitor() Factory-Skeleton"
```
---
## Task 3: `PollMonitor` — Geräteerkennung (TDD)
**Files:**
- Create: `whisper_local/microphone/_poll.py`
- Create: `tests/test_microphone_monitor.py`
- [ ] **Schritt 1: Testdatei anlegen (schlägt zunächst fehl)**
```python
# tests/test_microphone_monitor.py
import asyncio
from unittest.mock import AsyncMock, patch
import pytest
from whisper_local.microphone._poll import PollMonitor
def _fake_devices(names: list[str]) -> list[dict]:
return [{"name": n, "max_input_channels": 1} for n in names]
@pytest.mark.asyncio
async def test_on_device_added_fires_when_device_appears():
monitor = PollMonitor(configured_device=None, interval=0.05)
event = asyncio.Event()
added: list[str] = []
async def on_added(name: str) -> None:
added.append(name)
event.set()
monitor.on_device_added = on_added
call_count = 0
def fake_query():
nonlocal call_count
call_count += 1
if call_count == 1:
return _fake_devices(["Mic A"])
return _fake_devices(["Mic A", "Mic B"])
with patch("sounddevice.query_devices", side_effect=fake_query):
await monitor.start()
await asyncio.wait_for(event.wait(), timeout=1.0)
monitor.stop()
assert added == ["Mic B"]
@pytest.mark.asyncio
async def test_on_device_removed_fires_when_device_disappears():
monitor = PollMonitor(configured_device=None, interval=0.05)
event = asyncio.Event()
removed: list[str] = []
async def on_removed(name: str) -> None:
removed.append(name)
event.set()
monitor.on_device_removed = on_removed
call_count = 0
def fake_query():
nonlocal call_count
call_count += 1
if call_count == 1:
return _fake_devices(["Mic A", "Mic B"])
return _fake_devices(["Mic A"])
with patch("sounddevice.query_devices", side_effect=fake_query):
await monitor.start()
await asyncio.wait_for(event.wait(), timeout=1.0)
monitor.stop()
assert removed == ["Mic B"]
```
- [ ] **Schritt 2: Tests ausführen — müssen FEHLSCHLAGEN**
```
uv run pytest tests/test_microphone_monitor.py -v
```
Erwartete Ausgabe: `ModuleNotFoundError: No module named 'whisper_local.microphone._poll'`
- [ ] **Schritt 3: `_poll.py` implementieren**
```python
# whisper_local/microphone/_poll.py
"""Polling-basierter Mikrofon-Monitor (cross-platform)."""
import asyncio
import logging
from collections.abc import Awaitable, Callable
import sounddevice as sd
logger = logging.getLogger(__name__)
class PollMonitor:
def __init__(self, configured_device: str | None, interval: float = 2.5):
self.configured_device = configured_device
self.interval = interval
self.on_device_added: Callable[[str], Awaitable[None]] | None = None
self.on_device_removed: Callable[[str], Awaitable[None]] | None = None
self.on_configured_missing: Callable[[], Awaitable[None]] | None = None
self._task: asyncio.Task | None = None
self._known_devices: set[str] = set()
def _current_devices(self) -> set[str]:
try:
return {
dev["name"]
for dev in sd.query_devices()
if dev["max_input_channels"] > 0
}
except Exception:
logger.exception("Fehler beim Abfragen der Audiogeräte")
return self._known_devices.copy()
async def start(self) -> None:
self._known_devices = self._current_devices()
self._task = asyncio.create_task(self._loop())
def stop(self) -> None:
if self._task is not None:
self._task.cancel()
self._task = None
async def _loop(self) -> None:
while True:
await asyncio.sleep(self.interval)
current = self._current_devices()
added = current - self._known_devices
removed = self._known_devices - current
self._known_devices = current
for name in added:
if self.on_device_added:
await self.on_device_added(name)
for name in removed:
if self.on_device_removed:
await self.on_device_removed(name)
```
- [ ] **Schritt 4: Tests ausführen — müssen BESTEHEN**
```
uv run pytest tests/test_microphone_monitor.py -v
```
Erwartete Ausgabe:
```
PASSED tests/test_microphone_monitor.py::test_on_device_added_fires_when_device_appears
PASSED tests/test_microphone_monitor.py::test_on_device_removed_fires_when_device_disappears
```
- [ ] **Schritt 5: Committen**
```
git add whisper_local/microphone/_poll.py tests/test_microphone_monitor.py
git commit -m "feat(microphone): PollMonitor mit Geräteerkennung (TDD)"
```
---
## Task 4: `PollMonitor` — sofortige Startprüfung (TDD)
**Files:**
- Modify: `whisper_local/microphone/_poll.py`
- Modify: `tests/test_microphone_monitor.py`
- [ ] **Schritt 1: Zwei neue Tests zur Testdatei hinzufügen** (nach den bestehenden Tests einfügen)
```python
@pytest.mark.asyncio
async def test_on_configured_missing_fires_immediately_at_start():
monitor = PollMonitor(configured_device="Headset USB", interval=99.0)
missing_called = asyncio.Event()
async def on_missing() -> None:
missing_called.set()
monitor.on_configured_missing = on_missing
with patch("sounddevice.query_devices", return_value=_fake_devices(["Mic A"])):
await monitor.start()
assert missing_called.is_set()
monitor.stop()
@pytest.mark.asyncio
async def test_on_configured_missing_does_not_fire_when_device_present():
monitor = PollMonitor(configured_device="Headset USB", interval=99.0)
missing_mock = AsyncMock()
monitor.on_configured_missing = missing_mock
with patch("sounddevice.query_devices", return_value=_fake_devices(["Headset USB", "Mic A"])):
await monitor.start()
missing_mock.assert_not_called()
monitor.stop()
```
- [ ] **Schritt 2: Tests ausführen — die zwei neuen müssen FEHLSCHLAGEN**
```
uv run pytest tests/test_microphone_monitor.py::test_on_configured_missing_fires_immediately_at_start tests/test_microphone_monitor.py::test_on_configured_missing_does_not_fire_when_device_present -v
```
Erwartete Ausgabe: `FAILED` für beide neuen Tests.
- [ ] **Schritt 3: `PollMonitor.start()` um sofortige Prüfung erweitern**
In `whisper_local/microphone/_poll.py` die Methode `start()` ersetzen:
```python
async def start(self) -> None:
self._known_devices = self._current_devices()
if (
self.configured_device
and self.configured_device not in self._known_devices
and self.on_configured_missing
):
await self.on_configured_missing()
self._task = asyncio.create_task(self._loop())
```
- [ ] **Schritt 4: Alle Tests ausführen — müssen BESTEHEN**
```
uv run pytest tests/test_microphone_monitor.py -v
```
Erwartete Ausgabe: 4× `PASSED`
- [ ] **Schritt 5: Committen**
```
git add whisper_local/microphone/_poll.py tests/test_microphone_monitor.py
git commit -m "feat(microphone): PollMonitor meldet fehlendes Gerät sofort beim Start"
```
---
## Task 5: `Win32Monitor` mit IMMNotificationClient
**Files:**
- Create: `whisper_local/microphone/_win32.py`
- [ ] **Schritt 1: `_win32.py` anlegen**
```python
# whisper_local/microphone/_win32.py
"""Windows Mikrofon-Monitor via IMMNotificationClient (Core Audio API)."""
import asyncio
import ctypes
import logging
from collections.abc import Awaitable, Callable
import sounddevice as sd
logger = logging.getLogger(__name__)
_CLSID_MMDeviceEnumerator = "{BCDE0395-E52F-467C-8E3D-C4579291692E}"
_IID_IMMDeviceEnumerator = "{A95664D2-9614-4F35-A746-DE8DB63617E6}"
_IID_IMMNotificationClient = "{7991EEC9-7E89-4D85-8390-6C703CEC60C0}"
def _build_com_interfaces():
"""Definiert IMMDeviceEnumerator und IMMNotificationClient via comtypes."""
import comtypes
from comtypes import COMMETHOD, GUID, HRESULT, IUnknown, POINTER
class _IMMNotificationClient(IUnknown):
_iid_ = GUID(_IID_IMMNotificationClient)
_methods_ = [
COMMETHOD([], HRESULT, "OnDeviceStateChanged",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId"),
(["in"], ctypes.c_uint, "dwNewState")),
COMMETHOD([], HRESULT, "OnDeviceAdded",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId")),
COMMETHOD([], HRESULT, "OnDeviceRemoved",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId")),
COMMETHOD([], HRESULT, "OnDefaultDeviceChanged",
(["in"], ctypes.c_int, "flow"),
(["in"], ctypes.c_int, "role"),
(["in"], ctypes.c_wchar_p, "pwstrDefaultDeviceId")),
COMMETHOD([], HRESULT, "OnPropertyValueChanged",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId"),
(["in"], ctypes.c_void_p, "key")),
]
class _IMMDeviceEnumerator(IUnknown):
_iid_ = GUID(_IID_IMMDeviceEnumerator)
_methods_ = [
COMMETHOD([], HRESULT, "EnumAudioEndpoints",
(["in"], ctypes.c_int, "dataFlow"),
(["in"], ctypes.c_uint, "dwStateMask"),
(["out"], POINTER(IUnknown), "ppDevices")),
COMMETHOD([], HRESULT, "GetDefaultAudioEndpoint",
(["in"], ctypes.c_int, "dataFlow"),
(["in"], ctypes.c_int, "role"),
(["out"], POINTER(IUnknown), "ppEndpoint")),
COMMETHOD([], HRESULT, "GetDevice",
(["in"], ctypes.c_wchar_p, "pwstrId"),
(["out"], POINTER(IUnknown), "ppDevice")),
COMMETHOD([], HRESULT, "RegisterEndpointNotificationCallback",
(["in"], POINTER(_IMMNotificationClient), "pClient")),
COMMETHOD([], HRESULT, "UnregisterEndpointNotificationCallback",
(["in"], POINTER(_IMMNotificationClient), "pClient")),
]
return _IMMNotificationClient, _IMMDeviceEnumerator
def _build_client_class(IMMNotificationClient, callback):
"""Erstellt eine comtypes.COMObject-Implementierung von IMMNotificationClient."""
import comtypes
class _NotificationClientImpl(comtypes.COMObject):
_com_interfaces_ = [IMMNotificationClient]
def OnDeviceStateChanged(self, pwstrDeviceId, dwNewState):
callback()
return 0
def OnDeviceAdded(self, pwstrDeviceId):
callback()
return 0
def OnDeviceRemoved(self, pwstrDeviceId):
callback()
return 0
def OnDefaultDeviceChanged(self, flow, role, pwstrDefaultDeviceId):
return 0
def OnPropertyValueChanged(self, pwstrDeviceId, key):
return 0
return _NotificationClientImpl()
class Win32Monitor:
def __init__(self, configured_device: str | None):
self.configured_device = configured_device
self.on_device_added: Callable[[str], Awaitable[None]] | None = None
self.on_device_removed: Callable[[str], Awaitable[None]] | None = None
self.on_configured_missing: Callable[[], Awaitable[None]] | None = None
self._loop: asyncio.AbstractEventLoop | None = None
self._known_devices: set[str] = set()
self._enumerator = None
self._client = None
self._fallback = None
def _current_devices(self) -> set[str]:
try:
return {
dev["name"]
for dev in sd.query_devices()
if dev["max_input_channels"] > 0
}
except Exception:
logger.exception("Fehler beim Abfragen der Audiogeräte")
return self._known_devices.copy()
async def start(self) -> None:
self._loop = asyncio.get_running_loop()
self._known_devices = self._current_devices()
if (
self.configured_device
and self.configured_device not in self._known_devices
and self.on_configured_missing
):
await self.on_configured_missing()
try:
self._start_com()
except Exception:
logger.warning(
"IMMNotificationClient nicht verfügbar, Fallback auf Polling",
exc_info=True,
)
from whisper_local.microphone._poll import PollMonitor
fallback = PollMonitor(self.configured_device)
fallback.on_device_added = self.on_device_added
fallback.on_device_removed = self.on_device_removed
fallback._known_devices = self._known_devices
self._fallback = fallback
self._fallback._task = asyncio.create_task(self._fallback._loop())
def _start_com(self) -> None:
import comtypes
import comtypes.client
from comtypes import GUID
comtypes.CoInitialize()
IMMNotificationClient, IMMDeviceEnumerator = _build_com_interfaces()
self._enumerator = comtypes.client.CreateObject(
GUID(_CLSID_MMDeviceEnumerator),
interface=IMMDeviceEnumerator,
)
self._client = _build_client_class(IMMNotificationClient, self._on_com_event)
self._enumerator.RegisterEndpointNotificationCallback(self._client)
def _on_com_event(self) -> None:
if self._loop is not None:
self._loop.call_soon_threadsafe(
lambda: asyncio.ensure_future(self._handle_change())
)
async def _handle_change(self) -> None:
current = self._current_devices()
added = current - self._known_devices
removed = self._known_devices - current
self._known_devices = current
for name in added:
if self.on_device_added:
await self.on_device_added(name)
for name in removed:
if self.on_device_removed:
await self.on_device_removed(name)
def stop(self) -> None:
if self._fallback is not None:
self._fallback.stop()
return
if self._enumerator is not None and self._client is not None:
try:
self._enumerator.UnregisterEndpointNotificationCallback(self._client)
except Exception:
logger.warning("Fehler beim Deregistrieren des Notification-Clients")
try:
import comtypes
comtypes.CoUninitialize()
except Exception:
pass
```
- [ ] **Schritt 2: Importtest auf Windows**
```
uv run python -c "from whisper_local.microphone._win32 import Win32Monitor; print('OK')"
```
Erwartete Ausgabe: `OK`
- [ ] **Schritt 3: Alle Tests laufen lassen**
```
uv run pytest tests/ -v
```
Erwartete Ausgabe: alle bestehenden Tests `PASSED`
- [ ] **Schritt 4: Committen**
```
git add whisper_local/microphone/_win32.py
git commit -m "feat(microphone): Win32Monitor via IMMNotificationClient mit Polling-Fallback"
```
---
## Task 6: `PystrayApp.set_warning()` + `NoOpTray.set_warning()`
**Files:**
- Modify: `whisper_local/tray/_tray.py`
- [ ] **Schritt 1: `set_warning()` zu `PystrayApp` hinzufügen**
In `whisper_local/tray/_tray.py` nach der Methode `set_state` die neue Methode einfügen:
```python
def set_warning(self, msg: str | None) -> None:
"""Setzt Tray-Titel auf Warnung oder zurück auf normal (thread-sicher)."""
if self._icon is not None:
self._icon.title = "whisper-local" if msg is None else f"whisper-local ⚠ {msg}"
```
- [ ] **Schritt 2: `set_warning()` zu `NoOpTray` hinzufügen**
In `NoOpTray` nach `set_state` einfügen:
```python
def set_warning(self, msg: str | None) -> None:
pass
```
- [ ] **Schritt 3: Importtest**
```
uv run python -c "from whisper_local.tray._tray import PystrayApp, NoOpTray; print('OK')"
```
Erwartete Ausgabe: `OK`
- [ ] **Schritt 4: Committen**
```
git add whisper_local/tray/_tray.py
git commit -m "feat(tray): set_warning() für Tray-Tooltip-Warnung"
```
---
## Task 7: App-Integration
**Files:**
- Modify: `whisper_local/__main__.py`
- [ ] **Schritt 1: Import und Monitor-Erstellung in `App.__init__` hinzufügen**
In `whisper_local/__main__.py` den Import-Block am Anfang der Datei ergänzen:
```python
from whisper_local.microphone import create_monitor
```
In `App.__init__` nach `self.hotkey = create_listener(key_name=config.hotkey)` einfügen:
```python
self.monitor = create_monitor(config.microphone or None)
self.monitor.on_device_added = self._on_microphone_added
self.monitor.on_device_removed = self._on_microphone_removed
self.monitor.on_configured_missing = self._on_configured_microphone_missing
```
- [ ] **Schritt 2: Callbacks implementieren** (in `App`, nach `_open_settings`)
```python
async def _on_configured_microphone_missing(self) -> None:
"""Konfiguriertes Mikrofon nicht gefunden — auf Standard wechseln."""
from whisper_local.tray._notification import notify
device_name = self._config.microphone or "Mikrofon"
logger.warning("Konfiguriertes Mikrofon '%s' nicht gefunden, nutze Standard", device_name)
self.recorder = Recorder(
sample_rate=self._config.sample_rate,
channels=self._config.channels,
min_duration=self._config.min_duration,
device=None,
)
notify(
"Mikrofon nicht gefunden",
f"{device_name}" ist nicht verfügbar. Standard-Mikrofon wird verwendet.",
)
self.tray.set_warning("Mikrofon nicht gefunden")
async def _on_microphone_added(self, device_name: str) -> None:
"""Neues Mikrofon erkannt — konfiguriertes Gerät ggf. wiederherstellen."""
if device_name != self._config.microphone:
return
from whisper_local.tray._notification import notify
logger.info("Konfiguriertes Mikrofon '%s' wieder verfügbar", device_name)
self.recorder = Recorder(
sample_rate=self._config.sample_rate,
channels=self._config.channels,
min_duration=self._config.min_duration,
device=self._config.microphone or None,
)
notify("Mikrofon verbunden", f"{device_name}" ist wieder verfügbar.")
self.tray.set_warning(None)
async def _on_microphone_removed(self, device_name: str) -> None:
"""Mikrofon entfernt — konfiguriertes Gerät → Fallback auslösen."""
logger.info("Mikrofon entfernt: %s", device_name)
if device_name == self._config.microphone:
await self._on_configured_microphone_missing()
```
- [ ] **Schritt 3: Monitor in `App.run()` starten**
In `App.run()` nach `self._hotkey_task = asyncio.create_task(self.hotkey.listen())` einfügen:
```python
asyncio.create_task(self.monitor.start())
```
- [ ] **Schritt 4: Monitor in `_on_config_reload` neu starten**
In `_on_config_reload` nach dem Block mit `self.recorder = Recorder(...)` einfügen:
```python
self.monitor.stop()
self.monitor = create_monitor(new_config.microphone or None)
self.monitor.on_device_added = self._on_microphone_added
self.monitor.on_device_removed = self._on_microphone_removed
self.monitor.on_configured_missing = self._on_configured_microphone_missing
if self._loop is not None:
asyncio.run_coroutine_threadsafe(self.monitor.start(), self._loop)
self.tray.set_warning(None)
```
- [ ] **Schritt 5: Alle Tests ausführen**
```
uv run pytest tests/ -v
```
Erwartete Ausgabe: alle Tests `PASSED`
- [ ] **Schritt 6: App manuell testen**
```
uv run whisper-local
```
Prüfen:
- App startet ohne Fehler
- USB/Bluetooth-Mikrofon anstecken → kein Absturz
- Konfiguriertes Mikrofon abziehen (falls gesetzt) → Toast erscheint, Tray-Tooltip zeigt Warnung
- Mikrofon wieder anstecken → Toast „verbunden", Warnung verschwindet
- [ ] **Schritt 7: Committen**
```
git add whisper_local/__main__.py
git commit -m "feat(app): Mikrofon-Monitor in App integriert"
```