Files
whisper-local/docs/superpowers/plans/2026-05-14-microphone-monitor.md
T

769 lines
24 KiB
Markdown
Raw Normal View History

# Mikrofon-Monitor Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Mikrofon-Geräteänderungen erkennen, bei fehlendem konfiguriertem Mikrofon automatisch auf Standard-Mikrofon wechseln und den Nutzer per Toast und Tray-Tooltip benachrichtigen.
**Architecture:** Neues `whisper_local/microphone/` Paket mit `MicrophoneMonitor`-Protocol und `create_monitor()`-Factory (analog zu `whisper_local/media/`). Windows nutzt `IMMNotificationClient` via `comtypes` mit Fallback auf Polling; alle anderen Plattformen nutzen Polling (`asyncio.sleep(2.5)`). Benachrichtigungen laufen über `notify-py` (cross-platform) und `PystrayApp.set_warning()`.
**Tech Stack:** Python 3.13+, `sounddevice` (device listing), `comtypes` (Windows COM), `notify-py` (Toast), `pystray` (Tray-Tooltip), `pytest-asyncio` (Tests)
---
## Dateiübersicht
| Aktion | Datei | Zweck |
|--------|-------|-------|
| Erstellen | `whisper_local/microphone/__init__.py` | Protocol + Factory |
| Erstellen | `whisper_local/microphone/_poll.py` | Polling-Implementierung |
| Erstellen | `whisper_local/microphone/_win32.py` | Windows IMMNotificationClient |
| Erstellen | `whisper_local/tray/_notification.py` | notify-py Wrapper |
| Ändern | `whisper_local/tray/_tray.py` | `set_warning()` zu `PystrayApp` + `NoOpTray` |
| Ändern | `whisper_local/__main__.py` | Monitor-Integration in `App` |
| Ändern | `pyproject.toml` | `notify-py` + `comtypes` als Abhängigkeiten |
| Erstellen | `tests/test_microphone_monitor.py` | Tests für `PollMonitor` |
---
## Task 1: Abhängigkeiten + `_notification.py`
**Files:**
- Modify: `pyproject.toml`
- Create: `whisper_local/tray/_notification.py`
- [ ] **Schritt 1: Abhängigkeiten in `pyproject.toml` eintragen**
In der `dependencies`-Liste nach `"darkdetect>=0.8.0",` folgende Zeilen ergänzen:
```toml
"notify-py>=0.3.43",
"comtypes>=1.4.0; sys_platform == 'win32'",
```
- [ ] **Schritt 2: Lock-File aktualisieren**
```
uv lock
```
Erwartete Ausgabe: `Resolved N packages` ohne Fehler.
- [ ] **Schritt 3: `_notification.py` anlegen**
```python
# whisper_local/tray/_notification.py
"""Desktop-Benachrichtigungen via notify-py."""
import logging
logger = logging.getLogger(__name__)
_APP_NAME = "whisper-local"
def notify(title: str, message: str) -> None:
"""Zeigt eine Desktop-Benachrichtigung. Bei Fehler wird nur geloggt."""
try:
from notifypy import Notify
n = Notify()
n.application_name = _APP_NAME
n.title = title
n.message = message
n.send()
except Exception:
logger.warning("Benachrichtigung fehlgeschlagen: %s %s", title, message)
```
- [ ] **Schritt 4: Importtest**
```
uv run python -c "from whisper_local.tray._notification import notify; print('OK')"
```
Erwartete Ausgabe: `OK`
- [ ] **Schritt 5: Committen**
```
git add pyproject.toml uv.lock whisper_local/tray/_notification.py
git commit -m "feat(notify): notify-py + _notification.py Wrapper"
```
---
## Task 2: `MicrophoneMonitor` Protocol + Factory-Skeleton
**Files:**
- Create: `whisper_local/microphone/__init__.py`
- [ ] **Schritt 1: Paket anlegen**
```python
# whisper_local/microphone/__init__.py
"""Mikrofon-Geräteüberwachung — plattformspezifische Backends."""
import sys
from collections.abc import Awaitable, Callable
from typing import Protocol
class MicrophoneMonitor(Protocol):
on_device_added: Callable[[str], Awaitable[None]] | None
on_device_removed: Callable[[str], Awaitable[None]] | None
on_configured_missing: Callable[[], Awaitable[None]] | None
async def start(self) -> None: ...
def stop(self) -> None: ...
def create_monitor(configured_device: str | None) -> MicrophoneMonitor:
"""Erstellt den plattformspezifischen Mikrofon-Monitor."""
if sys.platform == "win32":
from whisper_local.microphone._win32 import Win32Monitor
return Win32Monitor(configured_device)
from whisper_local.microphone._poll import PollMonitor
return PollMonitor(configured_device)
```
- [ ] **Schritt 2: Importtest**
```
uv run python -c "from whisper_local.microphone import create_monitor; print('OK')"
```
Erwartete Ausgabe: `OK` (auf Windows schlägt das vorerst fehl, weil `_win32.py` noch nicht existiert — das ist OK, kommt in Task 5)
- [ ] **Schritt 3: Committen**
```
git add whisper_local/microphone/__init__.py
git commit -m "feat(microphone): Protocol + create_monitor() Factory-Skeleton"
```
---
## Task 3: `PollMonitor` — Geräteerkennung (TDD)
**Files:**
- Create: `whisper_local/microphone/_poll.py`
- Create: `tests/test_microphone_monitor.py`
- [ ] **Schritt 1: Testdatei anlegen (schlägt zunächst fehl)**
```python
# tests/test_microphone_monitor.py
import asyncio
from unittest.mock import AsyncMock, patch
import pytest
from whisper_local.microphone._poll import PollMonitor
def _fake_devices(names: list[str]) -> list[dict]:
return [{"name": n, "max_input_channels": 1} for n in names]
@pytest.mark.asyncio
async def test_on_device_added_fires_when_device_appears():
monitor = PollMonitor(configured_device=None, interval=0.05)
event = asyncio.Event()
added: list[str] = []
async def on_added(name: str) -> None:
added.append(name)
event.set()
monitor.on_device_added = on_added
call_count = 0
def fake_query():
nonlocal call_count
call_count += 1
if call_count == 1:
return _fake_devices(["Mic A"])
return _fake_devices(["Mic A", "Mic B"])
with patch("sounddevice.query_devices", side_effect=fake_query):
await monitor.start()
await asyncio.wait_for(event.wait(), timeout=1.0)
monitor.stop()
assert added == ["Mic B"]
@pytest.mark.asyncio
async def test_on_device_removed_fires_when_device_disappears():
monitor = PollMonitor(configured_device=None, interval=0.05)
event = asyncio.Event()
removed: list[str] = []
async def on_removed(name: str) -> None:
removed.append(name)
event.set()
monitor.on_device_removed = on_removed
call_count = 0
def fake_query():
nonlocal call_count
call_count += 1
if call_count == 1:
return _fake_devices(["Mic A", "Mic B"])
return _fake_devices(["Mic A"])
with patch("sounddevice.query_devices", side_effect=fake_query):
await monitor.start()
await asyncio.wait_for(event.wait(), timeout=1.0)
monitor.stop()
assert removed == ["Mic B"]
```
- [ ] **Schritt 2: Tests ausführen — müssen FEHLSCHLAGEN**
```
uv run pytest tests/test_microphone_monitor.py -v
```
Erwartete Ausgabe: `ModuleNotFoundError: No module named 'whisper_local.microphone._poll'`
- [ ] **Schritt 3: `_poll.py` implementieren**
```python
# whisper_local/microphone/_poll.py
"""Polling-basierter Mikrofon-Monitor (cross-platform)."""
import asyncio
import logging
from collections.abc import Awaitable, Callable
import sounddevice as sd
logger = logging.getLogger(__name__)
class PollMonitor:
def __init__(self, configured_device: str | None, interval: float = 2.5):
self.configured_device = configured_device
self.interval = interval
self.on_device_added: Callable[[str], Awaitable[None]] | None = None
self.on_device_removed: Callable[[str], Awaitable[None]] | None = None
self.on_configured_missing: Callable[[], Awaitable[None]] | None = None
self._task: asyncio.Task | None = None
self._known_devices: set[str] = set()
def _current_devices(self) -> set[str]:
try:
return {
dev["name"]
for dev in sd.query_devices()
if dev["max_input_channels"] > 0
}
except Exception:
logger.exception("Fehler beim Abfragen der Audiogeräte")
return self._known_devices.copy()
async def start(self) -> None:
self._known_devices = self._current_devices()
self._task = asyncio.create_task(self._loop())
def stop(self) -> None:
if self._task is not None:
self._task.cancel()
self._task = None
async def _loop(self) -> None:
while True:
await asyncio.sleep(self.interval)
current = self._current_devices()
added = current - self._known_devices
removed = self._known_devices - current
self._known_devices = current
for name in added:
if self.on_device_added:
await self.on_device_added(name)
for name in removed:
if self.on_device_removed:
await self.on_device_removed(name)
```
- [ ] **Schritt 4: Tests ausführen — müssen BESTEHEN**
```
uv run pytest tests/test_microphone_monitor.py -v
```
Erwartete Ausgabe:
```
PASSED tests/test_microphone_monitor.py::test_on_device_added_fires_when_device_appears
PASSED tests/test_microphone_monitor.py::test_on_device_removed_fires_when_device_disappears
```
- [ ] **Schritt 5: Committen**
```
git add whisper_local/microphone/_poll.py tests/test_microphone_monitor.py
git commit -m "feat(microphone): PollMonitor mit Geräteerkennung (TDD)"
```
---
## Task 4: `PollMonitor` — sofortige Startprüfung (TDD)
**Files:**
- Modify: `whisper_local/microphone/_poll.py`
- Modify: `tests/test_microphone_monitor.py`
- [ ] **Schritt 1: Zwei neue Tests zur Testdatei hinzufügen** (nach den bestehenden Tests einfügen)
```python
@pytest.mark.asyncio
async def test_on_configured_missing_fires_immediately_at_start():
monitor = PollMonitor(configured_device="Headset USB", interval=99.0)
missing_called = asyncio.Event()
async def on_missing() -> None:
missing_called.set()
monitor.on_configured_missing = on_missing
with patch("sounddevice.query_devices", return_value=_fake_devices(["Mic A"])):
await monitor.start()
assert missing_called.is_set()
monitor.stop()
@pytest.mark.asyncio
async def test_on_configured_missing_does_not_fire_when_device_present():
monitor = PollMonitor(configured_device="Headset USB", interval=99.0)
missing_mock = AsyncMock()
monitor.on_configured_missing = missing_mock
with patch("sounddevice.query_devices", return_value=_fake_devices(["Headset USB", "Mic A"])):
await monitor.start()
missing_mock.assert_not_called()
monitor.stop()
```
- [ ] **Schritt 2: Tests ausführen — die zwei neuen müssen FEHLSCHLAGEN**
```
uv run pytest tests/test_microphone_monitor.py::test_on_configured_missing_fires_immediately_at_start tests/test_microphone_monitor.py::test_on_configured_missing_does_not_fire_when_device_present -v
```
Erwartete Ausgabe: `FAILED` für beide neuen Tests.
- [ ] **Schritt 3: `PollMonitor.start()` um sofortige Prüfung erweitern**
In `whisper_local/microphone/_poll.py` die Methode `start()` ersetzen:
```python
async def start(self) -> None:
self._known_devices = self._current_devices()
if (
self.configured_device
and self.configured_device not in self._known_devices
and self.on_configured_missing
):
await self.on_configured_missing()
self._task = asyncio.create_task(self._loop())
```
- [ ] **Schritt 4: Alle Tests ausführen — müssen BESTEHEN**
```
uv run pytest tests/test_microphone_monitor.py -v
```
Erwartete Ausgabe: 4× `PASSED`
- [ ] **Schritt 5: Committen**
```
git add whisper_local/microphone/_poll.py tests/test_microphone_monitor.py
git commit -m "feat(microphone): PollMonitor meldet fehlendes Gerät sofort beim Start"
```
---
## Task 5: `Win32Monitor` mit IMMNotificationClient
**Files:**
- Create: `whisper_local/microphone/_win32.py`
- [ ] **Schritt 1: `_win32.py` anlegen**
```python
# whisper_local/microphone/_win32.py
"""Windows Mikrofon-Monitor via IMMNotificationClient (Core Audio API)."""
import asyncio
import ctypes
import logging
from collections.abc import Awaitable, Callable
import sounddevice as sd
logger = logging.getLogger(__name__)
_CLSID_MMDeviceEnumerator = "{BCDE0395-E52F-467C-8E3D-C4579291692E}"
_IID_IMMDeviceEnumerator = "{A95664D2-9614-4F35-A746-DE8DB63617E6}"
_IID_IMMNotificationClient = "{7991EEC9-7E89-4D85-8390-6C703CEC60C0}"
def _build_com_interfaces():
"""Definiert IMMDeviceEnumerator und IMMNotificationClient via comtypes."""
import comtypes
from comtypes import COMMETHOD, GUID, HRESULT, IUnknown, POINTER
class _IMMNotificationClient(IUnknown):
_iid_ = GUID(_IID_IMMNotificationClient)
_methods_ = [
COMMETHOD([], HRESULT, "OnDeviceStateChanged",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId"),
(["in"], ctypes.c_uint, "dwNewState")),
COMMETHOD([], HRESULT, "OnDeviceAdded",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId")),
COMMETHOD([], HRESULT, "OnDeviceRemoved",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId")),
COMMETHOD([], HRESULT, "OnDefaultDeviceChanged",
(["in"], ctypes.c_int, "flow"),
(["in"], ctypes.c_int, "role"),
(["in"], ctypes.c_wchar_p, "pwstrDefaultDeviceId")),
COMMETHOD([], HRESULT, "OnPropertyValueChanged",
(["in"], ctypes.c_wchar_p, "pwstrDeviceId"),
(["in"], ctypes.c_void_p, "key")),
]
class _IMMDeviceEnumerator(IUnknown):
_iid_ = GUID(_IID_IMMDeviceEnumerator)
_methods_ = [
COMMETHOD([], HRESULT, "EnumAudioEndpoints",
(["in"], ctypes.c_int, "dataFlow"),
(["in"], ctypes.c_uint, "dwStateMask"),
(["out"], POINTER(IUnknown), "ppDevices")),
COMMETHOD([], HRESULT, "GetDefaultAudioEndpoint",
(["in"], ctypes.c_int, "dataFlow"),
(["in"], ctypes.c_int, "role"),
(["out"], POINTER(IUnknown), "ppEndpoint")),
COMMETHOD([], HRESULT, "GetDevice",
(["in"], ctypes.c_wchar_p, "pwstrId"),
(["out"], POINTER(IUnknown), "ppDevice")),
COMMETHOD([], HRESULT, "RegisterEndpointNotificationCallback",
(["in"], POINTER(_IMMNotificationClient), "pClient")),
COMMETHOD([], HRESULT, "UnregisterEndpointNotificationCallback",
(["in"], POINTER(_IMMNotificationClient), "pClient")),
]
return _IMMNotificationClient, _IMMDeviceEnumerator
def _build_client_class(IMMNotificationClient, callback):
"""Erstellt eine comtypes.COMObject-Implementierung von IMMNotificationClient."""
import comtypes
class _NotificationClientImpl(comtypes.COMObject):
_com_interfaces_ = [IMMNotificationClient]
def OnDeviceStateChanged(self, pwstrDeviceId, dwNewState):
callback()
return 0
def OnDeviceAdded(self, pwstrDeviceId):
callback()
return 0
def OnDeviceRemoved(self, pwstrDeviceId):
callback()
return 0
def OnDefaultDeviceChanged(self, flow, role, pwstrDefaultDeviceId):
return 0
def OnPropertyValueChanged(self, pwstrDeviceId, key):
return 0
return _NotificationClientImpl()
class Win32Monitor:
def __init__(self, configured_device: str | None):
self.configured_device = configured_device
self.on_device_added: Callable[[str], Awaitable[None]] | None = None
self.on_device_removed: Callable[[str], Awaitable[None]] | None = None
self.on_configured_missing: Callable[[], Awaitable[None]] | None = None
self._loop: asyncio.AbstractEventLoop | None = None
self._known_devices: set[str] = set()
self._enumerator = None
self._client = None
self._fallback = None
def _current_devices(self) -> set[str]:
try:
return {
dev["name"]
for dev in sd.query_devices()
if dev["max_input_channels"] > 0
}
except Exception:
logger.exception("Fehler beim Abfragen der Audiogeräte")
return self._known_devices.copy()
async def start(self) -> None:
self._loop = asyncio.get_running_loop()
self._known_devices = self._current_devices()
if (
self.configured_device
and self.configured_device not in self._known_devices
and self.on_configured_missing
):
await self.on_configured_missing()
try:
self._start_com()
except Exception:
logger.warning(
"IMMNotificationClient nicht verfügbar, Fallback auf Polling",
exc_info=True,
)
from whisper_local.microphone._poll import PollMonitor
fallback = PollMonitor(self.configured_device)
fallback.on_device_added = self.on_device_added
fallback.on_device_removed = self.on_device_removed
fallback._known_devices = self._known_devices
self._fallback = fallback
self._fallback._task = asyncio.create_task(self._fallback._loop())
def _start_com(self) -> None:
import comtypes
import comtypes.client
from comtypes import GUID
comtypes.CoInitialize()
IMMNotificationClient, IMMDeviceEnumerator = _build_com_interfaces()
self._enumerator = comtypes.client.CreateObject(
GUID(_CLSID_MMDeviceEnumerator),
interface=IMMDeviceEnumerator,
)
self._client = _build_client_class(IMMNotificationClient, self._on_com_event)
self._enumerator.RegisterEndpointNotificationCallback(self._client)
def _on_com_event(self) -> None:
if self._loop is not None:
self._loop.call_soon_threadsafe(
lambda: asyncio.ensure_future(self._handle_change())
)
async def _handle_change(self) -> None:
current = self._current_devices()
added = current - self._known_devices
removed = self._known_devices - current
self._known_devices = current
for name in added:
if self.on_device_added:
await self.on_device_added(name)
for name in removed:
if self.on_device_removed:
await self.on_device_removed(name)
def stop(self) -> None:
if self._fallback is not None:
self._fallback.stop()
return
if self._enumerator is not None and self._client is not None:
try:
self._enumerator.UnregisterEndpointNotificationCallback(self._client)
except Exception:
logger.warning("Fehler beim Deregistrieren des Notification-Clients")
try:
import comtypes
comtypes.CoUninitialize()
except Exception:
pass
```
- [ ] **Schritt 2: Importtest auf Windows**
```
uv run python -c "from whisper_local.microphone._win32 import Win32Monitor; print('OK')"
```
Erwartete Ausgabe: `OK`
- [ ] **Schritt 3: Alle Tests laufen lassen**
```
uv run pytest tests/ -v
```
Erwartete Ausgabe: alle bestehenden Tests `PASSED`
- [ ] **Schritt 4: Committen**
```
git add whisper_local/microphone/_win32.py
git commit -m "feat(microphone): Win32Monitor via IMMNotificationClient mit Polling-Fallback"
```
---
## Task 6: `PystrayApp.set_warning()` + `NoOpTray.set_warning()`
**Files:**
- Modify: `whisper_local/tray/_tray.py`
- [ ] **Schritt 1: `set_warning()` zu `PystrayApp` hinzufügen**
In `whisper_local/tray/_tray.py` nach der Methode `set_state` die neue Methode einfügen:
```python
def set_warning(self, msg: str | None) -> None:
"""Setzt Tray-Titel auf Warnung oder zurück auf normal (thread-sicher)."""
if self._icon is not None:
self._icon.title = "whisper-local" if msg is None else f"whisper-local ⚠ {msg}"
```
- [ ] **Schritt 2: `set_warning()` zu `NoOpTray` hinzufügen**
In `NoOpTray` nach `set_state` einfügen:
```python
def set_warning(self, msg: str | None) -> None:
pass
```
- [ ] **Schritt 3: Importtest**
```
uv run python -c "from whisper_local.tray._tray import PystrayApp, NoOpTray; print('OK')"
```
Erwartete Ausgabe: `OK`
- [ ] **Schritt 4: Committen**
```
git add whisper_local/tray/_tray.py
git commit -m "feat(tray): set_warning() für Tray-Tooltip-Warnung"
```
---
## Task 7: App-Integration
**Files:**
- Modify: `whisper_local/__main__.py`
- [ ] **Schritt 1: Import und Monitor-Erstellung in `App.__init__` hinzufügen**
In `whisper_local/__main__.py` den Import-Block am Anfang der Datei ergänzen:
```python
from whisper_local.microphone import create_monitor
```
In `App.__init__` nach `self.hotkey = create_listener(key_name=config.hotkey)` einfügen:
```python
self.monitor = create_monitor(config.microphone or None)
self.monitor.on_device_added = self._on_microphone_added
self.monitor.on_device_removed = self._on_microphone_removed
self.monitor.on_configured_missing = self._on_configured_microphone_missing
```
- [ ] **Schritt 2: Callbacks implementieren** (in `App`, nach `_open_settings`)
```python
async def _on_configured_microphone_missing(self) -> None:
"""Konfiguriertes Mikrofon nicht gefunden — auf Standard wechseln."""
from whisper_local.tray._notification import notify
device_name = self._config.microphone or "Mikrofon"
logger.warning("Konfiguriertes Mikrofon '%s' nicht gefunden, nutze Standard", device_name)
self.recorder = Recorder(
sample_rate=self._config.sample_rate,
channels=self._config.channels,
min_duration=self._config.min_duration,
device=None,
)
notify(
"Mikrofon nicht gefunden",
f"{device_name}" ist nicht verfügbar. Standard-Mikrofon wird verwendet.",
)
self.tray.set_warning("Mikrofon nicht gefunden")
async def _on_microphone_added(self, device_name: str) -> None:
"""Neues Mikrofon erkannt — konfiguriertes Gerät ggf. wiederherstellen."""
if device_name != self._config.microphone:
return
from whisper_local.tray._notification import notify
logger.info("Konfiguriertes Mikrofon '%s' wieder verfügbar", device_name)
self.recorder = Recorder(
sample_rate=self._config.sample_rate,
channels=self._config.channels,
min_duration=self._config.min_duration,
device=self._config.microphone or None,
)
notify("Mikrofon verbunden", f"{device_name}" ist wieder verfügbar.")
self.tray.set_warning(None)
async def _on_microphone_removed(self, device_name: str) -> None:
"""Mikrofon entfernt — konfiguriertes Gerät → Fallback auslösen."""
logger.info("Mikrofon entfernt: %s", device_name)
if device_name == self._config.microphone:
await self._on_configured_microphone_missing()
```
- [ ] **Schritt 3: Monitor in `App.run()` starten**
In `App.run()` nach `self._hotkey_task = asyncio.create_task(self.hotkey.listen())` einfügen:
```python
asyncio.create_task(self.monitor.start())
```
- [ ] **Schritt 4: Monitor in `_on_config_reload` neu starten**
In `_on_config_reload` nach dem Block mit `self.recorder = Recorder(...)` einfügen:
```python
self.monitor.stop()
self.monitor = create_monitor(new_config.microphone or None)
self.monitor.on_device_added = self._on_microphone_added
self.monitor.on_device_removed = self._on_microphone_removed
self.monitor.on_configured_missing = self._on_configured_microphone_missing
if self._loop is not None:
asyncio.run_coroutine_threadsafe(self.monitor.start(), self._loop)
self.tray.set_warning(None)
```
- [ ] **Schritt 5: Alle Tests ausführen**
```
uv run pytest tests/ -v
```
Erwartete Ausgabe: alle Tests `PASSED`
- [ ] **Schritt 6: App manuell testen**
```
uv run whisper-local
```
Prüfen:
- App startet ohne Fehler
- USB/Bluetooth-Mikrofon anstecken → kein Absturz
- Konfiguriertes Mikrofon abziehen (falls gesetzt) → Toast erscheint, Tray-Tooltip zeigt Warnung
- Mikrofon wieder anstecken → Toast „verbunden", Warnung verschwindet
- [ ] **Schritt 7: Committen**
```
git add whisper_local/__main__.py
git commit -m "feat(app): Mikrofon-Monitor in App integriert"
```