Files
whisper-local/docs/superpowers/plans/2026-05-14-microphone-monitor.md
2026-05-14 17:29:57 +02:00

24 KiB
Raw Permalink Blame History

Mikrofon-Monitor Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Mikrofon-Geräteänderungen erkennen, bei fehlendem konfiguriertem Mikrofon automatisch auf Standard-Mikrofon wechseln und den Nutzer per Toast und Tray-Tooltip benachrichtigen.

Architecture: Neues whisper_local/microphone/ Paket mit MicrophoneMonitor-Protocol und create_monitor()-Factory (analog zu whisper_local/media/). Windows nutzt IMMNotificationClient via comtypes mit Fallback auf Polling; alle anderen Plattformen nutzen Polling (asyncio.sleep(2.5)). Benachrichtigungen laufen über notify-py (cross-platform) und PystrayApp.set_warning().

Tech Stack: Python 3.13+, sounddevice (device listing), comtypes (Windows COM), notify-py (Toast), pystray (Tray-Tooltip), pytest-asyncio (Tests)


Dateiübersicht

Aktion Datei Zweck
Erstellen whisper_local/microphone/__init__.py Protocol + Factory
Erstellen whisper_local/microphone/_poll.py Polling-Implementierung
Erstellen whisper_local/microphone/_win32.py Windows IMMNotificationClient
Erstellen whisper_local/tray/_notification.py notify-py Wrapper
Ändern whisper_local/tray/_tray.py set_warning() zu PystrayApp + NoOpTray
Ändern whisper_local/__main__.py Monitor-Integration in App
Ändern pyproject.toml notify-py + comtypes als Abhängigkeiten
Erstellen tests/test_microphone_monitor.py Tests für PollMonitor

Task 1: Abhängigkeiten + _notification.py

Files:

  • Modify: pyproject.toml

  • Create: whisper_local/tray/_notification.py

  • Schritt 1: Abhängigkeiten in pyproject.toml eintragen

In der dependencies-Liste nach "darkdetect>=0.8.0", folgende Zeilen ergänzen:

    "notify-py>=0.3.43",
    "comtypes>=1.4.0; sys_platform == 'win32'",
  • Schritt 2: Lock-File aktualisieren
uv lock

Erwartete Ausgabe: Resolved N packages ohne Fehler.

  • Schritt 3: _notification.py anlegen
# whisper_local/tray/_notification.py
"""Desktop-Benachrichtigungen via notify-py."""
import logging

logger = logging.getLogger(__name__)

_APP_NAME = "whisper-local"


def notify(title: str, message: str) -> None:
    """Zeigt eine Desktop-Benachrichtigung. Bei Fehler wird nur geloggt."""
    try:
        from notifypy import Notify
        n = Notify()
        n.application_name = _APP_NAME
        n.title = title
        n.message = message
        n.send()
    except Exception:
        logger.warning("Benachrichtigung fehlgeschlagen: %s  %s", title, message)
  • Schritt 4: Importtest
uv run python -c "from whisper_local.tray._notification import notify; print('OK')"

Erwartete Ausgabe: OK

  • Schritt 5: Committen
git add pyproject.toml uv.lock whisper_local/tray/_notification.py
git commit -m "feat(notify): notify-py + _notification.py Wrapper"

Task 2: MicrophoneMonitor Protocol + Factory-Skeleton

Files:

  • Create: whisper_local/microphone/__init__.py

  • Schritt 1: Paket anlegen

# whisper_local/microphone/__init__.py
"""Mikrofon-Geräteüberwachung — plattformspezifische Backends."""
import sys
from collections.abc import Awaitable, Callable
from typing import Protocol


class MicrophoneMonitor(Protocol):
    on_device_added: Callable[[str], Awaitable[None]] | None
    on_device_removed: Callable[[str], Awaitable[None]] | None
    on_configured_missing: Callable[[], Awaitable[None]] | None

    async def start(self) -> None: ...
    def stop(self) -> None: ...


def create_monitor(configured_device: str | None) -> MicrophoneMonitor:
    """Erstellt den plattformspezifischen Mikrofon-Monitor."""
    if sys.platform == "win32":
        from whisper_local.microphone._win32 import Win32Monitor
        return Win32Monitor(configured_device)
    from whisper_local.microphone._poll import PollMonitor
    return PollMonitor(configured_device)
  • Schritt 2: Importtest
uv run python -c "from whisper_local.microphone import create_monitor; print('OK')"

Erwartete Ausgabe: OK (auf Windows schlägt das vorerst fehl, weil _win32.py noch nicht existiert — das ist OK, kommt in Task 5)

  • Schritt 3: Committen
git add whisper_local/microphone/__init__.py
git commit -m "feat(microphone): Protocol + create_monitor() Factory-Skeleton"

Task 3: PollMonitor — Geräteerkennung (TDD)

Files:

  • Create: whisper_local/microphone/_poll.py

  • Create: tests/test_microphone_monitor.py

  • Schritt 1: Testdatei anlegen (schlägt zunächst fehl)

# tests/test_microphone_monitor.py
import asyncio
from unittest.mock import AsyncMock, patch

import pytest

from whisper_local.microphone._poll import PollMonitor


def _fake_devices(names: list[str]) -> list[dict]:
    return [{"name": n, "max_input_channels": 1} for n in names]


@pytest.mark.asyncio
async def test_on_device_added_fires_when_device_appears():
    monitor = PollMonitor(configured_device=None, interval=0.05)
    event = asyncio.Event()
    added: list[str] = []

    async def on_added(name: str) -> None:
        added.append(name)
        event.set()

    monitor.on_device_added = on_added
    call_count = 0

    def fake_query():
        nonlocal call_count
        call_count += 1
        if call_count == 1:
            return _fake_devices(["Mic A"])
        return _fake_devices(["Mic A", "Mic B"])

    with patch("sounddevice.query_devices", side_effect=fake_query):
        await monitor.start()
        await asyncio.wait_for(event.wait(), timeout=1.0)
        monitor.stop()

    assert added == ["Mic B"]


@pytest.mark.asyncio
async def test_on_device_removed_fires_when_device_disappears():
    monitor = PollMonitor(configured_device=None, interval=0.05)
    event = asyncio.Event()
    removed: list[str] = []

    async def on_removed(name: str) -> None:
        removed.append(name)
        event.set()

    monitor.on_device_removed = on_removed
    call_count = 0

    def fake_query():
        nonlocal call_count
        call_count += 1
        if call_count == 1:
            return _fake_devices(["Mic A", "Mic B"])
        return _fake_devices(["Mic A"])

    with patch("sounddevice.query_devices", side_effect=fake_query):
        await monitor.start()
        await asyncio.wait_for(event.wait(), timeout=1.0)
        monitor.stop()

    assert removed == ["Mic B"]
  • Schritt 2: Tests ausführen — müssen FEHLSCHLAGEN
uv run pytest tests/test_microphone_monitor.py -v

Erwartete Ausgabe: ModuleNotFoundError: No module named 'whisper_local.microphone._poll'

  • Schritt 3: _poll.py implementieren
# whisper_local/microphone/_poll.py
"""Polling-basierter Mikrofon-Monitor (cross-platform)."""
import asyncio
import logging
from collections.abc import Awaitable, Callable

import sounddevice as sd

logger = logging.getLogger(__name__)


class PollMonitor:
    def __init__(self, configured_device: str | None, interval: float = 2.5):
        self.configured_device = configured_device
        self.interval = interval
        self.on_device_added: Callable[[str], Awaitable[None]] | None = None
        self.on_device_removed: Callable[[str], Awaitable[None]] | None = None
        self.on_configured_missing: Callable[[], Awaitable[None]] | None = None
        self._task: asyncio.Task | None = None
        self._known_devices: set[str] = set()

    def _current_devices(self) -> set[str]:
        try:
            return {
                dev["name"]
                for dev in sd.query_devices()
                if dev["max_input_channels"] > 0
            }
        except Exception:
            logger.exception("Fehler beim Abfragen der Audiogeräte")
            return self._known_devices.copy()

    async def start(self) -> None:
        self._known_devices = self._current_devices()
        self._task = asyncio.create_task(self._loop())

    def stop(self) -> None:
        if self._task is not None:
            self._task.cancel()
            self._task = None

    async def _loop(self) -> None:
        while True:
            await asyncio.sleep(self.interval)
            current = self._current_devices()
            added = current - self._known_devices
            removed = self._known_devices - current
            self._known_devices = current

            for name in added:
                if self.on_device_added:
                    await self.on_device_added(name)

            for name in removed:
                if self.on_device_removed:
                    await self.on_device_removed(name)
  • Schritt 4: Tests ausführen — müssen BESTEHEN
uv run pytest tests/test_microphone_monitor.py -v

Erwartete Ausgabe:

PASSED tests/test_microphone_monitor.py::test_on_device_added_fires_when_device_appears
PASSED tests/test_microphone_monitor.py::test_on_device_removed_fires_when_device_disappears
  • Schritt 5: Committen
git add whisper_local/microphone/_poll.py tests/test_microphone_monitor.py
git commit -m "feat(microphone): PollMonitor mit Geräteerkennung (TDD)"

Task 4: PollMonitor — sofortige Startprüfung (TDD)

Files:

  • Modify: whisper_local/microphone/_poll.py

  • Modify: tests/test_microphone_monitor.py

  • Schritt 1: Zwei neue Tests zur Testdatei hinzufügen (nach den bestehenden Tests einfügen)

@pytest.mark.asyncio
async def test_on_configured_missing_fires_immediately_at_start():
    monitor = PollMonitor(configured_device="Headset USB", interval=99.0)
    missing_called = asyncio.Event()

    async def on_missing() -> None:
        missing_called.set()

    monitor.on_configured_missing = on_missing

    with patch("sounddevice.query_devices", return_value=_fake_devices(["Mic A"])):
        await monitor.start()

    assert missing_called.is_set()
    monitor.stop()


@pytest.mark.asyncio
async def test_on_configured_missing_does_not_fire_when_device_present():
    monitor = PollMonitor(configured_device="Headset USB", interval=99.0)
    missing_mock = AsyncMock()
    monitor.on_configured_missing = missing_mock

    with patch("sounddevice.query_devices", return_value=_fake_devices(["Headset USB", "Mic A"])):
        await monitor.start()

    missing_mock.assert_not_called()
    monitor.stop()
  • Schritt 2: Tests ausführen — die zwei neuen müssen FEHLSCHLAGEN
uv run pytest tests/test_microphone_monitor.py::test_on_configured_missing_fires_immediately_at_start tests/test_microphone_monitor.py::test_on_configured_missing_does_not_fire_when_device_present -v

Erwartete Ausgabe: FAILED für beide neuen Tests.

  • Schritt 3: PollMonitor.start() um sofortige Prüfung erweitern

In whisper_local/microphone/_poll.py die Methode start() ersetzen:

    async def start(self) -> None:
        self._known_devices = self._current_devices()
        if (
            self.configured_device
            and self.configured_device not in self._known_devices
            and self.on_configured_missing
        ):
            await self.on_configured_missing()
        self._task = asyncio.create_task(self._loop())
  • Schritt 4: Alle Tests ausführen — müssen BESTEHEN
uv run pytest tests/test_microphone_monitor.py -v

Erwartete Ausgabe: 4× PASSED

  • Schritt 5: Committen
git add whisper_local/microphone/_poll.py tests/test_microphone_monitor.py
git commit -m "feat(microphone): PollMonitor meldet fehlendes Gerät sofort beim Start"

Task 5: Win32Monitor mit IMMNotificationClient

Files:

  • Create: whisper_local/microphone/_win32.py

  • Schritt 1: _win32.py anlegen

# whisper_local/microphone/_win32.py
"""Windows Mikrofon-Monitor via IMMNotificationClient (Core Audio API)."""
import asyncio
import ctypes
import logging
from collections.abc import Awaitable, Callable

import sounddevice as sd

logger = logging.getLogger(__name__)

_CLSID_MMDeviceEnumerator = "{BCDE0395-E52F-467C-8E3D-C4579291692E}"
_IID_IMMDeviceEnumerator = "{A95664D2-9614-4F35-A746-DE8DB63617E6}"
_IID_IMMNotificationClient = "{7991EEC9-7E89-4D85-8390-6C703CEC60C0}"


def _build_com_interfaces():
    """Definiert IMMDeviceEnumerator und IMMNotificationClient via comtypes."""
    import comtypes
    from comtypes import COMMETHOD, GUID, HRESULT, IUnknown, POINTER

    class _IMMNotificationClient(IUnknown):
        _iid_ = GUID(_IID_IMMNotificationClient)
        _methods_ = [
            COMMETHOD([], HRESULT, "OnDeviceStateChanged",
                      (["in"], ctypes.c_wchar_p, "pwstrDeviceId"),
                      (["in"], ctypes.c_uint, "dwNewState")),
            COMMETHOD([], HRESULT, "OnDeviceAdded",
                      (["in"], ctypes.c_wchar_p, "pwstrDeviceId")),
            COMMETHOD([], HRESULT, "OnDeviceRemoved",
                      (["in"], ctypes.c_wchar_p, "pwstrDeviceId")),
            COMMETHOD([], HRESULT, "OnDefaultDeviceChanged",
                      (["in"], ctypes.c_int, "flow"),
                      (["in"], ctypes.c_int, "role"),
                      (["in"], ctypes.c_wchar_p, "pwstrDefaultDeviceId")),
            COMMETHOD([], HRESULT, "OnPropertyValueChanged",
                      (["in"], ctypes.c_wchar_p, "pwstrDeviceId"),
                      (["in"], ctypes.c_void_p, "key")),
        ]

    class _IMMDeviceEnumerator(IUnknown):
        _iid_ = GUID(_IID_IMMDeviceEnumerator)
        _methods_ = [
            COMMETHOD([], HRESULT, "EnumAudioEndpoints",
                      (["in"], ctypes.c_int, "dataFlow"),
                      (["in"], ctypes.c_uint, "dwStateMask"),
                      (["out"], POINTER(IUnknown), "ppDevices")),
            COMMETHOD([], HRESULT, "GetDefaultAudioEndpoint",
                      (["in"], ctypes.c_int, "dataFlow"),
                      (["in"], ctypes.c_int, "role"),
                      (["out"], POINTER(IUnknown), "ppEndpoint")),
            COMMETHOD([], HRESULT, "GetDevice",
                      (["in"], ctypes.c_wchar_p, "pwstrId"),
                      (["out"], POINTER(IUnknown), "ppDevice")),
            COMMETHOD([], HRESULT, "RegisterEndpointNotificationCallback",
                      (["in"], POINTER(_IMMNotificationClient), "pClient")),
            COMMETHOD([], HRESULT, "UnregisterEndpointNotificationCallback",
                      (["in"], POINTER(_IMMNotificationClient), "pClient")),
        ]

    return _IMMNotificationClient, _IMMDeviceEnumerator


def _build_client_class(IMMNotificationClient, callback):
    """Erstellt eine comtypes.COMObject-Implementierung von IMMNotificationClient."""
    import comtypes

    class _NotificationClientImpl(comtypes.COMObject):
        _com_interfaces_ = [IMMNotificationClient]

        def OnDeviceStateChanged(self, pwstrDeviceId, dwNewState):
            callback()
            return 0

        def OnDeviceAdded(self, pwstrDeviceId):
            callback()
            return 0

        def OnDeviceRemoved(self, pwstrDeviceId):
            callback()
            return 0

        def OnDefaultDeviceChanged(self, flow, role, pwstrDefaultDeviceId):
            return 0

        def OnPropertyValueChanged(self, pwstrDeviceId, key):
            return 0

    return _NotificationClientImpl()


class Win32Monitor:
    def __init__(self, configured_device: str | None):
        self.configured_device = configured_device
        self.on_device_added: Callable[[str], Awaitable[None]] | None = None
        self.on_device_removed: Callable[[str], Awaitable[None]] | None = None
        self.on_configured_missing: Callable[[], Awaitable[None]] | None = None
        self._loop: asyncio.AbstractEventLoop | None = None
        self._known_devices: set[str] = set()
        self._enumerator = None
        self._client = None
        self._fallback = None

    def _current_devices(self) -> set[str]:
        try:
            return {
                dev["name"]
                for dev in sd.query_devices()
                if dev["max_input_channels"] > 0
            }
        except Exception:
            logger.exception("Fehler beim Abfragen der Audiogeräte")
            return self._known_devices.copy()

    async def start(self) -> None:
        self._loop = asyncio.get_running_loop()
        self._known_devices = self._current_devices()

        if (
            self.configured_device
            and self.configured_device not in self._known_devices
            and self.on_configured_missing
        ):
            await self.on_configured_missing()

        try:
            self._start_com()
        except Exception:
            logger.warning(
                "IMMNotificationClient nicht verfügbar, Fallback auf Polling",
                exc_info=True,
            )
            from whisper_local.microphone._poll import PollMonitor
            fallback = PollMonitor(self.configured_device)
            fallback.on_device_added = self.on_device_added
            fallback.on_device_removed = self.on_device_removed
            fallback._known_devices = self._known_devices
            self._fallback = fallback
            self._fallback._task = asyncio.create_task(self._fallback._loop())

    def _start_com(self) -> None:
        import comtypes
        import comtypes.client
        from comtypes import GUID

        comtypes.CoInitialize()
        IMMNotificationClient, IMMDeviceEnumerator = _build_com_interfaces()
        self._enumerator = comtypes.client.CreateObject(
            GUID(_CLSID_MMDeviceEnumerator),
            interface=IMMDeviceEnumerator,
        )
        self._client = _build_client_class(IMMNotificationClient, self._on_com_event)
        self._enumerator.RegisterEndpointNotificationCallback(self._client)

    def _on_com_event(self) -> None:
        if self._loop is not None:
            self._loop.call_soon_threadsafe(
                lambda: asyncio.ensure_future(self._handle_change())
            )

    async def _handle_change(self) -> None:
        current = self._current_devices()
        added = current - self._known_devices
        removed = self._known_devices - current
        self._known_devices = current

        for name in added:
            if self.on_device_added:
                await self.on_device_added(name)

        for name in removed:
            if self.on_device_removed:
                await self.on_device_removed(name)

    def stop(self) -> None:
        if self._fallback is not None:
            self._fallback.stop()
            return
        if self._enumerator is not None and self._client is not None:
            try:
                self._enumerator.UnregisterEndpointNotificationCallback(self._client)
            except Exception:
                logger.warning("Fehler beim Deregistrieren des Notification-Clients")
            try:
                import comtypes
                comtypes.CoUninitialize()
            except Exception:
                pass
  • Schritt 2: Importtest auf Windows
uv run python -c "from whisper_local.microphone._win32 import Win32Monitor; print('OK')"

Erwartete Ausgabe: OK

  • Schritt 3: Alle Tests laufen lassen
uv run pytest tests/ -v

Erwartete Ausgabe: alle bestehenden Tests PASSED

  • Schritt 4: Committen
git add whisper_local/microphone/_win32.py
git commit -m "feat(microphone): Win32Monitor via IMMNotificationClient mit Polling-Fallback"

Task 6: PystrayApp.set_warning() + NoOpTray.set_warning()

Files:

  • Modify: whisper_local/tray/_tray.py

  • Schritt 1: set_warning() zu PystrayApp hinzufügen

In whisper_local/tray/_tray.py nach der Methode set_state die neue Methode einfügen:

    def set_warning(self, msg: str | None) -> None:
        """Setzt Tray-Titel auf Warnung oder zurück auf normal (thread-sicher)."""
        if self._icon is not None:
            self._icon.title = "whisper-local" if msg is None else f"whisper-local ⚠ {msg}"
  • Schritt 2: set_warning() zu NoOpTray hinzufügen

In NoOpTray nach set_state einfügen:

    def set_warning(self, msg: str | None) -> None:
        pass
  • Schritt 3: Importtest
uv run python -c "from whisper_local.tray._tray import PystrayApp, NoOpTray; print('OK')"

Erwartete Ausgabe: OK

  • Schritt 4: Committen
git add whisper_local/tray/_tray.py
git commit -m "feat(tray): set_warning() für Tray-Tooltip-Warnung"

Task 7: App-Integration

Files:

  • Modify: whisper_local/__main__.py

  • Schritt 1: Import und Monitor-Erstellung in App.__init__ hinzufügen

In whisper_local/__main__.py den Import-Block am Anfang der Datei ergänzen:

from whisper_local.microphone import create_monitor

In App.__init__ nach self.hotkey = create_listener(key_name=config.hotkey) einfügen:

        self.monitor = create_monitor(config.microphone or None)
        self.monitor.on_device_added = self._on_microphone_added
        self.monitor.on_device_removed = self._on_microphone_removed
        self.monitor.on_configured_missing = self._on_configured_microphone_missing
  • Schritt 2: Callbacks implementieren (in App, nach _open_settings)
    async def _on_configured_microphone_missing(self) -> None:
        """Konfiguriertes Mikrofon nicht gefunden — auf Standard wechseln."""
        from whisper_local.tray._notification import notify
        device_name = self._config.microphone or "Mikrofon"
        logger.warning("Konfiguriertes Mikrofon '%s' nicht gefunden, nutze Standard", device_name)
        self.recorder = Recorder(
            sample_rate=self._config.sample_rate,
            channels=self._config.channels,
            min_duration=self._config.min_duration,
            device=None,
        )
        notify(
            "Mikrofon nicht gefunden",
            f"„{device_name}" ist nicht verfügbar. Standard-Mikrofon wird verwendet.",
        )
        self.tray.set_warning("Mikrofon nicht gefunden")

    async def _on_microphone_added(self, device_name: str) -> None:
        """Neues Mikrofon erkannt — konfiguriertes Gerät ggf. wiederherstellen."""
        if device_name != self._config.microphone:
            return
        from whisper_local.tray._notification import notify
        logger.info("Konfiguriertes Mikrofon '%s' wieder verfügbar", device_name)
        self.recorder = Recorder(
            sample_rate=self._config.sample_rate,
            channels=self._config.channels,
            min_duration=self._config.min_duration,
            device=self._config.microphone or None,
        )
        notify("Mikrofon verbunden", f"„{device_name}" ist wieder verfügbar.")
        self.tray.set_warning(None)

    async def _on_microphone_removed(self, device_name: str) -> None:
        """Mikrofon entfernt — konfiguriertes Gerät → Fallback auslösen."""
        logger.info("Mikrofon entfernt: %s", device_name)
        if device_name == self._config.microphone:
            await self._on_configured_microphone_missing()
  • Schritt 3: Monitor in App.run() starten

In App.run() nach self._hotkey_task = asyncio.create_task(self.hotkey.listen()) einfügen:

        asyncio.create_task(self.monitor.start())
  • Schritt 4: Monitor in _on_config_reload neu starten

In _on_config_reload nach dem Block mit self.recorder = Recorder(...) einfügen:

        self.monitor.stop()
        self.monitor = create_monitor(new_config.microphone or None)
        self.monitor.on_device_added = self._on_microphone_added
        self.monitor.on_device_removed = self._on_microphone_removed
        self.monitor.on_configured_missing = self._on_configured_microphone_missing
        if self._loop is not None:
            asyncio.run_coroutine_threadsafe(self.monitor.start(), self._loop)
        self.tray.set_warning(None)
  • Schritt 5: Alle Tests ausführen
uv run pytest tests/ -v

Erwartete Ausgabe: alle Tests PASSED

  • Schritt 6: App manuell testen
uv run whisper-local

Prüfen:

  • App startet ohne Fehler

  • USB/Bluetooth-Mikrofon anstecken → kein Absturz

  • Konfiguriertes Mikrofon abziehen (falls gesetzt) → Toast erscheint, Tray-Tooltip zeigt Warnung

  • Mikrofon wieder anstecken → Toast „verbunden", Warnung verschwindet

  • Schritt 7: Committen

git add whisper_local/__main__.py
git commit -m "feat(app): Mikrofon-Monitor in App integriert"