Knowledgeable privacy aficionados of Lemmy, perhaps you can help.

I’m searching for a U.S. English speech to text program I can use for note taking, dictation, and internet searching that runs locally on Windows and doesn’t collect information or send it off to either the software company or third parties. I’m looking for an out-of-the-box easy option first- if needed I can explore writing scripts and using an LLM to craft a UI, but I’m not looking for something that would require a significant amount of extra building or coding. Ideally it’d be FLOSS and be light on compute, but I’m not averse to paying for a solid product that meets the privacy requirement and if it’s not ludicrously heavy on compute, that’s okay.

Vosk seems a good option, though in my brief exploration, I haven’t found a UI or scripts to use it easily.

WhisperAI, while very accurate, doesn’t natively support real-time speech to text, though there are some mods that try and address that.

Anything I’m completely missing?

    • eyes_uncl0uded@lemmy.worldOP
      link
      fedilink
      arrow-up
      3
      ·
      2 days ago

      I hear you. VST/VST3 support is inconsistent through wine and while there are some alternatives to some plugins, I’m not committed enough to give up the ogs I love just yet.

      To my knowledge, Recall rollout is currently only to Copilot+ PCs, and my debloated Win11 is safe for now. I’ll remain vigilant, though 🫡

  • eyes_uncl0uded@lemmy.worldOP
    link
    fedilink
    arrow-up
    3
    ·
    2 days ago

    Working with an LLM, I’ve been able to use vosk and python to get something that works well, given a small delay. Using pyautogui, sounddevice, and keyboard, it works. I also implemented a toggle for transcription using a keyboard shortcut. Not as nice as having a gui and more options, but it does the job.

    I’d still be interested if anyone knows of a simpler “just works” solution, but this does seem like it’ll suit my needs. Hopefully others can find this helpful.