Quel outil de Reconnaissance automatique de la parole (speech to text) respectueux de la vie privée et hors-ligne recommandez-vous ?
#ReconnaissanceAutomatiqueDeLaParole #speechRecognition
Quel outil de Reconnaissance automatique de la parole (speech to text) respectueux de la vie privée et hors-ligne recommandez-vous ?
#ReconnaissanceAutomatiqueDeLaParole #speechRecognition
Another failed attempt at rolling out #AI into the real-world: https://www.wsj.com/articles/taco-bell-rethinks-future-of-voice-ai-at-the-drive-through-72990b5a
One might actually think that ordering from a drive-through is a very constrained and well-defined problem, such that it would lend itself easily to #SpeechRecognition or so.
But even the fast food world turns out to be more open and complex than current-gen AI can handle.
(Linux news in previous posts of thread)
FOSS NEWS
VirtualBox 7.2 released with initial support for Linux kernel 6.16 and 6.17, improved Linux Guest Additions support for Oracle Linux 10 and Red Hat Enterprise Linux 10 guests, improved handling of the vboxvideo kernel module in the init script for Linux guests, video decoding acceleration is enabled for Linux hosts when the 3D option is active in settings, GUI improvements, bug fixes:
https://9to5linux.com/virtualbox-7-2-officially-released-with-initial-support-for-linux-kernel-6-17
Organic Maps now displays popular hiking and cycling routes, agricultural and forestry roads are excluded from routing, bookmark names are displayed directly on the map for faster identification, Android app gets track elevation graph and track selection on the map:
https://alternativeto.net/news/2025/8/organic-maps-now-displays-popular-hiking-and-cycling-routes-from-all-over-the-world/
CoMaps v2025.08.13-8 released with UI improvements, support for Irish postcodes, various bug fixes:
https://alternativeto.net/news/2025/8/comaps-v2025-08-13-8-improves-map-design-android-visuals-and-ios-language-options/
Immich 1.137 released with beta timeline fixes, option for custom URLs when generating shared links, new utility to quickly locate large files, fine-grained permissions extended to more API endpoints, etc.:
https://alternativeto.net/news/2025/8/immich-1-137-adds-beta-timeline-improvements-shared-link-custom-url-and-large-file-finder/
Immich 1.138 released with ability to reset PIN code by entering current password, option to reset OAuth IDs, swipe-to-delete functionality for albums for beta timeline users, improved upload and sync capabilities, etc.:
https://alternativeto.net/news/2025/8/immich-update-lets-users-reset-pin-improves-oauth-migration-and-album-sync/
Ghostty terminal GTK build is rewritten to fix various issues on Linux and BSD, including memory issues:
https://www.omgubuntu.co.uk/2025/08/ghostty-terminal-gtk-rewrite-linux
FFmpeg 8.0 will include OpenAI Whisper filter for automatic speech recognition and transcription if built with --enable-whisper flag:
https://www.phoronix.com/news/FFmpeg-Lands-Whisper
(more FOSS news in comment)
PDFs became my real-world AI benchmark. I can’t fill forms by hand, so ChatGPT in Agent mode now handles flat scans, anchors, proofs, and signatures on my phone—showing both the limits and the leverage of assistive AI.
#Accessibility #AssistiveTech #A11y #AI #AgenticAI #PDFForms #AcroForms #Inclusion #MobileFirst #SpeechRecognition #Automation #Productivity #AGI #DocumentAI
Journal of Open Source Software: voice: A Comprehensive R Package for Audio Analysis
{voice}
"...a free, open-source toolkit designed to streamline audio analysis by integrating music theory and advanced computational techniques. It enables researchers to extract, summarize, and analyze voice data efficiently, supporting applications such as speech recognition, speaker identification, and mood inference..."
Voxtral-Mini-3B-2507 – Open source speech understanding model
"#KarenHao only really gets her teeth into this point in the book’s epilogue, “How the Empire Falls.” She takes inspiration from #TeHiku, a #Māori AI #speechrecognition project. Te Hiku seeks to revitalize the #te_reo language through putting archived audio tapes of te reo speakers into an AI model, teaching new generations of Māori.
The tech has been developed on consent and active participation from the Māori community, and it is only licensed to organizations that respect Māori values"
I don't know why they call it vibe coding
@thelinuxEXP I really like Speech Note! It's a fantastic tool for quick and local voice transcription in multiple languages, created by @mkiol
It's incredibly handy for capturing thoughts on the go, conducting interviews, or making voice memos without worrying about language barriers. The app uses strictly locally running LLMs, and its ease of use makes it a standout choice for anyone needing offline transcription services.
I primarily use #WhisperAI for transcription and Piper for voice, but many other models are available as well.
It is available as flatpak and https://github.com/mkiol/dsnote
#TTS #transcription #TextToSpeech #translator translation #offline #machinetranslation #sailfishos #SpeechSynthesis #SpeechRecognition #speechtotext #nmt #linux-desktop #stt #asr #flatpak-applications #SpeechNote
DeepSpeech Is Discontinued
Slow amplitude fluctuations in sounds, critical for #SpeechRecognition, seem poorly represented in the #brainstem. This study shows that overlooked intricacies of #SpikeTiming represent these fluctuations, reconciling low-level neural processing with #perception @plosbiology.org https://plos.io/3FJ4adI
Excited to share Thorsten-Voice's YouTube channel!
Thorsten presents innovative TTS solutions and a variety of voice technologies, making it an excellent starting point for anyone interested in open-source text-to-speech. Whether you're a developer, accessibility advocate, or tech enthusiast, his channel offers valuable insights and resources. Don't miss out on this fantastic content!
follow hem here: @thorstenvoice
or on YouTube: https://www.youtube.com/@ThorstenMueller YouTube channel!
Goode @thorstenvoice, just found your channel and I'm impressed! Your work on TTS is fantastic and so important for accessibility in the FLOSS community. Keep it up! #AccessibilityMatters #FLOSS #TTS #OpenSource #Inclusivity #FOSS #Coqui #AI #CoquiAI #VoiceAssistant #Sprachassistent #VoiceTechnology #KünstlicheStimme #MachineLearning #Python #Rhasspy #TextToSpeech #VoiceTech #STT #SpeechSynthesis #SpeechRecognition #Sprachsynthese #ArtificialVoice #VoiceCloning #Spracherkennung #CoquiTTS #voice #a11y #ScreenReader
Christmas Comes Early With AI Santa Demo - With only two hundred odd days ’til Christmas, you just know we’re already feeling... - https://hackaday.com/2025/05/18/christmas-comes-early-with-ai-santa-demo/ #artificialintelligence #speechrecognition #speechsynthesis #santaclaus #libpeer #openai #llm #ai
I'm exploring ways to improve audio preprocessing for speech recognition for my [midi2hamlib](https://github.com/DO9RE/midi2hamlib) project. Do any of my followers have expertise with **SoX** or **speech recognition**? Specifically, I’m seeking advice on: Best practices for audio preparation for speech recognition.
SoX command-line parameters that can optimize audio during recording or playback.
https://github.com/DO9RE/midi2hamlib/blob/main/tests/speech_menu.sh #SoX #SpeechRecognition #OpenSource #AudioProcessing #ShellScripting #Sphinx #PocketSphinx #Audio Retoot appreciated.
Be Careful What You Ask For: Voice Control https://hackaday.com/2025/02/19/be-careful-what-you-ask-for-voice-control/ #speechrecognition #computerspeech #voicecommand #Featured #Rants #rants
Vibe is an #OpenSource desktop client (mac, windows, linux) for locally running Whisper to more accurately transcribe or caption videos & audio https://thewh1teagle.github.io/vibe/ Source code: https://github.com/thewh1teagle/vibe/ Easier to use than what I was using before (WhisperDesktop). Default settings use the medium Whisper model, which has been good enough in my experience.
#Accessibility #A11y #AI #SpeechRecognition #EdTech
Speech recognition systems struggle with accents and dialects, risking problems in critical fields like healthcare and emergency services. Imagine calling 911 and the AI used to screen out non-emergency calls can’t understand you.
A Spanish language professor explains: https://theconversation.com/sorry-i-didnt-get-that-ai-misunderstands-some-peoples-words-more-than-others-239281 #AI #speechrecognition