Hausmeister @hausmeister

**adrienandrem** @adrienandrem@pouet.chapril.org · 1 T.

adrienandrem @adrienandrem@pouet.chapril.org

Quel outil de Reconnaissance automatique de la parole (speech to text) respectueux de la vie privée et hors-ligne recommandez-vous ?
#ReconnaissanceAutomatiqueDeLaParole #speechRecognition

**Joseph** @j0seph@mastodon.social · 5 T.

5 T.

Joseph @j0seph@mastodon.social

Another failed attempt at rolling out #AI into the real-world: https://www.wsj.com/articles/taco-bell-rethinks-future-of-voice-ai-at-the-drive-through-72990b5a

One might actually think that ordering from a drive-through is a very constrained and well-defined problem, such that it would lend itself easily to #SpeechRecognition or so.

But even the fast food world turns out to be more open and complex than current-gen AI can handle.

Fortgeführter Thread

**Fossery Tech** @fosserytech@social.linux.pizza · 17. Aug.

17. Aug.

Fossery Tech @fosserytech@social.linux.pizza

(Linux news in previous posts of thread)

FOSS NEWS

VirtualBox 7.2 released with initial support for Linux kernel 6.16 and 6.17, improved Linux Guest Additions support for Oracle Linux 10 and Red Hat Enterprise Linux 10 guests, improved handling of the vboxvideo kernel module in the init script for Linux guests, video decoding acceleration is enabled for Linux hosts when the 3D option is active in settings, GUI improvements, bug fixes:
https://9to5linux.com/virtualbox-7-2-officially-released-with-initial-support-for-linux-kernel-6-17

Organic Maps now displays popular hiking and cycling routes, agricultural and forestry roads are excluded from routing, bookmark names are displayed directly on the map for faster identification, Android app gets track elevation graph and track selection on the map:
https://alternativeto.net/news/2025/8/organic-maps-now-displays-popular-hiking-and-cycling-routes-from-all-over-the-world/

CoMaps v2025.08.13-8 released with UI improvements, support for Irish postcodes, various bug fixes:
https://alternativeto.net/news/2025/8/comaps-v2025-08-13-8-improves-map-design-android-visuals-and-ios-language-options/

Immich 1.137 released with beta timeline fixes, option for custom URLs when generating shared links, new utility to quickly locate large files, fine-grained permissions extended to more API endpoints, etc.:
https://alternativeto.net/news/2025/8/immich-1-137-adds-beta-timeline-improvements-shared-link-custom-url-and-large-file-finder/

Immich 1.138 released with ability to reset PIN code by entering current password, option to reset OAuth IDs, swipe-to-delete functionality for albums for beta timeline users, improved upload and sync capabilities, etc.:
https://alternativeto.net/news/2025/8/immich-update-lets-users-reset-pin-improves-oauth-migration-and-album-sync/

Ghostty terminal GTK build is rewritten to fix various issues on Linux and BSD, including memory issues:
https://www.omgubuntu.co.uk/2025/08/ghostty-terminal-gtk-rewrite-linux

FFmpeg 8.0 will include OpenAI Whisper filter for automatic speech recognition and transcription if built with --enable-whisper flag:
https://www.phoronix.com/news/FFmpeg-Lands-Whisper

(more FOSS news in comment)

#WeeklyNews #OpenSource #FOSSNews

**Sir thalon** @thalon@embassy.social · 10. Aug.

10. Aug.

Sir thalon @thalon@embassy.social

PDFs became my real-world AI benchmark. I can’t fill forms by hand, so ChatGPT in Agent mode now handles flat scans, anchors, proofs, and signatures on my phone—showing both the limits and the leverage of assistive AI.

#Accessibility #AssistiveTech #A11y #AI #AgenticAI #PDFForms #AcroForms #Inclusion #MobileFirst #SpeechRecognition #Automation #Productivity #AGI #DocumentAI

https://www.linkedin.com/posts/christian-bayerlein-ba578a171_accessibility-assistivetech-a11y-activity-7360218369846833152-xG9X

www.linkedin.comPDFs became my real-world AI benchmark. | Christian BayerleinPDFs became my real-world AI benchmark. I can’t fill forms by hand, so ChatGPT in Agent mode now handles flat scans, anchors, proofs, and signatures on my phone—showing both the limits and the leverage of assistive AI. #Accessibility #AssistiveTech #A11y #AI #AgenticAI #PDFForms #AcroForms #Inclusion #MobileFirst #SpeechRecognition #Automation #Productivity #AGI #DocumentAI

**Data Quine** @scottish@datasci.social · 30. Juli

30. Juli

Data Quine @scottish@datasci.social

Journal of Open Source Software: voice: A Comprehensive R Package for Audio Analysis
{voice}
"...a free, open-source toolkit designed to streamline audio analysis by integrating music theory and advanced computational techniques. It enables researchers to extract, summarize, and analyze voice data efficiently, supporting applications such as speech recognition, speaker identification, and mood inference..."

https://joss.theoj.org/papers/10.21105/joss.08420

Journal of Open Source Softwarevoice: A Comprehensive R Package for Audio AnalysisZabala et al., (2025). voice: A Comprehensive R Package for Audio Analysis. Journal of Open Source Software, 10(111), 8420, https://doi.org/10.21105/joss.08420

#RStats #Audio #SpeechRecognition

**Hacker News** @h4ckernews@mastodon.social · 15. Juli

15. Juli

Hacker News @h4ckernews@mastodon.social

Voxtral-Mini-3B-2507 – Open source speech understanding model

https://huggingface.co/mistralai/Voxtral-Mini-3B-2507

huggingface.comistralai/Voxtral-Mini-3B-2507 · Hugging FaceWe’re on a journey to advance and democratize artificial intelligence through open source and open science.

#HackerNews #OpenSource #SpeechRecognition

Antwortete im Thread

**Ecologia Digital** @josemurilo@mato.social · 8. Juli

8. Juli

Ecologia Digital @josemurilo@mato.social

"#KarenHao only really gets her teeth into this point in the book’s epilogue, “How the Empire Falls.” She takes inspiration from #TeHiku, a #Māori AI #speechrecognition project. Te Hiku seeks to revitalize the #te_reo language through putting archived audio tapes of te reo speakers into an AI model, teaching new generations of Māori.
The tech has been developed on consent and active participation from the Māori community, and it is only licensed to organizations that respect Māori values"

**Jeremy Kahn** @trochee@dair-community.social · 4. Juli

4. Juli

Jeremy Kahn @trochee@dair-community.social

I don't know why they call it vibe coding

Antwortete im Thread

**Debby ‬** @debby@hear-me.social · 3. Juli *

3. Juli *

Debby ‬ @debby@hear-me.social

@thelinuxEXP I really like Speech Note! It's a fantastic tool for quick and local voice transcription in multiple languages, created by @mkiol

It's incredibly handy for capturing thoughts on the go, conducting interviews, or making voice memos without worrying about language barriers. The app uses strictly locally running LLMs, and its ease of use makes it a standout choice for anyone needing offline transcription services.

I primarily use #WhisperAI for transcription and Piper for voice, but many other models are available as well.

It is available as flatpak and https://github.com/mkiol/dsnote

#TTS #transcription #TextToSpeech #translator translation #offline #machinetranslation #sailfishos #SpeechSynthesis #SpeechRecognition #speechtotext #nmt #linux-desktop #stt #asr #flatpak-applications #SpeechNote

**Hacker News** @h4ckernews@mastodon.social · 25. Juni

25. Juni

Hacker News @h4ckernews@mastodon.social

DeepSpeech Is Discontinued

https://github.com/mozilla/DeepSpeech

#HackerNews #DeepSpeech #Discontinued

**PLOS Biology** @PLOSBiology@fediscience.org · 17. Juni

17. Juni

PLOS Biology @PLOSBiology@fediscience.org

Slow amplitude fluctuations in sounds, critical for #SpeechRecognition, seem poorly represented in the #brainstem. This study shows that overlooked intricacies of #SpikeTiming represent these fluctuations, reconciling low-level neural processing with #perception @plosbiology.org https://plos.io/3FJ4adI

**Debby ‬** @debby@hear-me.social · 23. Mai *

23. Mai *

Debby ‬ @debby@hear-me.social

Excited to share Thorsten-Voice's YouTube channel!

Thorsten presents innovative TTS solutions and a variety of voice technologies, making it an excellent starting point for anyone interested in open-source text-to-speech. Whether you're a developer, accessibility advocate, or tech enthusiast, his channel offers valuable insights and resources. Don't miss out on this fantastic content!

follow hem here: @thorstenvoice
or on YouTube: https://www.youtube.com/@ThorstenMueller YouTube channel!

www.youtube.comBevor Sie zu YouTube weitergehen

#Accessibility #FLOSS #TTS

Antwortete im Thread

**Debby ‬** @debby@hear-me.social · 23. Mai *

23. Mai *

Debby ‬ @debby@hear-me.social

Goode @thorstenvoice, just found your channel and I'm impressed! Your work on TTS is fantastic and so important for accessibility in the FLOSS community. Keep it up! #AccessibilityMatters #FLOSS #TTS #OpenSource #Inclusivity #FOSS #Coqui #AI #CoquiAI #VoiceAssistant #Sprachassistent #VoiceTechnology #KünstlicheStimme #MachineLearning #Python #Rhasspy #TextToSpeech #VoiceTech #STT #SpeechSynthesis #SpeechRecognition #Sprachsynthese #ArtificialVoice #VoiceCloning #Spracherkennung #CoquiTTS #voice #a11y #ScreenReader

**IT News** @itnewsbot@schleuss.online · 18. Mai

18. Mai

IT News @itnewsbot@schleuss.online

Christmas Comes Early With AI Santa Demo - With only two hundred odd days ’til Christmas, you just know we’re already feeling... - https://hackaday.com/2025/05/18/christmas-comes-early-with-ai-santa-demo/ #artificialintelligence #speechrecognition #speechsynthesis #santaclaus #libpeer #openai #llm #ai

Hackaday · 18. MaiChristmas Comes Early With AI Santa DemoWith only two hundred odd days ’til Christmas, you just know we’re already feeling the season’s magic. Well, maybe not, but [Sean Dubois] has decided to give us a head start with …

**Hacker News** @h4ckernews@mastodon.social · 7. Mai

7. Mai

Hacker News @h4ckernews@mastodon.social

Jargonic Sets New SOTA for Japanese ASR

https://aiola.ai/blog/jargonic-japanese-asr/

aiOla · 6. MaiJargonic Sets New Standards for Japanese ASR - aiOlaJargonic V2 sets a new benchmark for Japanese ASR, delivering industry-leading accuracy and jargon recall in real-world enterprise settings.

#HackerNews #Jargonic #SOTA

**Richard Emling (DO9RE)** @tschapajew@metalhead.club · 1. Mai

1. Mai

Richard Emling (DO9RE) @tschapajew@metalhead.club

I'm exploring ways to improve audio preprocessing for speech recognition for my [midi2hamlib](https://github.com/DO9RE/midi2hamlib) project. Do any of my followers have expertise with **SoX** or **speech recognition**? Specifically, I’m seeking advice on: Best practices for audio preparation for speech recognition. SoX command-line parameters that can optimize audio during recording or playback.
https://github.com/DO9RE/midi2hamlib/blob/main/tests/speech_menu.sh #SoX #SpeechRecognition #OpenSource #AudioProcessing #ShellScripting #Sphinx #PocketSphinx #Audio Retoot appreciated.

GitHubGitHub - DO9RE/midi2hamlibContribute to DO9RE/midi2hamlib development by creating an account on GitHub.

**Hacker News** @h4ckernews@mastodon.social · 1. Apr.

1. Apr.

Hacker News @h4ckernews@mastodon.social

Jargonic: Industry-Tunable ASR Model

https://aiola.ai/blog/introducing-jargonic-asr/

aiOla · 1. Apr.Introducing Jargonic: The World’s Most Accurate Industry-Tuned ASR Model - aiOlaIntroduction Automatic Speech Recognition (ASR) has made significant strides over the last decade, but most ASR models on the market offer general-purpose transcription. They perform well in clean, controlled environments but break down when handling: Technical jargon & acronyms – Standard ASR models fail to recognize niche terminology used in most industries (i.e., medical terms, […]

#HackerNews #Jargonic #ASR

**Pyrzout** @jos1264@social.skynetcloud.site · 19. Feb.

19. Feb.

Pyrzout @jos1264@social.skynetcloud.site

Be Careful What You Ask For: Voice Control https://hackaday.com/2025/02/19/be-careful-what-you-ask-for-voice-control/ #speechrecognition #computerspeech #voicecommand #Featured #Rants #rants

Hackaday · 19. Feb.Be Careful What You Ask For: Voice ControlWe get it. We also watched Star Trek and thought how cool it would be to talk to our computer. From Kirk setting a self-destruct sequence, to Scotty talking into a mouse, or Picard ordering Earl Gr…

**Doug Holton** @dougholton@mastodon.social · 10. Feb. *

10. Feb. *

Doug Holton @dougholton@mastodon.social

Vibe is an #OpenSource desktop client (mac, windows, linux) for locally running Whisper to more accurately transcribe or caption videos & audio https://thewh1teagle.github.io/vibe/ Source code: https://github.com/thewh1teagle/vibe/ Easier to use than what I was using before (WhisperDesktop). Default settings use the medium Whisper model, which has been good enough in my experience.
#Accessibility #A11y #AI #SpeechRecognition #EdTech

**The Conversation U.S.** @TheConversationUS@newsie.social · 5. Feb.

5. Feb.

The Conversation U.S. @TheConversationUS@newsie.social

Speech recognition systems struggle with accents and dialects, risking problems in critical fields like healthcare and emergency services. Imagine calling 911 and the AI used to screen out non-emergency calls can’t understand you.

A Spanish language professor explains: https://theconversation.com/sorry-i-didnt-get-that-ai-misunderstands-some-peoples-words-more-than-others-239281 #AI #speechrecognition

The Conversation‘Sorry, I didn’t get that’: AI misunderstands some people’s words more than othersSpeaking with an AI bot can be amusing and even helpful – if it understands you. How well AIs do that is a matter of whose speech they’ve been trained on.

Frühere Suchanfragen

Suchoptionen

Verwaltet von:

Serverstatistik:

#speechrecognition