Home IT Info News Today NVIDIA’s AI Transcription Tool Produces 60 Minutes of Text i…

NVIDIA’s AI Transcription Tool Produces 60 Minutes of Text i…

127
Stock photo of a person using a microphone perhaps to record a podcast and looking at a laptop.


eWEEK content material and product suggestions are editorially unbiased. We might earn a living if you click on on hyperlinks to our companions. Learn More.

NVIDIA has launched a brand new model of its Parakeet transcription instrument, boasting the bottom error fee of any of its rivals. In addition, the corporate made the code public on GitHub.

Parakeet TDT 0.6B is a 600-million-parameter computerized speech recognition mannequin. It can transcribe 60 minutes of audio per second, Hugging Face knowledge scientist Vaibhav Srivastav mentioned on X on May 5.

The mannequin is beneficial for, however not restricted to, “conversational AI, voice assistants, transcription services, subtitle generation, and voice analytics platforms.” Parakeet TDT 0.6B transcription is simply obtainable in English.

How to entry the brand new Parakeet instrument and what it might probably do

NVIDIA launched Parakeet TDT 0.6B beneath a commercially permissive Creative Commons license, which suggests builders can incorporate its transcription into their very own merchandise for enterprise use or particular person sale. NVIDIA mentioned it supplies correct transcriptions, together with music lyrics, with computerized punctuation and capitalization; particular consideration is paid to precisely transcribing spoken numbers.

Hugging Face’s Open ASR Leaderboard confirms that accuracy; the truth is, model 2 of Parakeet TDT 0.6B sits on the high of the leaderboard, above merchandise from Microsoft and OpenAI. Parakeet TDT 0.6B V2 additionally surpasses lots of NVIDIA’s different transcription fashions. The precise efficiency of every occasion might fluctuate primarily based on {hardware}.

Parakeet TDT 0.6B might be retrieved from Hugging Face and thru NVIDIA’s NeMo toolkit.

It was primarily based on Fast Conformer encoder structure, an encoder present in NVIDIA NeMo. It was skilled on the Granary dataset, a corpus of about 120,000 hours of English speech knowledge together with human-transcribed speech and auto-labeled speech from sources such because the YouTube-Commons dataset.

Parakeet’s place in NVIDIA’s portfolio and rivals

Releasing Parakeet TDT 0.6B as open supply matches with NVIDIA’s total place within the generative AI trade. NVIDIA supplies infrastructure and instruments for enabling right now’s proliferation of AI, particularly the GPUs that function the first {hardware}. Parakeet TDT 0.6B is simply one of many many AI-based instruments and companies it affords.

The subsequent highest scoring mannequin on the leaderboard is Microsoft’s Phi-4-multimodal-instruct, which may transcribe speech in 23 languages.

Read eWeek’s assessment of the Notta AI transcription instrument, and see our checklist of the greatest AI assembly be aware takers.



Source hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here