Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Learn OpenAI Whisper
  • Table Of Contents Toc
  • Feedback & Rating feedback
Learn OpenAI Whisper

Learn OpenAI Whisper

By : Josué R. Batista
4.9 (13)
close
close
Learn OpenAI Whisper

Learn OpenAI Whisper

4.9 (13)
By: Josué R. Batista

Overview of this book

As the field of generative AI evolves, so does the demand for intelligent systems that can understand human speech. Navigating the complexities of automatic speech recognition (ASR) technology is a significant challenge for many professionals. This book offers a comprehensive solution that guides you through OpenAI's advanced ASR system. You’ll begin your journey with Whisper's foundational concepts, gradually progressing to its sophisticated functionalities. Next, you’ll explore the transformer model, understand its multilingual capabilities, and grasp training techniques using weak supervision. The book helps you customize Whisper for different contexts and optimize its performance for specific needs. You’ll also focus on the vast potential of Whisper in real-world scenarios, including its transcription services, voice-based search, and the ability to enhance customer engagement. Advanced chapters delve into voice synthesis and diarization while addressing ethical considerations. By the end of this book, you'll have an understanding of ASR technology and have the skills to implement Whisper. Moreover, Python coding examples will equip you to apply ASR technologies in your projects as well as prepare you to tackle challenges and seize opportunities in the rapidly evolving world of voice recognition and processing.
Table of Contents (16 chapters)
close
close
Free Chapter
1
Part 1: Introducing OpenAI’s Whisper
4
Part 2: Underlying Architecture
7
Part 3: Real-world Applications and Use Cases

To get the most out of this book

For most of the book, you only need a Google account and internet access to run the Whisper AI code in Google Colaboratory (Colab). No paid subscription is required to use the free version of Colab and GPU. Those familiar with Python can run this code example in their local environment instead of using Colab.

Software/hardware covered in the book

Operating system requirements

Google Colaboratory (Colab)

Web browser on Windows, macOS, or Linux

Google Drive

YouTube

RSS

GitHub

Python

Hugging Face

Gradio

Foundational models:

Google’s gTTS

StableLM Zephyr 3B – GGUF

LlaVA

Intel’s OpenVINO

NVIDIA’s NeMo

Microphone and speakers

Whisper’s small model requires at least 12 gigabytes of GPU memory. Thus, let’s try to secure a decent GPU for our Colab! Unfortunately, accessing a good GPU with the free version of Google Colab (i.e., Tesla T4 16 GB) is becoming much harder. However, with Google Colab Pro, we should have no issues in being allocated a V100 or P100 GPU.

If you are using the digital version of this book, we advise you to type the code yourself or access it from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to copying and pasting code.

Fine-tuning Whisper in Chapter 4 will take at least one hour. Thus, you must monitor your running notebook in Colab regularly. Some notebooks implement a Gradio app with voice recording and audio playback. A microphone and speakers connected to your computer might help you experience the interactive voice features. Another option is to open the URL link Gradio provides at runtime on your mobile phone; from there, you might be able to use the phone’s microphone to record your voice.

By meeting these technical requirements, you will be prepared to explore Whisper in different contexts while enjoying the streamlined experience of Google Colab and the comprehensive resources available on GitHub.

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY