使用 Pydub 和 Google 语音识别 API 处理音频

2026-02-02 16:17:41 0条评论 2次阅读 0人点赞

音频文件是信息传输的广泛手段。那么，让我们来看看如何将音频文件（.wav 文件）分解成更小的片段，识别其中的内容并将其存储到文本文件中。要了解更多关于音频文件及其格式的信息，请参阅 Audio_formats。

需要分解音频文件吗？

当我们对音频文件进行任何处理时，这会花费很多时间。在这里，处理可能意味着任何事情。例如，我们可能想要增加或减少音频的频率，或者像本文中所做的那样，识别音频文件中的内容。通过将其分解为称为块的小音频文件，我们可以确保处理快速进行。

必需的安装：

pip3 install pydub
pip3 install audioread
pip3 install SpeechRecognition

该程序主要有两个步骤。

步骤 #1：

它处理将音频文件切成恒定间隔的小块。切片可以在有或没有重叠的情况下完成。重叠意味着创建的下一个块将从恒定时间向后开始，以便在切片期间如果任何音频/单词被切断，可以由该重叠覆盖。例如，如果音频文件是 22 秒，重叠是 1.5 秒，则这些块的时间将是：

chunk1 : 0 - 5 seconds
  chunk2 : 3.5 - 8.5 seconds
  chunk3 : 7 - 12 seconds
  chunk4 : 10.5 - 15.5 seconds
  chunk5 : 14 - 19.5 seconds
  chunk6 : 18 - 22 seconds

我们可以通过将重叠设置为 0 来忽略此重叠。

步骤 #2：

它处理使用切片后的音频文件来执行用户要求的任何操作。在这里，出于演示目的，这些块已通过 Google 语音识别模块传递，并且文本已写入单独的文件。要了解如何使用 Google 语音识别模块来识别来自麦克风的音频，请参阅 this。

在本文中，我们将使用切片后的音频文件来识别内容。

步骤 #2 在 步骤 #1 内部的循环中完成。一旦音频文件被切片成块，该块就会被识别。此过程一直持续到音频文件结束。
示例：

**Input :**  [**Geek.wav**](https://media.geeksforgeeks.org/wp-content/uploads/1.wav)

**Output : **
Screenshot of cmd running the code:

Text File: [recognized](https://media.geeksforgeeks.org/wp-content/uploads/recognized.txt)

下面是实现：

# Import necessary libraries
from pydub import AudioSegment
import speech_recognition as sr

# Input audio file to be sliced
audio = AudioSegment.from_wav("1.wav")

‘‘‘
Step #1 - Slicing the audio file into smaller chunks.
‘‘‘
# Length of the audiofile in milliseconds
n = len(audio)

# Variable to count the number of sliced chunks
counter = 1

# Text file to write the recognized audio
fh = open("recognized.txt", "w+")

# Interval length at which to slice the audio file.
# If length is 22 seconds, and interval is 5 seconds,
# The chunks created will be:
# chunk1 : 0 - 5 seconds
# chunk2 : 5 - 10 seconds
# chunk3 : 10 - 15 seconds
# chunk4 : 15 - 20 seconds
# chunk5 : 20 - 22 seconds
interval = 5 * 1000

# Length of audio to overlap. 
# If length is 22 seconds, and interval is 5 seconds,
# With overlap as 1.5 seconds,
# The chunks created will be:
# chunk1 : 0 - 5 seconds
# chunk2 : 3.5 - 8.5 seconds
# chunk3 : 7 - 12 seconds
# chunk4 : 10.5 - 15.5 seconds
# chunk5 : 14 - 19.5 seconds
# chunk6 : 18 - 22 seconds
overlap = 1.5 * 1000

# Initialize start and end seconds to 0
start = 0
end = 0

# Flag to keep track of end of file.
# When audio reaches its end, flag is set to 1 and we break
flag = 0

# Iterate from 0 to end of the file,
# with increment = interval
for i in range(0, 2 * n, interval):
    
    # During first iteration,
    # start is 0, end is the interval
    if i == 0:
        start = 0
        end = interval

    # All other iterations,
    # start is the previous end - overlap
    # end becomes end + interval
    else:
        start = end - overlap
        end = start + interval 

    # When end becomes greater than the file length,
    # end is set to the file length
    # flag is set to 1 to indicate break.
    if end >= n:
        end = n
        flag = 1

    # Storing audio file from the defined start to end
    chunk = audio[start:end]

    # Filename / Path to store the sliced audio
    filename = ‘chunk‘+str(counter)+‘.wav‘

    # Store the sliced audio file to the defined path
    chunk.export(filename, format ="wav")
    # Print information about the current chunk
    print("Processing chunk "+str(counter)+". Start = "
                        +str(start)+" end = "+str(end))

    # Increment counter for the next chunk
    counter = counter + 1
    
    # Slicing of the audio file is done.
    # Skip the below steps if there is some other usage

投稿给我们	如何建站？
vps是什么？	如何安装宝塔？
如何通过博客赚钱？	便宜wordpress托管方案
免费wordpress主题	这些都是免费方案

豆丁博客

使用 Pydub 和 Google 语音识别 API 处理音频

相关文章美国1G带宽/1T流量高速vps $17.99/年