使用 Pydub 和 Google 语音识别 API 处理音频

音频文件是信息传输的广泛手段。那么,让我们来看看如何将音频文件(.wav 文件)分解成更小的片段,识别其中的内容并将其存储到文本文件中。要了解更多关于音频文件及其格式的信息,请参阅 Audio_formats

需要分解音频文件吗?

当我们对音频文件进行任何处理时,这会花费很多时间。在这里,处理可能意味着任何事情。例如,我们可能想要增加或减少音频的频率,或者像本文中所做的那样,识别音频文件中的内容。通过将其分解为称为块的小音频文件,我们可以确保处理快速进行。

必需的安装:

pip3 install pydub
pip3 install audioread
pip3 install SpeechRecognition

该程序主要有两个步骤。

步骤 #1:

它处理将音频文件切成恒定间隔的小块。切片可以在有或没有重叠的情况下完成。重叠意味着创建的下一个块将从恒定时间向后开始,以便在切片期间如果任何音频/单词被切断,可以由该重叠覆盖。例如,如果音频文件是 22 秒,重叠是 1.5 秒,则这些块的时间将是:

chunk1 : 0 - 5 seconds
  chunk2 : 3.5 - 8.5 seconds
  chunk3 : 7 - 12 seconds
  chunk4 : 10.5 - 15.5 seconds
  chunk5 : 14 - 19.5 seconds
  chunk6 : 18 - 22 seconds

我们可以通过将重叠设置为 0 来忽略此重叠。

步骤 #2:

它处理使用切片后的音频文件来执行用户要求的任何操作。在这里,出于演示目的,这些块已通过 Google 语音识别模块传递,并且文本已写入单独的文件。要了解如何使用 Google 语音识别模块来识别来自麦克风的音频,请参阅 this

在本文中,我们将使用切片后的音频文件来识别内容。

步骤 #2步骤 #1 内部的循环中完成。一旦音频文件被切片成块,该块就会被识别。此过程一直持续到音频文件结束。
示例:

**Input :**  [**Geek.wav**](https://media.geeksforgeeks.org/wp-content/uploads/1.wav)

**Output : **
Screenshot of cmd running the code:
Text File: [recognized](https://media.geeksforgeeks.org/wp-content/uploads/recognized.txt)

下面是实现:

# Import necessary libraries
from pydub import AudioSegment
import speech_recognition as sr

# Input audio file to be sliced
audio = AudioSegment.from_wav("1.wav")

‘‘‘
Step #1 - Slicing the audio file into smaller chunks.
‘‘‘
# Length of the audiofile in milliseconds
n = len(audio)

# Variable to count the number of sliced chunks
counter = 1

# Text file to write the recognized audio
fh = open("recognized.txt", "w+")

# Interval length at which to slice the audio file.
# If length is 22 seconds, and interval is 5 seconds,
# The chunks created will be:
# chunk1 : 0 - 5 seconds
# chunk2 : 5 - 10 seconds
# chunk3 : 10 - 15 seconds
# chunk4 : 15 - 20 seconds
# chunk5 : 20 - 22 seconds
interval = 5 * 1000

# Length of audio to overlap. 
# If length is 22 seconds, and interval is 5 seconds,
# With overlap as 1.5 seconds,
# The chunks created will be:
# chunk1 : 0 - 5 seconds
# chunk2 : 3.5 - 8.5 seconds
# chunk3 : 7 - 12 seconds
# chunk4 : 10.5 - 15.5 seconds
# chunk5 : 14 - 19.5 seconds
# chunk6 : 18 - 22 seconds
overlap = 1.5 * 1000

# Initialize start and end seconds to 0
start = 0
end = 0

# Flag to keep track of end of file.
# When audio reaches its end, flag is set to 1 and we break
flag = 0

# Iterate from 0 to end of the file,
# with increment = interval
for i in range(0, 2 * n, interval):
    
    # During first iteration,
    # start is 0, end is the interval
    if i == 0:
        start = 0
        end = interval

    # All other iterations,
    # start is the previous end - overlap
    # end becomes end + interval
    else:
        start = end - overlap
        end = start + interval 

    # When end becomes greater than the file length,
    # end is set to the file length
    # flag is set to 1 to indicate break.
    if end >= n:
        end = n
        flag = 1

    # Storing audio file from the defined start to end
    chunk = audio[start:end]

    # Filename / Path to store the sliced audio
    filename = ‘chunk‘+str(counter)+‘.wav‘

    # Store the sliced audio file to the defined path
    chunk.export(filename, format ="wav")
    # Print information about the current chunk
    print("Processing chunk "+str(counter)+". Start = "
                        +str(start)+" end = "+str(end))

    # Increment counter for the next chunk
    counter = counter + 1
    
    # Slicing of the audio file is done.
    # Skip the below steps if there is some other usage
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。如需转载,请注明文章出处豆丁博客和来源网址。https://shluqu.cn/22044.html
点赞
0.00 平均评分 (0% 分数) - 0