音频文件是信息传输的广泛手段。那么,让我们来看看如何将音频文件(.wav 文件)分解成更小的片段,识别其中的内容并将其存储到文本文件中。要了解更多关于音频文件及其格式的信息,请参阅 Audio_formats。
需要分解音频文件吗?
当我们对音频文件进行任何处理时,这会花费很多时间。在这里,处理可能意味着任何事情。例如,我们可能想要增加或减少音频的频率,或者像本文中所做的那样,识别音频文件中的内容。通过将其分解为称为块的小音频文件,我们可以确保处理快速进行。
必需的安装:
pip3 install pydub
pip3 install audioread
pip3 install SpeechRecognition
该程序主要有两个步骤。
步骤 #1:
它处理将音频文件切成恒定间隔的小块。切片可以在有或没有重叠的情况下完成。重叠意味着创建的下一个块将从恒定时间向后开始,以便在切片期间如果任何音频/单词被切断,可以由该重叠覆盖。例如,如果音频文件是 22 秒,重叠是 1.5 秒,则这些块的时间将是:
chunk1 : 0 - 5 seconds
chunk2 : 3.5 - 8.5 seconds
chunk3 : 7 - 12 seconds
chunk4 : 10.5 - 15.5 seconds
chunk5 : 14 - 19.5 seconds
chunk6 : 18 - 22 seconds
我们可以通过将重叠设置为 0 来忽略此重叠。
步骤 #2:
它处理使用切片后的音频文件来执行用户要求的任何操作。在这里,出于演示目的,这些块已通过 Google 语音识别模块传递,并且文本已写入单独的文件。要了解如何使用 Google 语音识别模块来识别来自麦克风的音频,请参阅 this。
在本文中,我们将使用切片后的音频文件来识别内容。
步骤 #2 在 步骤 #1 内部的循环中完成。一旦音频文件被切片成块,该块就会被识别。此过程一直持续到音频文件结束。
示例:
**Input :** [**Geek.wav**](https://media.geeksforgeeks.org/wp-content/uploads/1.wav)
**Output : **
Screenshot of cmd running the code:
Text File: [recognized](https://media.geeksforgeeks.org/wp-content/uploads/recognized.txt)
下面是实现:
# Import necessary libraries
from pydub import AudioSegment
import speech_recognition as sr
# Input audio file to be sliced
audio = AudioSegment.from_wav("1.wav")
‘‘‘
Step #1 - Slicing the audio file into smaller chunks.
‘‘‘
# Length of the audiofile in milliseconds
n = len(audio)
# Variable to count the number of sliced chunks
counter = 1
# Text file to write the recognized audio
fh = open("recognized.txt", "w+")
# Interval length at which to slice the audio file.
# If length is 22 seconds, and interval is 5 seconds,
# The chunks created will be:
# chunk1 : 0 - 5 seconds
# chunk2 : 5 - 10 seconds
# chunk3 : 10 - 15 seconds
# chunk4 : 15 - 20 seconds
# chunk5 : 20 - 22 seconds
interval = 5 * 1000
# Length of audio to overlap.
# If length is 22 seconds, and interval is 5 seconds,
# With overlap as 1.5 seconds,
# The chunks created will be:
# chunk1 : 0 - 5 seconds
# chunk2 : 3.5 - 8.5 seconds
# chunk3 : 7 - 12 seconds
# chunk4 : 10.5 - 15.5 seconds
# chunk5 : 14 - 19.5 seconds
# chunk6 : 18 - 22 seconds
overlap = 1.5 * 1000
# Initialize start and end seconds to 0
start = 0
end = 0
# Flag to keep track of end of file.
# When audio reaches its end, flag is set to 1 and we break
flag = 0
# Iterate from 0 to end of the file,
# with increment = interval
for i in range(0, 2 * n, interval):
# During first iteration,
# start is 0, end is the interval
if i == 0:
start = 0
end = interval
# All other iterations,
# start is the previous end - overlap
# end becomes end + interval
else:
start = end - overlap
end = start + interval
# When end becomes greater than the file length,
# end is set to the file length
# flag is set to 1 to indicate break.
if end >= n:
end = n
flag = 1
# Storing audio file from the defined start to end
chunk = audio[start:end]
# Filename / Path to store the sliced audio
filename = ‘chunk‘+str(counter)+‘.wav‘
# Store the sliced audio file to the defined path
chunk.export(filename, format ="wav")
# Print information about the current chunk
print("Processing chunk "+str(counter)+". Start = "
+str(start)+" end = "+str(end))
# Increment counter for the next chunk
counter = counter + 1
# Slicing of the audio file is done.
# Skip the below steps if there is some other usage