这是用户在 2024-9-29 10:15 为 https://github.com/HarmoniaLeo/pyAudioKits 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Skip to content
Owner avatar pyAudioKits Public

Powerful Python audio workflow support based on librosa and other libraries
基于 librosa 和其他库的强大 Python 音频工作流支持

License

Open in github.dev Open in a new github.dev tab Open in codespace

HarmoniaLeo/pyAudioKits

t

Add file

Add file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f1d60b1 · Mar 21, 2023

History

28 Commits
Mar 16, 2023
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Aug 1, 2022
Oct 18, 2022
Aug 1, 2022

Repository files navigation

中文教程:https://blog.csdn.net/weixin_43441742/category_11945327.html?spm=1001.2014.3001.5482

中文API速查手册:https://blog.csdn.net/weixin_43441742/article/details/126093437

pyAudioKits

Powerful Python audio workflow support based on librosa and other libraries
基于 librosa 和其他库的强大 Python 音频工作流支持

pip install pyAudioKits

This gives you access to the vast majority of pyAudioKits' features. If you want to use pyAudioKits' recording API, please refer to the section Recording.
这使您可以访问 pyAudioKits 的绝大多数功能。如果你想使用 pyAudioKits 的录音 API,请参考 录音 章节。

Basic Usage 基本用法

import pyAudioKits.audio as ak

Create or load an audio
创建或加载音频

From NumPy array 从 NumPy 数组

audio = ak.Audio(samples, sr)

Initialize an Audio object with a numpy array and a specific sample rate.
使用 numpy 数组和特定采样率初始化 Audio 对象

  • samples: The audio samples. A numpy array object.
    samples:音频样本。numpy 数组对象。

  • sr: The sample rate. An int object.
    sr:采样率。一个 int 对象。

Return: An Audio object.
Return:一个 Audio 对象

From File 从文件

audio = ak.read_Audio(direction = "audio.wav")
(audio1,audio2) = ak.read_Audio(direction = "audio_dualchannel.wav")

Get audio file from specified path. Dual channel audio returns a tuple.
从指定路径获取音频文件。双通道 audio 返回一个 Tuples。

  • direction: The path. direction:路径。

Return: 返回:

  • An Audio object when reading single channel audio.
    读取单声道音频时的 Audio 对象
  • A tuple with two Audio objects when reading double channels audio.
    读取双声道音频时具有两个 Audio 对象的元组。

Recording 录音

The pyAudioKits recording feature requires the PyAudio library, which is currently not properly installed online via pip.
pyAudioKits 录制功能需要 PyAudio 库,该库目前无法通过 pip 在线正确安装。

In order to install PyAudio, please follow the following procedure.
要安装 PyAudio,请按照以下步骤操作。

  1. Check your own Python version
    检查您自己的 Python 版本

  2. Download the corresponding .whl file at this link
    通过此链接下载相应的 .whl 文件

  3. Put the downloaded .whl file into the Scripts folder under the python installation path (if using Python directly) or the Scripts folder under the Anaconda installation path (if using Anaconda)
    将下载的 .whl 文件放入 python 安装路径下的 Scripts 文件夹(如果直接使用 Python)或 Anaconda 安装路径下的 Scripts 文件夹(如果使用 Anaconda)

  4. Open the command line and enter the command
    打开命令行并输入命令

    pip install .whl file path + .whl file name
    #e.g. pip install D:\anacondaLocation\Scripts\PyAudio-0.2.11-cp37-cp37m-win_amd64.whl
  5. Continue entering command
    继续输入命令

    pip install pyaudio
import pyAudioKits.record
audio = pyAudioKits.record.record(sr,recordSeconds)
(audio1, audio2) = pyAudioKits.record.record(sr,recordSeconds,channels=2)

Record audio with duration of recordSeconds from the microphone at the sample rate of sr.
以 sr 的采样率从麦克风录制持续时间为 recordSeconds 的音频。

  • sr: An int object for the sample rate.
    sr:采样率的 int 对象。
  • recordSeconds: An int object (seconds).
    recordSeconds:一个 int 对象(秒)。
  • channels: Number of channels for recording.
    channels:用于录制的通道数。

Return: 返回:

  • An Audio object when recording single channel audio.
    录制单声道音频时的 Audio 对象
  • A tuple with two Audio objects when recording double channels audio.
    录制双声道音频时具有两个 Audio 对象的元组。
pyAudioKits.record.start_record(sr)
audio = pyAudioKits.record.end_record(sr)

Beginning and ending a recording thread. Can be used in interactive software designs where recording needs to be turned on or off at any time.
开始和结束录制线程。可用于需要随时打开或关闭录制的交互式软件设计。

  • sr: An int object for the sample rate.
    sr:采样率的 int 对象。

Return: An Audio object.
Return:一个 Audio 对象

Simulating 模拟

audio = ak.create_Single_Freq_Audio(amp,freq,sr,time,phase)

Generate a sine wave signal.
生成正弦波信号。

  • amp: A float object for amplitude.
    amp:一个 float 对象,用于 amplitude。
  • freq: A float object for frequency (Hz).
    freq:频率 (Hz) 的浮点对象。
  • sr: An int object for sample rate.
    sr:采样率的 int 对象。
  • time: An int object for duration (seconds).
    time:一个 int 对象,用于 duration (秒)。
  • phase: The phase of the sine wave. Specifying None will generate independently and uniformly distributed random phase between [-π,π] each moment.
    phase:正弦波的相位。指定 None 将在每个时刻在 [-π,π] 之间生成独立且均匀分布的随机相位。

Return: An Audio object.
Return:一个 Audio 对象

The Audio object Audio 对象

Play 

audio.sound()

Play the audio. It displays the power of the audio and automatically limits the amplitude when it may damage the hearing of the human ear.
播放音频。它显示音频的功率,并在可能损害人耳听力时自动限制振幅。

Plot 情节

audio.plot(start=0, end=None, ylim=None, ax=None, imgPath=None, xlabel="t/s")

To draw the audio waveform on the sub graph.
在 sub 图上绘制音频波形。

If no subgraph is passed in, it will be displayed directly.
如果没有传入 subgraph,则直接显示。

If imgpath is passed in, the graph will be saved.
如果传入 imgpath,则将保存图形。

  • start: 开始:

    • If xlabel is "t/s" or "t/ms", then it will be the starting time stamp. Default = 0.
      如果 xlabel 是 “t/s” 或 “t/ms”,那么它将是开始时间戳。默认值 = 0。
    • If xlabel is "n", then it will be the starting sample count. Default = 0.
      如果 xlabel 为 “n”,则它将是起始样本计数。默认值 = 0。
  • end: 结束:

    • If xlabel is "t/s" or "t/ms", then it will be the ending time stamp. Default = The last time stamp of the audio.
      如果 xlabel 是 “t/s” 或 “t/ms”,那么它将是结束时间戳。默认值 = 音频的最后一个时间戳。
    • If xlabel is "n", then it will be the ending sample count. Default = The total count of samples.
      如果 xlabel 为 “n”,则它将是结束样本计数。默认值 = 样本总数。
  • ylim: A tuple (y_start,y_end) for display range on y-axis. The default (None) is adaptive.
    ylim:y 轴上显示范围的元组 (y_start,y_end)。默认值 (None) 为 adaptive。

  • ax: A matplotlib.pyplot subplot to draw on.
    ax:要绘制的 matplotlib.pyplot 子图。

  • imgPath: The path to save the graph.
    imgPath:保存图表的路径。

  • xlabel: "t/s", "t/ms" or "n".
    xlabel: “t/s”、“t/ms” 或 “n”。

To NumPy Array 到 NumPy 数组

array = audio.samples

Get all samples in the audio.
获取音频中的所有样本。

Return: NumPy array. 返回: NumPy 数组。

Get Properties 获取属性

duration = audio.getDuration()

Get the duration of the audio (seconds).
获取音频的持续时间 (秒)。

Return: A float object. 返回:一个 float 对象。

samples_count = len(audio)

Get the samples count of the audio.
获取音频的样本计数。

Return: An int object. 返回:一个 int 对象。

sr = audio.sr

Get the sample rate of the audio.
获取音频的采样率。

Return: An int object. 返回:一个 int 对象。

Save 

audio.save(direction = "audio.wav")

Save the audio to specified path.
将音频保存到指定路径。

  • direction: The saving path.
    direction:保存路径。
ak.save_Audio_DoubleTracks(audio1 = track1, audio1 = track2, direction = "audio.wav")

Combine two audio with the same length and sampling rate into a dual channel audio and save it to the specified path.
将两个长度和采样率相同的音频合并为一个双声道音频,并保存到指定路径。

  • audio1: An Audio object for the first channel.
    audio1:第一个声道的 Audio 对象
  • audio2: An Audio object for the second channel.
    audio2:第二个声道的 Audio 对象
  • direction: The saving path.
    direction:保存路径。

Indexing and Slicing 索引和切片

Audio object supports one-dimensional indexes and slices. Any value used in the index represents the number of samples if it is an integer, or the time in seconds if it is a floating point number.
Audio 对象支持一维索引和切片。索引中使用的任何值都表示样本数(如果是整数)或时间(以秒为单位)(如果是浮点数)。

Return: 返回:

  • If indexing is performed, a sample value is returned.
    如果执行索引,则返回一个示例值。
  • If slicing is performed, an Audio object consisting of the sliced samples is returned, but the sample rate is not changed.
    如果执行切片,则返回由切片样本组成的 Audio 对象,但采样率不会更改。

Since the 1D index is overloaded, it is also possible to modify a specific part of the audio using the 1D index. It is possible to use a NumPy array as a source to modify, or another Audio object as a source. However, the sample rate of the source and target audio must be the same.
由于 1D 索引过载,因此也可以使用 1D 索引修改音频的特定部分。可以使用 NumPy 数组作为要修改的源,或使用另一个 Audio 对象作为源。但是,源音频和目标音频的采样率必须相同。

audio_slice = audio.timeSelect(start = 0, end = None, step = None, index_type = "t/s")

Selecting part of the audio.
选择音频的一部分。

  • start: 开始:

    • If index_type is "t/s" or "t/ms", then it will be the starting time stamp of slicing.
      如果 index_type 为 “t/s” 或 “t/ms”,则它将是切片的开始时间戳。
    • If index_type is "n", then it will be the starting sample count of slicing.
      如果 index_type 为 “n”,则它将是切片的起始样本计数。
    • Default = 0. 默认值 = 0。
  • end: 结束:

    • If index_type is "t/s" or "t/ms", then it will be the ending time stamp of slicing. Default = The last time stamp of the audio.
      如果 index_type 为 “t/s” 或 “t/ms”,则为切片的结束时间戳。默认值 = 音频的最后一个时间戳。
    • If index_type is "n", then it will be the ending sample count of slicing. Default = The total count of samples.
      如果 index_type 为 “n”,则它将是切片的结束样本计数。默认值 = 样本总数。
  • step: 步:

    • If index_type is "t/s" or "t/ms", then it will be the time step of slicing.
      如果 index_type 为 “t/s” 或 “t/ms”,则为切片的时间步长。
    • If index_type is "n", then it will be the samples step of slicing.
      如果 index_type 为 “n”,则这将是切片的样本步骤。
    • default = 1 sample. 默认值 = 1 个样本。
  • index_type: "t/s", "t/ms" or "n".
    index_type:“t/s”、“t/ms”或“n”。

Return: An Audio object of part of the audio.
返回值:音频部分的 Audio 对象

Concatenate 连接

audio = ak.concatenate(lis = [audio1,audio2,...])

Concatenating the Audio objects in a tuple or a list.
Audio 对象连接到元组或列表中。

  • lis: A tuple or a list for Audio objects.
    lis:Audio 对象的元组或列表。

Return: An Audio object.
Return:一个 Audio 对象

Synthesis 合成

audio = ak.synthesis(lis = [audio1,audio2,...])

Synthesis the Audio objects in a tuple or a list.
以元组或列表的形式合成 Audio 对象

  • lis: A tuple or a list for Audio objects.
    lis:Audio 对象的元组或列表。

Return: An Audio object.
Return:一个 Audio 对象

audio = ak.mixWithSNR(signal, noise, snr, maintain= "signal")

Mixing signal and noise with specific SNR(dB). The signal and noise should have the same sample rate.
以特定的 SNR(dB) 混合信号和噪声。信号和噪声应具有相同的采样率。

  • signal: An Audio object. The signal without noise.
    signal:一个 Audio 对象。无噪声的信号。

  • noise: An Audio object. Noise to mix with signal.
    noise:一个 Audio 对象。噪声与信号混合。

  • snr: A float object. The SNR(dB) between signal and noise.
    snr:一个 float 对象。信号和噪声之间的 SNR(dB)。

  • maintain: maintain="signal" to maintain the intensity of signal while maintain="noise" to maintain the intensity of noise.
    maintain: maintain=“signal” 保持信号强度,maintain =“noise” 保持噪声强度。

Return: An Audio object.
Return:一个 Audio 对象

Arithmetic operations 算术运算

Audio object overloads the arithmetic operations.
Audio 对象重载了算术运算。

audio1=audio1+audio2	#Audio overlay. The two audio should have the same length and sample rate. 
audio1=audio1+arr	#Audio can also overlay a numpy array, whose shape is the same as audio1.samples.shape
audio1=audio1-audio2	#Audio substract. The two audio should have the same length and sample rate. 
audio1=audio1-arr	#Audio can also substract a numpy array, whose shape is the same as audio1.samples.shape
audio1=audio1*audio2	#Audio modulate. The two audio should have the same length and sample rate. 
audio1=audio1*arr	#Audio can also be modulated by a numpy array, whose shape is the same as audio1.samples.shape
audio1=audio1/audio2	#Audio demodulate. The two audio should have the same length and sample rate. 
audio1=audio1/arr	#Audio can also be demodulated by a numpy array, whose shape is the same as audio1.samples.shape

audio1=audio1*value	#Using multiplication to amplify an audio, value is float object. 

audio1=audio1/value	#Using divide to attenuate an audio, value is float object. 

Amplify 放大

audio = audio.amplify(dB)

Amplify the audio with gain of dB.
以 dB 的增益放大音频。

  • dB: A float object for the gain in dB.
    dB:增益的浮点对象,单位为 dB。

Return: An Audio object.
Return:一个 Audio 对象

Pitch shift 音高偏移

audio = audio.pitch_shift(halfSteps) 

Shift the pitch of the audio. A possive value of halfSteps increase the frenquency of the audio.
改变音频的音高。正值 halfSteps 会增加音频的频率。

  • halfSteps: An int object for how much halfSteps to shift.
    halfSteps:一个 int 对象,用于表示要移动的 halfSteps 量。

Return: An Audio object.
Return:一个 Audio 对象

Resample 重新采样

audio = audio.resample(newRate)

Resample the audio with a new sample rate.
使用新的采样率对音频进行重新采样。

  • newRate: An int object for the new sample rate.
    newRate:新采样率的 int 对象。

Return: An Audio object.
Return:一个 Audio 对象

Add Gaussian White Noise 添加高斯白噪声

audio = audio.addWgn(dB)

Add Gaussian white noise of specific intensity.
添加特定强度的高斯白噪声。

  • snr: An float object for the white noise's intensity. The signal and white noise will have a signal-to-noise ratio measured in snr (in dB)
    snr:用于白噪声强度的浮点对象。信号和白噪声的信噪比以 snr (以 dB 为单位)

Return: An Audio object.
Return:一个 Audio 对象

Padding 填充

audio_padded = audio.padding(audioDuration, start = 0)

Zero-padding the audio to the given length.
将音频零填充到给定的长度。

  • audioDuration: If float, the duration of the padded audio (in seconds); if int, the number of samples of the padded audio.
    audioDuration:如果为 float,则为填充音频的持续时间(以秒为单位);如果为 int,则为填充音频的样本数。

  • start: The start position of the original audio in the padded audio. If float, it is the start time (in seconds); if int, it is the number of start samples.
    start:填充音频中原始音频的起始位置。如果 float ,则为开始时间(以秒为单位);如果为 int,则为 Start 样本数。

Return: An Audio object.
Return:一个 Audio 对象

Framing and Windowing 框架和窗口

audioFrames=audio.framing(frameDuration=0.03,overlapRate=0.5,window=None)

Framing and windowing the audio.
对音频进行框架和窗口化。

  • frameDuration: A float object for the duration of each frame (seconds) or an int object for the length of each frame (sample points).
    frameDuration:每帧持续时间(秒)的 float 对象或每帧长度(采样点)的 int 对象。

  • overlapRate: A float object in [0,1) for the overlapping rate of the frames.

  • window:

    • If string, it's the name of the window function (e.g., "hann")
    • If tuple, it's the name of the window function and any parameters (e.g., ("kaiser", 4.0))
    • If numeric, it is treated as the beta parameter of the "kaiser" window, as in scipy.signal.get_window.
    • If callable, it's a function that accepts one integer argument (the window length)
    • If list-like, it's a pre-computed window of the correct length Nx

Return: An AudioFrames object.

The AudioFrames object

To NumPy Array

array = audioFrames.samples

Get all samples in the audio, which is a K × M NumPy array, where K is the count of frames and M is the length of a frame in samples counts.

Return: NumPy array.

Get Properties

duration = audioFrames.getDuration()

Get the duration of the original audio (seconds).

Return: A float object.

samples_count = len(audioFrames)

Get the samples count of the original audio.

Return: An int object.

sr = audioFrames.sr

Get the sample rate of the original audio.

Return: An int object.

frame_count = audioFrames.getFrameCount()

Get the count of frames.

Return: An int object.

frame_length = audioFrames.getFrameLength()

Get the count of samples in each frame.

Return: An int object.

frame_count, frame_length = audioFrames.shape

Get the count of frames and the count of samples in each frame at the same time.

Return: A tuple of two int objects.

frame_duration = audioFrames.getFrameDuration()

Get the duration of each frame in seconds.

Return: A float object.

step_length = audioFrames.getStepLength()

Get the inter-frame step length in count of samples.

Return: A int object.

step_duration = audioFrames.getStepDuration()

Get the inter-frame step length in seconds.

Return: A float object.

Indexing and Slicing

AudioFrames object supports one or two-dimensional indexes and slices.

The first dimension is the frame dimension. Any value represents the number of samples in the original Audio object if it is an integer, or the time in the original Audio object in seconds if it is a floating point number.

The second index is the time dimension. Any value represents the number of samples in each frame if it is an integer, or the time in each frame in seconds if it is a floating point number.

Return:

  • If any indexing or slicing is done on the time dimension, a 2-D NumPy array will be returned.
  • If slicing on the frame dimension with step greater than the inter-frame step length, a 2-D NumPy array will be returned.
  • Otherwise, an AudioFrames object will be returned; in particular, if the frame dimension of the AudioFrames object is 1, it will be downscaled to an Audio object.

Retrieve

audio = audioframes.retrieve(method = "first_half")

Retrieve the AudioFrames object back to an Audio object.

  • method:
    • "last_half": When overlapping, preserve the last half of each frame.
    • "first_half": When overlapping, preserve the first half of each frame.

Return: An Audio object.

Time domain analyse

For more details of this section, please refer to: 2. Musical tone and Noise.ipynb and 4. Short-time Analysis Method for Audio Signals.ipynb

import pyAudioKits.analyse as aly

Methods

Power and Energy

power = aly.power(input, dB = False)

Calculate the power of the audio.

  • input: An Audio object or an AudioFrames object.

  • dB: Whether to express the result in the form of gain (dB).

Return:

  • If input is an Audio object: Power of the whole audio.

  • If input is an AudioFrames object: A frame_axis_ds object containing power of each frame.

energy = aly.energy(input, dB = False)

Calculate the energy of the audio.

  • input: An Audio object or an AudioFrames object.
  • dB: Whether to express the result in the form of gain (dB).

Return:

  • If input is an Audio object: Energy of the whole audio.
  • If input is an AudioFrames object: A frame_axis_ds object containing energy of each frame.
snr = aly.snr(signal, signalAndNoise)

Calculate the SNR(dB).

  • signal: An Audio object. The ground truth of the signal.
  • signalAndNoise: An Audio object. Signal mixed with noise.

Return: A float object of the SNR(dB).

Zero Crossing Rate

zerocrossingrate = aly.zerocrossing(input)

Calculate the zero crossing rate of the audio.

  • input: An AudioFrames object.

Return: A frame_axis_ds object containing zero crossing rate of each frame.

Auto-correlation

autocorrelation = aly.autocorr(input)

Calculate the auto correlation function of the audio.

  • input: An Audio object or an AudioFrames object.

Return:

  • If input is an Audio object: A time_delta_axis_ds object containing auto correlation function.

  • If input is an AudioFrames object: A time_delta_frame_axis_ds object containing short-time auto correlation result for each frames.

Statistic result data structures

frame_axis_ds object

result = frame_axis_ds.samples

Get statistics.

Return: A one-dimensional NumPy array whose length is equal to the number of frames of the AudioFrames object to statistic.

frame_axis_ds object supports one-dimensional indexes and slices. Any value used in the index represents the number of samples if it is an integer, or the time in seconds if it is a floating point number.

Return: An 1-D NumPy array for part of the result.

result = frame_axis_ds.frameSelect(start = 0, end = None, step = None, index_type = "t/s")

Selecting part of the result.

  • start:

    • If index_type is "t/s" or "t/ms", then it will be the starting time stamp of slicing. Default = 0.
    • If index_type is "n", then it will be the starting sample count of slicing. Default = 0.
    • If index_type is "frame", then it will be the starting frame count. Default = 0.
  • end:

    • If index_type is "t/s" or "t/ms", then it will be the ending time stamp of slicing. Default = The last time stamp of the audio.
    • If index_type is "n", then it will be the ending sample count of slicing. Default = The total count of samples.
    • If index_type is "frame", then it will be the ending frame count of slicing. Default = The total count of frames.
  • step:

    • If index_type is "t/s" or "t/ms", then it will be the time step of slicing.
    • If index_type is "n", then it will be the samples step of slicing.
    • If index_type is "frame", then it will be the frames count step of slicing.
    • Default = 1 frame.
  • index_type: "t/s", "t/ms", "n" or "frame".

Return: An 1-D NumPy array for part of the result.

frame_axis_ds.plot(start=0, end=None, ylim=None, ax=None, imgPath=None, xlabel="t/s")

To draw the per frame statistic result on the sub graph.

If no subgraph is passed in, it will be displayed directly.

If imgpath is passed in, the graph will be saved.

  • start:

    • If xlabel is "t/s" or "t/ms", then it will be the starting time stamp. Default = 0.
    • If xlabel is "n", then it will be the starting sample count. Default = 0.
    • If xlabel is "frame", then it will be the starting frame count. Default = 0.
  • end:

    • If xlabel is "t/s" or "t/ms", then it will be the ending time stamp. Default = The last time stamp of the audio.
    • If xlabel is "n", then it will be the ending sample count. Default = The total count of samples.
    • If xlabel is "frame", then it will be the ending frame count. Default = The total count of frames.
  • ylim: A tuple (y_start,y_end) for display range on y-axis. The default (None) is adaptive.

  • ax: A matplotlib.pyplot subplot to draw on.

  • imgPath: The path to save the graph.

  • xlabel: "t/s", "t/ms", "n" or "frame".

time_delta_axis_ds object

result = time_delta_axis_ds.samples

Get statistics.

Return: A one-dimensional NumPy array whose length is equal to 2 × L 1 , where L is the number of samples of the Audio object to statistic.

time_delta_axis_ds object supports one-dimensional indexes and slices. Any value used in the index represents the sample offset if it is an integer, or the time offset in seconds if it is a floating point number.

Return: An 1-D NumPy array for part of the result.

result = time_delta_axis_ds.timeSelect(start = 0, end = None, step = None, index_type = "t/s")

Selecting part of the result.

  • start:
    • If index_type is "t/s" or "t/ms", then it will be the starting time offset of slicing. Default = 0.
    • If index_type is "k", then it will be the starting sample offset of slicing. Default = 0.
  • end:
    • If index_type is "t/s" or "t/ms", then it will be the ending time offset of slicing. Default = The duration of the audio.
    • If index_type is "k", then it will be the ending sample offset of slicing. Default = The max samples count of the audio.
  • step:
    • If index_type is "t/s" or "t/ms", then it will be the time offset step of slicing.
    • If index_type is "k", then it will be the sample offset step of slicing.
    • default = 1 sample offset.
  • index_type: "t/s", "t/ms" or "k".

Return: An 1-D NumPy array for part of the result.

result = time_delta_axis_ds.plot(start=0, end=None, ylim=None, ax=None, imgPath=None, xlabel="t/s")

To draw the result on the sub graph.

If no subgraph is passed in, it will be displayed directly.

If imgpath is passed in, the graph will be saved.

  • start:

    • If xlabel is "t/s" or "t/ms", then it will be the starting time offset. Default = 0.
    • If xlabel is "k", then it will be the starting sample offset. Default = 0.
  • end:

    • If xlabel is "t/s" or "t/ms", then it will be the ending time offset. Default = The duration of the audio.
    • If xlabel is "k", then it will be the ending sample offset. Default = The max samples count of the audio.
  • ylim: A tuple (y_start,y_end) for display range on y-axis. The default (None) is adaptive.

  • ax: A matplotlib.pyplot subplot to draw on.

  • imgPath: The path to save the graph.

  • xlabel: "t/s", "t/ms" or "k".

time_delta_frame_axis_ds object

result = time_delta_frame_axis_ds.samples

Get statistics.

Return: A two-dimensional NumPy array whose shape is ( K , 2 × L 1 ) , where L is the number of samples in each frame and K is the number of frames of the AudioFrames object to statistic.

time_delta_frame_axis_ds object supports one or two-dimensional indexes and slices.

The first dimension is the frame dimension. Any value represents the number of samples in the original Audio object if it is an integer, or the time in the original Audio object in seconds if it is a floating point number.

The second dimension is the time offset dimension. Any value represents the samples offset in each frame if it is an integer, or the time offset in each frame in seconds if it is a floating point number.

Return: An 2-D NumPy array for part of the result.

result = time_delta_frame_axis_ds.frameSelect(start = 0, end = None, step = None, index_type = "t/s")

Selecting part of the result on the frame dimension.

  • start:

    • If index_type is "t/s" or "t/ms", then it will be the starting time stamp of slicing. Default = 0.
    • If index_type is "n", then it will be the starting sample count of slicing. Default = 0.
    • If index_type is "frame", then it will be the starting frame count. Default = 0.
  • end:

    • If index_type is "t/s" or "t/ms", then it will be the ending time stamp of slicing. Default = The last time stamp of the audio.
    • If index_type is "n", then it will be the ending sample count of slicing. Default = The total count of samples.
    • If index_type is "frame", then it will be the ending frame count of slicing. Default = The total count of frames.
  • step:

    • If index_type is "t/s" or "t/ms", then it will be the time step of slicing.
    • If index_type is "n", then it will be the samples step of slicing.
    • If index_type is "frame", then it will be the frames count step of slicing.
    • Default = 1 frame.
  • index_type: "t/s", "t/ms", "n" or "frame".

Return: An 2-D NumPy array for part of the result.

result = time_delta_frame_axis_ds.timeSelect(start = 0, end = None, step = None, index_type = "t/s")

Selecting part of the result on the time offset dimension.

  • start:

    • If index_type is "t/s" or "t/ms", then it will be the starting time offset of slicing. Default = 0.
    • If index_type is "k", then it will be the starting sample differenc of slicing. Default = 0.
  • end:

    • If index_type is "t/s" or "t/ms", then it will be the ending time offset of slicing. Default = The duration of the audio in each frame.
    • If index_type is "k", then it will be the ending sample differenc of slicing. Default = The max samples count of the audio in each frame.
  • step:

    • If index_type is "t/s" or "t/ms", then it will be the time offset step of slicing.
    • If index_type is "k", then it will be the sample offset step of slicing.
    • default = 1 sample offset.
  • index_type: "t/s", "t/ms" or "k".

Return: An 2-D NumPy array for part of the result.

time_delta_frame_axis_ds.plot(xstart=0, xend=None, ystart=0, yend=None, ax=None, imgPath=None, xlabel="t/s", ylabel="t/s", cbar=True)

To draw the per frame statistic result on the sub graph.

If no subgraph is passed in, it will be displayed directly.

If imgpath is passed in, the graph will be saved.

  • xstart:

    • If xlabel is "t/s" or "t/ms", then it will be the starting time stamp. Default = 0.
    • If xlabel is "n", then it will be the starting sample count. Default = 0.
    • If xlabel is "frame", then it will be the starting frame count. Default = 0.
  • xend:

    • If xlabel is "t/s" or "t/ms", then it will be the ending time stamp. Default = The last time stamp of the audio.

    • If xlabel is "n", then it will be the ending sample count. Default = The total count of samples.

    • If xlabel is "frame", then it will be the ending frame count. Default = The total count of frames.

  • ystart:

    • If ylabel is "t/s" or "t/ms", then it will be the starting time offset. Default = 0.

    • If ylabel is "k", then it will be the starting sample offset. Default = 0.

  • yend:

    • If ylabel is "t/s" or "t/ms", then it will be the ending time offset. Default = The duration of the audio.

    • If ylabel is "k", then it will be the ending sample offset. Default = The max samples count of the audio.

  • ax: A matplotlib.pyplot subplot to draw on.

  • imgPath: The path to save the graph.

  • xlabel: "t/s", "t/ms", "n" or "frame".

  • ylabel: "t/s", "t/ms" or "k"

  • cbar: True to show the color bar.

Frequency domain analyse

For more details of this section, please refer to: 3. Fourier Transform: from Time domain to Frequency domain.ipynb and 4. Short-time Analysis Method for Audio Signals.ipynb

import pyAudioKits.analyse as aly

Methods

FFT

spec = aly.FFT(input, N=None)

Calculate the FFT of the audio.

  • input: An Audio object or an AudioFrames object.
  • N: Num of Fourier Transform points. None if use all samples.

Return:

  • If input is an Audio object: A freq_axis_ds object containing spectrum.

  • If input is an AudioFrames object: A freq_frame_axis_ds object containing short-time spectrum.

Power Spectral Density

psd = aly.PSD(input, N=None, dB=False)

Calculate the power spectral density of the audio.

  • input: An Audio object or an AudioFrames object.
  • N: Num of Fourier Transform points. None if use all samples.
  • dB: Whether to express the output in gain (dB).

Return:

  • If input is an Audio object: A freq_axis_ds_real object containing power spectral density of the whole audio.

  • If input is an AudioFrames object: A freq_frame_axis_ds_real object containing power spectral density of each frame.

Spectral Entropy

specEnt = aly.specEntropy(input, N=None)

Calculate the spectral entropy of the audio.

  • input: An Audio object or an AudioFrames object.

  • N: Num of Fourier Transform points. None if use all samples.

Return:

  • If input is an Audio object: A float object for the spectral entropy of the whole audio.

  • If input is an AudioFrames object: A frame_axis_ds object containing spectral entropy of each frame.

Spectrum Peak

specPeak, peakAmp = aly.getMaxFrequency(input,N=None,dB=False)

Get the frequency and the amplitude of spectrum peak.

  • input: An Audio object or an AudioFrames object.
  • N: Num of Fourier Transform points. None if use all samples.
  • dB: Whether to express the output amplitude in gain (dB).

Return:

  • If input is an Audio object: The frequency and the amplitude of the spectrum peak of the whole audio.

  • If input is an AudioFrames object: A frame_axis_ds object containing the frequency of the spectrum peak and a frame_axis_ds object containing the amplitude of the spectrum peak.

Statistic result data structures

freq_axis_ds object

result = freq_axis_ds.samples
result = freq_axis_ds_real.samples

Get statistics spectrum.

Return: A one-dimensional NumPy array whose length is equal to N / 2 , where N is the number of points in FFT.

freq_axis_ds object supports one-dimensional indexes and slices. Any value used in the index represents the frequency point if it is an integer, or the frequency in Hz if it is a floating point number.

Return: An 1-D NumPy array for part of the result spectrum.

result = freq_axis_ds.freqSelect(start = 0, end = None, step = None, index_type = "frequency/Hz")
result = freq_axis_ds_real.freqSelect(start = 0, end = None, step = None, index_type = "frequency/Hz")

Selecting part of the spectrum.

  • start: The starting of slicing frequency. Meaning depends on index_type. Default = 0.

  • end: The ending of slicing frequency. Meaning depends on index_type. Default = Half of the sample rate.

  • step: The slicing step. Meaning depends on index_type. Default = 1 freq point.

  • index_type: "frequency/Hz", "frequency/(rad/s)", "normalized frequency/Hz", "normalized frequency/(rad/s)" or "freq point".

Return: An 1-D NumPy array for part of the spectrum.

freq_axis_ds.plot(start=0, end=None, ax=None, ylim=None, imgPath=None, xlabel="frequency/Hz", plot_type="amplitude",freq_scale="linear")
freq_axis_ds_real.plot(start=0, end=None, ylim=None, ax=None, imgPath=None, xlabel="frequency/Hz",freq_scale="linear")

To draw the statistic result spectrogram on the sub graph.

If no subgraph is passed in, it will be displayed directly.

If imgpath is passed in, the graph will be saved.

  • start: The starting frequency. Meaning depends on xlabel. Default = 0.

  • end: The ending frequency. Meaning depends on xlabel. Default = Half of the sample rate.

  • ylim: A tuple (y_start,y_end) for display range on y-axis. The default (None) is adaptive.

  • ax: A matplotlib.pyplot subplot to draw on.

  • imgPath: The path to save the graph.

  • xlabel: "frequency/Hz", "frequency/(rad/s)", "normalized frequency/Hz", "normalized frequency/(rad/s)" or "freq point".

  • plot_type: "amplitude", "dB" or "phase". (plot function of freq_axis_ds_real doesn't have this parameter)

  • freq_scale: "linear", "log" or "mel".

freq_frame_axis_ds object

result = freq_frame_axis_ds.samples
result = freq_frame_axis_ds_real.samples

Get statistics short-time spectrum.

Return: A two-dimensional NumPy array whose shape is ( K , N / 2 ) , where N is the number of points in FFT and K is the number of frames of the AudioFrames object to statistic.

freq_frame_axis_ds object supports one or two-dimensional indexes and slices.

The first dimension is the frame dimension. Any value represents the number of samples in the original Audio object if it is an integer, or the time in the original Audio object in seconds if it is a floating point number.

The second index is the frequncy dimension. Any value represents the frequency point in each frame if it is an integer, or the frequency in each frame in Hz if it is a floating point number.

Return: An 2-D NumPy array for part of the short-time spectrum.

result = freq_frame_axis_ds.freqSelect(start = 0, end = None, step = None, index_type = "frequency/Hz")
result = freq_frame_axis_ds_real.freqSelect(start = 0, end = None, step = None, index_type = "frequency/Hz")

Selecting part of the short-time spectrum on the frequency dimension.

  • start: The starting of slicing frequency. Meaning depends on index_type. Default = 0.

  • end: The ending of slicing frequency. Meaning depends on index_type. Default = Half of the sample rate.

  • step: The slicing step. Meaning depends on index_type. Default = 1 freq point.

  • index_type: "frequency/Hz", "frequency/(rad/s)", "normalized frequency/Hz", "normalized frequency/(rad/s)" or "freq point".

Return: An 2-D NumPy array for part of the short-time spectrum.

result = freq_frame_axis_ds.frameSelect(start = 0, end = None, step = None, index_type = "frequency/Hz")
result = freq_frame_axis_ds_real.frameSelect(start = 0, end = None, step = None, index_type = "frequency/Hz")

Selecting part of the short-time spectrum on the frame dimension.

  • start:

    • If index_type is "t/s" or "t/ms", then it will be the starting time stamp of slicing. Default = 0.
    • If index_type is "n", then it will be the starting sample count of slicing. Default = 0.
    • If index_type is "frame", then it will be the starting frame count. Default = 0.
  • end:

    • If index_type is "t/s" or "t/ms", then it will be the ending time stamp of slicing. Default = The last time stamp in each frame.
    • If index_type is "n", then it will be the ending sample count of slicing. Default = The total count of samples in each frame.
    • If index_type is "frame", then it will be the ending frame count of slicing. Default = The total count of frames.
  • step:

    • If index_type is "t/s" or "t/ms", then it will be the time step of slicing.
    • If index_type is "n", then it will be the samples step of slicing.
    • If index_type is "frame", then it will be the frames count step of slicing.
    • Default = 1 frame.
  • index_type: "t/s", "t/ms", "n" or "frame".

Return: An 2-D NumPy array for part of the short-time spectrum.

freq_frame_axis_ds.plot(xstart=0, xend=None, ystart=0, yend=None, ax=None, imgPath=None, xlabel="t/s", ylabel="frequency/Hz", plot_type="amplitude", cbar=True, freq_scale="linear")
freq_frame_axis_ds_real.plot(xstart=0, xend=None, ystart=0, yend=None, ax=None, imgPath=None, xlabel="t/s", ylabel="frequency/Hz", cbar=True, freq_scale="linear")

To draw the short-time spectrogram on the sub graph.

If no subgraph is passed in, it will be displayed directly.

If imgpath is passed in, the graph will be saved.

  • xstart:

    • If xlabel is "t/s" or "t/ms", then it will be the starting time stamp. Default = 0.
    • If xlabel is "n", then it will be the starting sample count. Default = 0.
    • If xlabel is "frame", then it will be the starting frame count. Default = 0.
  • xend:

    • If xlabel is "t/s" or "t/ms", then it will be the ending time stamp. Default = The last time stamp in each frame.
    • If xlabel is "n", then it will be the ending sample count. Default = The total count of samples in each frame.
    • If xlabel is "frame", then it will be the ending frame count. Default = The total count of frames.
  • ystart: The starting frequency. Meaning depends on ylabel. Default = 0.

  • yend: The ending frequency. Meaning depends on ylabel. Default = Half of the sample rate.

  • ax: A matplotlib.pyplot subplot to draw on.

  • imgPath: The path to save the graph.

  • xlabel: "t/s", "t/ms", "n" or "frame".

  • ylabel: "frequency/Hz", "frequency/(rad/s)", "normalized frequency/Hz", "normalized frequency/(rad/s)" or "freq point".

  • plot_type: "amplitude", "dB" or "phase". (plot function of freq_frame_axis_ds_real doesn't have this parameter)

  • cbar: True to show the color bar.

  • freq_scale: "linear", "log" or "mel".

Model based analyse

For more details of this section, please refer to: 6. Endpoint Detection and Speech Recognition.ipynb

import pyAudioKits.analyse as aly

MFCC

mfcc_feats = aly.MFCC(input,p=13,diff1=True,diff2=True,energy=True,frameDuration = 0.03, overlapRate = 0.5)

Calculate the MFCC features of the audio.

  • input: An Audio object.

  • p: MFCC order.

  • frameDuration: A float object for the duration of each frame (seconds) or a int object for the length of each frame (sample points).

  • overlapRate: A float object in [0,1) for the overlapping rate of the frames.

  • diff1: Use first-order differential features.

  • diff2: Use second-order differential features.

  • energy: Use energy features.

Return: A 2-D NumPy array of MFCC features. Each row will be MFCC features of one frame.

FBank

fbank_feats = aly.fBank(input, filters = 26, frameDuration = 0.03, overlapRate = 0.5)

Calculate the Fbank features of the Audio.

  • input: An Audio object.
  • filters: Number of mel filters applied.
  • frameDuration: A float object for the duration of each frame (seconds) or a int object for the length of each frame (sample points).
  • overlapRate: A float object in [0,1) for the overlapping rate of the frames.

Return: A 2-D NumPy array of FBank features. Each row will be FBank features of one frame.

Mel Spectrogram

melspec_feats = aly.melSpec(input, spec_h=128, spec_w=128)

Calculate the Mel spectrogram features of the Audio.

  • input: An Audio object.
  • spec_h: The height of the mel spectrogram features, which determines the frequency resolution.
  • spec_w: The width of the mel spectrogram features, which determines the temporal resolution.

Return: A 2-D NumPy array of Mel spectrogram features.

LPC

es, ws = LPC(input, p=10)

Matlab style LPC for each frame.

  • input: An AudioFrames object.
  • p: LPC order.

Return: A list object of LPC error of each frame and a list object of LPC coefficient of each frame.

Filter

For more details of this section, please refer to: 5. LTI Filter.ipynb

import pyAudioKits.filters as flt

General Design

output = flt.ltiFilter(input,numerators,denominators,zero_phase=False)

LTI filter design by specifying the denominator and numerator coefficients of the system function.

  • input: An Audio object or an AudioFrames object.
  • numerators: An Numpy array of the numerator coefficients of the system function.
  • denominators: An Numpy array of the denominator coefficients of the system function.
  • zero_phase: Use bi-directional filtering to maintain a phase response of 0.

Return:

  • An Audio object if the input is an Audio object.
  • An AudioFrames object if the input is an AudioFrames object.

Butterworth Filter Design

output=flt.lowPassFilterN(input,n,f,freq_type = "frequency/Hz", zero_phase = True)	#N order low pass Butterworth filter. 
output=flt.highPassFilterN(input,n,f,freq_type = "frequency/Hz", zero_phase = True)	#N order high pass Butterworth filter.
output=flt.bandPassFilterN(input,n,fLow,fHigh,freq_type = "frequency/Hz", zero_phase = True)	#N order band pass Butterworth filter.
output=flt.bandStopFilterN(input,n,fLow,fHigh,freq_type = "frequency/Hz", zero_phase = True)	#N order band stop Butterworth filter.
output=flt.lowPassFilter(input,fPass,fStop,ripplePass,rippleStop,freq_type = "frequency/Hz", zero_phase = True)	#Low pass Butterworth filter with specified ripple. 
output=flt.highPassFilter(input,fPass,fStop,ripplePass,rippleStop,freq_type = "frequency/Hz", zero_phase = True)	#High pass Butterworth filter with specified ripple. 
output=flt.bandPassFilter(input,fLowPass,fLowStop,fHighPass,fHighStop,ripplePass,rippleStop,freq_type = "frequency/Hz", zero_phase = True)	#Band pass Butterworth filter with specified ripple. 
output=flt.bandStopFilter(input,fLowPass,fLowStop,fHighPass,fHighStop,ripplePass,rippleStop,freq_type = "frequency/Hz", zero_phase = True)	#Band stop Butterworth filter with specified ripple. 
  • input: An Audio object or an AudioFrames object.
  • n: The order.
  • f、fLow、fHigh: The cut-off frequency.
  • fPass: The passband frequency.
  • fStop: The stopband frequency.
  • ripplePass: The passband ripple. The signal will loses no more than ripplePass dB in the passband.
  • rippleStop: The stopband ripple. The signal will have at least rippleStop dB attenuation in the stopband.
  • freq_type: "frequency/Hz"[0,sr/2), "frequency/(rad/s)"[0,sr * π), "normalized frequency/Hz"[0,1) or "normalized frequency/(rad/s)"[0,π)
  • Zero_phase: Use bi-directional filtering to maintain a phase response of 0.

Return:

  • An Audio object if the input is an Audio object.
  • An AudioFrames object if the input is an AudioFrames object.

Algorithm

import pyAudioKits.algorithm as alg

Speech Endpoint Detection

For more details of this section, please refer to: 6. Endpoint Detection and Speech Recognition.ipynb

vad_result = alg.VAD(input,energyThresLow,energyThresHigh,zerocrossingThres,frameDuration = 0.03,overlapRate=0.5)

Speech endpoint detection based on double threshold method.

  • input: An Audio object.

  • energyThresLow: A lower threshold of energy for distinguish between silence and voice.

  • energyThresHigh: A higher threshold of energy for distinguish between unvoiced and voiced.

  • zerocrossingThres: Zero crossing rate threshold.

  • frameDuration: A float object for the duration of each frame (seconds) or a int object for the length of each frame (sample points).

  • overlapRate: A float object in [0,1) for the overlapping rate of the frames.

Return: A VAD object.

vad_result.plot(imgPath=None)

Visualize the result of VAD. Save the figure if imgPath is given, otherwise display directly. The blue parts are silence, the magenta parts are unvoice and the orange parts are voice.

  • imgPath: The direction to save the figure.
vad_result.slices()

Return the slices of voices in audio.

Return: A list for Audio objects.

Speech Enhancement

For more details of this section, please refer to: 7. Speech enhancement: spectral subtraction, Wiener and Kalman.ipynb

Spectrum subtraction 频谱减法

output = alg.specSubstract(input, noise, beta=0.002, frameDuration = 0.03, overlapRate = 0.5, window = None)

Using spectral subtraction to reduce noise.
使用频谱减法来减少噪声。

  • input: An Audio object of signal + noise.
    input:信号 + 噪声的 Audio 对象

  • noise: An Audio object of estimate noise.
    noise:估计噪声的 Audio 对象

  • beta: The beta parameter.
    beta:beta 参数。

  • frameDuration: A float object for the duration of each frame (seconds) or a int object for the length of each frame (sample points).
    frameDuration:每帧持续时间(秒)的 float 对象或每帧长度(采样点)的 int 对象。

  • overlapRate: A float object in [0,1) for the overlapping rate of the frames.
    overlapRate:[0,1) 中的浮点对象,表示帧的重叠率。

  • window: 窗:

    • If string, it's the name of the window function (e.g., "hann")
      如果是 string,则它是窗口函数的名称(例如,“hann”)
    • If tuple, it's the name of the window function and any parameters (e.g., ("kaiser", 4.0))
      如果是 tuple,则它是窗口函数的名称和任何参数(例如,(“kaiser”, 4.0))
    • If numeric, it is treated as the beta parameter of the "kaiser" window, as in scipy.signal.get_window.
      如果是 numeric,则将其视为 “kaiser” 窗口的 beta 参数,如 scipy.signal.get_window。
    • If callable, it's a function that accepts one integer argument (the window length)
      如果可调用,则它是一个接受一个整数参数(窗口长度)的函数
    • If list-like, it's a pre-computed window of the correct length Nx
      如果是 list-like,则它是正确长度 Nx 的预计算窗口

Return: An Audio object of filtered signal.
Return:过滤信号的 Audio 对象

Wiener Filtering 维纳过滤

output = alg.wienerFilter(observed_signal,desired_signal,h_length=200, frameDuration = 0.03, overlapRate = 0.5, window = None)

Using Wiener filtering to reduce noise.
使用 Wiener 滤波来降低噪声。

  • observed_signal: An Audio object of signal + noise.
    observed_signal:信号 + 噪声的 Audio 对象

  • desired_signal: An Audio object or estimated signal.
    desired_signal:Audio 对象或估计的信号。

  • h_length: Orders. h_length:订单。

  • frameDuration: A float object for the duration of each frame (seconds) or a int object for the length of each frame (sample points).
    frameDuration:每帧持续时间(秒)的 float 对象或每帧长度(采样点)的 int 对象。

  • overlapRate: A float object in [0,1) for the overlapping rate of the frames.
    overlapRate:[0,1) 中的浮点对象,表示帧的重叠率。

  • window: 窗:

    • If string, it's the name of the window function (e.g., "hann")
      如果是 string,则它是窗口函数的名称(例如,“hann”)

    • If tuple, it's the name of the window function and any parameters (e.g., ("kaiser", 4.0))
      如果是 tuple,则它是窗口函数的名称和任何参数(例如,(“kaiser”, 4.0))

    • If numeric, it is treated as the beta parameter of the "kaiser" window, as in scipy.signal.get_window.
      如果是 numeric,则将其视为 “kaiser” 窗口的 beta 参数,如 scipy.signal.get_window。

    • If callable, it's a function that accepts one integer argument (the window length)
      如果可调用,则它是一个接受一个整数参数(窗口长度)的函数

    • If list-like, it's a pre-computed window of the correct length Nx
      如果是 list-like,则它是正确长度 Nx 的预计算窗口

Return: An Audio object of filtered signal.
Return:过滤信号的 Audio 对象

Kalman Filtering 卡尔曼滤波

output = alg.kalmanFilter(input,noise,numIter=7,p=20, frameDuration = 0.05)

Using Kalman filtering to reduce noise.
使用卡尔曼滤波来降低噪声。

  • input: An Audio object of signal + noise.
    input:信号 + 噪声的 Audio 对象。
  • noise: An Audio object of estimate noise.
    noise:估计噪声的 Audio 对象。
  • numIter: Iterating times.
    numIter:迭代时间。
  • p: Orders. p:订单。
  • frameDuration: A float object for the duration of each frame (seconds) or a int object for the length of each frame (sample points).
    frameDuration:每帧持续时间(秒)的 float 对象或每帧长度(采样点)的 int 对象。

Return: An Audio object of filtered signal.
Return:过滤信号的 Audio 对象

Speech Recognition 语音识别

For more details of this section, please refer to: 6. Endpoint Detection and Speech Recognition.ipynb
有关此部分的更多详细信息,请参阅:6。终结点检测和语音识别.ipynb

DTW

distance = alg.dtw(M1, M2)

Use DTW to calculate the similarity distance between two MFCC features.
使用 DTW 计算两个 MFCC 特征之间的相似距离。

  • M1: The first MFCC feature.
    M1:第一个 MFCC 功能。

  • M2: The first MFCC feature.
    M2:第一个 MFCC 功能。

Return: A float object of the similarity distance between two MFCC features.
返回:两个 MFCC 特征之间相似距离的浮点对象。

GMM+HMM

gmmhmm_model = alg.GMMHMM(features, labels, n_iter = 10)

Construct and train a GMM+HMM model.
构建和训练 GMM+HMM 模型。

  • features: A list consisting of MFCC features.
    features:由 MFCC 特征组成的列表。
  • labels: The label corresponding to each MFCC feature in the features list.
    labels:与特征列表中每个 MFCC 特征对应的标签。
  • n_iter: Iterating times. n_iter:迭代时间。

Return: A GMMHMM object. 返回:GMMHMM 对象。

predicted_labels = gmmhmm_model.predict(features)

Use the trained GMM+HMM model to predict the labels on test set.
使用经过训练的 GMM+HMM 模型来预测测试集上的标签。

  • features: A list consisting of MFCC features.
    features:由 MFCC 特征组成的列表。

Return: A list of predicted labels.
返回:预测标签列表。

About

Powerful Python audio workflow support based on librosa and other libraries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published