Mel spectrogram - MATLAB melSpectrogram - MathWorks 中国 (2024)

Table of Contents
Syntax Description Examples Calculate Mel Spectrogram Calculate Mel Spectrums of 2048-Point Windows Get Filter Bank Center Frequencies and Analysis Window Time Instants Input Arguments audioIn — Audio input column vector | matrix fs — Input sample rate (Hz) positive scalar Name-Value Arguments Window — Window applied in time domain hamming(round(fs*0.03),'periodic') (default) | vector OverlapLength — Analysis window overlap length (samples) round(0.02*fs) (default) | integer in the range [0, (numel(Window) - 1)] FFTLength — Number of DFT points numel(Window) (default) | positive integer NumBands — Number of mel bandpass filters 32 (default) | positive integer FrequencyRange — Frequency range over which to compute mel spectrogram (Hz) [0 fs/2] (default) | two-element row vector SpectrumType — Type of mel spectrogram "power" (default) | "magnitude" WindowNormalization — Apply window normalization true (default) | false FilterBankNormalization — Type of filter bank normalization "bandwidth" (default) | "area" | "none" MelStyle — Mel style "oshaughnessy" (default) | "slaney" ApplyLog — Apply logarithm false (default) | true Output Arguments S — Mel spectrogram column vector | matrix | 3-D array F — Center frequencies of mel bandpass filters (Hz) row vector T — Location of each window of audio (s) row vector Algorithms Filter Bank Design References Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™. GPU ArraysAccelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Version History R2024a: Apply logarithm to mel spectrogram R2023b: Support for Slaney-style mel scale R2023a: Generate optimized C/C++ code for computing mel spectrogram R2020b: WindowLength will be removed in a future release See Also Topics MATLAB 命令 Americas Europe Asia Pacific References

Mel spectrogram

collapse all in page

Syntax

S = melSpectrogram(audioIn,fs)

S = melSpectrogram(audioIn,fs,Name=Value)

[S,F,T] = melSpectrogram(___)

melSpectrogram(___)

Description

example

S = melSpectrogram(audioIn,fs) returns the mel spectrogram of the audio input at sample rate fs. The function treats columns of the input as individual channels.

example

S = melSpectrogram(audioIn,fs,Name=Value) specifies options using one or more name-value arguments.

example

[S,F,T] = melSpectrogram(___) returns the center frequencies of the bands in Hz and the location of each window of data in seconds. The location corresponds to the center of each window. You can use this output syntax with any of the previous input syntaxes.

example

melSpectrogram(___) plots the mel spectrogram on a surface in the current figure.

Examples

collapse all

Calculate Mel Spectrogram

Open Live Script

Use the default settings to calculate the mel spectrogram for an entire audio file. Print the number of bandpass filters in the filter bank and the number of frames in the mel spectrogram.

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');S = melSpectrogram(audioIn,fs);[numBands,numFrames] = size(S);fprintf("Number of bandpass filters in filterbank: %d\n",numBands)
Number of bandpass filters in filterbank: 32
fprintf("Number of frames in spectrogram: %d\n",numFrames)
Number of frames in spectrogram: 1551

Plot the mel spectrogram.

melSpectrogram(audioIn,fs)

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (1)

Calculate Mel Spectrums of 2048-Point Windows

Open Live Script

Calculate the mel spectrums of 2048-point periodic Hann windows with 1024-point overlap. Convert to the frequency domain using a 4096-point FFT. Pass the frequency-domain representation through 64 half-overlapped triangular bandpass filters that span the range 62.5 Hz to 8 kHz.

[audioIn,fs] = audioread('FunkyDrums-44p1-stereo-25secs.mp3');S = melSpectrogram(audioIn,fs, ... 'Window',hann(2048,'periodic'), ... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3]);

Call melSpectrogram again, this time with no output arguments so that you can visualize the mel spectrogram. The input audio is a multichannel signal. If you call melSpectrogram with a multichannel input and with no output arguments, only the first channel is plotted.

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (2)

Get Filter Bank Center Frequencies and Analysis Window Time Instants

Open Live Script

melSpectrogram applies a frequency-domain filter bank to audio signals that are windowed in time. You can get the center frequencies of the filters and the time instants corresponding to the analysis windows as the second and third output arguments from melSpectrogram.

Get the mel spectrogram, filter bank center frequencies, and analysis window time instants of a multichannel audio signal. Use the center frequencies and time instants to plot the mel spectrogram for each channel.

[audioIn,fs] = audioread('AudioArray-16-16-4channels-20secs.wav');[S,cF,t] = melSpectrogram(audioIn,fs);S = 10*log10(S+eps); % Convert to dB for plottingfor i = 1:size(S,3) figure(i) surf(t,cF,S(:,:,i),'EdgeColor','none'); xlabel('Time (s)') ylabel('Frequency (Hz)') view([0,90]) title(sprintf('Channel %d',i)) axis([t(1) t(end) cF(1) cF(end)])end

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (3)

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (4)

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (5)

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (6)

Input Arguments

collapse all

audioInAudio input
column vector | matrix

Audio input, specified as a column vector or matrix. If specified as a matrix, the function treats columns as independent audio channels.

Data Types: single | double

fsInput sample rate (Hz)
positive scalar

Input sample rate in Hz, specified as a positive scalar.

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: FFTLength=1024

WindowWindow applied in time domain
hamming(round(fs*0.03),'periodic') (default) | vector

Window applied in time domain, specified as a real vector. The number of elements in the vector must be in the range [1,size(audioIn,1)]. The number of elements in the vector must also be greater than OverlapLength.

Data Types: single | double

OverlapLengthAnalysis window overlap length (samples)
round(0.02*fs) (default) | integer in the range [0, (numel(Window) - 1)]

Analysis window overlap length in samples, specified as an integer in the range [0, (numel(Window) - 1)].

Data Types: single | double

FFTLengthNumber of DFT points
numel(Window) (default) | positive integer

Number of points used to calculate the DFT, specified as a positive integer greater than or equal to the length of Window. If unspecified, FFTLength defaults to the length of Window.

Data Types: single | double

FrequencyRangeFrequency range over which to compute mel spectrogram (Hz)
[0 fs/2] (default) | two-element row vector

Frequency range over which to compute the mel spectrogram in Hz, specified as a two-element row vector of monotonically increasing values in the range [0, fs/2].

Data Types: single | double

SpectrumTypeType of mel spectrogram
"power" (default) | "magnitude"

Type of mel spectrogram, specified as "power" or "magnitude".

Data Types: char | string

WindowNormalizationApply window normalization
true (default) | false

Apply window normalization, specified as true or false. When WindowNormalization is set to true, the power (or magnitude) in the mel spectrogram is normalized to remove the power (or magnitude) of the time domain Window.

Data Types: char | string

FilterBankNormalizationType of filter bank normalization
"bandwidth" (default) | "area" | "none"

Type of filter bank normalization, specified as "bandwidth", "area", or "none".

Data Types: char | string

MelStyleMel style
"oshaughnessy" (default) | "slaney"

Mel style, specified as "oshaughnessy" or "slaney".

Data Types: char | string

ApplyLogApply logarithm
false (default) | true

Apply base 10 logarithm to the returned mel spectrogram, specified as true or false.

Data Types: logical

Output Arguments

collapse all

S — Mel spectrogram
column vector | matrix | 3-D array

Mel spectrogram, returned as a column vector, matrix, or 3-D array. The dimensions of S are L-by-M-by-N, where:

  • L is the number of frequency bins in each mel spectrum. NumBands and fs determine L.

  • M is the number of frames the audio signal is partitioned into. size(audioIn,1), the length of Window, and OverlapLength determine M.

  • N is the number of channels such that N = size(audioIn,2).

Trailing singleton dimensions are removed from the output S.

Data Types: single | double

F — Center frequencies of mel bandpass filters (Hz)
row vector

Center frequencies of mel bandpass filters in Hz, returned as a row vector with length size(S,1).

Data Types: single | double

T — Location of each window of audio (s)
row vector

Location of each analysis window of audio in seconds, returned as a row vector length size(S,2). The location corresponds to the center of each window.

Data Types: single | double

Algorithms

collapse all

The melSpectrogram function follows the general algorithm to compute a mel spectrogram as described in [1].

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (7)

In this algorithm, the audio input is first buffered into frames of numel(Window) number of samples. The frames are overlapped by OverlapLength number of samples. The specified Window is applied to each frame, and then the frame is converted to frequency-domain representation with FFTLength number of points. The frequency-domain representation can be either magnitude or power, specified by SpectrumType. If WindowNormalization is set to true, the spectrum is normalized by the window. Each frame of the frequency-domain representation passes through a mel filter bank. The spectral values output from the mel filter bank are summed, and then the channels are concatenated so that each frame is transformed to a NumBands-element column vector.

Filter Bank Design

The mel filter bank is designed as half-overlapped triangular filters equally spaced on the mel scale. NumBands controls the number of mel bandpass filters. FrequencyRange controls the band edges of the first and last filters in the mel filter bank. FilterBankNormalization specifies the type of normalization applied to the individual bands.

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (8)

The mel scale can be in the O'Shaughnessy style, which follows [2], or the Slaney style, which follows [3].

References

[1] Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.

[2] O'Shaughnessy, Douglas. Speech Communication: Human and Machine. Reading, MA: Addison-Wesley Publishing Company, 1987.

[3] Slaney, Malcolm. "Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work." Technical Report, Version 2, Interval Research Corporation, 1998.

Extended Capabilities

The melSpectrogram function supports optimized code generation using single instruction, multiple data (SIMD) instructions. For more information about SIMD code generation, see Generate SIMD Code from MATLAB Functions for Intel Platforms (MATLAB Coder).

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

Introduced in R2019a

expand all

Set the ApplyLog name-value argument to true to apply a base 10 logarithm to the spectrogram.

Set the MelStyle name-value argument to "slaney" to use the Slaney-style mel scale.

melSpectrogram supports optimized C/C++ code generation using single instruction, multiple data (SIMD) instructions.

See Also

spectrogram | mfcc | gtcc | mdct | audioFeatureExtractor

Topics

  • Train Speech Command Recognition Model Using Deep Learning

MATLAB 命令

您点击的链接对应于以下 MATLAB 命令:

 

请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。

Mel spectrogram - MATLAB melSpectrogram- MathWorks 中国 (9)

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

Europe

Asia Pacific

Contact your local office

Mel spectrogram - MATLAB melSpectrogram
- MathWorks 中国 (2024)

References

Top Articles
Latest Posts
Article information

Author: Reed Wilderman

Last Updated:

Views: 5918

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Reed Wilderman

Birthday: 1992-06-14

Address: 998 Estell Village, Lake Oscarberg, SD 48713-6877

Phone: +21813267449721

Job: Technology Engineer

Hobby: Swimming, Do it yourself, Beekeeping, Lapidary, Cosplaying, Hiking, Graffiti

Introduction: My name is Reed Wilderman, I am a faithful, bright, lucky, adventurous, lively, rich, vast person who loves writing and wants to share my knowledge and understanding with you.