feat_util
Author: Heli Qi Affiliation: NAIST Date: 2022.12
convert_wav_to_logmel(wav, n_mels, hop_length, win_length, sr=16000, n_fft=None, preemphasis=None, pre_stft_norm=None, window='hann', center=True, mag_spec=False, fmin=0.0, fmax=None, clamp=1e-10, logging=True, log_base=10.0, htk=False, norm='slaney', delta_order=0, delta_N=2)
For the details about the arguments and returns, please refer to ${SPEECHAIN_ROOT}/speechain/module/frontend/speech2mel.py.
Source code in speechain/utilbox/feat_util.py
convert_wav_to_mfcc(wav, hop_length, win_length, num_ceps=None, n_mfcc=20, sr=16000, n_fft=None, n_mels=80, preemphasis=None, pre_stft_norm=None, window='hann', center=True, fmin=0.0, fmax=None, clamp=1e-10, logging=True, log_base=10.0, htk=False, norm='slaney', delta_order=0, delta_N=2)
For the details about the arguments and returns, please refer to ${SPEECHAIN_ROOT}/speechain/utilbox/feat_util.convert_wav_to_logmel() and librosa.feature.mfcc.
Source code in speechain/utilbox/feat_util.py
convert_wav_to_pitch(wav, hop_length=256, sr=22050, f0min=80, f0max=400, continuous_f0=True, return_tensor=False)
The function that converts a waveform to a pitch contour by dio & stonemask of pyworld.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wav
|
ndarray or Tensor
|
(n_sample, 1) or (n_sample,) The waveform to be processed. |
required |
hop_length
|
int or float
|
int = 256 The value of the argument 'hop_length' given to pyworld.dio() |
256
|
sr
|
int
|
int = 22050 The value of the argument 'fs' given to pyworld.dio() |
22050
|
f0min
|
int
|
int = 80 The value of the argument 'f0min' given to pyworld.dio() |
80
|
f0max
|
int
|
int = 400 The value of the argument 'f0max' given to pyworld.dio() |
400
|
continuous_f0
|
bool
|
bool = True Whether to make the calculated pitch values continuous over time. |
True
|
return_tensor
|
bool
|
bool Whether to return the pitch in torch.Tensor. If False, np.ndarray will be returned. |
False
|
Returns:
Name | Type | Description |
---|---|---|
f0 |
ndarray or Tensor
|
(n_frame,) |
Source code in speechain/utilbox/feat_util.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 |
|
convert_wav_to_stft(wav, hop_length, win_length, sr=16000, n_fft=None, preemphasis=None, pre_stft_norm=None, window='hann', center=True, mag_spec=False, clamp=1e-10, logging=False, log_base=None)
For the details about the arguments and returns, please refer to ${SPEECHAIN_ROOT}/speechain/module/frontend/speech2linear.py.