var_pred
Conv1dVarPredictor
Bases: Module
The Conv1d variance predictor for FastSpeech2. This module is made up of: 1. (mandatory) The Conv1d part contains two or more Conv1d blocks which are composed of the components below 1. (mandatory) a Conv1d layer 2. (mandatory) a ReLU function 3. (mandatory) a LayerNorm layer 4. (mandatory) a Dropout layer. 2. (mandatory) The Linear part contains one Linear block which is composed of the component below 1. (mandatory) a Linear layer
Reference
Fastspeech 2: Fast and high-quality end-to-end text to speech https://arxiv.org/pdf/2006.04558
Source code in speechain/module/prenet/var_pred.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
emb_pred_scalar(pred_scalar)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pred_scalar
|
Tensor
|
(batch, feat_maxlen, 1) or (batch, feat_maxlen) The predicted scalar vectors calculated in the forward(). |
required |
Source code in speechain/module/prenet/var_pred.py
forward(feat, feat_len)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feat
|
Tensor
|
(batch, feat_maxlen, feat_dim) The input feature tensors. |
required |
feat_len
|
Tensor
|
(batch,) The length of each feature tensor. |
required |
feat, feat_len
Type | Description |
---|---|
The embedded feature vectors with their lengths. |
Source code in speechain/module/prenet/var_pred.py
module_init(feat_dim=None, conv_dims=[256, 256], conv_kernel=3, conv_stride=1, use_gate=False, conv_dropout=0.5, use_conv_emb=True, conv_emb_kernel=1, conv_emb_dropout=0.0)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feat_dim
|
int
|
int The dimension of input acoustic feature tensors. Used for calculating the in_features of the first Linear layer. |
None
|
conv_dims
|
int or List[int]
|
List[int] or int The values of out_channels of each Conv1d layer. If a list of integers is given, multiple Conv1d layers will be initialized. If an integer is given, there will be only one Conv1d layer |
[256, 256]
|
conv_kernel
|
int
|
int The value of kernel_size of all Conv1d layers. |
3
|
conv_stride
|
int
|
int The value of stride of all Conv1d layers. |
1
|
conv_dropout
|
float or List[float]
|
float or List[float] The values of p rate of the Dropout layer after each Linear layer. |
0.5
|
use_conv_emb
|
bool
|
bool Whether to embed the predicted scalar back to an embedding vector. This argument needs to be False for duration predictor. |
True
|
conv_emb_kernel
|
int
|
int The value of kernel_size for the conv1d embedding layer. Only effective when use_conv_emb is True. |
1
|
conv_emb_dropout
|
float
|
float The value of p reate of the Dropout layer after the conv1d embedding layer. Only effective when use_conv_emb is True. |
0.0
|
Source code in speechain/module/prenet/var_pred.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
LayerNorm
Bases: LayerNorm
Layer normalization module. Borrowed from https://github.com/espnet/espnet/blob/master/espnet/nets/pytorch_backend/transformer/layer_norm.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nout
|
int
|
Output dim size. |
required |
dim
|
int
|
Dimension to be normalized. |
-1
|
Source code in speechain/module/prenet/var_pred.py
__init__(nout, dim=-1)
forward(x)
Apply layer normalization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns: torch.Tensor: Normalized tensor.