conv1d
Author: Heli Qi Affiliation: NAIST Date: 2022.09
Conv1dPostnet
Bases: Module
The Conv1d postnet for TTS. Usually used after the Transformer TTS decoder. This prenet is made up of only Conv1d blocks each of which contains: 1. a Conv1d layer 2. a BatchNorm1d layer 3. an activation function 4. a Dropout layer.
Reference
Neural Speech Synthesis with Transformer Network https://ojs.aaai.org/index.php/AAAI/article/view/4642/4520
Source code in speechain/module/postnet/conv1d.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
forward(feat, feat_len)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feat
|
Tensor
|
(batch, feat_maxlen, feat_dim) The input feature tensors. |
required |
feat_len
|
Tensor
|
(batch,) The length of each feature tensor. |
required |
feat, feat_len
Type | Description |
---|---|
The embedded feature vectors with their lengths. |
Source code in speechain/module/postnet/conv1d.py
module_init(feat_dim=None, conv_dims=[512, 512, 512, 512, 0], conv_kernel=5, conv_stride=1, conv_padding_mode='same', conv_batchnorm=True, conv_activation='Tanh', conv_dropout=None, zero_centered=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feat_dim
|
int
|
int The dimension of input acoustic feature tensors. Used for calculating the in_features of the first Linear layer. |
None
|
conv_dims
|
int or List[int]
|
List[int] or int The values of out_channels of each Conv1d layer. If a list of integers is given, multiple Conv1d layers will be initialized. If an integer is given, there will be only one Conv1d layer -1: same size as the previous convolutional layer's dim 0: same size as the input feat_dim |
[512, 512, 512, 512, 0]
|
conv_kernel
|
int
|
int The value of kernel_size of all Conv1d layers. |
5
|
conv_stride
|
int
|
int The value of stride of all Conv1d layers. |
1
|
conv_padding_mode
|
str
|
str The padding mode of convolutional layers. Must be one of ['valid', 'full', 'same', 'causal']. |
'same'
|
conv_batchnorm
|
bool
|
bool Whether a BatchNorm1d layer is added right after a Conv1d layer |
True
|
conv_activation
|
str
|
str The type of the activation function after all Conv1d layers. None means no activation function is needed. |
'Tanh'
|
conv_dropout
|
float or List[float]
|
float or List[float] The values of p rate of the Dropout layer after each Linear layer. |
None
|
zero_centered
|
bool
|
bool Whether the output of this module is centered at 0. If the specified activation function changes the centroid of the output distribution, e.g. ReLU and LeakyReLU, the activation function won't be attached to the final Linear layer if zer_centered is set to True. |
False
|
Source code in speechain/module/postnet/conv1d.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|