attention
Origin: Sashi Novitasari Modification: Heli Qi Affiliation: NAIST Date: 2022.07
MultiHeadedAttention
Bases: Module
A Multi-Head Attention layer has
· Query linear layer · Key linear layer · Value linear layer · Softmax layer · Attention Dropout layer · Output linear layer
Implementation modified from OpenNMT-py. https://github.com/OpenNMT/OpenNMT-py
Source code in speechain/module/transformer/attention.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
forward(k, v, q, mask=None)
Computes multi-headed attention.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k
|
Tensor
|
keys [B, M, D] with M being the sentence length. |
required |
v
|
Tensor
|
values [B, M, D] |
required |
q
|
Tensor
|
query [B, M, D] |
required |
mask
|
Tensor
|
optional mask [B, 1, M] |
None
|
Returns:
Source code in speechain/module/transformer/attention.py
module_init(num_heads, d_model, dropout=0.1, scale_dp_by_head=False)
Create a multi-headed attention layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_heads
|
int
|
The number of heads |
required |
d_model
|
int
|
Model size (must be divisible by num_heads) |
required |
dropout
|
float
|
The dropout rate of the Dropout layer after the softmax operation |
0.1
|