scaled_dot_product_attention#

ivy.scaled_dot_product_attention(query, key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#

Apply scaled dot product attention to inputs x using optional mask.

Parameters:
  • query (Union[Array, NativeArray]) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.

  • key (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.

  • value (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.

  • scale (Optional[float], default: None) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.

  • mask (Optional[Union[Array, NativeArray]], default: None) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].

  • dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied

  • is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.

  • training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.

  • out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

  • ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

  • Both the description and the type hints above assumes an array input for simplicity,

  • but this function is nestable, and therefore also accepts ivy.Container

  • instances in place of any of the arguments.

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q,
...                                           k,
...                                           v,
...                                           scale=1,
...                                           dropout_p=0.1,
...                                           is_causal=True,
...                                           training=True)
>>> print(result)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)
ivy.array([[[2.30000019, 3.23333359],

[2.30000019, 3.23333359], [2.30000019, 3.23333359]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q,
...                                  k,
...                                  v,
...                                  scale=1,
...                                  dropout_p=0.1,
...                                  is_causal=True,
...                                  training=True,
...                                  out=out)
>>> print(out)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)

ivy.array([[[2.30000019, 3.23333359], … [2.30000019, 3.23333359], … [2.30000019, 3.23333359]]])

>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q,
...                                  k,
...                                  v,
...                                  scale=1,
...                                  dropout_p=0.1,
...                                  is_causal=True,
...                                  training=True,
...                                  out=out)
>>> print(out)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

With ivy.Container input:

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> result = ivy.scaled_dot_product_attention(q,
...                                           k,
...                                           v,
...                                           scale=1,
...                                           dropout_p=0.1,
...                                           is_causal=True,
...                                           training=True)
>>> print(result)
{
    a: ivy.array([[[5.19999981, 1.],
    ...            [2.59249449, 2.68226194],
    ...            [4.4000001, 5.5999999]]]),
    b: ivy.array([[[0.2, 1.],
    ...            [2.19603825, 2.9960382],
    ...            [4.4000001, 5.5999999]]])
}
>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> mask = ivy.Container(
...     a=ivy.array([[[1.0, 1.0, 1.0],[1.0, 1.0, 1.0],[1.0, 1.0, 1.0]]]),
...     b=ivy.array([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0,1.0]]])
... )
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)
{
    a: ivy.array([[[4.26894283, 5.40236187],
    ...            [4.39999437, 5.59999037],
    ...            [4.4000001, 5.5999999]]]),
    b: ivy.array([[[4.35046196, 5.54282808],
    ...            [4.39989519, 5.5998764],
    ...            [4.4000001, 5.5999999]]])
}

With a mix of ivy.Array and ivy.NativeArray inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q,
...                                            k,
...                                            v,
...                                            scale=1,
...                                            dropout_p=0.1,
...                                            is_causal=True,
...                                            training=True)
>>> print(result)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q,k,v,scale=1,out=out)
>>> print(out)
ivy.array([[[4.03946018, 5.0280633 ],
...         [4.29981947, 5.29981089],
...         [4.30000019, 5.30000019]]])

With a mix of ivy.Array and ivy.Container inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,is_causal=True)
>>> print(result)
{
    a: ivy.array([[[0.40000001, 1.29999995],
    ...            [2.06345534, 2.9634552],
    ...            [4.30000019, 5.30000019]]]),
    b: ivy.array([[[0.40000001, 1.29999995],
    ...            [2.19336844, 3.09336829],
    ...            [4.30000019, 5.30000019]]])
}
With a mix of :class:`ivy.Array` and :class:`ivy.Container` inputs:
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3],[4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6],[4.0, 5.6]]]))
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,
...                                           k,
...                                           v,
...                                           scale=1,
...                                           mask=mask,
...                                           dropout_p=0.1,
...                                           training=True)
>>> print(result)
{
    a: ivy.array([[[2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359]]]),
    b: ivy.array([[[2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359]]])
}
Array.scaled_dot_product_attention(self, key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#

ivy.Array instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:
  • self (Array) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.

  • key (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.

  • value (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.

  • scale (Optional[float], default: None) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.

  • mask (Optional[Union[Array, NativeArray]], default: None) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].

  • dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied

  • is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.

  • training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.

  • out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                           is_causal=True, training=True)
>>> print(result)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1, mask=mask)
>>> print(result)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                  is_causal=True, training=True, out=out)
>>> print(out)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])
Container.scaled_dot_product_attention(self, key, value, /, *, scale, mask=None, dropout_p=0.0, is_causal=False, training=False, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, out=None)[source]#

ivy.Container instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:
  • self (Container) – The queries input container. The shape of queries input array leaves should be in [batch_shape,num_queries,feat_dim]. The queries input array leaves should have the same size as keys and values.

  • key (Union[Array, NativeArray, Container]) – The keys input array container. The shape of keys input array leaves should be in [batch_shape,num_keys,feat_dim]. The keys input array leaves should have the same size as queries and values.

  • value (Union[Array, NativeArray, Container]) – The values input array container. The shape of values input array leaves should be in [batch_shape,num_keys,feat_dim]. The values input array leaves should have the same size as queries and keys.

  • scale (Union[float, Container]) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.

  • mask (Optional[Union[Array, NativeArray, Container]], default: None) – The mask input array/container. The mask to apply to the query-key values. Default is None. The shape of mask input array leaves should be in [batch_shape,num_queries,num_keys].

  • dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied

  • is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.

  • training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.

  • key_chains (Optional[Union[List[str], Dict[str, str], Container]], default: None) – The key-chains to apply or not apply the method to. Default is None.

  • to_apply (Union[bool, Container], default: True) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default is True.

  • prune_unapplied (Union[bool, Container], default: False) – Whether to prune key_chains for which the function was not applied. Default is False.

  • map_sequences (Union[bool, Container], default: False) – Whether to also map method to sequences (lists, tuples). Default is False.

  • out (Optional[Container], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The output container following applications of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Container input:

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                           is_causal=True, training=True)
>>> print(result)
{
    a: ivy.array([[[5.19999981, 1.],
                   [2.59249449, 2.68226194],
                   [4.4000001, 5.5999999]]]),
    b: ivy.array([[[0.2, 1.],
                   [2.19603825, 2.9960382],
                   [4.4000001, 5.5999999]]])
}
>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> mask =
... ivy.Container(a=ivy.array([[[1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0]]]),
...               b=ivy.array([[[1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0],
...                             [1.0, 1.0,1.0]]]))
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)
{
    a: ivy.array([[[4.26894283, 5.40236187],
                   [4.39999437, 5.59999037],
                   [4.4000001, 5.5999999]]]),
    b: ivy.array([[[4.35046196, 5.54282808],
                   [4.39989519, 5.5998764],
                   [4.4000001, 5.5999999]]])

}