scaled_dot_product_attention#

ivy.scaled_dot_product_attention(q, k, v, scale, /, *, mask=None, out=None)[source]#

Apply scaled dot product attention to inputs x using optional mask.

Parameters:
  • q (Union[Array, NativeArray]) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.

  • k (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.

  • v (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.

  • scale (float) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.

  • mask (Optional[Union[Array, NativeArray]]) – The mask input array. The mask to apply to the query-key values. Default is (default: None) None. The shape of mask input should be in [batch_shape,num_queries,num_keys].

  • out (Optional[Array]) – optional output array, for writing the result to. It must have a shape that the (default: None) inputs broadcast to.

Return type:

Array

Returns:

  • ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

  • Both the description and the type hints above assumes an array input for simplicity,

  • but this function is nestable, and therefore also accepts ivy.Container

  • instances in place of any of the arguments.

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1)
>>> print(result)
ivy.array([[[4.04,5.03],[4.3,5.3],[4.3,5.3]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1, mask=mask)
>>> print(result)
ivy.array([[[2.3, 3.23],[2.3, 3.23],[2.3, 3.23]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q, k, v, 1, out=out)
>>> print(out)
ivy.array([[[4.04, 5.03],[4.3 , 5.3 ],[4.3 , 5.3 ]]])
>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1, mask=mask)
>>> print(result)
ivy.array([[[2.3, 3.23],[2.3, 3.23],[2.3, 3.23]]])
>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q, k, v, 1, out=out)
>>> print(out)
ivy.array([[[4.04, 5.03],[4.3 , 5.3 ],[4.3 , 5.3 ]]])

With ivy.Container input:

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1)
>>> print(result)
{
    a:ivy.array([[[4.27, 5.4],[4.4, 5.6],[4.4, 5.6]]]),
    b:ivy.array([[[4.35, 5.54],[4.4, 5.6],[4.4, 5.6]]])
}
>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> mask =
... ivy.Container(a=ivy.array([[[1.0, 1.0, 1.0],[1.0, 1.0, 1.0],[1.0, 1.0, 1.0]]]),
...               b=ivy.array([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0,1.0]]]))
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1, mask=mask)
>>> print(result)
{
    a: ivy.array([[[4.27, 5.4],
                   [4.4, 5.6],
                   [4.4, 5.6]]]),
    b: ivy.array([[[4.35, 5.54],
                   [4.4, 5.6],
                   [4.4, 5.6]]])
}

With a mix of ivy.Array and ivy.NativeArray inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1)
>>> print(result)
ivy.array([[
        [4.04, 5.03],
        [4.3 , 5.3 ],
        [4.3 , 5.3 ]
    ]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q, k, v, 1, out=out)
>>> print(out)
ivy.array([[[4.04, 5.03],[4.3 , 5.3 ],[4.3 , 5.3 ]]])

With a mix of ivy.Array and ivy.Container inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1)
>>> print(result)
{
    a: ivy.array([[[4.14, 5.13],
                   [4.3, 5.3],
                   [4.3, 5.3]]]),
    b: ivy.array([[[4.09, 5.08],
                   [4.3, 5.3],
                   [4.3, 5.3]]])
}
With a mix of :class:`ivy.Array` and :class:`ivy.Container` inputs:
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3],[4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6],[4.0, 5.6]]]))
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, 1)
>>> print(result)
{
    a: ivy.array([[[4.14, 5.13],
                [4.3, 5.3],
                [4.3, 5.3]]]),
    b: ivy.array([[[4.09, 5.08],
                [4.3, 5.3],
                [4.3, 5.3]]])
}
Array.scaled_dot_product_attention(self, k, v, scale, /, *, mask=None, out=None)#

ivy.Array instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:
  • self (Array) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.

  • k (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.

  • v (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.

  • scale (float) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.

  • mask (Optional[Union[Array, NativeArray]]) – The mask input array. The mask to apply to the query-key values. (default: None) Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].

  • out (Optional[Array]) – optional output array, for writing the result to. It must have a shape (default: None) that the inputs broadcast to.

Return type:

Array

Returns:

ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]])
>>> result = q.scaled_dot_product_attention(k, v, 1, mask=mask)
>>> print(result)
ivy.array([[[2.3, 3.23],[2.3, 3.23],[2.3, 3.23]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> q.scaled_dot_product_attention(k, v, 1, mask=mask, out=out)
>>> print(out)
ivy.array([[[2.3, 3.23],[2.3, 3.23],[2.3, 3.23]]])
Container.scaled_dot_product_attention(self, k, v, scale, /, *, mask=None, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, out=None)#

ivy.Container instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:
  • self (Container) – The queries input container. The shape of queries input array leaves should be in [batch_shape,num_queries,feat_dim]. The queries input array leaves should have the same size as keys and values.

  • k (Union[Array, NativeArray, Container]) – The keys input array container. The shape of keys input array leaves should be in [batch_shape,num_keys,feat_dim]. The keys input array leaves should have the same size as queries and values.

  • v (Union[Array, NativeArray, Container]) – The values input array container. The shape of values input array leaves should be in [batch_shape,num_keys,feat_dim]. The values input array leaves should have the same size as queries and keys.

  • scale (float) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.

  • mask (Optional[Union[Array, NativeArray, Container]]) – The mask input array/container. The mask to apply to the query-key values. (default: None) Default is None. The shape of mask input array leaves should be in [batch_shape,num_queries,num_keys].

  • key_chains (Optional[Union[List[str], Dict[str, str]]]) – The key-chains to apply or not apply the method to. Default is None. (default: None)

  • to_apply (bool) – If True, the method will be applied to key_chains, otherwise key_chains (default: True) will be skipped. Default is True.

  • prune_unapplied (bool) – Whether to prune key_chains for which the function was not applied. (default: False) Default is False.

  • map_sequences (bool) – Whether to also map method to sequences (lists, tuples). (default: False) Default is False.

  • out (Optional[Container]) – optional output array, for writing the result to. It must have a shape (default: None) that the inputs broadcast to.

Return type:

Container

Returns:

ret – The output container following applications of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Container input:

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3],[4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.],[4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]))
>>> mask =
... ivy.Container(a=ivy.array([[[1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0],
...                             [1.0, 1.0,1.0]]]),
...               b=ivy.array([[[1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0],
...                             [1.0, 1.0,1.0]]]))
>>> result = q.scaled_dot_product_attention(k,
                                            v,
                                            1,
                                            mask=mask)
>>> print(result)
{
    a: ivy.array([[[4.27, 5.4],
                [4.4, 5.6],
                [4.4, 5.6]]]),
    b: ivy.array([[[4.35, 5.54],
                [4.4, 5.6],
                [4.4, 5.6]]])
}