Layers#

class ivy.data_classes.array.layers._ArrayWithLayers[source]#

Bases: ABC

_abc_impl = <_abc._abc_data object>#

conv1d(filters, strides, padding, /, *, data_format='NWC', filter_format='channel_last', x_dilations=1, dilations=1, bias=None, out=None)[source]#

ivy.Array instance method variant of ivy.conv1d. This method simply wraps the function, and so the docstring for ivy.conv1d also applies to this method with minimal changes.

Parameters:

self (Array) – Input image [batch_size,w,d_in] or [batch_size,d_in,w].
filters (Union[Array, NativeArray]) – Convolution filters [fw,d_in,d_out].
strides (Union[int, Tuple[int]]) – The stride of the sliding window for each dimension of input.
padding (str) – “SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.
data_format (str, default: 'NWC') – “NWC” or “NCW”. Defaults to “NWC”.
filter_format (str, default: 'channel_last') –
Either “channel_first” or “channel_last”. Defaults to “channel_last”. x_dilations

The dilation factor for each dimension of input. (Default value = 1)
dilations (Union[int, Tuple[int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
bias (Optional[Array], default: None) – Bias array of shape [d_out].
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the convolution operation.

Examples

>>> x = ivy.array([[[1., 2.], [3., 4.], [6., 7.], [9., 11.]]])  # NWC
>>> filters = ivy.array([[[0., 1.], [1., 1.]]])  # WIO (I == C)
>>> result = x.conv1d(filters, (1,), 'VALID')
>>> print(result)
ivy.array([[[ 2.,  3.],
...         [ 4.,  7.],
...         [ 7., 13.],
...         [11., 20.]]])

conv1d_transpose(filters, strides, padding, /, *, output_shape=None, filter_format='channel_last', data_format='NWC', dilations=1, bias=None, out=None)[source]#

ivy.Array instance method variant of ivy.conv1d_transpose. This method simply wraps the function, and so the docstring for ivy.conv1d_transpose also applies to this method with minimal changes.

Parameters:

self (Array) – Input image [batch_size,w,d_in] or [batch_size,d_in,w].
filters (Union[Array, NativeArray]) – Convolution filters [fw,d_out,d_in].
strides (Union[int, Tuple[int]]) – The stride of the sliding window for each dimension of input.
padding (str) – either the string ‘SAME’ (padding with zeros evenly), the string ‘VALID’ (no padding), or a sequence of n (low, high) integer pairs that give the padding to apply before and after each spatial dimension.
output_shape (Optional[Union[Shape, NativeShape]], default: None) – Shape of the output (Default value = None)
filter_format (str, default: 'channel_last') – Either “channel_first” or “channel_last”. “channel_first” corresponds to “IOW”,input data formats, while “channel_last” corresponds to “WOI”.
data_format (str, default: 'NWC') – The ordering of the dimensions in the input, one of “NWC” or “NCW”. “NWC” corresponds to input with shape (batch_size, width, channels), while “NCW” corresponds to input with shape (batch_size, channels, width).
dilations (Union[int, Tuple[int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
bias (Optional[Array], default: None) – Bias array of shape [d_out].
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the transpose convolution operation.

Examples

>>> x = ivy.array([[[1., 2.], [3., 4.], [6., 7.], [9., 11.]]])  # NWC
>>> filters = ivy.array([[[0., 1.], [1., 1.]]])  # WIO (I == C)
>>> result = x.conv1d_transpose(filters, (1,), 'VALID')
>>> print(result)
ivy.array([[[ 2.,  3.],
...         [ 4.,  7.],
...         [ 7., 13.],
...         [11., 20.]]])

conv2d(filters, strides, padding, /, *, data_format='NHWC', filter_format='channel_last', x_dilations=1, dilations=1, bias=None, out=None)[source]#

ivy.Array instance method variant of ivy.conv2d. This method simply wraps the function, and so the docstring for ivy.conv2d also applies to this method with minimal changes.

Parameters:

self (Array) – Input image [batch_size,h,w,d_in] or [batch_size,d_in,h,w].
filters (Union[Array, NativeArray]) – Convolution filters [fh,fw,d_in,d_out].
strides (Union[int, Tuple[int, int]]) – The stride of the sliding window for each dimension of input.
padding (str) – “SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.
data_format (str, default: 'NHWC') – “NHWC” or “NCHW”. Defaults to “NHWC”.
dilations (Union[int, Tuple[int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
filter_format (str, default: 'channel_last') – Either “channel_first” or “channel_last”. Defaults to “channel_last”.
x_dilations (Union[int, Tuple[int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
bias (Optional[Container], default: None) – Bias array of shape [d_out].
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the convolution operation.

Examples

>>> x = ivy.array([[[[1.], [2.0],[3.]],
...                 [[1.], [2.0],[3.]],
...                 [[1.], [2.0],[3.]]]]) #NHWC
>>> filters = ivy.array([[[[0.]], [[1.]], [[0.]]],
...                      [[[0.]], [[1.]], [[0.]]],
...                      [[[0.]], [[1.]], [[0.]]]]) #HWIO
>>> result = x.conv2d(filters, 1, 'SAME', data_format='NHWC',
...    dilations= 1)
>>> print(result)
ivy.array([[
          [[2.],[4.],[6.]],
          [[3.],[6.],[9.]],
          [[2.],[4.],[6.]]
          ]])

conv2d_transpose(filters, strides, padding, /, *, output_shape=None, filter_format='channel_last', data_format='NHWC', dilations=1, out=None, bias=None)[source]#

ivy.Array instance method variant of ivy.conv2d_transpose. This method simply wraps the function, and so the docstring for ivy.conv2d_transpose also applies to this method with minimal changes.

Parameters:

self (Array) – Input image [batch_size,h,w,d_in] or [batch_size,d_in,h,w].
filters (Union[Array, NativeArray]) – Convolution filters [fh,fw,d_out,d_in].
strides (Union[int, Tuple[int, int]]) – The stride of the sliding window for each dimension of input.
padding (str) – “SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.
output_shape (Optional[Union[Shape, NativeShape]], default: None) – Shape of the output (Default value = None)
filter_format (str, default: 'channel_last') – Either “channel_first” or “channel_last”. “channel_first” corresponds to “IOHW”,input data formats, while “channel_last” corresponds to “HWOI”.
data_format (str, default: 'NHWC') – The ordering of the dimensions in the input, one of “NHWC” or “NCHW”. “NHWC” corresponds to inputs with shape (batch_size, height, width, channels), while “NCHW” corresponds to input with shape (batch_size, channels, height, width). Default is "NHWC".
dilations (Union[int, Tuple[int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
bias (Optional[Array], default: None) – Bias array of shape [d_out].
out (Optional[Array], default: None) – Optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the transpose convolution operation.

Examples

>>> x = ivy.random_normal(mean=0, std=1, shape=[1, 28, 28, 3])
>>> filters = ivy.random_normal(mean=0, std=1, shape=[3, 3, 6, 3])
>>> y = x.conv2d_transpose(filters,2,'SAME',)
>>> print(y.shape)
(1, 56, 56, 6)

conv3d(filters, strides, padding, /, *, data_format='NDHWC', filter_format='channel_last', x_dilations=1, dilations=1, bias=None, out=None)[source]#

ivy.Array instance method variant of ivy.conv3d. This method simply wraps the function, and so the docstring for ivy.conv3d also applies to this method with minimal changes.

Parameters:

x – Input volume [batch_size,d,h,w,d_in].
filters (Union[Array, NativeArray]) – Convolution filters [fd,fh,fw,d_in,d_out].
strides (Union[int, Tuple[int, int, int]]) – The stride of the sliding window for each dimension of input.
padding (str) – “SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.
data_format (str, default: 'NDHWC') – “NDHWC” or “NCDHW”. Defaults to “NDHWC”.
filter_format (str, default: 'channel_last') – Either “channel_first” or “channel_last”. Defaults to “channel_last”.
x_dilations (Union[int, Tuple[int, int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
dilations (Union[int, Tuple[int, int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
bias (Optional[Array], default: None) – Bias array of shape [d_out].
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the convolution operation.

Examples

>>> x = ivy.ones((1, 3, 3, 3, 1)).astype(ivy.float32)

>>> filters = ivy.ones((1, 3, 3, 1, 1)).astype(ivy.float32)

>>> result = x.conv3d(filters, 2, 'SAME')
>>> print(result)
ivy.array([[[[[4.],[4.]],[[4.],[4.]]],[[[4.],[4.]],[[4.],[4.]]]]])

conv3d_transpose(filters, strides, padding, /, *, output_shape=None, filter_format='channel_last', data_format='NDHWC', dilations=1, bias=None, out=None)[source]#

ivy.Array instance method variant of ivy.conv3d_transpose. This method simply wraps the function, and so the docstring for ivy.conv3d_transpose also applies to this method with minimal changes.

Parameters:

self (Array) – Input volume [batch_size,d,h,w,d_in] or [batch_size,d_in,d,h,w].
filters (Union[Array, NativeArray]) – Convolution filters [fd,fh,fw,d_out,d_in].
strides (Union[int, Tuple[int], Tuple[int, int], Tuple[int, int, int]]) – The stride of the sliding window for each dimension of input.
padding (Union[str, List[int]]) – “SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.
output_shape (Optional[Union[Shape, NativeShape]], default: None) – Shape of the output (Default value = None)
filter_format (str, default: 'channel_last') – Either “channel_first” or “channel_last”. “channel_first” corresponds to “IODHW”,input data formats, while “channel_last” corresponds to “DHWOI”.
data_format (str, default: 'NDHWC') –
The ordering of the dimensions in the input, one of “NDHWC” or “NCDHW”. “NDHWC” corresponds to inputs with shape (batch_size,

depth, height, width, channels), while “NCDHW” corresponds to input with shape (batch_size, channels, depth, height, width).
dilations (Union[int, Tuple[int], Tuple[int, int], Tuple[int, int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
bias (Optional[Array], default: None) – Bias array of shape [d_out].
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the transpose convolution operation.

Examples

>>> x = ivy.random_normal(mean=0, std=1, shape=[1, 3, 28, 28, 3])
>>> filters = ivy.random_normal(mean=0, std=1, shape=[3, 3, 3, 6, 3])
>>> y = x.conv3d_transpose(filters, 2, 'SAME')
>>> print(y.shape)
(1, 6, 56, 56, 6)

depthwise_conv2d(filters, strides, padding, /, *, data_format='NHWC', dilations=1, out=None)[source]#

ivy.Array instance method variant of ivy.depthwise_conv2d. This method simply wraps the function, and so the docstring for ivy.depthwise_conv2d also applies to this method with minimal changes.

Parameters:

self (Array) – Input image [batch_size,h,w,d].
filters (Union[Array, NativeArray]) – Convolution filters [fh,fw,d_in]. (d_in must be the same as d from self)
strides (Union[int, Tuple[int], Tuple[int, int]]) – The stride of the sliding window for each dimension of input.
padding (Union[str, List[int]]) – “SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.
data_format (str, default: 'NHWC') – “NHWC” or “NCHW”. Defaults to “NHWC”.
dilations (Union[int, Tuple[int], Tuple[int, int]], default: 1) – The dilation factor for each dimension of input. (Default value = 1)
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The result of the convolution operation.

Examples

>>> x = ivy.randint(0, 255, shape=(1, 128, 128, 3)).astype(ivy.float32) / 255.0
>>> filters = ivy.random_normal(mean=0, std=1, shape=[3, 3, 3])
>>> y = x.depthwise_conv2d(filters, 2, 'SAME')
>>> print(y.shape)
(1, 64, 64, 3)

dropout(prob, /, *, scale=True, dtype=None, training=True, seed=None, noise_shape=None, out=None)[source]#

ivy.Array instance method variant of ivy.dropout. This method simply wraps the function, and so the docstring for ivy.dropout also applies to this method with minimal changes.

Parameters:

self (Array) – The input array x to perform dropout on.
prob (float) – The probability of zeroing out each array element, float between 0 and 1.
scale (bool, default: True) – Whether to scale the output by 1/(1-prob), default is True.
dtype (Optional[Union[Dtype, NativeDtype]], default: None) – output array data type. If dtype is None, the output array data type must be inferred from x. Default: None.
training (bool, default: True) – Turn on dropout if training, turn off otherwise. Default is True.
seed (Optional[int], default: None) – Set a default seed for random number generating (for reproducibility).Default is None.
noise_shape (Optional[Sequence[int]], default: None) – a sequence representing the shape of the binary dropout mask that will be multiplied with the input.
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – Result array of the output after dropout is performed.

Examples

With ivy.Array instances:

>>> x = ivy.array([[1., 2., 3.],
...                [4., 5., 6.],
...                [7., 8., 9.],
...                [10., 11., 12.]])
>>> y = x.dropout(0.3)
>>> print(y)
ivy.array([[ 1.42857146,  2.85714293,  4.28571415],
           [ 5.71428585,  7.14285755,  8.5714283 ],
           [ 0.        , 11.4285717 , 12.8571434 ],
           [14.2857151 ,  0.        ,  0.        ]])

>>> x = ivy.array([[1., 2., 3.],
...                [4., 5., 6.],
...                [7., 8., 9.],
...                [10., 11., 12.]])
>>> y = x.dropout(0.3, scale=False)
>>> print(y)
ivy.array([[ 1.,  2., 3.],
           [ 4.,  5., 0.],
           [ 7.,  0., 9.],
           [10., 11., 0.]])

dropout1d(prob, /, *, training=True, data_format='NWC', out=None)[source]#

ivy.Array instance method variant of ivy.dropout1d. This method simply wraps the function, and so the docstring for ivy.dropout1d also applies to this method with minimal changes.

Parameters:

self (Array) – The input array x to perform dropout on.
prob (float) – The probability of zeroing out each array element, float between 0 and 1.
training (bool, default: True) – Turn on dropout if training, turn off otherwise. Default is True.
data_format (str, default: 'NWC') – “NWC” or “NCW”. Default is "NWC".
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – Result array of the output after dropout is performed.

Examples

>>> x = ivy.array([1, 1, 1]).reshape([1, 1, 3])
>>> y = x.dropout1d(0.5)
>>> print(y)
ivy.array([[[2., 0, 2.]]])

dropout2d(prob, /, *, training=True, data_format='NHWC', out=None)[source]#

ivy.Array instance method variant of ivy.dropout2d. This method simply wraps the function, and so the docstring for ivy.dropout1d also applies to this method with minimal changes.

Parameters:

self (Array) – The input array x to perform dropout on.
prob (float) – The probability of zeroing out each array element, float between 0 and 1.
training (bool, default: True) – Turn on dropout if training, turn off otherwise. Default is True.
data_format (str, default: 'NHWC') – “NHWC” or “NCHW”. Default is "NHWC".
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – Result array of the output after dropout is performed.

Examples

>>> x = ivy.array([[1, 1, 1], [2, 2, 2]])
>>> y = x.dropout2d(0.5)
>>> print(y)
ivy.array([[0., 0., 2.],
       [4., 4., 4.]])

dropout3d(prob, /, *, training=True, data_format='NDHWC', out=None)[source]#

ivy.Array instance method variant of ivy.dropout3d. This method simply wraps the function, and so the docstring for ivy.dropout3d also applies to this method with minimal changes.

Parameters:

self (Array) – The input array x to perform dropout on.
prob (float) – The probability of zeroing out each array element, float between 0 and 1.
training (bool, default: True) – Turn on dropout if training, turn off otherwise. Default is True.
data_format (str, default: 'NDHWC') – “NDHWC” or “NCDHW”. Default is "NDHWC".
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – Result array of the output after dropout is performed.

linear(weight, /, *, bias=None, out=None)[source]#

ivy.Array instance method variant of ivy.linear. This method simply wraps the function, and so the docstring for ivy.linear also applies to this method with minimal changes.

Parameters:

self (Array) – The input array to compute linear transformation on. [outer_batch_shape,inner_batch_shape,in_features]
weight (Union[Array, NativeArray]) – The weight matrix. [outer_batch_shape,out_features,in_features]
bias (Optional[Union[Array, NativeArray]], default: None) – The bias vector, default is None. [outer_batch_shape,out_features]
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – Result array of the linear transformation. [outer_batch_shape,inner_batch_shape,out_features]

Examples

>>> x = ivy.array([[1.1, 2.2, 3.3],                            [4.4, 5.5, 6.6],                            [7.7, 8.8, 9.9]])
>>> w = ivy.array([[1., 2., 3.],                            [4., 5., 6.],                            [7., 8., 9.]])
>>> b = ivy.array([1., 0., -1.])
>>> y = x.linear(w, bias=b)
>>> print(y)
ivy.array([[ 16.4,  35.2,  54. ],
           [ 36.2,  84.7, 133. ],
           [ 56. , 134. , 212. ]])

lstm_update(init_h, init_c, kernel, recurrent_kernel, /, *, bias=None, recurrent_bias=None)[source]#

ivy.Array instance method variant of ivy.lstm_update. This method simply wraps the function, and so the docstring for ivy.lstm_update also applies to this method with minimal changes.

Parameters:

init_h (Union[Array, NativeArray]) – initial state tensor for the cell output [batch_shape, out].
init_c (Union[Array, NativeArray]) – initial state tensor for the cell hidden state [batch_shape, out].
kernel (Union[Array, NativeArray]) – weights for cell kernel [in, 4 x out].
recurrent_kernel (Union[Array, NativeArray]) – weights for cell recurrent kernel [out, 4 x out].
bias (Optional[Union[Array, NativeArray]], default: None) – bias for cell kernel [4 x out]. (Default value = None)
recurrent_bias (Optional[Union[Array, NativeArray]], default: None) – bias for cell recurrent kernel [4 x out]. (Default value = None)

Return type:

Tuple[Array, Array]

Returns:

ret – hidden state for all timesteps [batch_shape,t,out] and cell state for last timestep [batch_shape,out]

Examples

>>> x = ivy.randint(0, 20, shape=(6, 20, 3))
>>> h_i = ivy.random_normal(shape=(6, 5))
>>> c_i = ivy.random_normal(shape=(6, 5))
>>> kernel = ivy.random_normal(shape=(3, 4 * 5))
>>> rc = ivy.random_normal(shape=(5, 4 * 5))
>>> result = x.lstm_update(h_i, c_i, kernel, rc)

>>> result[0].shape
(6, 20, 5)
>>> result[1].shape
(6, 5)

multi_head_attention(*, key=None, value=None, num_heads=8, scale=None, attention_mask=None, in_proj_weights=None, q_proj_weights=None, k_proj_weights=None, v_proj_weights=None, out_proj_weights=None, in_proj_bias=None, out_proj_bias=None, is_causal=False, key_padding_mask=None, bias_k=None, bias_v=None, static_k=None, static_v=None, add_zero_attn=False, return_attention_weights=False, average_attention_weights=True, dropout=0.0, training=False, out=None)[source]#

Return type:: Array

scaled_dot_product_attention(key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#

ivy.Array instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:

self (Array) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.
key (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.
value (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.
scale (Optional[float], default: None) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.
mask (Optional[Union[Array, NativeArray]], default: None) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].
dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied
is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.
training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                           is_causal=True, training=True)
>>> print(result)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1, mask=mask)
>>> print(result)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                  is_causal=True, training=True, out=out)
>>> print(out)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])