ivy.adam_step(dcdw, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

Compute adam step delta, given the derivatives of some cost c with respect to weights ws, using ADAM update. `[reference]

Parameters:
• dcdw (`Union`[`Array`, `NativeArray`]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

• mw (`Union`[`Array`, `NativeArray`]) – running average of the gradients

• vw (`Union`[`Array`, `NativeArray`]) – running average of second moments of the gradients

• step (`Union`[`int`, `float`]) – training step

• beta1 (`float`, default: `0.9`) – gradient forgetting factor (Default value = 0.9)

• beta2 (`float`, default: `0.999`) – second moment of gradient forgetting factor (Default value = 0.999)

• epsilon (`float`, default: `1e-07`) – divisor during adam update, preventing division by zero (Default value = 1e-7)

• out (`Optional`[`Array`], default: `None`) – optional output array, for writing the effective grad of adam_step to. It must have a shape that the inputs broadcast to.

Return type:

`Tuple`[`Array`, `Array`, `Array`]

Returns:

ret – The adam step delta.

Examples

With `ivy.Array` inputs:

```>>> dcdw = ivy.array([1, 2, 3])
>>> mw = ivy.ones(3)
>>> vw = ivy.ones(1)
>>> step = ivy.array(3)
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step)
(ivy.array([0.2020105 , 0.22187898, 0.24144873]),
ivy.array([0.99999998, 1.09999998, 1.19999998]),
ivy.array([1.00000001, 1.00300001, 1.00800001]))
```
```>>> dcdw = ivy.array([[1., 4., -3.], [2., 3., 0.5]])
>>> mw = ivy.zeros((2,3))
>>> vw = ivy.zeros(3)
>>> step = ivy.array(1)
>>> beta1 = 0.86
>>> beta2 = 0.95
>>> epsilon = 1e-6
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2,
...                                 epsilon=epsilon)
(ivy.array([[ 1.,  1., -1.],
[ 1.,  1.,  1.]]),
ivy.array([[ 0.14,  0.56, -0.42],
[ 0.28,  0.42,  0.07]]),
ivy.array([[0.05  , 0.8   , 0.45  ],
[0.2   , 0.45  , 0.0125]]))
```
```>>> dcdw = ivy.array([0.1, -0.7, 2])
>>> mw = ivy.ones(1)
>>> vw = ivy.ones(1)
>>> step = ivy.array(3.6)
>>> out = ivy.zeros_like(dcdw)
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, out=out)
>>> print(out)
ivy.array([0.17294501, 0.15770318, 0.20863818])
```

With one `ivy.Container` input:

```>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> mw = ivy.array([1., 4., 9.])
>>> vw = ivy.array([0.,])
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2,
...                                 epsilon=epsilon)
({
a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]),
b: ivy.array([2.02, 4.82, 8.17])
}, {
a: ivy.array([0.87, 3.61, 8.09]),
b: ivy.array([1.26, 4., 8.48])
}, {
a: ivy.array([0., 0.024, 0.096]),
b: ivy.array([0.216, 0.384, 0.6])
})
```

With multiple `ivy.Container` inputs:

```>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> mw = ivy.Container(a=ivy.array([0., 0., 0.]),
...                    b=ivy.array([0., 0., 0.]))
>>> vw = ivy.Container(a=ivy.array([0.,]),
...                    b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2,
...                                 epsilon=epsilon)
({
a: ivy.array([0., 0.626, 0.626]),
b: ivy.array([0.626, 0.626, 0.626])
}, {
a: ivy.array([0., 0.13, 0.26]),
b: ivy.array([0.39, 0.52, 0.65])
}, {
a: ivy.array([0., 0.024, 0.096]),
b: ivy.array([0.216, 0.384, 0.6])
})
```
Array.adam_step(self, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

ivy.Array instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.

Parameters:
• self (`Array`) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

• mw (`Union`[`Array`, `NativeArray`]) – running average of the gradients.

• vw (`Union`[`Array`, `NativeArray`]) – running average of second moments of the gradients.

• step (`Union`[`int`, `float`]) – training step.

• beta1 (`float`, default: `0.9`) – gradient forgetting factor (Default value = 0.9).

• beta2 (`float`, default: `0.999`) – second moment of gradient forgetting factor (Default value = 0.999).

• epsilon (`float`, default: `1e-07`) – divisor during adam update, preventing division by zero (Default value = 1e-7).

• out (`Optional`[`Array`], default: `None`) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

`Array`

Returns:

ret – The adam step delta.

Examples

With `ivy.Array` inputs:

```>>> dcdw = ivy.array([1, 2, 3])
>>> mw = ivy.ones(3)
>>> vw = ivy.ones(1)
>>> step = ivy.array(3)
(ivy.array([0.2020105,0.22187898,0.24144873]),
ivy.array([1.,1.10000002,1.20000005]),
ivy.array([1.,1.00300002,1.00800002]))
```
Container.adam_step(self, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

ivy.Container instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.

Parameters:
• self (`Container`) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

• mw (`Union`[`Array`, `NativeArray`, `Container`]) – running average of the gradients.

• vw (`Union`[`Array`, `NativeArray`, `Container`]) – running average of second moments of the gradients.

• step (`Union`[`int`, `float`, `Container`]) – training step.

• beta1 (`Union`[`float`, `Container`], default: `0.9`) – gradient forgetting factor (Default value = 0.9).

• beta2 (`Union`[`float`, `Container`], default: `0.999`) – second moment of gradient forgetting factor (Default value = 0.999).

• epsilon (`Union`[`float`, `Container`], default: `1e-07`) – divisor during adam update, preventing division by zero (Default value = 1e-7).

• out (`Optional`[`Container`], default: `None`) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

`Container`

Returns:

ret – The adam step delta.

Examples

With one `ivy.Container` input:

```>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                         b=ivy.array([3., 4., 5.]))
>>> mw = ivy.array([1., 4., 9.])
>>> vw = ivy.array([0.,])
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2,
...                                     epsilon=epsilon)
({
a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]),
b: ivy.array([2.02, 4.82, 8.17])
}, {
a: ivy.array([0.87, 3.61, 8.09]),
b: ivy.array([1.26, 4., 8.48])
}, {
a: ivy.array([0., 0.024, 0.096]),
b: ivy.array([0.216, 0.384, 0.6])
})
```

With multiple `ivy.Container` inputs:

```>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                        b=ivy.array([3., 4., 5.]))
>>> mw = ivy.Container(a=ivy.array([0., 0., 0.]),
...                    b=ivy.array([0., 0., 0.]))
>>> vw = ivy.Container(a=ivy.array([0.,]),
...                    b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2,
...                                     epsilon=epsilon)