lamb_update#

ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.

Parameters:
  • w (Union[Array, NativeArray]) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • mw_tm1 (Union[Array, NativeArray]) – running average of the gradients, from the previous time-step.

  • vw_tm1 (Union[Array, NativeArray]) – running average of second moments of the gradients, from the previous time-step.

  • step (int) – training step.

  • beta1 (float) – gradient forgetting factor (Default value = 0.9). (default: 0.9)

  • beta2 (float) – second moment of gradient forgetting factor (Default value = 0.999). (default: 0.999)

  • epsilon (float) – divisor during adam update, preventing division by zero (Default value = 1e-7). (default: 1e-07)

  • max_trust_ratio (Union[int, float]) – The maximum value for the trust ratio. (Default value = 10) (default: 10)

  • decay_lambda (float) – The factor used for weight decay. (Default value = 0). (default: 0)

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • out (Optional[Array]) – optional output array, for writing the new function weights ws_new to. It must (default: None) have a shape that the inputs broadcast to.

Return type:

Tuple[Array, Array, Array]

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With ivy.Array inputs:

>>> w = ivy.array([1., 2, 3])
>>> dcdw = ivy.array([0.5,0.2,0.1])
>>> lr = ivy.array(0.1)
>>> vw_tm1 = ivy.zeros(1)
>>> mw_tm1 = ivy.zeros(3)
>>> step = ivy.array(1)
>>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step)
>>> print(new_weights)
(ivy.array([0.784, 1.78 , 2.78 ]),
... ivy.array([0.05, 0.02, 0.01]),
... ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))
>>> w = ivy.array([[1., 2, 3],[4, 6, 1],[1, 0, 7]])
>>> dcdw = ivy.array([[0.5, 0.2, 0.1],[0.3, 0.6, 0.4],[0.4, 0.7, 0.2]])
>>> lr = ivy.array(0.1)
>>> mw_tm1 = ivy.zeros((3,3))
>>> vw_tm1 = ivy.zeros(3)
>>> step = ivy.array(1)
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> out = ivy.zeros_like(w)
>>> stop_gradients = True
>>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                               beta2=beta2, epsilon=epsilon,
...                               max_trust_ratio=max_trust_ratio,
...                               decay_lambda=decay_lambda, out=out,
...                               stop_gradients=stop_gradients)
>>> print(out)
ivy.array([[ 0.639,  1.64 ,  2.64 ],
...        [ 3.64 ,  5.64 ,  0.639],
...        [ 0.639, -0.361,  6.64 ]])

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([3., 4., 5.])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(1.)
>>> step = ivy.array([2])
>>> new_weights = ivy.lamb_update(w, dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(new_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1.,3.,5.]),
...                   b=ivy.array([3.,4.,2.]))
>>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]),
...                      b=ivy.array([0.6,0.4,0.7]))
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                        b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                        b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> stop_gradients = True
>>> lr = ivy.array(0.5)
>>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                               beta2=beta2, epsilon=epsilon,
...                               max_trust_ratio=max_trust_ratio,
...                               decay_lambda=decay_lambda,
...                               stop_gradients=stop_gradients)
>>> print(new_weights)
({
    a: ivy.array([-0.708, 1.29, 3.29]),
    b: ivy.array([1.45, 2.45, 0.445])
}, {
    a: ivy.array([0.02, 0.03, 0.06]),
    b: ivy.array([0.06, 0.04, 0.07])
}, {
    a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]),
    b: ivy.array([0.00036, 0.00016, 0.00049])
})
Array.lamb_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)#

ivy.Array instance method variant of ivy.lamb_update. This method simply wraps the function, and so the docstring for ivy.lamb_update also applies to this method with minimal changes.

Parameters:
  • self (Array) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • mw_tm1 (Union[Array, NativeArray]) – running average of the gradients, from the previous time-step.

  • vw_tm1 (Union[Array, NativeArray]) – running average of second moments of the gradients, from the previous time-step.

  • step (int) – training step.

  • beta1 (float) – gradient forgetting factor (Default value = 0.9). (default: 0.9)

  • beta2 (float) – second moment of gradient forgetting factor (Default value = 0.999). (default: 0.999)

  • epsilon (float) – divisor during adam update, preventing division by zero (default: 1e-07) (Default value = 1e-7).

  • max_trust_ratio (Union[int, float]) – The maximum value for the trust ratio. Default is 10. (default: 10)

  • decay_lambda (float) – The factor used for weight decay. Default is zero. (default: 0)

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • out (Optional[Array]) – optional output array, for writing the result to. It must have a shape that (default: None) the inputs broadcast to.

Return type:

Array

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With ivy.Array inputs:

>>> w = ivy.array([1., 2, 3])
>>> dcdw = ivy.array([0.5,0.2,0.1])
>>> lr = ivy.array(0.1)
>>> vw_tm1 = ivy.zeros(1)
>>> mw_tm1 = ivy.zeros(3)
>>> step = ivy.array(1)
>>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step)
>>> print(new_weights)
(ivy.array([0.784, 1.78 , 2.78 ]),
ivy.array([0.05, 0.02, 0.01]),
ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))
Container.lamb_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.

Parameters:
  • self (Container) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • mw_tm1 (Union[Array, NativeArray, Container]) – running average of the gradients, from the previous time-step.

  • vw_tm1 (Union[Array, NativeArray, Container]) – running average of second moments of the gradients, from the previous time-step.

  • step (int) – training step.

  • beta1 (float) – gradient forgetting factor (Default value = 0.9). (default: 0.9)

  • beta2 (float) – second moment of gradient forgetting factor (Default value = 0.999). (default: 0.999)

  • epsilon (float) – divisor during adam update, preventing division by zero (default: 1e-07) (Default value = 1e-7).

  • max_trust_ratio (Union[int, float]) – The maximum value for the trust ratio. Default is 10. (default: 10)

  • decay_lambda (float) – The factor used for weight decay. Default is zero. (default: 0)

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • out (Optional[Container]) – optional output array, for writing the result to. It must have a shape (default: None) that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([3., 4., 5.])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(1.)
>>> step = ivy.array([2])
>>> new_weights = w.lamb_update(dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(new_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1.,3.,5.]),
...                      b=ivy.array([3.,4.,2.]))
>>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]),
...                         b=ivy.array([0.6,0.4,0.7]))
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                           b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                           b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> stop_gradients = True
>>> lr = ivy.array(0.5)
>>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                                beta2=beta2, epsilon=epsilon,
...                                max_trust_ratio=max_trust_ratio,
...                                decay_lambda=decay_lambda,
...                                stop_gradients=stop_gradients)
>>> print(new_weights)
({
    a: ivy.array([-0.708, 1.29, 3.29]),
    b: ivy.array([1.45, 2.45, 0.445])
}, {
    a: ivy.array([0.02, 0.03, 0.06]),
    b: ivy.array([0.06, 0.04, 0.07])
}, {
    a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]),
    b: ivy.array([0.00036, 0.00016, 0.00049])
})