Optimizers#

Collection of Ivy optimizers.

class ivy.stateful.optimizers.Adam(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#

Bases: Optimizer

__init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#

Construct an ADAM optimizer.

Parameters:
  • lr (float, default: 0.0001) – Learning rate, default is 1e-4.

  • beta1 (float, default: 0.9) – gradient forgetting factor, default is 0.9

  • beta2 (float, default: 0.999) – second moment of gradient forgetting factor, default is 0.999

  • epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero, default is 1e-07

  • inplace (bool, default: True) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • trace_on_next_step (bool, default: False) – Whether to trace the optimizer on the next step. Default is False.

  • device (Optional[Union[Device, NativeDevice]], default: None) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#
class ivy.stateful.optimizers.AdamW(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, weight_decay=0.0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#

Bases: Adam

__init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, weight_decay=0.0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#

Construct an ADAMW optimizer.

Parameters:
  • lr (float, default: 0.0001) – Learning rate, default is 1e-4.

  • beta1 (float, default: 0.9) – gradient forgetting factor, default is 0.9

  • beta2 (float, default: 0.999) – second moment of gradient forgetting factor, default is 0.999

  • epsilon (float, default: 1e-07) – divisor during adamw update, preventing division by zero, default is 1e-07

  • weight_decay (float, default: 0.0) – weight decay coefficient, default is 0.0

  • inplace (bool, default: True) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • trace_on_next_step (bool, default: False) – Whether to trace the optimizer on the next step. Default is False.

  • device (Optional[Union[Device, NativeDevice]], default: None) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)

class ivy.stateful.optimizers.LAMB(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#

Bases: Optimizer

__init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#

Construct an LAMB optimizer.

Parameters:
  • lr (float, default: 0.0001) – Learning rate, default is 1e-4.

  • beta1 (float, default: 0.9) – gradient forgetting factor, default is 0.9

  • beta2 (float, default: 0.999) – second moment of gradient forgetting factor, default is 0.999

  • epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero, default is 1e-07

  • max_trust_ratio (float, default: 10) – The max value of the trust ratio; the ratio between the norm of the layer weights and norm of gradients update. Default is 10.

  • decay_lambda (float, default: 0) – The factor used for weight decay. Default is 0.

  • inplace (bool, default: True) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • trace_on_next_step (bool, default: False) – Whether to trace the optimizer on the next step. Default is False.

  • device (Optional[Union[Device, NativeDevice]], default: None) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#
class ivy.stateful.optimizers.LARS(lr=0.0001, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#

Bases: Optimizer

__init__(lr=0.0001, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#

Construct a Layer-wise Adaptive Rate Scaling (LARS) optimizer.

Parameters:
  • lr (float, default: 0.0001) – Learning rate, default is 1e-4.

  • decay_lambda (float, default: 0) – The factor used for weight decay. Default is 0.

  • inplace (bool, default: True) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • trace_on_next_step (bool, default: False) – Whether to trace the optimizer on the next step. Default is False.

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#
class ivy.stateful.optimizers.Optimizer(lr, inplace=True, stop_gradients=True, init_on_first_step=False, trace_on_next_step=False, fallback_to_non_traced=False, device=None)[source]#

Bases: ABC

__init__(lr, inplace=True, stop_gradients=True, init_on_first_step=False, trace_on_next_step=False, fallback_to_non_traced=False, device=None)[source]#

Construct a general Optimizer. This is an abstract class, and must be derived.

Parameters:
  • lr (Union[float, Callable]) – Learning rate.

  • inplace (bool, default: True) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • init_on_first_step (bool, default: False) – Whether the optimizer is initialized on the first step. Default is False.

  • trace_on_next_step (bool, default: False) – Whether to trace the optimizer on the next step. Default is False.

  • fallback_to_non_traced (bool, default: False) – Whether to fall back to non-traced forward call in the case that an error is raised during the traced forward pass. Default is True.

  • device (Optional[Union[Device, NativeDevice]], default: None) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)

abstract set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

step(v, grads, ignore_missing=False)[source]#

Update nested variables container v from overridden private self._step.

Parameters:
  • v (Container) – Nested variables to update.

  • grads (Container) – Nested gradients to update.

  • ignore_missing (bool, default: False) – Whether to ignore keys missing from the gradients which exist in the variables. Default is False.

Returns:

ret – The updated variables, following update step.

class ivy.stateful.optimizers.SGD(lr=0.0001, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#

Bases: Optimizer

__init__(lr=0.0001, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#

Construct a Stochastic-Gradient-Descent (SGD) optimizer.

Parameters:
  • lr (float, default: 0.0001) – Learning rate, default is 1e-4.

  • inplace (bool, default: True) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • trace_on_next_step (bool, default: False) – Whether to trace the optimizer on the next step. Default is False.

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#

This should have hopefully given you an overview of the optimizers submodule, if you have any questions, please feel free to reach out on our discord in the optimizers channel!