Optimizers#

Collection of Ivy optimizers.

class ivy.stateful.optimizers.Adam(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, inplace=True, stop_gradients=True, compile_on_next_step=False, device=None)[source]#

Bases: Optimizer

__init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, inplace=True, stop_gradients=True, compile_on_next_step=False, device=None)[source]#

Construct an ADAM optimizer.

Parameters:
  • lr (float) – Learning rate, default is 1e-4. (default: 0.0001)

  • beta1 (float) – gradient forgetting factor, default is 0.9 (default: 0.9)

  • beta2 (float) – second moment of gradient forgetting factor, default is 0.999 (default: 0.999)

  • epsilon (float) – divisor during adam update, preventing division by zero, (default: 1e-07) default is 1e-07

  • inplace (bool) – Whether to update the variables in-place, or to create new variable handles. (default: True) This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • compile_on_next_step (bool) – Whether to compile the optimizer on the next step. Default is False. (default: False)

  • device (Optional[Union[Device, NativeDevice]]) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ (default: None) etc. (Default value = None)

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#
class ivy.stateful.optimizers.LAMB(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True, compile_on_next_step=False, device=None)[source]#

Bases: Optimizer

__init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True, compile_on_next_step=False, device=None)[source]#

Construct an LAMB optimizer.

Parameters:
  • lr (float) – Learning rate, default is 1e-4. (default: 0.0001)

  • beta1 (float) – gradient forgetting factor, default is 0.9 (default: 0.9)

  • beta2 (float) – second moment of gradient forgetting factor, default is 0.999 (default: 0.999)

  • epsilon (float) – divisor during adam update, preventing division by zero, (default: 1e-07) default is 1e-07

  • max_trust_ratio (float) – The max value of the trust ratio; the ratio between the norm of the layer (default: 10) weights and norm of gradients update. Default is 10.

  • decay_lambda (float) – The factor used for weight decay. Default is 0. (default: 0)

  • inplace (bool) – Whether to update the variables in-place, or to create new variable handles. (default: True) This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • compile_on_next_step (bool) – Whether to compile the optimizer on the next step. Default is False. (default: False)

  • device (Optional[Union[Device, NativeDevice]]) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ (default: None) etc. (Default value = None)

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#
class ivy.stateful.optimizers.LARS(lr=0.0001, decay_lambda=0, inplace=True, stop_gradients=True, compile_on_next_step=False)[source]#

Bases: Optimizer

__init__(lr=0.0001, decay_lambda=0, inplace=True, stop_gradients=True, compile_on_next_step=False)[source]#

Construct a Layer-wise Adaptive Rate Scaling (LARS) optimizer.

Parameters:
  • lr (float) – Learning rate, default is 1e-4. (default: 0.0001)

  • decay_lambda (float) – The factor used for weight decay. Default is 0. (default: 0)

  • inplace (bool) – Whether to update the variables in-place, or to create new variable handles. (default: True) This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • compile_on_next_step (bool) – Whether to compile the optimizer on the next step. Default is False. (default: False)

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#
class ivy.stateful.optimizers.Optimizer(lr, inplace=True, stop_gradients=True, init_on_first_step=False, compile_on_next_step=False, fallback_to_non_compiled=False, device=None)[source]#

Bases: ABC

__init__(lr, inplace=True, stop_gradients=True, init_on_first_step=False, compile_on_next_step=False, fallback_to_non_compiled=False, device=None)[source]#

Construct a general Optimizer. This is an abstract class, and must be derived.

Parameters:
  • lr (Union[float, Callable]) – Learning rate.

  • inplace (bool) – Whether to update the variables in-place, or to create new variable handles. (default: True) This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • init_on_first_step (bool) – Whether the optimizer is initialized on the first step. (default: False) Default is False.

  • compile_on_next_step (bool) – Whether to compile the optimizer on the next step. Default is False. (default: False)

  • fallback_to_non_compiled (bool) – Whether to fall back to non-compiled forward call in the case that an error (default: False) is raised during the compiled forward pass. Default is True.

  • device (Optional[Union[Device, NativeDevice]]) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ (default: None) etc. (Default value = None)

abstract set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

step(v, grads, ignore_missing=False)[source]#

Update nested variables container v from overridden private self._step.

Parameters:
  • v (Container) – Nested variables to update.

  • grads (Container) – Nested gradients to update.

  • ignore_missing (bool) – Whether to ignore keys missing from the gradients which exist in (default: False) the variables. Default is False.

Returns:

ret – The updated variables, following update step.

class ivy.stateful.optimizers.SGD(lr=0.0001, inplace=True, stop_gradients=True, compile_on_next_step=False)[source]#

Bases: Optimizer

__init__(lr=0.0001, inplace=True, stop_gradients=True, compile_on_next_step=False)[source]#

Construct a Stochastic-Gradient-Descent (SGD) optimizer.

Parameters:
  • lr (float) – Learning rate, default is 1e-4. (default: 0.0001)

  • inplace (bool) – Whether to update the variables in-place, or to create new variable handles. (default: True) This is only relevant for frameworks with stateful variables such as PyTorch. Default is True, provided the backend framework supports it.

  • stop_gradients (bool) – Whether to stop the gradients of the variables after each gradient step. (default: True) Default is True.

  • compile_on_next_step (bool) – Whether to compile the optimizer on the next step. Default is False. (default: False)

set_state(state)[source]#

Set state of the optimizer.

Parameters:

state (Container) – Nested state to update.

property state#

This should have hopefully given you an overview of the optimizers submodule, if you have any questions, please feel free to reach out on our discord in the optimizers channel or in the optimizers forum!