tf.kerasAPI 를 준수하는 사용자 정의 옵티 마이저 클래스를 작성한다고 가정하십시오 (TensorFlow 버전> = 2.0 사용). 이 작업을 수행하는 것과 문서화 된 방법에 대해 혼란 스럽습니다.

tf.keras.optimizers.Optimizer 상태 에 대한 문서 ,

  ### Write a customized optimizer.
  If you intend to create your own optimization algorithm, simply inherit from
  this class and override the following methods:

    - resource_apply_dense (update variable given gradient tensor is dense)
    - resource_apply_sparse (update variable given gradient tensor is sparse)
    - create_slots (if your optimizer algorithm requires additional variables)

그러나, 현재의 tf.keras.optimizers.Optimizer구현은 정의하지 않는 resource_apply_dense방법을하지만, 않습니다 개인적인 정의 _resource_apply_dense방법 스텁을 . 비슷하게 resource_apply_sparse또는 create_slots메소드는 없지만 _resource_apply_sparse메소드 스텁 및 _create_slots메소드 호출이 있습니다.

공식 tf.keras.optimizers.Optimizer서브 클래스 ( tf.keras.optimizers.Adam예로 사용)에는 , 및 메소드 가 있으며 _resource_apply_dense, 밑줄 없이는 이러한 메소드가 없습니다._resource_apply_sparse_create_slots

약간 덜 공식적인 유사한 선도 밑줄 방법이 있습니다 tf.keras.optimizers.Optimizer(예를 들어, 서브 클래스 tfa.optimizers.MovingAverage: TensorFlow 부가 기능에서는 _resource_apply_dense, _resource_apply_sparse, _create_slots).

저에게 또 다른 혼란스러운 점은 TensorFlow Addons 최적화 프로그램 중 일부 는 메소드 (예 🙂 를 재정의 하지만 최적화 프로그램은 그렇지 않습니다.apply_gradientstfa.optimizers.MovingAveragetf.keras.optimizers

또한, 나는 것으로 나타났습니다 apply_gradients방법 tf.keras.optimizers.Optimizer메서드 호출_create_slots 하지만, 기본 tf.keras.optimizers.Optimizer클래스는없는 _create_slots방법을. 따라서 해당 서브 클래스가 재정의하지 않으면 최적화 프로그램 서브 클래스에 _create_slots메소드 를 정의 해야합니다apply_gradients .

질문

서브 클래 싱하는 올바른 방법은 무엇입니까 tf.keras.optimizers.Optimizer? 구체적으로 특별히,

않습니다 tf.keras.optimizers.Optimizer상단에 나열된 문서는 단순히 언급 된 방법의 주요 밑줄 버전을 오버라이드 (override) (예를 들어, 의미 _resource_apply_dense대신에 resource_apply_dense)? 그렇다면 향후 버전의 TensorFlow에서 동작을 변경하지 않는 이러한 개인용 메소드에 대한 API 보증이 있습니까? 이 방법의 서명은 무엇입니까?
메소드 apply_gradients외에 언제 재정의 _apply_resource_[dense|sparse]합니까?

편집하다. GitHub에서 열린 문제 : # 36449

답변

모든 주요 TF 및 Keras 버전에서 Keras AdamW 를 구현했습니다 . optimizers_v2.py 를 검토 하도록 초대합니다 . 몇 가지 사항 :

상속해야합니다. OptimizerV2실제로 연결된 것입니다. tf.keras최적화 프로그램 의 최신 및 현재 기본 클래스입니다.
당신은 (1)에서 정확합니다-이것은 문서상의 실수입니다. 이 메소드는 사용자가 직접 사용하지 않기 때문에 비공개입니다.
apply_gradients(또는 다른 방법) 기본값이 주어진 옵티 마이저에 필요한 것을 달성하지 못하면 오버라이드됩니다. 링크 된 예제에서 원본에 대한 하나의 라이너 애드온 일뿐입니다.
“그래서, 보인다 _create_slots서브 클래스가 오버라이드 (override)하지 않는 경우 메소드가 최적화 서브 클래스에 정의되어 있어야합니다 apply_gradients“ – 두 관련이없는; 우연의 일치입니다.

차이점은 무엇이며 _resource_apply_dense그리고 _resource_apply_sparse?

후자는 희소 한 층 을 다룬다 Embedding. 예 .

언제 사용해야 _create_slots()합니까?

훈련 가능한 것을 정의 할 때 tf.Variable; 예 : 가중치의 1 차 및 2 차 모멘트 (예 : Adam). 사용합니다 add_slot().

언제 사용해야 _set_hyper()합니까?

거의 사용하지 않을 때마다 _create_slots(); 클래스 속성을 설정하는 것과 비슷하지만 사용법이 정확하도록 추가 전처리 단계가 있습니다. 그래서 파이썬 int, float, tf.Tensor, tf.Variable, 등이 있습니다. (Keras AdamW에서 더 많이 사용해야 했음).

참고 : 연결된 최적화 프로그램이 올바르게 작동하고 원본만큼 빠르지 만 코드는 최상의 TensorFlow 사례를 따르며 여전히 더 빠를 수 있습니다. 나는 “이상적인 참조”로 권장하지 않습니다. 예를 들어 일부 Python 객체 (예 int:)는 텐서 여야합니다. eta_t는로 정의 tf.Variable되지만 즉시 tf.Tensorin _apply메소드 로 재정의됩니다 . 반드시 큰 문제는 아니지만 혁신 할 시간이 없었습니다.

답변

예, 이것은 문서 오류 인 것 같습니다. 앞의 밑줄 이름은 올바른 재정의 방법입니다. 관련 내용은 모두 정의되어 있지만 기본 클래스 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/optimizer.py 에서 구현되지 않은 비 Keras Optimizer입니다.

  def _create_slots(self, var_list):
    """Create all slots needed by the variables.
    Args:
      var_list: A list of `Variable` objects.
    """
    # No slots needed by default
    pass

  def _resource_apply_dense(self, grad, handle):
    """Add ops to apply dense gradients to the variable `handle`.
    Args:
      grad: a `Tensor` representing the gradient.
      handle: a `Tensor` of dtype `resource` which points to the variable
       to be updated.
    Returns:
      An `Operation` which updates the value of the variable.
    """
    raise NotImplementedError()

  def _resource_apply_sparse(self, grad, handle, indices):
    """Add ops to apply sparse gradients to the variable `handle`.
    Similar to `_apply_sparse`, the `indices` argument to this method has been
    de-duplicated. Optimizers which deal correctly with non-unique indices may
    instead override `_resource_apply_sparse_duplicate_indices` to avoid this
    overhead.
    Args:
      grad: a `Tensor` representing the gradient for the affected indices.
      handle: a `Tensor` of dtype `resource` which points to the variable
       to be updated.
      indices: a `Tensor` of integral type representing the indices for
       which the gradient is nonzero. Indices are unique.
    Returns:
      An `Operation` which updates the value of the variable.
    """
    raise NotImplementedError()

에 대해 모르겠습니다 apply_dense. 우선, 코드를 재정의하면 복제 별 DistributionStrategy가 “위험”할 수 있다고 코드에서 언급합니다.

    # TODO(isaprykin): When using a DistributionStrategy, and when an
    # optimizer is created in each replica, it might be dangerous to
    # rely on some Optimizer methods.  When such methods are called on a
    # per-replica optimizer, an exception needs to be thrown.  We do
    # allow creation per-replica optimizers however, because the
    # compute_gradients()->apply_gradients() sequence is safe.