[python] TensorFlow에서 Xavier 초기화를 수행하는 방법

Question 1

내 Caffe 네트워크를 TensorFlow로 포팅하고 있지만 xavier 초기화가없는 것 같습니다. 나는 사용하고 truncated_normal있지만 이것은 훈련하는 것을 훨씬 더 어렵게 만드는 것 같습니다.

Question 2

에서 Tensorflow 2.0 와 더 모두 tf.contrib.*와 tf.get_variable()사용되지 않습니다. Xavier 초기화를 수행하려면 이제 다음으로 전환해야합니다.

init = tf.initializers.GlorotUniform()
var = tf.Variable(init(shape=shape))
# or a oneliner with a little confusing brackets
var = tf.Variable(tf.initializers.GlorotUniform()(shape=shape))

Glorot uniform과 Xavier uniform은 동일한 초기화 유형의 서로 다른 이름입니다. Keras를 사용하거나 사용하지 않고 TF2.0에서 초기화를 사용하는 방법에 대해 자세히 알고 싶다면 문서를 참조하십시오 .

Question 3

버전 0.8부터 Xavier 이니셜 라이저가 있습니다 . 문서는 여기를 참조하세요 .

다음과 같이 사용할 수 있습니다.

W = tf.get_variable("W", shape=[784, 256],
           initializer=tf.contrib.layers.xavier_initializer())

Question 4

Xavier와 Yoshua 의 방법을 사용하여 tf.Variable초기화 된 것을 정의하는 방법에 대한 또 다른 예를 추가하기 위해 :

graph = tf.Graph()
with graph.as_default():
    ...
    initializer = tf.contrib.layers.xavier_initializer()
    w1 = tf.Variable(initializer(w1_shape))
    b1 = tf.Variable(initializer(b1_shape))
    ...

이로 nan인해 RELU와 함께 여러 레이어를 사용할 때 수치 적 불안정성으로 인해 손실 함수에 대한 값을 얻지 못했습니다.

Question 5

@ Aleph7, Xavier / Glorot 초기화는 들어오는 연결 수 (fan_in), 나가는 연결 수 (fan_out) 및 뉴런의 활성화 함수 종류 (sigmoid 또는 tanh)에 따라 다릅니다. 참조 : http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

이제 귀하의 질문에. 이것은 TensorFlow에서 수행하는 방법입니다.

(fan_in, fan_out) = ...
    low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation 
    high = 4*np.sqrt(6.0/(fan_in + fan_out))
    return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32))

다른 답변에서 제안한 정규 분포가 아닌 균일 분포에서 샘플링해야합니다.

덧붙여서 어제 Xavier 초기화를 사용하는 TensorFlow를 사용하여 다른 게시물을 작성했습니다 . 관심이 있으시면 엔드 투 엔드 예제가있는 파이썬 노트북도 있습니다 : https://github.com/delip/blog-stuff/blob/master/tensorflow_ufp.ipynb

Question 6

주위에 좋은 래퍼 tensorflow라고는 prettytensor(직접 복사 한 소스 코드에서 구현 제공 여기를 )

def xavier_init(n_inputs, n_outputs, uniform=True):
  """Set the parameter initialization using the method described.
  This method is designed to keep the scale of the gradients roughly the same
  in all layers.
  Xavier Glorot and Yoshua Bengio (2010):
           Understanding the difficulty of training deep feedforward neural
           networks. International conference on artificial intelligence and
           statistics.
  Args:
    n_inputs: The number of input nodes into each output.
    n_outputs: The number of output nodes for each input.
    uniform: If true use a uniform distribution, otherwise use a normal.
  Returns:
    An initializer.
  """
  if uniform:
    # 6 was used in the paper.
    init_range = math.sqrt(6.0 / (n_inputs + n_outputs))
    return tf.random_uniform_initializer(-init_range, init_range)
  else:
    # 3 gives us approximately the same limits as above since this repicks
    # values greater than 2 standard deviations from the mean.
    stddev = math.sqrt(3.0 / (n_inputs + n_outputs))
    return tf.truncated_normal_initializer(stddev=stddev)

Question 7

TF-contrib에는 xavier_initializer. 다음은 사용 방법의 예입니다.

import tensorflow as tf
a = tf.get_variable("a", shape=[4, 4], initializer=tf.contrib.layers.xavier_initializer())
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print sess.run(a)

이 외에도 tensorflow에는 다른 이니셜 라이저가 있습니다.

Question 8

나는 봤는데 내장 된 것을 찾을 수 없었다. 그러나 이것에 따르면 :

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

Xavier 초기화는 분산이 뉴런 수의 함수 인 (일반적으로 가우스) 분포를 샘플링하는 것입니다. tf.random_normal당신을 위해 그것을 할 수 있습니다, 당신은 stddev (즉, 초기화하려는 가중치 매트릭스로 표현되는 뉴런의 수)를 계산하면됩니다.