Optimization

<aside> ๐Ÿ’ก - ์ง€๋‚œ ์‹œ๊ฐ„์— ์ด์–ด ์ข€ ๋” fancyโœจ ํ•œ optimization์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž

</aside>

Stochastic Gradient Descent (ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•)

while True:
	dx = compute_gradient(x)
	x += learning_rate * dx

SGD + Momentum

๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ด์–ด๋„ update๋ฅผ ๊ณ„์†ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•

vx = 0 # velocity์˜ ์ดˆ๊ธฐ๊ฐ’์€ ์–ธ์ œ๋‚˜ 0
while True:
	dx = compute_gradinet(x)
	vx = rho * vx + dx # rho: ๋งˆ์ฐฐ ๊ณ„์ˆ˜. ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋„ˆ๋ฌด ๋น ๋ฅด๊ฒŒ ๋ณ€ํ•˜๋Š” ๊ฒƒ์„ ์ œํ•œํ•˜๊ธฐ ์œ„ํ•จ
	x += learning_rate * vx

Nesterov momentum

momentum ๋ฐฉ์‹์„ ์•ฝ๊ฐ„ ๋” ์—…๊ทธ๋ ˆ์ด๋“œ ํ•œ ๋ฒ„์ „

Untitled