http://en.wikipedia.org/wiki/Subgradient_method
Classical subgradient rules

Let f:\mathbb{R}^n \to \mathbb{R} be a convex function with domain \mathbb{R}^n. A classical subgradient method iterates

x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \

where g^{(k)} denotes a subgradient of  f \  at x^{(k)} \ . If f \  is differentiable, then its only subgradient is the gradient vector \nabla f itself. It may happen that -g^{(k)} is not a descent direction for f \  at x^{(k)}. We therefore maintain a list f_{\rm{best}} \  that keeps track of the lowest objective function value found so far, i.e.

f_{\rm{best}}^{(k)} = \min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}.
下圖來自: http://www.stanford.edu/class/ee364b/notes/subgradients_notes.pdf
例2:SVM代價函數是hinge loss,在(1,0)除導數不存在,取1和1之間的數值,具體怎么取?Mingming Gong said好像這個pdf和
http://www.stanford.edu/class/ee364b/lectures/subgrad_method_slides.pdf,其中一個講了。Mingming Gong asked tianyi, which is better, subgradient or smooth apprpximation?結論是不一定,subgradient解的是原問題,smooth不是解的原問題。一個相當于對梯度的近似,一個是對函數的近似,很難說哪個好。