博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Huber loss--转
阅读量:5992 次
发布时间:2019-06-20

本文共 4560 字,大约阅读时间需要 15 分钟。

原文地址:https://en.wikipedia.org/wiki/Huber_loss

In , the Huber loss is a  used in , that is less sensitive to  in data than the . A variant for classification is also sometimes used.

Definition

 
Huber loss (green, 
{\displaystyle \delta =1}\delta =1) and squared error loss (blue) as a function of {\displaystyle y-f(x)}y-f(x)

The Huber loss function describes the penalty incurred by an  f.  (1964) defines the loss function piecewise by

{\displaystyle L_{\delta }(a)={\begin{cases}{\frac {1}{2}}{a^{2}}&{\text{for }}|a|\leq \delta ,\\\delta (|a|-{\frac {1}{2}}\delta ),&{\text{otherwise.}}\end{cases}}}L_{\delta }(a)={\begin{cases}{\frac  {1}{2}}{a^{2}}&{\text{for }}|a|\leq \delta ,\\\delta (|a|-{\frac  {1}{2}}\delta ),&{\text{otherwise.}}\end{cases}}

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where {\displaystyle |a|=\delta }|a|=\delta. The variable a often refers to the residuals, that is to the difference between the observed and predicted values {\displaystyle a=y-f(x)}a=y-f(x), so the former can be expanded to

{\displaystyle L_{\delta }(y,f(x))={\begin{cases}{\frac {1}{2}}(y-f(x))^{2}&{\textrm {for}}|y-f(x)|\leq \delta ,\\\delta \,|y-f(x)|-{\frac {1}{2}}\delta ^{2}&{\textrm {otherwise.}}\end{cases}}}L_{\delta }(y,f(x))={\begin{cases}{\frac  {1}{2}}(y-f(x))^{2}&{\textrm  {for}}|y-f(x)|\leq \delta ,\\\delta \,|y-f(x)|-{\frac  {1}{2}}\delta ^{2}&{\textrm  {otherwise.}}\end{cases}}

Motivation

Two very commonly used loss functions are the , {\displaystyle L(a)=a^{2}}L(a) = a^2, and the , {\displaystyle L(a)=|a|}L(a)=|a|. The squared loss function results in an -, and the absolute-value loss function results in a -unbiased estimator (in the one-dimensional case, and a -unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of {\displaystyle a}a's (as in {\textstyle \sum _{i=1}^{n}L(a_{i})}{\textstyle \sum _{i=1}^{n}L(a_{i})}), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of , the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is  in a uniform neighborhood of its minimum {\displaystyle a=0}a=0, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points {\displaystyle a=-\delta }a=-\delta and {\displaystyle a=\delta }a=\delta. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as

{\displaystyle L_{\delta }(a)=\delta ^{2}({\sqrt {1+(a/\delta )^{2}}}-1).}L_{\delta }(a)=\delta ^{2}({\sqrt  {1+(a/\delta )^{2}}}-1).

As such, this function approximates {\displaystyle a^{2}/2}a^{2}/2 for small values of {\displaystyle a}a, and approximates a straight line with slope {\displaystyle \delta }\delta for large values of {\displaystyle a}a.

While the above is the most common form, other smooth approximations of the Huber loss function also exist.

Variant for classification

For  purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction {\displaystyle f(x)}f(x) (a real-valued classifier score) and a true  class label {\displaystyle y\in \{+1,-1\}}y\in \{+1,-1\}, the modified Huber loss is defined as

{\displaystyle L(y,f(x))={\begin{cases}\max(0,1-y\,f(x))^{2}&{\textrm {for}}\,\,y\,f(x)\geq -1,\\-4y\,f(x)&{\textrm {otherwise.}}\end{cases}}}L(y,f(x))={\begin{cases}\max(0,1-y\,f(x))^{2}&{\textrm  {for}}\,\,y\,f(x)\geq -1,\\-4y\,f(x)&{\textrm  {otherwise.}}\end{cases}}

The term {\displaystyle \max(0,1-y\,f(x))}\max(0,1-y\,f(x)) is the  used by ; the  is a generalization of {\displaystyle L}L.

Applications

The Huber loss function is used in ,  and .

See also

References

  1.  Huber, Peter J. (1964). "Robust Estimation of a Location Parameter".  53 (1): 73–101. :.  .
  2.  Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). . p. 349. Compared to Hastie et al., the loss is scaled by a factor of ½, to be consistent with Huber's original definition given earlier.
  3.  Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. (1997). "Deterministic edge-preserving regularization in computed imaging". IEEE Trans. Image Processing 6(2): 298–311. :.
  4.  Hartley, R.; Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. p. 619.  .
  5.  Lange, K. (1990). "Convergence of Image Reconstruction Algorithms with Gibbs Smoothing". IEEE Trans. Medical Imaging 9 (4): 439–446. :.
  6. Zhang, Tong (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML.
  7.  Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine".  26 (5): 1189–1232. :. .

 

转载地址:http://jctlx.baihongyu.com/

你可能感兴趣的文章
python 几种不同的格式化输出
查看>>
站立会议(三)
查看>>
MarkdownPad2基础语法
查看>>
mysql 8.0 ~ 存储和账户
查看>>
贪心 Codeforces Round #236 (Div. 2) A. Nuts
查看>>
特定场景下SQL的优化
查看>>
UrlPager免费分页控件2.0版发布!
查看>>
Django介绍&工程搭建
查看>>
解除svn控制下的文件夹的svn的控制
查看>>
P2051 中国象棋
查看>>
AsyncTask 实现异步处理任务
查看>>
sql
查看>>
Python-列表
查看>>
HDU5461 Largest Point 思维 2015沈阳icpc
查看>>
MySQL忘记root密码不重启mysqld的方法
查看>>
2014/12/05 随笔 2014-12-05 12:50 26人阅读 评论(0) 收藏...
查看>>
使用OpenGL一年后
查看>>
快速排序
查看>>
foreach遍历----for(object o: list)
查看>>
Uva 10917
查看>>