A Note About Gradients in Classification Problems

For the gradient boosting packages we have to calculate the gradient of the Loss function with respect to the marginal probabilites.

In this case, we must calculate

$\frac{\partial L}{\partial z} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z}$

The Hessian is similarly calculated:

$\frac{\partial^{2} L}{\partial z^{2}} = \frac{\partial}{\partial z}[\frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z}]$

Where y-hat is the sigmoid function, unless stated otherwise:

$\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$

We will make use of the following property for the calculations of the Gradients and Hessians:

$\frac{\partial \hat{y}}{\partial z} = \hat{y} \cdot (1 - \hat{y})$

Note to avoid divide-by-zero errors, we clip the values of the sigmoid such that the output of the sigmoid is bound by 10^-15 from below and 1 - 10^-15 from above.