writing custom loss function in keras

The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.

Available losses

Note that all losses are available both via a class handle and via a function handle. The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = CategoricalCrossentropy(from_logits=True) ), and they perform reduction by default when used in a standalone way (see details below).

Probabilistic losses

  • BinaryCrossentropy class
  • BinaryFocalCrossentropy class
  • CategoricalCrossentropy class
  • CategoricalFocalCrossentropy class
  • SparseCategoricalCrossentropy class
  • Poisson class
  • binary_crossentropy function
  • categorical_crossentropy function
  • sparse_categorical_crossentropy function
  • poisson function
  • KLDivergence class
  • kl_divergence function

Regression losses

  • MeanSquaredError class
  • MeanAbsoluteError class
  • MeanAbsolutePercentageError class
  • MeanSquaredLogarithmicError class
  • CosineSimilarity class
  • mean_squared_error function
  • mean_absolute_error function
  • mean_absolute_percentage_error function
  • mean_squared_logarithmic_error function
  • cosine_similarity function
  • Huber class
  • huber function
  • LogCosh class
  • log_cosh function

Hinge losses for "maximum-margin" classification

  • Hinge class
  • SquaredHinge class
  • CategoricalHinge class
  • hinge function
  • squared_hinge function
  • categorical_hinge function

Usage of losses with compile() & fit()

A loss function is one of the two arguments required for compiling a Keras model:

All built-in loss functions may also be passed via their string identifier:

Loss functions are typically created by instantiating a loss class (e.g. keras.losses.SparseCategoricalCrossentropy ). All losses are also provided as function handles (e.g. keras.losses.sparse_categorical_crossentropy ).

Using classes enables you to pass configuration arguments at instantiation time, e.g.:

Standalone usage of losses

A loss is a callable with arguments loss_fn(y_true, y_pred, sample_weight=None) :

  • y_true : Ground truth values, of shape (batch_size, d0, ... dN) . For sparse loss functions, such as sparse categorical crossentropy, the shape should be (batch_size, d0, ... dN-1)
  • y_pred : The predicted values, of shape (batch_size, d0, .. dN) .
  • sample_weight : Optional sample_weight acts as reduction weighting coefficient for the per-sample losses. If a scalar is provided, then the loss is simply scaled by the given value. If sample_weight is a tensor of size [batch_size] , then the total loss for each sample of the batch is rescaled by the corresponding element in the sample_weight vector. If the shape of sample_weight is (batch_size, d0, ... dN-1) (or can be broadcasted to this shape), then each loss element of y_pred is scaled by the corresponding value of sample_weight . (Note on dN-1 : all loss functions reduce by 1 dimension, usually axis=-1 .)

By default, loss functions return one scalar loss value per input sample, e.g.

However, loss class instances feature a reduction constructor argument, which defaults to "sum_over_batch_size" (i.e. average). Allowable values are "sum_over_batch_size", "sum", and "none":

  • "sum_over_batch_size" means the loss instance will return the average of the per-sample losses in the batch.
  • "sum" means the loss instance will return the sum of the per-sample losses in the batch.
  • "none" means the loss instance will return the full array of per-sample losses.

Note that this is an important difference between loss functions like keras.losses.mean_squared_error and default loss class instances like keras.losses.MeanSquaredError : the function version does not perform reduction, but by default the class instance does.

When using fit() , this difference is irrelevant since reduction is handled by the framework.

Here's how you would use a loss class instance as part of a simple training loop:

Creating custom losses

Any callable with the signature loss_fn(y_true, y_pred) that returns an array of losses (one of sample in the input batch) can be passed to compile() as a loss. Note that sample weighting is automatically supported for any such loss.

Here's a simple example:

The add_loss() API

Loss functions applied to the output of a model aren't the only way to create losses.

When writing the call method of a custom layer or a subclassed model, you may want to compute scalar quantities that you want to minimize during training (e.g. regularization losses). You can use the add_loss() layer method to keep track of such loss terms.

Here's an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:

Loss values added via add_loss can be retrieved in the .losses list property of any Layer or Model (they are recursively retrieved from every underlying layer):

These losses are cleared by the top-level layer at the start of each forward pass – they don't accumulate. So layer.losses always contain only the losses created during the last forward pass. You would typically use these losses by summing them before computing your gradients when writing a training loop.

When using model.fit() , such loss terms are handled automatically.

When writing a custom training loop, you should retrieve these terms by hand from model.losses , like this:

See the add_loss() documentation for more details.

  • cnvrg.io Metacloud
  • --> --> --> Free Community--> --> --> --> Plans --> --> --> "Meet Us at re:Invent --> --> -->