operators

The operators module in this framework provides a collection of tensor operations for building computational graphs in deep learning. Each class in this module represents a different type of operation that can be performed on tensors, such as element-wise addition, scalar multiplication, division, exponentiation, etc.

Note about the `out_grad` parameter

During backpropagation in a neural network, we compute gradients starting from the output layer and propagate them back towards the input layer. The key idea here is that each layer receives the gradient of the loss with respect to its output (let’s call this out_grad), and it needs to compute and pass back the gradient of the loss with respect to its input (let’s call this in_grad). This is needed so that the parameters of each layer can be updated correctly during gradient descent.

The out_grad parameter refers to the gradient of the loss function with respect to the output of the node. Multiplying this with the local gradient gives the gradient of the loss with respect to the input to the node, according to the chain rule of calculus, which is the basis for backpropagation in neural networks.

The chain rule is a fundamental concept in calculus that provides a method to compute the derivative of composite functions. In simple terms, the chain rule states that the derivative of a composite function is the derivative of the outer function multiplied by the derivative of the inner function.

Given a composite function that is the composition of two functions, say, \(f(g(x))\), the chain rule can be stated as follows:

\[\frac{df}{dx} = \frac{df}{dg} \cdot \frac{dg}{dx}\]

Where:

\(\frac{df}{dx}\) is the derivative of the composite function \(f(g(x))\) with respect to \(x\),
\(\frac{df}{dg}\) is the derivative of the outer function \(f\) with respect to its argument \(g(x)\), and
\(\frac{dg}{dx}\) is the derivative of the inner function \(g(x)\) with respect to \(x\).

The chain rule can be extended to the case where we have more than two composite functions.

Element Wise Addition

Let’s walk through the step-by-step derivative calculation for the EWiseAdd operation:

We have the function f(a, b) = a + b, where a and b are tensors. Our goal is to compute the partial derivatives with respect to a and b.

Let’s start by calculating the derivative of f with respect to a, denoted as df/da:

Step 1: Compute the derivative of f with respect to a.

\(\frac{{\partial f}}{{\partial a}} = \frac{{\partial}}{{\partial a}} (a + b)\)

Since a is the variable we are differentiating with respect to, the derivative of a with respect to itself is 1:

\[\frac{{\partial f}}{{\partial a}} = 1\]

Therefore, \[\frac{{\partial f}}{{\partial a}} = 1.\]

Step 2: Compute the derivative of f with respect to b.

\[\frac{{\partial f}}{{\partial b}} = \frac{{\partial}}{{\partial b}} (a + b)\]

Again, since b is the variable we are differentiating with respect to, the derivative of b with respect to itself is 1:

\[\frac{{\partial f}}{{\partial b}} = 1\]

Therefore, \[\frac{{\partial f}}{{\partial b}} = 1\]

Hence, the partial derivatives of f(a, b) = a + b with respect to a and b are both equal to 1.

Note about the out_grad parameter

Element Wise Addition

add

EWiseAdd

Scalar Addition

add_scalar

AddScalar

Element Wise Multiplication

multiply

EWiseMul

Scalar Multiplication

mul_scalar

MulScalar

Element Wise Divide

divide

EWiseDiv

Scalar Division

divide_scalar

DivScalar

Negation

negate

Negate

Exp

exp

Exp

ReLU

relu

ReLU

Power Scalar

power_scalar

PowerScalar

Log

Transpose

transpose

Transpose

Reshape

reshape

Reshape

Matrix Multiplication

matmul

MatMul

Summation

summation

Summation

Broadcast

broadcast_to

BroadcastTo

LogSumExp

logsumexp

LogSumExp

Export

Note about the `out_grad` parameter