Vector and Matrix Derivatives

Matrix CalculusVector DerivativesMatrix PropertiesDeterminants & TracesEigenvalues & EigenvectorsMatrix Decompositions (LU, QR, SVD)Orthogonality & ProjectionsOptimizationCommon Loss FunctionsGradient DescentConvexityLinear AlgebraSupervised LearningSupervised Learning (Regression/Classification)

Vector and Matrix Derivatives: Essential Tools for Optimization

In many areas of machine learning and optimization, especially when dealing with gradient-based methods, it is crucial to know the derivatives of vector and matrix functions. In this article, we summarize some common derivative formulas, specify the dimensions involved, and discuss how these results are applied in practice.


Dimensions and Notation

We use the following dimensions throughout:

a,b,YRn×1,ARn×n,BRp×n,XRn×p.\mathbf{a},\, \mathbf{b},\, \mathbf{Y} \in \mathbb{R}^{n \times 1}, \quad A \in \mathbb{R}^{n \times n}, \quad B \in \mathbb{R}^{p \times n}, \quad X \in \mathbb{R}^{n \times p}.


Vector Derivatives

aba=b,\frac{\partial \mathbf{a}^\top \mathbf{b}}{\partial \mathbf{a}} = \mathbf{b}, baa=b,\frac{\partial \mathbf{b}^\top \mathbf{a}}{\partial \mathbf{a}} = \mathbf{b}, aAaa=(A+A)a,\frac{\partial \mathbf{a}^\top A \mathbf{a}}{\partial \mathbf{a}} = (A + A^\top)\,\mathbf{a}, (Aa)a=A,\frac{\partial (A\,\mathbf{a})}{\partial \mathbf{a}} = A, Tr(Aa)a=A.\frac{\partial \operatorname{Tr}(A\,\mathbf{a})}{\partial \mathbf{a}} = A.

Matrix Derivatives

aAbA=ab,\frac{\partial \mathbf{a}^\top A \mathbf{b}}{\partial A} = \mathbf{a}\,\mathbf{b}^\top, aAbA=ba,\frac{\partial \mathbf{a}^\top A^\top \mathbf{b}}{\partial A} = \mathbf{b}\,\mathbf{a}^\top, aAAaA=Aaa,\frac{\partial \mathbf{a}^\top A^\top A\,\mathbf{a}}{\partial A} = A^\top\,\mathbf{a}\,\mathbf{a}^\top, Tr(A)A=I,\frac{\partial \operatorname{Tr}(A)}{\partial A} = I, Tr(AX)X=Tr(XA)X=A,\frac{\partial \operatorname{Tr}(A^\top X)}{\partial X} = \frac{\partial \operatorname{Tr}(X^\top A)}{\partial X} = A, Tr(AXB)X=AB,\frac{\partial \operatorname{Tr}(A X B)}{\partial X} = A^\top B^\top, Tr(BXA)X=AB.\frac{\partial \operatorname{Tr}(B^\top X^\top A)}{\partial X} = A\,B^\top.

L2L^2 Norm Derivatives

YWX22W=2(YWX)X,\frac{\partial \|Y - W\,X\|_2^2}{\partial W} = -2\,(Y - W\,X)\,X^\top, YXW22W=2X(YXW).\frac{\partial \|Y - X\,W\|_2^2}{\partial W} = -2\,X^\top\,(Y - X\,W).

Other Common Derivatives

ln(xa)x=axa,\frac{\partial \ln(\mathbf{x}^\top \mathbf{a})}{\partial \mathbf{x}} = \frac{\mathbf{a}}{\mathbf{x}^\top \mathbf{a}}, ln(det(A))A=(A1),\frac{\partial \ln(\det(A))}{\partial A} = (A^{-1})^\top, AF2A=2A,\frac{\partial \|A\|_F^2}{\partial A} = 2A,

where AF\|A\|_F denotes the Frobenius norm of AA.


Useful Resources

For further reading on matrix calculus, a highly recommended resource is the Matrix Cookbook. Other valuable references include: