最近在翻看
这本书,对其中的一些英文概念做一些记录。
Link:https://web.stanford.edu/~boyd/books.html
中文版
向量值线性函数和仿射函数的详解
在机器学习、数据科学和工程应用中,向量值线性函数和仿射函数是非常重要的数学工具。本文将从基本定义、性质、矩阵表示及其应用等方面详细介绍它们,并通过具体例子加深理解。
一、向量值函数
1.1 定义
向量值函数 ( f : R n → R m f: \mathbb{R}^n \to \mathbb{R}^m f:Rn→Rm ) 是一种将 ( n n n )-维向量映射为 ( m m m )-维向量的函数。函数 ( f f f ) 在 ( x \mathbf{x} x ) 的值为:
f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f m ( x ) ] f(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \end{bmatrix} f(x)= f1(x)f2(x)⋮fm(x)
其中,( f i ( x ) f_i(\mathbf{x}) fi(x) ) 是一个标量值函数,它接受 ( x \mathbf{x} x ) 作为输入,输出一个标量。我们也可以将 ( x \mathbf{x} x ) 展开成标量形式 ( x = ( x 1 , x 2 , … , x n ) \mathbf{x} = (x_1, x_2, \ldots, x_n) x=(x1,x2,…,xn) ),此时 ( f i ( x ) = f i ( x 1 , x 2 , … , x n ) f_i(\mathbf{x}) = f_i(x_1, x_2, \ldots, x_n) fi(x)=fi(x1,x2,…,xn) )。
二、线性函数
2.1 定义
向量值函数 ( f : R n → R m f: \mathbb{R}^n \to \mathbb{R}^m f:Rn→Rm ) 称为线性函数,如果它满足叠加性(superposition property):
f ( α x + β y ) = α f ( x ) + β f ( y ) , f(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha f(\mathbf{x}) + \beta f(\mathbf{y}), f(αx+βy)=αf(x)+βf(y),
其中 ( x , y \mathbf{x}, \mathbf{y} x,y ) 是任意的 ( n n n )-维向量,( α , β \alpha, \beta α,β ) 是任意标量。
在线性函数中,( f ( x ) f(\mathbf{x}) f(x) ) 可以表示为矩阵-向量乘积:
f ( x ) = A x , f(\mathbf{x}) = A\mathbf{x}, f(x)=Ax,
其中 ( A A A ) 是一个 ( m × n m \times n m×n ) 的矩阵。
2.2 矩阵表示的推导
假设 ( f : R n → R m f: \mathbb{R}^n \to \mathbb{R}^m f:Rn→Rm ) 是线性函数,那么可以通过以下方式表示:
f ( x ) = x 1 f ( e 1 ) + x 2 f ( e 2 ) + ⋯ + x n f ( e n ) , f(\mathbf{x}) = x_1 f(\mathbf{e}_1) + x_2 f(\mathbf{e}_2) + \cdots + x_n f(\mathbf{e}_n), f(x)=x1f(e1)+x2f(e2)+⋯+xnf(en),
其中 ( e k \mathbf{e}_k ek ) 是 ( R n \mathbb{R}^n Rn ) 的第 ( k k k ) 个单位向量(只有第 ( k k k ) 个元素为 1,其余为 0)。将 ( f ( e k ) f(\mathbf{e}_k) f(ek) ) 依次作为矩阵 ( A A A ) 的列向量:
A = [ f ( e 1 ) f ( e 2 ) ⋯ f ( e n ) ] . A = \begin{bmatrix} f(\mathbf{e}_1) & f(\mathbf{e}_2) & \cdots & f(\mathbf{e}_n) \end{bmatrix}. A=[f(e1)f(e2)⋯f(en)].
于是 ( f ( x ) f(\mathbf{x}) f(x) ) 可以写为:
f ( x ) = A x . f(\mathbf{x}) = A\mathbf{x}. f(x)=Ax.
2.3 示例
(1)取反函数
函数 ( f ( x ) = − x f(\mathbf{x}) = -\mathbf{x} f(x)=−x ) 将输入向量的每个元素取反。矩阵形式为:
A = − I , A = -I, A=−I,
其中 ( I I I ) 是 ( n × n n \times n n×n ) 的单位矩阵。
(2)元素顺序翻转
函数 ( f ( x ) f(\mathbf{x}) f(x) ) 翻转输入向量的元素顺序,例如将 ( x = ( x 1 , x 2 , … , x n ) \mathbf{x} = (x_1, x_2, \ldots, x_n) x=(x1,x2,…,xn) ) 映射为 ( ( x n , x n − 1 , … , x 1 ) (x_n, x_{n-1}, \ldots, x_1) (xn,xn−1,…,x1) )。对应的矩阵为:
A = [ 0 0 ⋯ 0 1 0 0 ⋯ 1 0 ⋮ ⋮ ⋯ ⋮ ⋮ 1 0 ⋯ 0 0 ] . A = \begin{bmatrix} 0 & 0 & \cdots & 0 & 1 \\ 0 & 0 & \cdots & 1 & 0 \\ \vdots & \vdots & \cdots & \vdots & \vdots \\ 1 & 0 & \cdots & 0 & 0 \end{bmatrix}. A= 00⋮100⋮0⋯⋯⋯⋯01⋮010⋮0 .
这是单位矩阵 ( I I I ) 的列顺序翻转版本。
三、仿射函数
3.1 定义
仿射函数是线性函数的推广,形式为:
f ( x ) = A x + b , f(\mathbf{x}) = A\mathbf{x} + \mathbf{b}, f(x)=Ax+b,
其中 ( A A A ) 是 ( m × n m \times n m×n ) 矩阵,( b \mathbf{b} b ) 是 ( m m m )-维向量。
仿射函数可以看作是在线性变换 ( A x A\mathbf{x} Ax ) 的基础上,添加了一个平移向量 ( b \mathbf{b} b )。
3.2 性质
仿射函数与线性函数的主要区别在于叠加性质:
- 对线性函数,叠加性成立:( f ( α x + β y ) = α f ( x ) + β f ( y ) f(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha f(\mathbf{x}) + \beta f(\mathbf{y}) f(αx+βy)=αf(x)+βf(y) )。
- 对仿射函数,仅当 ( α + β = 1 \alpha + \beta = 1 α+β=1 ) 时成立。
3.3 参数的唯一性
仿射函数的矩阵 ( A A A ) 和向量 ( b \mathbf{b} b ) 是唯一的。我们可以通过以下方法确定它们:
- 计算平移向量 ( KaTeX parse error: Can't use function '\)' in math mode at position 12: \mathbf{b} \̲)̲:\( \mathbf{b} … );
- 计算矩阵 ( A A A ) 的列:( A = [ f ( e 1 ) − b f ( e 2 ) − b ⋯ f ( e n ) − b ] A = \begin{bmatrix} f(\mathbf{e}_1) - \mathbf{b} & f(\mathbf{e}_2) - \mathbf{b} & \cdots & f(\mathbf{e}_n) - \mathbf{b} \end{bmatrix} A=[f(e1)−bf(e2)−b⋯f(en)−b] )。
3.4 示例
(1)平移函数
仿射函数 ( f ( x ) = x + b f(\mathbf{x}) = \mathbf{x} + \mathbf{b} f(x)=x+b ) 将向量 ( x \mathbf{x} x ) 平移 ( b \mathbf{b} b)。对应的矩阵表示为:
A = I , b = b . A = I, \quad \mathbf{b} = \mathbf{b}. A=I,b=b.
(2)缩放与偏移
仿射函数 ( f ( x ) = 2 x + b f(\mathbf{x}) = 2\mathbf{x} + \mathbf{b} f(x)=2x+b ) 对 ( x \mathbf{x} x ) 进行缩放(放大 2 倍)后,再平移 ( b \mathbf{b} b )。对应的矩阵表示为:
A = 2 I , b = b . A = 2I, \quad \mathbf{b} = \mathbf{b}. A=2I,b=b.
四、线性与仿射函数的应用
4.1 计算机视觉
- 图像变换:通过矩阵 ( A A A ) 实现旋转、缩放或翻转,通过 ( b \mathbf{b} b ) 实现平移。
- 特征提取:线性函数常用于从图像中提取特征,例如卷积操作。
4.2 机器学习
- 线性模型:线性回归和支持向量机都是线性函数的直接应用。
- 神经网络:全连接层中的仿射变换 ( f ( x ) = A x + b f(\mathbf{x}) = A\mathbf{x} + \mathbf{b} f(x)=Ax+b ) 是基础运算。
五、总结
向量值线性函数和仿射函数是数学和工程中的重要工具。线性函数描述了纯粹的线性变换,而仿射函数在此基础上增加了平移操作。这些函数通过矩阵表示具有简单直观的形式,便于分析和实现。掌握它们的定义、性质及应用,对深入理解机器学习和数据处理有重要意义。
英文版
Introduction to Vector-Valued Linear and Affine Functions
In this blog, we’ll explore vector-valued linear and affine functions, providing detailed explanations, examples, and clarifications. These concepts are essential in understanding transformations and mappings in linear algebra and are widely used in various fields, including machine learning, optimization, and physics.
1. Linear Functions
A vector-valued linear function maps vectors from one space ( R n \mathbb{R}^n Rn ) to another space ( R m \mathbb{R}^m Rm ), and is expressed as:
f ( x ) = A x f(x) = Ax f(x)=Ax
where:
- ( x ∈ R n x \in \mathbb{R}^n x∈Rn ) is the input vector.
- ( A ∈ R m × n A \in \mathbb{R}^{m \times n} A∈Rm×n ) is a matrix.
- ( f ( x ) ∈ R m f(x) \in \mathbb{R}^m f(x)∈Rm ) is the output vector.
Key Properties of Linear Functions
Linear functions satisfy superposition:
f ( α x + β y ) = α f ( x ) + β f ( y ) f(\alpha x + \beta y) = \alpha f(x) + \beta f(y) f(αx+βy)=αf(x)+βf(y)
This property means that scaling and summing inputs before applying the function is equivalent to scaling and summing their corresponding outputs.
Derivation of Linear Representation
If ( f ( x ) f(x) f(x) ) is a linear function, the matrix ( A A A ) can be constructed by evaluating ( f ( x ) f(x) f(x) ) on the unit vectors ( e 1 , e 2 , … , e n e_1, e_2, \ldots, e_n e1,e2,…,en ):
A = [ f ( e 1 ) f ( e 2 ) ⋯ f ( e n ) ] A = \begin{bmatrix} f(e_1) & f(e_2) & \cdots & f(e_n) \end{bmatrix} A=[f(e1)f(e2)⋯f(en)]
This ensures that ( f ( x ) = A x f(x) = Ax f(x)=Ax ) for all ( x x x ).
Examples of Linear Functions
Example 1: Negation
Function: ( f ( x ) = − x f(x) = -x f(x)=−x )
Matrix: ( A = − I A = -I A=−I ) (where ( I I I ) is the identity matrix).
f ( x ) = − I x = − x f(x) = -Ix = -x f(x)=−Ix=−x
Example 2: Reversal
Function: ( f ( x ) f(x) f(x) ) reverses the order of the elements in ( x x x ).
Matrix ( A A A ):
A = [ 0 0 ⋯ 1 0 0 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 1 0 ⋯ 0 ] A = \begin{bmatrix} 0 & 0 & \cdots & 1 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 0 & \cdots & 0 \end{bmatrix} A= 00⋮100⋮0⋯⋯⋱⋯10⋮0
For example, if ( x = [ 1 , 2 , 3 ] T x = [1, 2, 3]^T x=[1,2,3]T ), then ( f ( x ) = [ 3 , 2 , 1 ] T f(x) = [3, 2, 1]^T f(x)=[3,2,1]T ).
2. Affine Functions
A vector-valued affine function adds a constant offset to a linear function:
f ( x ) = A x + b f(x) = Ax + b f(x)=Ax+b
where:
- ( A ∈ R m × n A \in \mathbb{R}^{m \times n} A∈Rm×n ) is the linear transformation matrix.
- ( b ∈ R m b \in \mathbb{R}^m b∈Rm ) is a constant vector (bias term).
Key Properties of Affine Functions
Affine functions satisfy affine combination superposition:
f ( α x + β y ) = α f ( x ) + β f ( y ) , if α + β = 1 f(\alpha x + \beta y) = \alpha f(x) + \beta f(y), \quad \text{if } \alpha + \beta = 1 f(αx+βy)=αf(x)+βf(y),if α+β=1
Unlike linear functions, affine functions do not satisfy full superposition for arbitrary ( α \alpha α ) and ( β \beta β ) unless ( b = 0 b = 0 b=0 ).
Derivation of Parameters ( A A A) and ( b b b )
To find ( A A A ) and ( b b b ), evaluate ( f ( x ) f(x) f(x) ) at specific points:
- ( f ( 0 ) = b f(0) = b f(0)=b ) (the bias vector).
- For each unit vector ( e k e_k ek ):
A = [ f ( e 1 ) − b f ( e 2 ) − b ⋯ f ( e n ) − b ] A = \begin{bmatrix} f(e_1) - b & f(e_2) - b & \cdots & f(e_n) - b \end{bmatrix} A=[f(e1)−bf(e2)−b⋯f(en)−b]
Examples of Affine Functions
Example 1: Translation (平移)
Function: ( f ( x ) = x + b f(x) = x + b f(x)=x+b ), where ( b b b ) shifts all elements of ( x x x ).
Matrix: ( A = I A = I A=I ), the identity matrix.
For ( x = [ 1 , 2 ] T x = [1, 2]^T x=[1,2]T ) and ( b = [ 3 , 4 ] T b = [3, 4]^T b=[3,4]T ):
f ( x ) = A x + b = [ 1 , 2 ] T + [ 3 , 4 ] T = [ 4 , 6 ] T f(x) = Ax + b = [1, 2]^T + [3, 4]^T = [4, 6]^T f(x)=Ax+b=[1,2]T+[3,4]T=[4,6]T
Example 2: Scaling with Offset
Function: ( f ( x ) = 2 x + [ 1 , − 1 ] T f(x) = 2x + [1, -1]^T f(x)=2x+[1,−1]T ).
Matrix: ( A = 2 I A = 2I A=2I ), and bias: ( b = [ 1 , − 1 ] T b = [1, -1]^T b=[1,−1]T ).
For ( x = [ 1 , 2 ] T x = [1, 2]^T x=[1,2]T ):
f ( x ) = 2 [ 1 , 2 ] T + [ 1 , − 1 ] T = [ 2 , 4 ] T + [ 1 , − 1 ] T = [ 3 , 3 ] T f(x) = 2[1, 2]^T + [1, -1]^T = [2, 4]^T + [1, -1]^T = [3, 3]^T f(x)=2[1,2]T+[1,−1]T=[2,4]T+[1,−1]T=[3,3]T
3. Linear vs. Affine Functions
Property | Linear Function (( A x Ax Ax )) | Affine Function (( A x + b Ax + b Ax+b )) |
---|---|---|
Superposition | Holds for all linear combinations. | Holds for affine combinations where ( α + β = 1 \alpha + \beta = 1 α+β=1 ). |
Contains a bias term ( b b b ) | No (( b = 0 b = 0 b=0 )). | Yes. |
Example | ( f ( x ) = − x f(x) = -x f(x)=−x ). | ( f ( x ) = x + [ 1 , 2 ] T f(x) = x + [1, 2]^T f(x)=x+[1,2]T ). |
4. Applications of Linear and Affine Functions
-
Machine Learning:
- Linear transformations are used in neural network layers (( W x + b Wx + b Wx+b )).
- Affine functions are fundamental in regression models.
-
Graphics:
- Affine functions are applied in transformations like scaling, rotation, and translation of images or 3D objects.
-
Physics:
- Linear functions describe physical systems under linear dynamics, while affine functions model systems with constant external forces.
5. Conclusion
Linear and affine functions form the backbone of many mathematical models, providing elegant and concise ways to represent transformations. Understanding their properties and representations is critical for applications in machine learning, computer vision, and beyond.
后记
2024年12月19日14点27分于上海,在GPT4o大模型辅助下完成。