【机器学习】Nonlinear Independent Component Analysis - Aapo Hyvärinen

news/2024/11/7 6:46:20/

Linear independent component analysis (ICA)

x i ( k ) = ∑ j = 1 n a i j s j ( k ) for all  i = 1 … n , k = 1 … K ( ) x_i(k) = \sum_{j=1}^{n} a_{ij}s_j(k) \quad \text{for all } i = 1 \ldots n, k = 1 \ldots K \tag{} xi(k)=j=1naijsj(k)for all i=1n,k=1K()

  • x i ( k ) x_i(k) xi(k) is the i i i-th observed signal in sample point k k k (possibly time)
  • a i j a_{ij} aij constant parameters describing “mixing”
  • Assuming independent, non-Gaussian latent “sources” s j s_j sj
  • ICA is identifiable, i.e. well-defined. Observing only x i x_i xi we can recover both a i j a_{ij} aij and s j s_j sj .

Fundamental difference between ICA and PCA

  • PCA doesn’t find the original coordinates, ICA does.

在这里插入图片描述

  • PCA, Gaussian factor analysis are not identifiable:
    • Any orthogonal rotation is equivalent: s ′ = U s s' = Us s=Us has same distribution.

Nonlinear ICA is an unsolved problem

  • Extend ICA to nonlinear case to get general disentanglement?

  • Unfortunately, “basic” nonlinear ICA is not identifiable:

  • If we define nonlinear ICA model for random variables ( x_i ) as

    x i = f i ( s 1 , … , s n ) , i = 1 … n x_i = f_i(s_1, \ldots, s_n) , i = 1 \ldots n xi=fi(s1,,sn),i=1n

    we cannot recover original sources (Darmois, 1952; Hyvärinen & Pajunen, 1999)

Darmois construction

  • Darmois (1952) showed the impossibility of nonlinear ICA:

  • For any x 1 , x 2 x_1, x_2 x1,x2, can always construct y = g ( x 1 , x 2 ) y = g(x_1, x_2) y=g(x1,x2) independent of x 1 x_1 x1 as

    g ( ξ 1 , ξ 2 ) = P ( x 2 < ξ 2 ∣ x 1 = ξ 1 ) g(\xi_1, \xi_2) = P(x_2 < \xi_2 | x_1 = \xi_1) g(ξ1,ξ2)=P(x2<ξ2x1=ξ1)

  • Independence alone too weak for identifiability:

    • We could take x 1 x_1 x1 as an independent component which is absurd
  • Looking at non-Gaussianity equally absurd:

    • Scalar transform h ( x 1 ) h(x_1) h(x1) can give any distribution

Time-contrastive learning

  • Observe n n n-dim time series x ( t ) x(t) x(t)
  • Divide x ( t ) x(t) x(t) into T T T segments (e.g., bins with equal sizes)
  • Train MLP to tell which segment a single data point comes from
    • Number of classes is T T T
    • Labels given by index of segment
    • Multinomial logistic regression
  • In hidden layer h h h, NN should learn to represent nonstationarity 非平稳性 (= differences between segments)
  • Could this really do Nonlinear ICA?
Pasted image 20231120155648
  • Assume data follows nonlinear ICA model x ( t ) = f ( s ( t ) ) x(t) = f(s(t)) x(t)=f(s(t)) with
    • smooth, invertible nonlinear mixing f : R n → R n f : \mathbb{R}^n \rightarrow \mathbb{R}^n f:RnRn
    • components s i ( t ) s_i(t) si(t) are nonstationary, e.g., in variances
  • Assume we apply time-contrastive learning on x ( t ) x(t) x(t)
    • using MLP with hidden layer in h ( x ( t ) ) h(x(t)) h(x(t)) with dim ( h ) = dim ( x ) \text{dim}(h) = \text{dim}(x) dim(h)=dim(x)
  • Then, TCL will find s ( t ) 2 = A h ( x ( t ) ) s(t)^2 = Ah(x(t)) s(t)2=Ah(x(t)) for some linear mixing matrix A A A. (Squaring is element-wise)
  • I.e.: TCL demixes nonlinear ICA model up to linear mixing (which can be estimated by linear ICA) and up to squaring.
  • This is a constructive proof of identifiability
  • Imposing independence at every segment -> more constraints -> unique solution. 增加了限制保证了indentifiability

用MLP,通过自监督分类(某一个信号来自于哪个时间段)来训练网络。这样MLP可以表示不同时间段内的信号差。而后原始信号 s 2 s^2 s2 可以表示为观测值(x)经MLP隐藏层分离结果的线性组合。

Deep Latent Variable Models

  • General framework with observed data vector x x x and latent s s s:
    p ( x , s ) = p ( x ∣ s ) p ( s ) , p ( x ) = ∫ p ( x , s ) d s p(x, s) = p(x|s)p(s), \quad p(x) = \int p(x, s)ds p(x,s)=p(xs)p(s),p(x)=p(x,s)ds
    where θ \theta θ is a vector of parameters, e.g., in a neural network

  • In variational autoencoders (VAE):

    • Define prior so that s s s white Gaussian (thus s i s_i si; all independent)
    • Define posterior so that x = f ( s ) + n x = f(s) + n x=f(s)+n
  • Looks like Nonlinear ICA, but not identifiable

    • By Gaussianity, any orthogonal rotation is equivalent:
      s ′ = M s has exactly the same distribution if  M T M = I s' = Ms \text{ has exactly the same distribution if } M^TM = I s=Ms has exactly the same distribution if MTM=I

Conditioning DLVM’s by another variable

通过引入一个新的变量u来解,比如找视频和音频的关系,时间t就可以作为辅助变量(auxiliary varibale)。通过条件独立(conditional independent)来解。

Conclusion

  • Typical deep learning needs class labels, or some targets

  • If no class labels: unsupervised learning

  • Independent component analysis is a principled approach

    • can be made nonlinear
  • Identifiable: Can recover components that actually created the data (unlike PCA, VAE etc)

  • Special assumptions needed for identifiability, one of:

    • Nonstationarity (“time-contrastive learning”)
    • Temporal dependencies (“permutation-contrastive learning”)
    • Existence of auxiliary (conditioning) variable (e.g., “iVAE”)
  • Self-supervised methods are easy to implement

  • Connection to DLVM’s can be made → iVAE

  • Principled framework for “disentanglement”

总结来说Linear ICA是可解的,对于Nonlinear ICA则需要增加额外的假设才能可解(原始信号可分离)。Nonlinear ICA的思想可以用在深度学习的其他模型上。

Reference

  1. https://www.youtube.com/watch?v=_cBLSNRWt8c

http://www.ppmy.cn/news/1241559.html

相关文章

思维模型 波纹效应

本系列文章 主要是 分享 思维模型&#xff0c;涉及各个领域&#xff0c;重在提升认知。小变化&#xff0c;大影响。 1 波纹效应的应用 1.1 波纹效应在市场中的应用 2008 年&#xff0c;美国金融危机爆发&#xff0c;导致全球经济陷入衰退。这场危机的起因是美国房地产市场的崩…

算法-技巧-中等-颜色分类

记录一下算法题的学习12 颜色分类 题目&#xff1a;给定一个包含红色、白色和蓝色、共 n 个元素的数组 nums &#xff0c;原地对它们进行排序&#xff0c;使得相同颜色的元素相邻&#xff0c;并按照红色、白色、蓝色顺序排列。 我们使用整数 0、 1 和 2 分别表示红色、白色和蓝…

git分支命名规范

https://www.cnblogs.com/wq-9/p/16968098.html

document load 和 document ready 的区别

"document load" 和 "document ready" 都是 JavaScript 中用于处理文档加载事件的术语&#xff0c;但是它们之间有一些重要的区别。 document load 在传统的 JavaScript 中&#xff0c;document.load 事件是当整个 HTML 文档完全加载并出现在浏览器中时触…

css加载会造成阻塞吗??

前言 前几天面试问到了这个问题&#xff0c;当时这个答得不敢确定哈哈&#xff0c;虽然一面还是过了 现在再分析下这个&#xff0c;总结下&#xff0c;等下次遇到就能自信得回答&#xff0c;666 准备工作 为了完成本次测试&#xff0c;先来科普一下&#xff0c;如何利用chr…

面试笔记--Linux常用命令

文件和目录操作&#xff1a; ls: 列出目录内容 例子&#xff1a;ls -l - 列出详细信息&#xff0c;包括权限、所有者等 cd: 切换目录 例子&#xff1a;cd Documents - 进入 “Documents” 目录 pwd: 显示当前工作目录 例子&#xff1a;pwd - 显示当前工作目录的路径 cp: 复制文…

排序算法-----基数排序

目录 前言 基数排序 算法思想 ​编辑 算法示例 代码实现 1.队列queue.h 头文件 2.队列queue.c 源文件 3.主函数&#xff08;radix_sort实现&#xff09; 算法分析 前言 今天我想把前面未更新完的排序算法补充一下&#xff0c;也就是基数排序的一种&#xff0c;这是跟…

Windows系统管理之备份与恢复

本章目录&#xff1a; 一. 本章须知&#xff1a; 前置条件 需要创建一个新的磁盘 前置条件2 给新添加的磁盘分盘 二. 了解开启并学会使用Windows sever backup 如何使用备份与恢复“备份计划”“一次性备份”“恢复” 最后是用命令行“一次性备份命令 ”完成一次备份 话不多说 …