Lecture 22 Ethics

news/2024/11/22 19:22:46/

目录

      • Arguments Against Ethical Checks in NLP
      • Core NLP Ethics Concepts
      • Group Discussion

How we ought to live — Socrates

  • what is ethics
    • What is the right thing to do?
    • Why?
  • Why Should We Care?
    • AI technology is increasingly being deployed to real-world applications
    • Have real and tangible impact to people
    • Whose responsibility is it when things go bad?
  • Why Is Ethics Hard?
    • Often no objective truth, unlike sciences
    • A new philosophy student may ask whether fundamental ethical theories such as utilitarianism(实用主义,功利主义) is right
    • But unlikely a new physics student would question the laws of thermodynamics(热力学)
    • In examining a problem, we need to think from different perspectives to justify our reasons
  • Learning Outcomes
    • Think more about the application you build
      • Not just its performance
      • Its social context
      • Its impact to other people
      • Unintended harms
    • Be a socially-responsible scientist or engineer

Arguments Against Ethical Checks in NLP

  • Should We Censor(审查) Science?
    • A common argument when ethical checks or processes are introduced:
      • Should there be limits to scientific research? Is it right to censor research?
    • Ethical procedures are common in other fields: medicine, biology, psychology, anthropology, etc
    • In the past, this isn’t common in computer science
    • But this doesn’t mean this shouldn’t change
    • Technology are increasingly being integrated into society; the research we do nowadays are likely to be more deployed than 20 years ago
  • H5N1
    • Ron Fouchier, a Dutch virologist, discovered how to make bird flu potentially more harmful in 2011
    • Dutch government objected to publishing the research
    • Raised a lot of discussions and concerns
    • National policies enacted
  • Isn’t Transparency Always Better?
    • Is it always better to publish sensitive research publicly?
    • Argument: worse if they are done underground
    • If goal is to raise awareness, scientific publication isn’t the only way
      • Could work with media to raise awareness
      • Doesn’t require exposing the technique
  • AI vs. Cybersecurity
    • Exposing vulnerability publicly is desirable in cyber-security applications
      • Easy for developer to fix the problem
    • But the same logic doesn’t always apply for AI
      • Not easy to fix, once the technology is out

Core NLP Ethics Concepts

  • Bias
    • Two definitions:
      • Value-neutral meaning in ML
      • Normative meaning in socio-cultural studies
    • Ethics research in NLP: harmful prejudices in models
    • A biased model is one that performs unfavourably against certain groups of users
      • typically based on demographic features such as gender or ethnicity
    • Bias isn’t necessarily bad
      • Guide the model to make informed decisions in the absence of more information
      • Truly unbiased system = system that makes random decisions
      • Bad when overwhelms evidence, or perpetuates harmful stereotypes
    • Bias can arise from data, annotations, representations, models, or research design
  • Bias in Word Embeddings
    • Word Analogy (lecture 10):
      • v(man) - v(woman) = v(king) - v(queen)
    • But!
      • v(man) - v(woman) = v(programmer) - v(homemaker)
      • v(father) - v(mother) = v(doctor) - v(nurse)
    • Word embeddings reflect and amplify gender stereotypes in society
    • Lots of work done to reduce bias in word embeddings
  • Dual Use
    • Every technology has a primary use, and unintended secondary consequences
      • uclear power, knives, electricity
      • could be abused for things they are not originally designed to do.
    • Since we do not know how people will use it, we need to be aware of this duality
  • OpenAI GPT-2
    • OpenAI developed GPT-2, a large language model trained on massive web data
    • Kickstarted the pretrained model paradigm in NLP
      • Fine-tune pretrained models on downstream tasks (BERT lecture 11)
    • GPT-2 also has amazing generation capability
      • Can be easily fine-tuned to generate fake news, create propaganda
    • Pretrained GPT-2 models released in stages over 9 months, starting with smaller models
    • Collaborated with various organisations to study social implications of very large language models over this time
    • OpenAI’s effort is commendable, but this is voluntary
    • Further raises questions about self-regulation
  • Privacy
    • Often conflated with anonymity
    • Privacy means nobody know I am doing something
    • Anonymity means everyone know what I am doing, but not that it is me
  • GDPR
    • Regulation on data privacy in EU
    • Also addresses transfer of personal data
    • Aim to give individuals control over their personal data
    • Organisations that process EU citizen’s personal data are subjected to it
    • Organisations need to anonymise data so that people cannot be identified
    • But we have technology to de-identify author attributes
  • AOL Search Data Leak
    • In 2006, AOL released anonymised search logs of users
    • Log contained sufficient information to de-identify individuals
      • Through cross-referencing with phonebook listing an individual was identified
    • Lawsuit filed against AOL

Group Discussion

  • Prompts
    • Primary use: does it promote harm or social good?
    • Bias?
    • Dual use concerns?
    • Privacy concerns? What sorts of data does it use?
    • Other questions to consider:
      • Can it be weaponised against populations (e.g. facial recognition, location tracking)?
      • Does it fit people into simple categories (e.g. gender and sexual orientation)?
      • Does it create alternate sets of reality (e.g. fake news)?
  • Automatic Prison Term Prediction
    • A model that predicts the prison sentence of an individual based on court documents
      • bias: black people often give harsher sentence
      • non-explainable if use deep learning
      • is it reasonable to use AI to judge one’s freedom in the first place?
  • Automatic CV Processing
    • A model that processes CV/resumes for a job to automatically filter candidates for interview
      • bias towards gender(stereotypes, does the model amplify the stereotype?)
      • the system can be cheated
      • how the data is sourced(privacy)
      • the reason for rejection? deep learning is black box
  • Language Community Classification
    • A text classification tool that distinguishes LGBTQ from heterosexual language
    • Motivation: to understand how language used in the LGBTQ community differs from heterosexual community
      • dual use: potentially can be used to classify LGBTQ person, discriminate people.
  • Take Away
    • Think about the applications you build
    • Be open-minded: ask questions, discuss with others
    • NLP tasks aren’t always just technical problems
    • Remember that the application we build could change someone else’s life
    • We should strive to be a socially responsible engineer/scientist

http://www.ppmy.cn/news/297639.html

相关文章

MacBook安装Redis

MacBook安装Redis 官方下载地址:https://redis.io/download 官方提供安装的相关命令了 #下载redis包 wget http://download.redis.io/releases/redis-4.0.10.tar.gz#解压 tar -zxvf redis-4.0.10.tar.gz#进入目录 cd redis-4.0.10#编译 make#启动redis服务 src/redis-server#…

大数据需要学习哪些内容?

大数据技术的体系庞大且复杂,每年都会涌现出大量新的技术,目前大数据行业所涉及到的核心技术主要就是:数据采集、数据存储、数据清洗、数据查询分析和数据可视化。 Python 已成利器 在大数据领域中大放异彩 Python,成为职场人追求…

Linux 应用程序信号量使用实战

背景 在项目实施过程中,有个机制需要做两个线程之间的同步。 具体需求如下: 首先,线程1需要把资源读取到缓存 其次,线程2才可以操作这块缓存 上述两个动作顺序交替重复。 思路 使用信号量解决思路,申请两个信号…

Directx 10 is not supported方法一

在安装印制板软件时,出现“Directx 10 is not supported”的提示或PCB操作区无显示情况,解决方法是更新显卡驱动(我的电脑-设备管理器-显示卡)或安装显卡驱动(根据显卡厂家去官网下载对应驱动),…

Goby 漏洞更新 |Bifrost 中间件 X-Requested-With 系统身份认证绕过漏洞(CVE-2022-39267)

漏洞名称:Bifrost 中间件 X-Requested-With 系统身份认证绕过漏洞(CVE-2022-39267) English Name:Bifrost X-Requested-With Authentication Bypass Vulnerability (CVE-2022-39267) CVSS core: 8.8 影响资产数:14 漏洞描述&a…

《C++高级编程》读书笔记(七:内存管理)

1、参考引用 C高级编程(第4版,C17标准)马克葛瑞格尔 2、建议先看《21天学通C》 这本书入门,笔记链接如下 21天学通C读书笔记(文章链接汇总) 1. 使用动态内存 1.1 如何描绘内存 在本书中,内存单…

【Linux系统进阶详解】Linux数字权限rwx-,4210,8进制权限表示法与字符权限区别及实战精讲

在Linux系统中,文件和目录的权限可以使用数字权限和字符权限两种表示方法。数字权限使用四个数字来表示文件和目录的权限,而字符权限则使用r、w和x等字符来表示文件和目录的权限。下面我们将分别介绍数字权限和字符权限,并提供一些实战精讲和使用案例。 数字权限 数字权限…

设置tablayout选中文字颜色和背景图片

很想爆粗口!!!!!妈的,我的tablayout为什么在布局文件下面写的时候就是不好用的,改变什么都是不好用的!!!没办法只能用代码的方式来粗暴的解决问题了。。。。一…