Constitutional AI

news/2024/10/17 20:29:43/
用中文以结构树的方式列出这篇讲稿的知识点:
Although you can use a reward model to eliminate the need for human evaluation during RLHF fine tuning, the human effort required to produce the trained reward model in the first place is huge. The labeled data set used to train the reward model typically requires large teams of labelers, sometimes many thousands of people to evaluate many prompts each. This work requires a lot of time and other resources which can be important limiting factors. As the number of models and use cases increases, human effort becomes a limited resource. Methods to scale human feedback are an active area of research. One idea to overcome these limitations is to scale through model self supervision. Constitutional AI is one approach of scale supervision. First proposed in 2022 by researchers at Anthropic, Constitutional AI is a method for training models using a set of rules and principles that govern the model's behavior. Together with a set of sample prompts, these form the constitution. You then train the model to self critique and revise its responses to comply with those principles. Constitutional AI is useful not only for scaling feedback, it can also help address some unintended consequences of RLHF. For example, depending on how the prompt is structured, an aligned model may end up revealing harmful information as it tries to provide the most helpful response it can. As an example, imagine you ask the model to give you instructions on how to hack your neighbor's WiFi. Because this model has been aligned to prioritize helpfulness, it actually tells you about an app that lets you do this, even though this activity is illegal. Providing the model with a set of constitutional principles can help the model balance these competing interests and minimize the harm. Here are some example rules from the research paper that Constitutional AI I asks LLMs to follow. For example, you can tell the model to choose the response that is the most helpful, honest, and harmless. But you can play some bounds on this, asking the model to prioritize harmlessness by assessing whether it's response encourages illegal, unethical, or immoral activity. Note that you don't have to use the rules from the paper, you can define your own set of rules that is best suited for your domain and use case. When implementing the Constitutional AI method, you train your model in two distinct phases. In the first stage, you carry out supervised learning, to start your prompt the model in ways that try to get it to generate harmful responses, this process is called red teaming. You then ask the model to critique its own harmful responses according to the constitutional principles and revise them to comply with those rules. Once done, you'll fine-tune the model using the pairs of red team prompts and the revised constitutional responses. Let's look at an example of how one of these prompt completion pairs is generated. Let's return to the WiFi hacking problem. As you saw earlier, this model gives you a harmful response as it tries to maximize its helpfulness. To mitigate this, you augment the prompt using the harmful completion and a set of predefined instructions that ask the model to critique its response. Using the rules outlined in the Constitution, the model detects the problems in its response. In this case, it correctly acknowledges that hacking into someone's WiFi is illegal. Lastly, you put all the parts together and ask the model to write a new response that removes all of the harmful or illegal content. The model generates a new answer that puts the constitutional principles into practice and does not include the reference to the illegal app. The original red team prompt, and this final constitutional response can then be used as training data. You'll build up a data set of many examples like this to create a fine-tuned NLM that has learned how to generate constitutional responses. The second part of the process performs reinforcement learning. This stage is similar to RLHF, except that instead of human feedback, we now use feedback generated by a model. This is sometimes referred to as reinforcement learning from AI feedback or RLAIF. Here you use the fine-tuned model from the previous step to generate a set of responses to your prompt. You then ask the model which of the responses is preferred according to the constitutional principles. The result is a model generated preference dataset that you can use to train a reward model. With this reward model, you can now fine-tune your model further using a reinforcement learning algorithm like PPO, as discussed earlier. Aligning models is a very important topic and an active area of research. The foundations of RLHF that you've explored in this lesson will allow you to follow along as the field evolves. I'm really excited to see what new discoveries researchers make in this area. I encourage you to keep an eye out for any new methods and best practices that emerge in the coming months and years

  • RLHF
    • Use of reward model to eliminate need for human evaluation
      • Large human effort required to produce trained reward model
        • Large teams of labelers needed for labeled data set used to train reward model
      • Human effort becomes limited resource as number of models and use cases increases
      • Methods to scale human feedback an active area of research
    • Constitutional AI as approach to scale through model self supervision
      • Method for training models using set of rules and principles that govern model's behavior and form constitution
      • Train model to self critique and revise responses to comply with principles
      • Can help address unintended consequences of RLHF, such as revealing harmful information
      • Example constitutional principles/rules:
        • Choose most helpful, honest, and harmless response
        • Prioritize harmlessness by assessing whether response encourages illegal, unethical, or immoral activity
        • Can define own set of rules suited for domain/use case
      • Train model using two distinct phases:
        • Supervised learning to generate harmful responses and critique and revise them according to constitutional principles (red teaming)
        • Reinforcement learning using feedback generated by model to train reward model
  • Fine-tuned NLM
  • Reinforcement learning algorithms (PPO)

  • 深度强化学习 (Deep Reinforcement Learning)
  • 奖励模型 (Reward Model)
  • 人工评估 (Human Evaluation)
  • 训练奖励模型的数据集 (Labeled Dataset)
  • 大规模标签队伍 (Large Teams of Labelers)
  • 自我监督 (Self Supervision)
  • 宪法型人工智能 (Constitutional AI)
  • 宪法中的规则和原则 (Rules and Principles in the Constitution)
  • RLHF的意外后果 (Unintended Consequences of RLHF)
  • 宪法中的规则示例 (Example Rules in the Constitution)
  • 监督学习 (Supervised Learning)
  • 红队测试 (Red Teaming)
  • 奖励模型的训练 (Training of Reward Model)
  • 强化学习 (Reinforcement Learning)
  • AI反馈的强化学习 (RLAIF)

  • 使用奖励模型消除RLHF微调过程中人工评估的需求
  • 为训练奖励模型需要大量的人力资源
  • 通过模型自我监督来扩展人类反馈的方法
  • 宪法AI是一种扩展反馈的方法,通过一组规则和原则来训练模型的行为
  • 使用宪法AI能够避免RLHF的一些意外后果
  • 宪法AI的规则可以根据领域和用例的需要进行定义和调整
  • 使用宪法AI的方法进行训练分为两个阶段:第一阶段进行有监督学习,第二阶段进行强化学习
  • 在强化学习阶段,使用奖励模型进行模型反馈,称为RLAIF
  • 定期关注领域内新的方法和最佳实践

http://www.ppmy.cn/news/1158595.html

相关文章

短视频剪辑矩阵系统开发解决的市场工具难点?

短视频剪辑矩阵系统开发源码----源头搭建 一、源码技术构建源码部署搭建交付之---- 1.需要协助系统完成部署、接口全部正常接入、系统正常运行多久?7个工作日 2.需要准备好服务器以及备案域名 3.短视频SEO模块一年项目带宽,带宽最低要求10M,…

ts使用记录

1、安装:通过管理员权权限使用cmd或者终端全局安装 npm install -g typescript2、运行: 可以通过tsc命令运行hello.ts文件 tsc hello.ts3、通过vscode的run code插件去右键运行 1.先安装插件run code 2.全局安装ts-node,npm install -g ts-n…

基于 Debian 稳定分支发行版的Zephix 7 发布

导读Zephix 是一个基于 Debian 稳定版的实时 Linux 操作系统。它可以完全从可移动媒介上运行,而不触及用户系统磁盘上存储的任何文件。 Zephix 是一个基于 Debian 稳定版的实时 Linux 操作系统。它可以完全从可移动媒介上运行,而不触及用户系统磁盘上存…

【Vue】vue在Windows平台IIS的部署

系列文章 【C#】IIS平台下,WebAPI发布及异常处理 本文链接:https://blog.csdn.net/youcheng_ge/article/details/126539836 【Vue】vue2与WebApi跨域CORS问题 本文链接:https://blog.csdn.net/youcheng_ge/article/details/133808959 文章目…

python find函数

一、 find函数作用: 用于判断字符串是否含有子串; 若包含子串,则返回所在字符串第一次出现的位置索引 若不包含子串,则返回-1 二、find函数语法: 字符串.find(self, sub, startNone, endNone) 源码: sub&a…

Debian衍生桌面项目SpiralLinux12.231001发布

导读SpiralLinux 是一个从 Debian 衍生出来的桌面项目,其重点是在所有主要桌面环境中实现简洁性和开箱即用的可用性。 spiral Linux 是为刚接触 Linux 世界的人们量身定制的发行版。这是 GeckoLinux 开发人员的创意,他更喜欢保持匿名。尽管他不愿透露姓…

机器学习——学习路线

一、Pytorch Pytorch安装Pytorch基础Pytorch项目实践 二、机器学习 1、监督学习 线性回归 均方差损失推导梯度下降法归一化正则化Lasso回归&岭回归多项式回归 线性分类 逻辑回归多标签分类交叉熵损失Softmax回归SVM支持向量机 决策树 剪枝与后剪枝随机森林Adaboost…

在 GeoServer 上发布 Shapefile 文件作为 WMS 数据

首先需要有 java 环境:https://zhuhukang.blog.csdn.net/article/details/132814744 1. 数据发布 点击数据存储,添加存储Shapefile 文件数据源添加 新建矢量数据源 点击保存,然后跳转下个页面进行发布 查找选择4326,然后从数据中进行计算 查看