变形金刚图纸_变形金刚救援

news/2025/3/15 11:33:18/

变形金刚图纸

使用神经网络生成数据到文本语言 (Data to text language generation with neural networks)

In my last post, I elaborated upon my modelling approach, for the use case of RDF-to-text generation. The task is part of a larger open source project to which I am contributing for the Google summer of code program (GSoC 2020).

在上一篇文章中,我详细介绍了从RDF到文本生成的用例的建模方法。 该任务是一个更大的开源项目的一部分,我正在为Google夏季代码程序(GSoC 2020)做出贡献。

Link to previous articles and resources:

链接到以前的文章和资源:

  • Part 1: Google Summer of Code — The lift off

    第1部分: Google Summer of Code-升空

  • Part 2 :Data to text generation — Let the modelling begin!

    第2部分: 数据到文本的生成-让建模开始!

  • Github repoistory for project

    Github项目仓库

As a reminder, this project consists of transforming a knowledge base, represented by a set of RDF triples, into a natural language text, using various neural architectures trained in an end-to-end fashion. This post serves as a quick update on the current state of the project, while highlighting key ideas, inspirations, and obstacles that were faced during the course of my experimentation.

提醒一下,该项目包括使用以端到端方式训练的各种神经体系结构,将由一组RDF三元组表示的知识库转换为自然语言文本。 这篇文章是对项目当前状态的快速更新,同时重点介绍了我在实验过程中遇到的关键思想,灵感和障碍。

注意确实是您所需要的! (Attention is indeed all you need!)

Out of the successfully implemented models proposed in the previous blog post, the best performing neural architecture proved to be the Transformer. The winning model was composed of an embedding layer with a size of 128, followed by 8 layers, 4 for encoding and 4 for decoding, each equipped with a multi-head attention size of 128 units, and 8 attention heads. Dropout regularization is during training, and layer normalization is applied to the output of each encoder and decoder layers. The models were trained for 10 epochs, after which their generated outputs were evaluated using BLEU and METEOR scores (standard in machine translation, and by extension, in text generation). Upon visual inspection of the output, the transformer produced on average very decent verbalizations, such as this one, with a triple set size of 3:

在之前的博客文章中提出的成功实现的模型中,性能最佳的神经体系结构被证明是Transformer。 获胜模型由大小为128的嵌入层,随后的8层,用于编码的4层和用于解码的4层组成,每个层都具有128个单位的多头注意大小和8个注意头。 丢包正则化是在训练期间,并且层归一化应用于每个编码器和解码器层的输出。 对模型进行了10个时期的训练,然后使用BLEU和METEOR得分(机器翻译的标准,以及扩展为文本生成的标准)评估了模型的生成输出。 通过目视检查输出,该变压器平均产生了非常不错的口头表达,例如这种口头表达,其三倍大小为3:

Image for post
Transformer’s generated output, with 3 input triples
变压器产生的输出,具有3个输入三元组

In general, the transformer does pretty well up till triple set size 4. At triple set size of 6 however, we see a very different story unfold. The transformer starts to lose its focus, producing irrelevant verbalizations like:

通常,直到三倍尺寸4为止,变压器的性能都很好。但是,在三倍尺寸6的情况下,我们看到了一个截然不同的故事。 变压器开始失去焦点,产生不相关的口头表达,例如:

Image for post
Transformer’s generated output, with 6input triples
变压器产生的输出,具有6个输入三元组

However, since the other models perform even poorly (e.g. the GAT starts to ramble on nonsensically with inputs of triple set size 5, whereas the LSTM starts hallucinating words as early as at triple sets of 4), the transformer will have to do for our generator model. Due to the parallelization advantages of the transformer architecture, it also trains much faster than its recurrent counterparts, which will be crucial to evaluate the benefit of adversarial training and reinforcement learning, at later stages of the project.

但是,由于其他模型的效果甚至很差(例如,GAT开始以三组大小5的输入进行无意义的漫游,而LSTM早在三组4的情况​​下开始产生幻觉词),因此转换器必须为我们做发电机模型。 由于变压器架构具有并行化的优势,因此它的训练速度也要比其经常性的同类训练速度快得多,这对于在项目后期评估对抗训练和强化学习的益处至关重要。

那鉴别器呢? (And what about the discriminator?)

Now all armed with an appropriate generator model (which has essentially been pre-trained for 10 epochs), I decided to construct a simple transformer model that I can use as the discriminator network, during adversarial training. The idea here is to make the discriminator be able to discriminate between real and generated texts. However, there is a slight issue here. If the discriminator only receives the generated output from the generator, and compares it to real target instances, it will get confused! What it needs, is some context. What do I mean by that? Consider this : our pre-trained generator will produce very realistic looking text on many instances, which might just happen to not match the input triple set, in terms of information conveyed. Yet, how will our discriminator know that? We cannot simply show the discriminator various strings of text, perfectly correct in syntax, and semantics, and ask it to tell which one is real or fake, without giving our model any context regarding the corresponding input triple.

现在,所有人都配备了适当的生成器模型(实际上已经过10个时期的预训练),我决定构建一个简单的变压器模型,在对抗训练中可以用作鉴别器网络。 这里的想法是使鉴别器能够区分真实文本和生成文本。 但是,这里有一个小问题。 如果鉴别符仅从生成器接收生成的输出,并将其与实际目标实例进行比较,则会感到困惑! 它需要的是一些上下文。 那是什么意思 考虑一下:经过预训练的生成器将在许多实例上生成非常逼真的文本,就传递的信息而言,这些文本可能恰好与输入的三元组不匹配。 但是,我们的歧视者将如何知道呢? 我们不能简单地向辨别器显示语法,语义完全正确的各种文本字符串,并要求辨别出哪一个是真实的还是假的,而没有给我们的模型任何有关相应输入三元组的上下文。

Thus, my approach was to simply concatenate the input triples with its corresponding generated or real output, and feed the result to the discriminator. This way, the discriminator actually receives both some context (from the first part of the sequence) along with the potential corresponding text, thereby essentially performing a sort of sequence classification — i.e. whether a given RDF triple-text sequence is real, or fake. I tested out the concept by pre-training the discriminator using real triple-text sequences (labelled 1) and fake sequences I constructed by randomly concatenating input triple sets with target text instances (labelled 0). I trained my model (2 layers, 2 heads, 32 neurons with an embedding dimension of 32) on this artificially constructed dataset for 10 epochs, at which point it was able to achieve a validation accuracy of 95%. Satisfied by these initial results, I merged my generator and discriminator networks into one glorious model, and setup the adversarial training setup.

因此,我的方法是简单地将输入三元组与其对应的生成或实际输出连接起来,并将结果提供给鉴别器。 这样,鉴别器实际上就从上下文的第一部分接收了某些上下文以及潜在的对应文本,从而从本质上执行了一种序列分类-即给定的RDF三文本序列是真实的还是假的。 我通过使用实三元文本序列(标记为1)和通过将输入三元集与目标文本实例(标记为0)随机连接而构建的伪序列对训练标识符进行预训练来测试了这一概念。 我在这个人工构建的数据集上训练了我的模型(2层,2个头部,32个神经元,嵌入尺寸为32个历元),历时10个时间段,到那时它可以达到95%的验证准确性。 对这些最初的结果感到满意,我将生成器和鉴别器网络合并为一个光荣的模型,并设置了对抗训练设置。

下一个 (Up next)

In the next post, I will reveal the final results obtained through the adversarial training approach, and evaluate the utility of using reinforcement learning for the given use case. For now, I hope you enjoyed this update on the progress of my GSoC 2020 project. Stay tuned for more!

在下一篇文章中,我将揭示通过对抗训练方法获得的最终结果,并评估针对给定用例使用强化学习的效用。 目前,我希望您喜欢我的GSoC 2020项目进度的最新进展。 敬请期待更多!

Written by : Niloy Purkait

撰写者:Niloy Purkait

翻译自: https://medium.com/@niloypurkait/transformers-to-the-rescue-52d714ced3d8

变形金刚图纸


http://www.ppmy.cn/news/947710.html

相关文章

java变形金刚中文版_[原创]RC4、Base64魔改看雪CTF-变形金刚学习笔记

前言 最近在研究某某app的数据库,发现自己在so层的调试比较薄弱,专门找了看雪的CTF-变形金刚来学习。希望在用ida调试so方面有所突破。 利用国庆期间整理成笔记。技术不成熟或许描述的不够清晰请大伙见谅。 也拜读了几位大佬的文章。 工具准备ida7.0 Tra…

1024分辨率《变形金刚3》BD中英双字 高清 1080P 9G   720P 6G 下载

高清 1080P 9G 720P 6G 下载" title="1024分辨率《变形金刚3》BD中英双字 高清 1080P 9G 720P 6G 下载">

操作系统(王道)- 操作系统的特征

一、操作系统的特征 1)并发 并发和并行的概念区分 并发的概念: 2)共享 共享的概念: 并发和共享的关系: 3)虚拟 虚拟特性: 空分、时分复用技术 空分复用技术: 时分复用技术&#xff…

《变形金刚》真人电影知识普及贴-转帖

《变形金刚》其实……bug确实非常多。一些简单的例子,擎天柱在女人和男孩掉下去的时候来了一记很阴的腿招。大黄蜂借助失重神教神功成功接住男女主角,后面擎天柱也上演了一次失重神教真人表演等等。 但是……有些bug是实在的,有些bug就不那么…

基于异步FIFO的串口回环测试

文章目录 前言一、异步FIFO简介二、串口简介2.1 数据接收模块(RX)2.1 数据发送模块(TX) 三、IP核说明与配置2.1 PLL IP核2.2 FIFO IP核 四、数据关联 前言 当涉及到串口通信的硬件设计和软件开发时,进行有效的测试是至关重要的。串口回环测试是一种常见的测试方法&a…

C#基础--进程和线程的认识

C#基础–进程和线程的认识 一、基础概念 1. 什么是进程? 进程并不是物理的东西,是虚拟出来的,是一种概念。当一个程序开始运行时,它就是一个进程,进程包括运行中的程序和程序所使用到的内存和系统资源。而一个进程又是由多个线程所组成的。是一种计算机概念,是程序在运…

一定是h的方式不对阅读_41章_[黑篮]一定是H的方式不对!_七零小说网

“站住!”考完试后,花凛然终于逮住了橘莉雅。 现在已是二月底,三年级的毕业考试全部结束,许多学生还要参加自己志愿的高校考试,而花凛然已经定下是去中国,所以没事了,黄濑也被保送海常,不用继续参加别的学校考试。 橘莉雅整日不见踪影,就考试出现一下,考完立刻闪人,好几次花…

很太吧动漫邪恶游戏

自己玩当然谈比给别人好一定要请女模特来实验吗十大发生的法师答复撒旦法撒旦法撒..万法归宗