变形金刚图纸

使用神经网络生成数据到文本语言 (Data to text language generation with neural networks)

In my last post, I elaborated upon my modelling approach, for the use case of RDF-to-text generation. The task is part of a larger open source project to which I am contributing for the Google summer of code program (GSoC 2020).

在上一篇文章中，我详细介绍了从RDF到文本生成的用例的建模方法。该任务是一个更大的开源项目的一部分，我正在为Google夏季代码程序(GSoC 2020)做出贡献。

Link to previous articles and resources:

链接到以前的文章和资源：

Part 1: Google Summer of Code — The lift off
第1部分： Google Summer of Code-升空
Part 2 :Data to text generation — Let the modelling begin!
第2部分：数据到文本的生成-让建模开始！
Github repoistory for project
Github项目仓库

As a reminder, this project consists of transforming a knowledge base, represented by a set of RDF triples, into a natural language text, using various neural architectures trained in an end-to-end fashion. This post serves as a quick update on the current state of the project, while highlighting key ideas, inspirations, and obstacles that were faced during the course of my experimentation.

提醒一下，该项目包括使用以端到端方式训练的各种神经体系结构，将由一组RDF三元组表示的知识库转换为自然语言文本。这篇文章是对项目当前状态的快速更新，同时重点介绍了我在实验过程中遇到的关键思想，灵感和障碍。

注意确实是您所需要的！ (Attention is indeed all you need!)

Out of the successfully implemented models proposed in the previous blog post, the best performing neural architecture proved to be the Transformer. The winning model was composed of an embedding layer with a size of 128, followed by 8 layers, 4 for encoding and 4 for decoding, each equipped with a multi-head attention size of 128 units, and 8 attention heads. Dropout regularization is during training, and layer normalization is applied to the output of each encoder and decoder layers. The models were trained for 10 epochs, after which their generated outputs were evaluated using BLEU and METEOR scores (standard in machine translation, and by extension, in text generation). Upon visual inspection of the output, the transformer produced on average very decent verbalizations, such as this one, with a triple set size of 3:

在之前的博客文章中提出的成功实现的模型中，性能最佳的神经体系结构被证明是Transformer。获胜模型由大小为128的嵌入层，随后的8层，用于编码的4层和用于解码的4层组成，每个层都具有128个单位的多头注意大小和8个注意头。丢包正则化是在训练期间，并且层归一化应用于每个编码器和解码器层的输出。对模型进行了10个时期的训练，然后使用BLEU和METEOR得分(机器翻译的标准，以及扩展为文本生成的标准)评估了模型的生成输出。通过目视检查输出，该变压器平均产生了非常不错的口头表达，例如这种口头表达，其三倍大小为3：

Image for post — Transformer’s generated output, with 3 input triples

In general, the transformer does pretty well up till triple set size 4. At triple set size of 6 however, we see a very different story unfold. The transformer starts to lose its focus, producing irrelevant verbalizations like:

通常，直到三倍尺寸4为止，变压器的性能都很好。但是，在三倍尺寸6的情况下，我们看到了一个截然不同的故事。变压器开始失去焦点，产生不相关的口头表达，例如：

However, since the other models perform even poorly (e.g. the GAT starts to ramble on nonsensically with inputs of triple set size 5, whereas the LSTM starts hallucinating words as early as at triple sets of 4), the transformer will have to do for our generator model. Due to the parallelization advantages of the transformer architecture, it also trains much faster than its recurrent counterparts, which will be crucial to evaluate the benefit of adversarial training and reinforcement learning, at later stages of the project.

但是，由于其他模型的效果甚至很差(例如，GAT开始以三组大小5的输入进行无意义的漫游，而LSTM早在三组4的情况下开始产生幻觉词)，因此转换器必须为我们做发电机模型。由于变压器架构具有并行化的优势，因此它的训练速度也要比其经常性的同类训练速度快得多，这对于在项目后期评估对抗训练和强化学习的益处至关重要。

那鉴别器呢？ (And what about the discriminator?)

Now all armed with an appropriate generator model (which has essentially been pre-trained for 10 epochs), I decided to construct a simple transformer model that I can use as the discriminator network, during adversarial training. The idea here is to make the discriminator be able to discriminate between real and generated texts. However, there is a slight issue here. If the discriminator only receives the generated output from the generator, and compares it to real target instances, it will get confused! What it needs, is some context. What do I mean by that? Consider this : our pre-trained generator will produce very realistic looking text on many instances, which might just happen to not match the input triple set, in terms of information conveyed. Yet, how will our discriminator know that? We cannot simply show the discriminator various strings of text, perfectly correct in syntax, and semantics, and ask it to tell which one is real or fake, without giving our model any context regarding the corresponding input triple.

现在，所有人都配备了适当的生成器模型(实际上已经过10个时期的预训练)，我决定构建一个简单的变压器模型，在对抗训练中可以用作鉴别器网络。这里的想法是使鉴别器能够区分真实文本和生成文本。但是，这里有一个小问题。如果鉴别符仅从生成器接收生成的输出，并将其与实际目标实例进行比较，则会感到困惑！它需要的是一些上下文。那是什么意思考虑一下：经过预训练的生成器将在许多实例上生成非常逼真的文本，就传递的信息而言，这些文本可能恰好与输入的三元组不匹配。但是，我们的歧视者将如何知道呢？我们不能简单地向辨别器显示语法，语义完全正确的各种文本字符串，并要求辨别出哪一个是真实的还是假的，而没有给我们的模型任何有关相应输入三元组的上下文。

Thus, my approach was to simply concatenate the input triples with its corresponding generated or real output, and feed the result to the discriminator. This way, the discriminator actually receives both some context (from the first part of the sequence) along with the potential corresponding text, thereby essentially performing a sort of sequence classification — i.e. whether a given RDF triple-text sequence is real, or fake. I tested out the concept by pre-training the discriminator using real triple-text sequences (labelled 1) and fake sequences I constructed by randomly concatenating input triple sets with target text instances (labelled 0). I trained my model (2 layers, 2 heads, 32 neurons with an embedding dimension of 32) on this artificially constructed dataset for 10 epochs, at which point it was able to achieve a validation accuracy of 95%. Satisfied by these initial results, I merged my generator and discriminator networks into one glorious model, and setup the adversarial training setup.

因此，我的方法是简单地将输入三元组与其对应的生成或实际输出连接起来，并将结果提供给鉴别器。这样，鉴别器实际上就从上下文的第一部分接收了某些上下文以及潜在的对应文本，从而从本质上执行了一种序列分类-即给定的RDF三文本序列是真实的还是假的。我通过使用实三元文本序列(标记为1)和通过将输入三元集与目标文本实例(标记为0)随机连接而构建的伪序列对训练标识符进行预训练来测试了这一概念。我在这个人工构建的数据集上训练了我的模型(2层，2个头部，32个神经元，嵌入尺寸为32个历元)，历时10个时间段，到那时它可以达到95％的验证准确性。对这些最初的结果感到满意，我将生成器和鉴别器网络合并为一个光荣的模型，并设置了对抗训练设置。

下一个 (Up next)

In the next post, I will reveal the final results obtained through the adversarial training approach, and evaluate the utility of using reinforcement learning for the given use case. For now, I hope you enjoyed this update on the progress of my GSoC 2020 project. Stay tuned for more!

在下一篇文章中，我将揭示通过对抗训练方法获得的最终结果，并评估针对给定用例使用强化学习的效用。目前，我希望您喜欢我的GSoC 2020项目进度的最新进展。敬请期待更多！

Written by : Niloy Purkait

撰写者：Niloy Purkait