【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

server/2024/10/18 20:21:12/

文章目录

  • 1. Street-View Image Generation from a Bird’s-Eye View Layout
    • 1.1 Problem introduction
    • 1.2 Why
    • 1.3 How
    • 1.4 My takeaway
  • 2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
    • 2.1 What
    • 2.2 Why
    • 2.3 How
    • 2.4 My takeaway
  • 3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)
    • 3.1 What
    • 3.2 Why
    • 3.3 How
    • 3.4 My takeaway

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

From the title of this paper, we know it bound a relation from Bev(Bird’s-Eye View) to Street view image.

在这里插入图片描述

Concretely, the input (Bev) is a two-dimensional representation of a three-dimensional environment from a top perspective. In the BEV diagram, squares of different colors represent different objects or road features, such as vehicles, pedestrians, lane lines, etc. And green square means an ego vehicle that has three cameras in front.

The task is to generate three street-view images aligned to the Bev according to the relative position among these square objects.

As for the concept of “layout”, it should consider the effects of these factors:

  • Cameras with an overlapping field-of-view (FoV) must ensure overlapping content is correctly shown
  • The visual styling of the scene also needs to be consistent such that all virtual views appear to be created in the same geographical area (e.g., urban vs. rural), at the same time of day, with the same weather conditions, and so on.
  • In addition to this consistency, the images must correspond to the HD
    map, faithfully reproducing the specified road layout, lane lines, and vehicle locations.

1.2 Why

It is the first attempt to explore the generative side of BEV perception for driving scenes.

1.3 How

  1. Methods
    在这里插入图片描述
    As shown in this pipeline, the Bev layout and source images were encoded as an input of the autoregressive transformer collaborating with direction and camera information to help the understanding of space. New mv-images were output.

  2. Experiments
    Three metrics are used.
    在这里插入图片描述
    FID represents the diversity and quality of generated images. Road mIoU and Vehicle mIoU can be used to represent the overlapping to verify the relative position in the Bev inputs.
    Scene edit was achieved by the change of Bev layout:
    在这里插入图片描述

1.4 My takeaway

  1. How to utilize the ability of an autoregressive transformer!!! Why do we use it other than others?
  2. I have known about what is Bev.

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

An editable 3D generative model using object bounding boxes without semantic annotation as layout prior, allowing for high-quality scene synthesis and flexible user control of both the camera and scene objects.

2.2 Why

  • Existing generative models focus on individual objects, lacking the ability to handle non-trivial scenes.

  • Some works like GSN can only generate scenes, without object-level editing. That is because of the lack of explicit object definition in NeRF.

  • GIRAFFE explicitly composites object-centric radiance fields to support object-level control. Yet, it works poorly on mixed scenes due to the absence of proper spatial priors.

  • Interesting refer:

    17: Layout-transformer: Layout generation and completion with self-attention.

    26: Layout-gan: Generating graphic layouts with wireframe discriminators.

    58: Blockplanner: City block generation with vectorized graph representation.

2.3 How

在这里插入图片描述

Bounding boxes as layout priors to generate the objects, combined with the generated background were used in neural rendering. Meanwhile, an extra object discriminator for local discrimination is added, leading to better object-level supervision.

2.4 My takeaway

  1. Is it possible to cancel the manually marked bbox and automatically identify and regenerate the corresponding area in Gaussian?

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

Incorporating an equivariant radiance field with the guidance of a BEV map, this method allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Understood as the superposition of patches in a bev:
在ddd

3.2 Why

  1. Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis techniques since 3D scenes usually hold complex spatial configurations and consist of many objects at varying scales.
  2. Previous approaches often relied on scene graphs, facing limitations in processing due to unstructured topology.
  3. DiscoScene introduces complexity in interpreting the entire scene and
    faces scalability challenges when using Bbox.
  4. BEV maps could specify the composition and scales of objects clearly but lack insights into the detailed visual appearance of the objects. Recent attempts like InfiniCity and SceneDreamer try to avoid the ambiguity of BEV maps, but they are inefficiency.

3.3 How

在这里插入图片描述

To integrate the prior information provided by the BEV map into the radiation field, the researchers introduced a generator U U U, which can generate a 2D feature map based on BEV map conditions. Builder U U U adopts a network structure that combines U-Net architecture and StyleGAN blocks.

3.4 My takeaway

  1. Confused about how to use this U-Net, need some other time to supplement background knowledge. 🤡

http://www.ppmy.cn/server/11849.html

相关文章

sqli-labs靶场学习(一)

一.知识点 1.数据库 数据库是一个用于存储和管理数据的仓库。数据按照特定的格式存储,可以对数据库中的数据进行增加、修改、删除和查询操作。数据库的本质是一个文件系统,按照一定的逻辑结构组织数据,以方便高效地访问和维护。 2.数据库管…

SQL--DDL数据定义语言(Oracle)

文章目录 数据定义语言创建表删除表清空表修改表修改表名,列名修改字段属性添加字段删除字段 数据定义语言 是针对数据库对象操作的语言 数据库对象:表,约束,视图,索引,序列… CREATE ---创建数据库对…

Cesium简单案例

一、Cesium组件 1、HTML <template><div id"cesiumContainer"><!-- 地图工具栏 --><ul class"mapTools"><li v-for"item in toolsData" :key"item.id" click"toolsClick(item)"><!-- 显…

C++中string的用法总结+底层剖析

前言&#xff1a;在C语言中&#xff0c;我们经常使用字符串进行一系列操作&#xff0c;经常使用的函数如下&#xff1a;增删改查 &#xff08;自己造轮子&#xff09;&#xff0c;C中设计出string容器&#xff0c;STL库中为我们提供了以上函数&#xff0c;所以我们使用string容…

简化图卷积 笔记

1 Title Simplifying Graph Convolutional Networks&#xff08;Felix Wu、Tianyi Zhang、Amauri Holanda de、 Souza Jr、Christopher Fifty、Tao Yu、Kilian Q. Weinberger&#xff09;【ICML 2019】 2 Conclusion This paper proposes a simplified graph convolutional m…

力扣经典150题解析之三十四:有效的数独

目录 解题思路与实现 - 有效的数独问题描述示例解题思路算法实现复杂度分析测试与验证总结 感谢阅读&#xff01; 解题思路与实现 - 有效的数独 问题描述 判断一个 9 x 9 的数独是否有效&#xff0c;根据以下规则验证已经填入的数字是否有效&#xff1a; 数字 1-9 在每一行只…

信奥赛课件 - 第一章 - 第2节 - 计算机系统的基础结构 - 习题

计算机基础知识习题 单选题 题目1 【NOIP2008】微型计算机中&#xff0c;控制器的基本功能是( )。 A. 控制机器各个部件协调工作 B. 实现算术运算和逻辑运算 C. 获取外部信息 D. 存放程序和数据 解析 选项 A. 控制机器各个部件协调工作 是正确的。 A 正确。控制器的主要功…

《AI聊天类工具之五——Copilot》

一.简介 官网:Microsoft Copilot: 你的日常 AI 助手 Copilot是微软在Windows 11操作系统中引入的一款先进的AI助手。这款工具集成在操作系统的侧边栏中,旨在帮助用户完成各种任务。它依托于底层大语言模型(LLM),用户只需通过简单的语言指示,Copilot就能够创建类似人类撰…