MiDaS、ZoeDepth、Depth-Anything ai算法深度图估计

devtools/2024/10/11 8:41:50/

1、MiDaS

参考:
https://github.com/isl-org/MiDaS
https://pytorch.org/hub/intelisl_midas_v2/
https://colab.research.google.com/github/pytorch/pytorch.github.io/blob/master/assets/hub/intelisl_midas_v2.ipynb#scrollTo=5A32CL3tocrZ

在这里插入图片描述

代码

import cv2
import torch
import urllib.requestimport matplotlib.pyplot as plturl, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
urllib.request.urlretrieve(url, filename)model_type = "DPT_Large"     # MiDaS v3 - Large     (highest accuracy, slowest inference speed)
#model_type = "DPT_Hybrid"   # MiDaS v3 - Hybrid    (medium accuracy, medium inference speed)
#model_type = "MiDaS_small"  # MiDaS v2.1 - Small   (lowest accuracy, highest inference speed)midas = torch.hub.load("intel-isl/MiDaS", model_type)

在这里插入图片描述

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
midas.to(device)
midas.eval()midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")if model_type == "DPT_Large" or model_type == "DPT_Hybrid":transform = midas_transforms.dpt_transform
else:transform = midas_transforms.small_transformimg = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)input_batch = transform(img).to(device)with torch.no_grad():prediction = midas(input_batch)prediction = torch.nn.functional.interpolate(prediction.unsqueeze(1),size=img.shape[:2],mode="bicubic",align_corners=False,).squeeze()output = prediction.cpu().numpy()plt.imshow(output)
# plt.show()

在这里插入图片描述

在这里插入图片描述

视频

import cv2
import torch
import numpy as np
import torch.nn.functional as F# 假设你已经有一个训练好的模型 'midas'
# midas = torch.load('path_to_your_model')
# midas.eval()# 加载视频
video_path = '20240807_024802.mp4'
cap = cv2.VideoCapture(video_path)# 获取视频的宽度和高度
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))# 获取视频的帧率
fps = cap.get(cv2.CAP_PROP_FPS)# 创建 VideoWriter 对象
output_path = 'output_video2.mp4'
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height), isColor=True)while cap.isOpened():ret, frame = cap.read()if not ret:break# 将帧转换为模型输入格式img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)input_batch = transform(img).to(device)with torch.no_grad():prediction = midas(input_batch)prediction = torch.nn.functional.interpolate(prediction.unsqueeze(1),size=frame.shape[:2],mode="bicubic",align_corners=False,).squeeze()output = prediction.cpu().numpy()# 可视化结果output = (output - output.min()) / (output.max() - output.min())  # 归一化到 [0, 1]# output = (output * 255).astype(np.uint8)# output = cv2.cvtColor(output, cv2.COLOR_GRAY2BGR)output = cv2.applyColorMap((output * 255).astype(np.uint8), cv2.COLORMAP_JET)# 写入视频out.write(output)cap.release()
out.release()
cv2.destroyAllWindows()

在这里插入图片描述

2、ZoeDepth

https://huggingface.co/spaces/shariqfarooq/ZoeDepth

https://github.com/isl-org/ZoeDepth
https://colab.research.google.com/github/isl-org/ZoeDepth/blob/main/notebooks/ZoeDepth_quickstart.ipynb#scrollTo=OJ9bY7rrVuAq

报错:ZoeDepth: Unexpected key(s) in state_dict: “core.core.pretrained.model.blocks.0.attn.relative_position_index”
解决: pip install timm==0.6.7

代码:
git clone https://github.com/isl-org/ZoeDepth.git
cd ZoeDepth

import torch
from zoedepth.utils.misc import get_image_from_url, colorize
from PIL import Image
import matplotlib.pyplot as pltzoe = torch.hub.load(".", "ZoeD_N", source="local", pretrained=True, version='ZoeD_N-Nov13-2023')
zoe = zoe.to('cuda')#@title Predicting depth from a url image
img_url = "http://static1.squarespace.com/static/6213c340453c3f502425776e/62f2452bc121595f4d87c713/62f3c63c5eec2b12a333f851/1661442296756/Screenshot+2022-08-10+at+15.55.27.png?format=1500w" #@param {type:"string"}
img = get_image_from_url(img_url)
depth = zoe.infer_pil(img)colored_depth = colorize(depth)
fig, axs = plt.subplots(1,2, figsize=(15,7))
for ax, im, title in zip(axs, [img, colored_depth], ['Input', 'Predicted Depth']):ax.imshow(im)ax.axis('off')ax.set_title(title)

在这里插入图片描述
本地图片

img = Image.open("/content/loong.png").convert("RGB")  # load
# depth_numpy = zoe.infer_pil(image)  # as numpydepth = zoe.infer_pil(img)colored_depth = colorize(depth)
fig, axs = plt.subplots(1,2, figsize=(15,7))
for ax, im, title in zip(axs, [img, colored_depth], ['Input', 'Predicted Depth']):ax.imshow(im)ax.axis('off')ax.set_title(title)

在这里插入图片描述
看着原图与深度图尺寸一致
在这里插入图片描述

本地运行完整代码:

import torchfrom zoedepth.utils.misc import get_image_from_url, colorize
from PIL import Image
import matplotlib.pyplot as plt# Zoe_N
model_zoe_n = torch.hub.load(r".", "ZoeD_N", source="local", pretrained=True)img = Image.open(r"C:\Users\loong\Downloads\loong.png").convert("RGB")  # load
# depth_numpy = zoe.infer_pil(image)  # as numpydepth = model_zoe_n.infer_pil(img)print(depth.shape,depth)colored_depth = colorize(depth)
fig, axs = plt.subplots(1,2, figsize=(15,7))
for ax, im, title in zip(axs, [img, colored_depth], ['Input', 'Predicted Depth']):ax.imshow(im)ax.axis('off')ax.set_title(title)# Save the figure
output_filename = "output_image.png"
plt.savefig(output_filename)# Show the figure
plt.show()

彩色

import numpy as npimg = Image.open("/content/loong.png").convert("RGB")  # loaddepth = zoe.infer_pil(img)# 归一化深度图
depth_normalized = (depth - depth.min()) / (depth.max() - depth.min())
# 反转颜色映射
depth_normalized = 1 - depth_normalized
# 应用颜色映射
colored_depth = plt.get_cmap('jet')(depth_normalized)
# 将 RGBA 转换为 RGB
colored_depth = (colored_depth[:, :, :3] * 255).astype(np.uint8)fig, axs = plt.subplots(1, 2, figsize=(15, 7))
for ax, im, title in zip(axs, [img, colored_depth], ['Input', 'Predicted Depth']):ax.imshow(im)ax.axis('off')ax.set_title(title)plt.show()

在这里插入图片描述

transformers加载使用
https://huggingface.co/docs/transformers/main/en/model_doc/zoedepth

from transformers import pipeline
from PIL import Image
import requests# url = "http://images.cocodataset.org/val2017/000000039769.jpg"
# image = Image.open(requests.get(url, stream=True).raw)image = Image.open(r"C:\Users\loong\Downloads\right1.png")pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti")
result = pipe(image)
depth = result["depth"]

在这里插入图片描述
在这里插入图片描述

3、Depth-Anything

https://github.com/DepthAnything/Depth-Anything-V2

from transformers import pipeline
from PIL import Imagepipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")
image = Image.open(r"C:\Users\loong\Downloads\right1.png")
result = pipe(image)
depth = result["depth"]

在这里插入图片描述


http://www.ppmy.cn/devtools/97098.html

相关文章

如何建立一个既能快速记录又易于回顾的笔记系统?

在快节奏的学习和工作中,能够快速记录和回顾信息变得尤为重要。尤其对于编程学习者来说,构建一个高效、有序的笔记系统不仅可以提高学习效率,还能帮助我们在未来轻松回溯知识要点。本文将详细探讨如何打造一个既快速记录又易于回顾的笔记系统…

2024深圳国际汽车改装与定制技术展览会

2024深圳国际汽车改装与定制技术展览会时间:2024年12月4日-6日地点:深圳国际会展中心 详询主办方陆先生 I38(前三位) I82I(中间四位) 9I72(后面四位) 展会简介:伴随着…

大屏畅玩游戏,小米SU7迎来大升级,可连接蓝牙游戏手柄

嘿,朋友们!今天我要和你们分享一个令人兴奋的消息。 小米SU7汽车迎来了一次重大升级,这不仅仅是技术上的飞跃,更是为驾驶体验带来了革命性的变化。想象一下,在宽敞的车内,通过连接蓝牙游戏手柄&#xff0c…

后端开发刷题 | 寻找峰值【二分法】

描述 给定一个长度为n的数组nums,请你找到峰值并返回其索引。数组可能包含多个峰值,在这种情况下,返回任何一个所在位置即可。 1.峰值元素是指其值严格大于左右相邻值的元素。严格大于即不能有等于 2.假设 nums[-1] nums[n] −∞ 3.对于…

数据结构---单链表实现

单链表是什么 我的理解是“特殊的数组”,通过访问地址来连接起来 1怎么创建链表 ----通过结构体(成员有存入数据的data和指向下一个节点的地址的指针(结构体指针)next 初始架构---DataType 对应存入数据类型,此处的N…

【C++】匿名对象知识点

#define _CRT_SECURE_NO_WARNINGS 1 #include <iostream> using namespace std;class Solution { public:int Sum_Solution(int n){//...return n;} }; int main() {Solution s1; //s1的生命周期在main函数中s1.Sum_Solution(10);Solution(); //匿名对象生命周期就在这一…

STM32 定时器 输入捕获

用于测频率测占空比 IC(Input Capture)输入捕获 输入捕获模式下&#xff0c;当通道输入引脚出现指定电平跳变&#xff08;上升沿/下降沿&#xff09;时&#xff0c;会让当前CNT的值将被锁存到CCR中&#xff0c;可用于测量PWM波形的频率、占空比、脉冲间隔、电平持续时间等参数…

Linux:进程替换

什么是进程替换&#xff1f; 我们的可执行程序&#xff0c;在运行起来的时候就上一个进程 一个进程就会有他的内核数据结构代码和数据 把一个已经成型的进程的代码和数据替换掉&#xff0c;这就叫进程替换 也就是可以通过系统调用把当前进程替换位我们需要的进程 那么替换…