Tauri跨端笔记实战(4) - 如何实现系统级截图

前言

Tauri 跨端笔记实战项目是基于 Notegen 开源项目，本系列深度解析如何运用Tauri框架开发跨平台AI笔记应用。涵盖核心技术选型、架构设计、典型场景开发及常见问题解决方案，通过代码级演示带您掌握集成AI能力的全流程开发技巧。

你可以通过本系列教程跟随我逐步搭建起一款跨端的笔记应用，也可以通过 NoteGen 源码来独立学习。

本文收录于《Tauri 开源日记》专栏，主要记录使用 Tauri 框架时的开发经验和避坑指南。

NoteGen 应用截图：
在这里插入图片描述

原理分析

本期主要讲解如何实现一个系统级别的截图功能，这是实现笔记应用中截图记录的功能的前提。整个截图的流程是，通过快捷键或点击按钮进行截图，框选区域实现部分截图，然后根据 OCR 识别进行文字提取，最后保存下文本和截图文件。

要实现屏幕上的截图，仅靠前端是无法实现的，还需要借助 rust 来实现。整个实现原理是，通过 rust 进行屏幕截图，并储存为一张临时图片，通知前端读取图片，在前端实现框选功能，首先需要创建一个新的窗口，然后框选后将选取的范围信息通知给 rust 进行裁切图片，随后保存。

屏幕截图

Rust 实现截图，我使用的是 XCap 库，它支持 Linux(X11,Wayland)、MacOS 与 Windows，所以移动端并不能使用它来实现。XCap 支持截图与视频录制，截图支持屏幕截图、窗口截图，这里我们先实现屏幕截图，窗口截图可以在后期实现更加优秀的截图体验。

在 src-tauri/src/ 下创建一个 screenshot.rs 文件，用来编写截图程序：

rust">use tauri::{path::BaseDirectory, AppHandle, Manager};
use xcap::{image, Monitor};#[allow(dead_code)]
#[tauri::command]
pub fn screenshot(app: AppHandle) -> String {let monitors = Monitor::all().unwrap();let mut path    = String::new();for monitor in monitors {let current_monitor = app.get_webview_window("main").unwrap().current_monitor().unwrap().unwrap();let current_monitor_name = current_monitor.name().unwrap().to_string();if monitor.name() == current_monitor_name {let image = monitor.capture_image().unwrap();// 获取 app data 目录let file_path = app.path().resolve("temp_screenshot.png", BaseDirectory::AppData).unwrap();image.save(&file_path).unwrap();path = file_path.to_str().unwrap().to_string();};}path/* `std::string::String` value */
}#[allow(dead_code)]
#[tauri::command]
pub fn screenshot_save(app: AppHandle, x: u32, y: u32, width: u32, height: u32) -> String {let file_path = app.path().resolve("temp_screenshot.png", BaseDirectory::AppData).unwrap();let image = image::open(&file_path).unwrap();let image = image.crop_imm(x, y, width, height);let timestamp = chrono::Local::now().format("%Y%m%d%H%M%S").to_string();let save_path = app.path().resolve(format!("screenshot/{}.png", &timestamp),BaseDirectory::AppData,).unwrap();image.save(&save_path).unwrap();std::fs::remove_file(&file_path).unwrap();let file_name = format!("{}.png", timestamp);file_name
}

触发屏幕截图时调用 screenshot，它将屏幕进行截图，保存为 BaseDirectory::AppData 目录下的 temp_screenshot.png。屏幕截图时可以注意，获取的是 monitors，这代表可以实现多屏幕截图，这里可以在未来进行优化，我们现在只对主屏幕进行截图。

screenshot_save 则是在前端框选结束后调用的方法，通过 x、y、width、height 进行对 temp_screenshot.png 图片的裁切，保存图片时我才用了时间戳的方式保存，因为这些都是临时文件，不会进行同步工作，所以无需考虑是否会重复的问题。

然后就是在 main.rs 中注册这两个方法，使得前端可以调用：

rust">use screenshot::{screenshot, screenshot_save};fn main() {tauri::Builder::default()// ....invoke_handler(tauri::generate_handler![screenshot, screenshot_save,])// ...
}

这样我们就完成了 rust 的代码编写，即使你没有 rust 的开发经验，这些代码也是非常易懂的。

框选截图

拿到屏幕截图后，还需要对其进行框选操作，这里就需要前端来实现了。其原理是创建一个新的 WebviewWindow，使其窗口最大化，然后背景则是读取刚才的截图文件，框选功能可以使用一些现成的库来实现，我使用的是 react-image-crop ，将框选后的结果传递给 rust 即可。

首先要做的是创建一个新的窗口：

const currentWindow = getCurrentWebviewWindow()
await currentWindow.hide()await invoke('screenshot')const monitor = await currentMonitor();if (!monitor) return;const webview = new WebviewWindow('screenshot', {url: '/screenshot',decorations: false,
});webview.setPosition(monitor?.position)
webview.setSize(monitor?.size)webview.onCloseRequested(async () => {if (!await currentWindow.isVisible()) {await currentWindow.show()} else {await currentWindow.setFocus()}unlisten()
})

这里进行了几个操作，获取当前显示器的大小和位置，随后创建一个新的窗口，decorations 的意思是创建一个无边框的窗口，然后将当前的应用窗口隐藏，在截图后将应用窗口在展示，这是为了避免截图时被应用的窗口挡住。

首先在 src/app/ 下创建一个新的页面 screenshot/page.tsx :

'use client'
import { LocalImage } from "@/components/local-image"
import { Button } from "@/components/ui/button"
import { invoke } from "@tauri-apps/api/core"
import { getCurrentWebviewWindow } from '@tauri-apps/api/webviewWindow'
import { Check } from "lucide-react"
import React, { useEffect } from "react"
import { useState } from "react"
import ReactCrop, { type Crop } from 'react-image-crop'
import { register, isRegistered, unregister } from '@tauri-apps/plugin-global-shortcut';
import 'react-image-crop/dist/ReactCrop.css'export default function Page() {const [crop, setCrop] = useState<Crop>()const [y, setY] = useState(0)const [scale, setScale] = useState(0)async function setScreen() {const innerPosition = await getCurrentWebviewWindow().innerPosition()const scaleFactor = await getCurrentWebviewWindow().scaleFactor()setY(innerPosition.y / scaleFactor)setScale(scaleFactor)}async function success() {await unregister('Esc');const path = await invoke('screenshot_save', {x: (crop?.x || 0) * scale,y: ((crop?.y || 0) + y) * scale,width: (crop?.width || 0) * scale,height: (crop?.height || 0) * scale})await getCurrentWebviewWindow().emit('save-success', path)await getCurrentWebviewWindow().close()}async function initRegister() {const isEscRegistered = await isRegistered('Esc');if (isEscRegistered) {await unregister('Esc');}await register('Esc', async (e) => {if (e.state === 'Released') {await unregister('Esc');const window = getCurrentWebviewWindow()await window.close()}});}useEffect(() => {initRegister()}, [])function Toolbar() {return (<><Button className="absolute bottom-2 right-2" onClick={success} size="icon"><Check /></Button></>)}return (<div className="flex h-screen w-screen overflow-hidden"><ReactCrop
crop={crop} onChange={c => setCrop(c)} ruleOfThirds={true} renderSelectionAddon={Toolbar}><LocalImage onLoad={setScreen} className="w-screen"
style={{ transform: `translateY(-${y}px)` }} src="/temp_screenshot.png" alt="" /></ReactCrop></div>)
}

这里需要注意的是屏幕的物理像素和逻辑像素是不一样的，比如 4k 屏幕的物理像素就很高，而逻辑像素就很低，如果计算错误则会导致选区未知错误，所以需要通过 getCurrentWebviewWindow().scaleFactor() 来进行转换。

截图操作结束后，调用 invoke('screenshot_save', ...) 将结果传递给 rust 进行处理，同时关闭此窗口。

如此我们就实现一次系统级的截图功能，在后续的文章中我讲带大家学习如何对截图进行 OCR 文字提取。