UnityWebGL使用sherpa-ncnn实时语音识别

server/2024/11/17 5:22:49/

k2-fsa/sherpa-ncnn:在没有互联网连接的情况下使用带有 ncnn 的下一代 Kaldi 进行实时语音识别。支持iOS、Android、Raspberry Pi、VisionFive2、LicheePi4A等。 (github.com)

如果是PC端可以直接使用ssssssilver大佬的 https://github.com/ssssssilver/sherpa-ncnn-unity.git

我这边要折腾的是WebGL版本的,所以修改了一番

1、WebSocket,客户端使用了psygames/UnityWebSocket: :whale: The Best Unity WebSocket Plugin for All Platforms. (github.com)

using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using UnityWebSocket;public class uSherpaWebGL : MonoBehaviour
{IWebSocket ws;public Text text;Queue<string> msgs = new Queue<string>();// Start is called before the first frame updatevoid Start(){ws = new WebSocket("ws://127.0.0.1:9999");ws.OnOpen += OnOpen;ws.OnMessage += OnMessage;ws.OnError += OnError;ws.OnClose += OnClose;ws.ConnectAsync();}// Update is called once per framevoid Update(){if (msgs.Count > 0){string msg = msgs.Dequeue();text.text += msg;}}byte[] desArray;public void OnData(float[] input){Debug.Log("input.Length:" + input.Length);SendData(input);}void SendData(float[] input){var desArraySize = Buffer.ByteLength(input);IntPtr srcArrayPtr = Marshal.UnsafeAddrOfPinnedArrayElement(input, 0);desArray = new byte[desArraySize];Marshal.Copy(srcArrayPtr, desArray, 0, desArraySize);if (ws != null && ws.ReadyState == WebSocketState.Open){ws.SendAsync(desArray);}}void OnOpen(object sender, OpenEventArgs e){Debug.Log("WS connected!");}void OnMessage(object sender, MessageEventArgs e){if (e.IsBinary){string str = Encoding.UTF8.GetString(e.RawData);Debug.Log("WS received message: " + str);msgs.Enqueue(str);}else if (e.IsText){}}void OnError(object sender, ErrorEventArgs e){Debug.Log("WS error: " + e.Message);}void OnClose(object sender, CloseEventArgs e){Debug.Log(string.Format("Closed: StatusCode: {0}, Reason: {1}", e.StatusCode, e.Reason));}private void OnApplicationQuit(){if (ws != null && ws.ReadyState != WebSocketState.Closed){ws.CloseAsync();}}
}

服务器端使用了Fleck

// See https://aka.ms/new-console-template for more information
using Fleck;
using System.Text;namespace uSherpaServer
{internal class Program{// 声明配置和识别器变量static SherpaNcnn.OnlineRecognizer recognizer;static SherpaNcnn.OnlineStream onlineStream;static string tokensPath = "tokens.txt";static string encoderParamPath = "encoder_jit_trace-pnnx.ncnn.param";static string encoderBinPath = "encoder_jit_trace-pnnx.ncnn.bin";static string decoderParamPath = "decoder_jit_trace-pnnx.ncnn.param";static string decoderBinPath = "decoder_jit_trace-pnnx.ncnn.bin";static string joinerParamPath = "joiner_jit_trace-pnnx.ncnn.param";static string joinerBinPath = "joiner_jit_trace-pnnx.ncnn.bin";static int numThreads = 1;static string decodingMethod = "greedy_search";static string modelPath;static float sampleRate = 16000;static IWebSocketConnection client;static void Main(string[] args){//需要将此文件夹拷贝到exe所在的目录modelPath = Environment.CurrentDirectory + "/sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16";// 初始化配置SherpaNcnn.OnlineRecognizerConfig config = new SherpaNcnn.OnlineRecognizerConfig{FeatConfig = { SampleRate = sampleRate, FeatureDim = 80 },ModelConfig = {Tokens = Path.Combine(modelPath,tokensPath),EncoderParam =  Path.Combine(modelPath,encoderParamPath),EncoderBin =Path.Combine(modelPath, encoderBinPath),DecoderParam =Path.Combine(modelPath, decoderParamPath),DecoderBin = Path.Combine(modelPath, decoderBinPath),JoinerParam = Path.Combine(modelPath,joinerParamPath),JoinerBin =Path.Combine(modelPath,joinerBinPath),UseVulkanCompute = 0,NumThreads = numThreads},DecoderConfig = {DecodingMethod = decodingMethod,NumActivePaths = 4},EnableEndpoint = 1,Rule1MinTrailingSilence = 2.4F,Rule2MinTrailingSilence = 1.2F,Rule3MinUtteranceLength = 20.0F};// 创建识别器和在线流recognizer = new SherpaNcnn.OnlineRecognizer(config);onlineStream = recognizer.CreateStream();StartWebServer();Update();Console.ReadLine();}static void StartWebServer(){//存储连接对象的池var connectSocketPool = new List<IWebSocketConnection>();//创建WebSocket服务端实例并监听本机的9999端口var server = new WebSocketServer("ws://127.0.0.1:9999");//开启监听server.Start(socket =>{//注册客户端连接建立事件socket.OnOpen = () =>{client = socket;Console.WriteLine("Open");//将当前客户端连接对象放入连接池中connectSocketPool.Add(socket);};//注册客户端连接关闭事件socket.OnClose = () =>{client = null;Console.WriteLine("Close");//将当前客户端连接对象从连接池中移除connectSocketPool.Remove(socket);};//注册客户端发送信息事件socket.OnBinary = message =>{float[] floatArray = new float[message.Length / 4];Buffer.BlockCopy(message, 0, floatArray, 0, message.Length);// 将采集到的音频数据传递给识别器onlineStream.AcceptWaveform(sampleRate, floatArray);};});}static string lastText = "";static void Update(){while (true){// 每帧更新识别器状态if (recognizer.IsReady(onlineStream)){recognizer.Decode(onlineStream);}var text = recognizer.GetResult(onlineStream).Text;bool isEndpoint = recognizer.IsEndpoint(onlineStream);if (!string.IsNullOrWhiteSpace(text) && lastText != text){if (string.IsNullOrWhiteSpace(lastText)){lastText = text;if (client != null){client.Send(Encoding.UTF8.GetBytes(text));//Console.WriteLine("text1:" + text);}}else{if (client != null){client.Send(Encoding.UTF8.GetBytes(text.Replace(lastText, "")));lastText = text;}}}if (isEndpoint){if (!string.IsNullOrWhiteSpace(text)){if (client != null){client.Send(Encoding.UTF8.GetBytes("。"));}// Console.WriteLine("text2:" + text);}recognizer.Reset(onlineStream);//Console.WriteLine("Reset");}Thread.Sleep(200); // ms}}}
}

2、Unity录音插件使用了uMicrophoneWebGL 绑定DataEvent事件实时获取话筒数据(float数组)

最后放上工程地址

客户端 uSherpa: fork from https://github.com/ssssssilver/sherpa-ncnn-unity.git改成 Unity WebGL版

服务器端 GitHub - xue-fei/uSherpaServer: uSherpaServer 给Unity提供流式语音识别的websocket服务


http://www.ppmy.cn/server/32480.html

相关文章

【机器学习】集成方法---Boosting之AdaBoost

一、Boosting的介绍 1.1 集成学习的概念 1.1.1集成学习的定义 集成学习是一种通过组合多个学习器来完成学习任务的机器学习方法。它通过将多个单一模型&#xff08;也称为“基学习器”或“弱学习器”&#xff09;的输出结果进行集成&#xff0c;以获得比单一模型更好的泛化性…

39.乐理基础-拍号-认识音符

拍号是一个分数的形式&#xff0c;如下图篮色的圈圈里的东西&#xff0c;但它的实际意义和分数没什么关系&#xff0c;只是外观上是一个分数的形式 单独拿出拍号&#xff0c;如下图&#xff1a; 然后接下来只要搞懂什么是 Y分音符、一拍、小节就可以了。 音符&#xff1a; 控…

论文《一种修复流程挖掘事件日志中缺失活动标签的深度学习方法》翻译

论文《A Deep Learning Approach for Repairing Missing Activity Labels in Event Logs for Process Mining》翻译 A Deep Learning Approach for Repairing Missing Activity Labels in Event Logs for Process Mining翻译

键盘更新计划

作为 IT 搬砖人&#xff0c;一直都认为键盘没有什么太大关系。 每次都是公司发什么用什么。 但随着用几年后&#xff0c;发现现在的键盘经常出问题&#xff0c;比如说调节音量的时候通常莫名其妙的卡死&#xff0c;要不就是最大音量要不就是最小音量。 按键 M 不知道什么原因…

安卓中对象序列化面试问题及回答

1. 什么是对象的序列化&#xff1f; 答&#xff1a; 序列化是将对象转换为字节流的过程&#xff0c;以便将其存储在文件、数据库或通过网络传输。反序列化则是将字节流重新转换为对象的过程。 2. 为什么在 Android 开发中需要对象的序列化&#xff1f; 答&#xff1a; 在 An…

后端方案设计文档结构模板可参考

文章目录 1 方案设计文档整体结构2 方案详细设计2.1 概要设计2.2 详细设计方案2.2.1 需求分析2.2.2 业务流程设计2.2.3 抽象类&#xff1a;实体对象建模2.2.4 接口设计2.2.5 存储设计 1 方案设计文档整体结构 一&#xff0c;现状&#xff1a;把项目的基本情况和背景都说清楚&a…

美国国防部数据网格参考架构概述(上)

文章目录 前言一、概述二、DRMA基本概念三、DRMA的能力视图与运行视图前言 美国国防部正在努力成为“一个以数据为中心的组织,以速度和规模优势使用数据,从而获得作战优势并提高效率。”企业数据网格服务(又称数据集成层)是美国国防部首席数字与人工智能办公室(CDAO)为支…

AST原理(反混淆)

一、AST原理 jscode var a "\u0068\u0065\u006c\u006c\u006f\u002c\u0041\u0053\u0054";在上述代码中&#xff0c;a 是一个变量&#xff0c;它被赋值为一个由 Unicode 转义序列组成的字符串。Unicode 转义序列在 JavaScript 中以 \u 开头&#xff0c;后跟四个十六进…