【Bug】RuntimeError: Engine loop has died

news/2024/10/26 12:44:04/

目录

  • 报错前置条件
  • 报错内容
  • 解决方案

报错前置条件

使用vllm启动qwen2.5-32b-instruct模型后发生的报错
GPU是GeForce RTX 4090 Laptop GPU
系统是Windows 11
运行系统是WSL2-Ubuntu22.04

报错内容

INFO 10-22 22:29:31 engine.py:290] Added request chat-993cbe95e73d4a1db5d1e89e433f727a.
ERROR 10-22 22:29:32 client.py:250] RuntimeError('Engine loop has died')
ERROR 10-22 22:29:32 client.py:250] Traceback (most recent call last):
ERROR 10-22 22:29:32 client.py:250]   File "/home/ai/miniconda3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop
ERROR 10-22 22:29:32 client.py:250]     await self._check_success(
ERROR 10-22 22:29:32 client.py:250]   File "/home/ai/miniconda3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 314, in _check_success
ERROR 10-22 22:29:32 client.py:250]     raise response
ERROR 10-22 22:29:32 client.py:250] RuntimeError: Engine loop has died
ERROR:    Exception in ASGI application
Traceback (most recent call last):File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 259, in __call__await wrap(partial(self.listen_for_disconnect, receive))File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 255, in wrapawait func()File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 232, in listen_for_disconnectmessage = await receive()File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receiveawait self.message_event.wait()File "/home/ai/miniconda3/lib/python3.10/asyncio/locks.py", line 214, in waitawait fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f385017b9d0During handling of the above exception, another exception occurred:Traceback (most recent call last):File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgiresult = await app(  # type: ignore[func-returns-value]File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__return await self.app(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__await super().__call__(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__await self.middleware_stack(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__raise excFile "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__await self.app(scope, receive, _send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__await self.app(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_appraise excFile "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_appawait app(scope, receive, sender)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__await self.middleware_stack(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 735, in appawait route.handle(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 288, in handleawait self.app(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 76, in appawait wrap_app_handling_exceptions(app, request)(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_appraise excFile "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_appawait app(scope, receive, sender)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 74, in appawait response(scope, receive, send)File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 252, in __call__async with anyio.create_task_group() as task_group:File "/home/ai/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 763, in __aexit__raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

解决方案

判断是内存不足导致

d$ free -htotal        used        free      shared  buff/cache   available
Mem:            15Gi       6.9Gi       8.2Gi        80Mi       435Mi       8.2Gi
Swap:          4.0Gi       4.0Gi       0.0Ki

从输出可以看到,系统总内存为 15GB,目前使用了约 6.9GB,剩余约 8.2GB 可用
交换空间(Swap)总共为 4GB,目前已全部使用,且没有可用的交换空间。
如果交换空间不足,会严重影响系统性能
要将交换空间设置为与你的物理内存相同的大小(15GB),可以按照以下步骤操作:

  1. 创建一个新的交换文件

    sudo fallocate -l 15G /swapfile
    
  2. 设置正确的权限

    sudo chmod 600 /swapfile
    
  3. 将文件设置为交换空间

    sudo mkswap /swapfile
    
  4. 启用交换文件

    sudo swapon /swapfile
    
  5. 确认交换空间已启用

    free -h
    
  6. 要使更改永久生效,请编辑 /etc/fstab 文件,添加以下行:

    sudo vim /etc/fstab
    /swapfile swap swap defaults 0 0
    :wq
    

这样,就能将交换空间设置为 15GB,性能完全发挥
如果/etc/fstab编辑后不起作用,可以将前面5个步骤的命令写入~/.bashrc


http://www.ppmy.cn/news/1542118.html

相关文章

区块链行业低迷的原因及未来发展展望

近年来,区块链行业经历了爆发式增长,也遭遇了周期性低迷。当前区块链行业的低迷状态主要由市场、技术、监管和竞争等多重因素导致。本文将探讨这些因素,并展望区块链未来的发展方向。 一、市场因素 1. 加密货币市场波动 区块链与加密货币市…

【MySQL】实战篇—项目需求分析:ER图的绘制与关系模型设计

在软件开发中,数据库是信息系统的核心部分,合理的数据库设计能够显著提高系统的性能和可维护性。 ER图(实体-关系图)是数据库设计的重要工具,它通过图形化的方式描述了数据实体及其相互关系,帮助开发者和设…

OpenIPC开源FPV之Ardupilot配置

OpenIPC开源FPV之Ardupilot配置 1. 源由2. 问题3. 分析3.1 MAVLINK_MSG_ID_RAW_IMU3.2 MAVLINK_MSG_ID_SYS_STATUS3.3 MAVLINK_MSG_ID_BATTERY_STATUS3.4 MAVLINK_MSG_ID_RC_CHANNELS_RAW3.5 MAVLINK_MSG_ID_GPS_RAW_INT3.6 MAVLINK_MSG_ID_VFR_HUD3.7 MAVLINK_MSG_ID_GLOBAL_P…

【汇编语言】第一个程序(一)—— 一个源程序从写出到执行的过程

文章目录 前言1. 第一步:编写汇编源程序2. 第二步:对源程序进行编译连接3. 第三步:执行可执行文件中的程序结语 前言 📌 汇编语言是很多相关课程(如数据结构、操作系统、微机原理)的重要基础。但仅仅从课程…

unity中的组件(Component)

在 Unity 中,组件(Component)是构成 GameObject 功能和行为的基础单元,每个 GameObject都可以附加一个或多个组件,以实现不同的功能 1. Transform 组件 描述:所有 GameObject 默认都有一个 Transform 组件…

计数问题[NOIP2013]

题目描述 试计算在区间 1 到 n 的所有整数中,数字 x(0≤x≤9)共出现了多少次?例如,在 1 到 11 中,即在 1,2,3,4,5,6,7,8,9,10,11 中,数字 1 出现了 4 次。 输入格式 2 个整数 n,x,之…

Ceph入门到精通-Osd db扩容

ceph-bluestore-tool 是一个在 BlueStore 实例上执行低级管理操作的实用程序。 以下命令可用于 ceph-bluestore-tool 语法 ceph-bluestore-tool COMMAND [ --dev DEVICE … ] [ -i OSD_ID ] [ --path OSD_PATH ] [ --out-dir DIR ] [ --log-file | -l filename ] [ --deep ]c…

【机器学习】13. 决策树

决策树的构造 策略:从上往下学习通过recursive divide-and-conquer process(递归分治过程) 首先选择最好的变量作为根节点,给每一个可能的变量值创造分支。然后将样本放进子集之中,从每个分支的节点拓展一个。最后&a…