如何禁用 PySpark 在运行时打印信息

news/2025/1/17 4:55:53/

我已经开始使用 PySpark。PySpark 的版本是3.5.4,它是通过 进行安装的pip

这是我的代码:

from pyspark.sql import SparkSession
pyspark = SparkSession.builder.master("local[8]").appName("test").getOrCreate()
df = pyspark.read.csv("test.csv", header=True)print(df.show())

每次我运行该程序时使用:

python test_01.py

它打印有关 pyspark 的所有信息(黄色):

如何禁用它,这样它就不会打印它。

解决办法:

  1. 不同的线条来自不同的来源。
    • Windows(“成功:... ”),
    • spark启动器shell /批处理脚本(“ ::加载设置 ::... ”)
    • 使用 log4j2 记录核心 spark 代码
    • 使用核心火花代码打印System.out.println()
  2. 不同的行写入不同的fds(std-out,std-error,log4j日志文件)
  3. Spark 针对不同目的提供了不同的“脚本”(pysparkspark-submitspark-shell、 ...)。您可能在这里使用了错误的脚本。

基于你想要达成的目标,最简单的方法是使用spark-submit,它适用于无界面执行:

CMD> cat test.py
from pyspark.sql import SparkSession
spark = SparkSession.builder \.config('spark.jars.packages', 'io.delta:delta-core_2.12:2.4.0') \  # just to produce logs.getOrCreate()spark.createDataFrame(data=[(i,) for i in range(5)], schema='id: int').show()CMD> spark-submit test.py
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+CMD>

要了解谁在向哪个 fd 写入什么是一个繁琐的过程,它甚至可能因平台(Linux/Windows/Mac)而异。我不推荐这样做。但如果你真的想要,这里有一些提示:

  1. 从您的原始代码:

print(df.show())

  • df.show()打印df到标准输出并返回None
  • print(df.show())打印None到标准输出。
  1. 运行使用python而不是spark-submit
CMD> python test.py
:: loading settings :: url = jar:file:/C:/My/.venv/Lib/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: C:\Users\e679994\.ivy2\cache
The jars for the packages stored in: C:\Users\e679994\.ivy2\jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-499a6ac1-b961-44da-af58-de97e4357cbf;1.0confs: [default]found io.delta#delta-core_2.12;2.4.0 in centralfound io.delta#delta-storage;2.4.0 in centralfound org.antlr#antlr4-runtime;4.9.3 in central
:: resolution report :: resolve 171ms :: artifacts dl 8ms:: modules in use:io.delta#delta-core_2.12;2.4.0 from central in [default]io.delta#delta-storage;2.4.0 from central in [default]org.antlr#antlr4-runtime;4.9.3 from central in [default]---------------------------------------------------------------------|                  |            modules            ||   artifacts   ||       conf       | number| search|dwnlded|evicted|| number|dwnlded|---------------------------------------------------------------------|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-499a6ac1-b961-44da-af58-de97e4357cbfconfs: [default]0 artifacts copied, 3 already retrieved (0kB/7ms)
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+CMD> SUCCESS: The process with PID 38136 (child process of PID 38196) has been terminated.
SUCCESS: The process with PID 38196 (child process of PID 35316) has been terminated.
SUCCESS: The process with PID 35316 (child process of PID 22336) has been terminated.CMD>
  1. 重定向stdout(fd=1)到一个文件:
CMD> python test.py > out.txt 2> err.txt
CMD> 
CMD> cat out.txt
:: loading settings :: url = jar:file:/C:/My/.venv/Lib/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+SUCCESS: The process with PID 25080 (child process of PID 38032) has been terminated.
SUCCESS: The process with PID 38032 (child process of PID 21176) has been terminated.
SUCCESS: The process with PID 21176 (child process of PID 38148) has been terminated.
SUCCESS: The process with PID 38148 (child process of PID 32456) has been terminated.
SUCCESS: The process with PID 32456 (child process of PID 31656) has been terminated.CMD> 
  1. 重定向stderr(fd=2)到一个文件:
CMD> cat err.txt
Ivy Default Cache set to: C:\Users\kash\.ivy2\cache
The jars for the packages stored in: C:\Users\kash\.ivy2\jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-597f3c82-718d-498b-b00e-7928264c307a;1.0confs: [default]found io.delta#delta-core_2.12;2.4.0 in centralfound io.delta#delta-storage;2.4.0 in centralfound org.antlr#antlr4-runtime;4.9.3 in central
:: resolution report :: resolve 111ms :: artifacts dl 5ms:: modules in use:io.delta#delta-core_2.12;2.4.0 from central in [default]io.delta#delta-storage;2.4.0 from central in [default]org.antlr#antlr4-runtime;4.9.3 from central in [default]---------------------------------------------------------------------|                  |            modules            ||   artifacts   ||       conf       | number| search|dwnlded|evicted|| number|dwnlded|---------------------------------------------------------------------|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-597f3c82-718d-498b-b00e-7928264c307aconfs: [default]0 artifacts copied, 3 already retrieved (0kB/5ms)CMD> 
  1. SUCCESS: The process with PID
    • 注意这是在 AFTER 之后打印的CMD>。即在完成以下执行后由“Windows”打印python
    • 你不会在 Linux 上看到它。例如从我的 Linux 机器上:
kash@ub$ python test.py
19:15:50.037 [main] WARN  org.apache.spark.util.Utils - Your hostname, ub resolves to a loopback address: 127.0.1.1; using 192.168.177.129 instead (on interface ens33)
19:15:50.049 [main] WARN  org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/home/kash/workspaces/spark-log-test/.venv/lib/python3.9/site-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/kash/.ivy2/cache
The jars for the packages stored in: /home/kash/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-7d38e7a2-a0e5-47fa-bfda-2cb5b8b443e0;1.0confs: [default]found io.delta#delta-core_2.12;2.4.0 in spark-listfound io.delta#delta-storage;2.4.0 in spark-listfound org.antlr#antlr4-runtime;4.9.3 in spark-list
:: resolution report :: resolve 390ms :: artifacts dl 10ms:: modules in use:io.delta#delta-core_2.12;2.4.0 from spark-list in [default]io.delta#delta-storage;2.4.0 from spark-list in [default]org.antlr#antlr4-runtime;4.9.3 from spark-list in [default]---------------------------------------------------------------------|                  |            modules            ||   artifacts   ||       conf       | number| search|dwnlded|evicted|| number|dwnlded|---------------------------------------------------------------------|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-7d38e7a2-a0e5-47fa-bfda-2cb5b8b443e0confs: [default]0 artifacts copied, 3 already retrieved (0kB/19ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
+---+                                                                           
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+kash@ub$

http://www.ppmy.cn/news/1563795.html

相关文章

音视频入门基础:RTP专题(3)——SDP简介

一、引言 会话描述协议(Session Description Protocol,简称SDP)描述了流媒体的初始化参数,包含音视频的编解码器、源地址和时间信息。SDP协议从不会被单独使用,而依赖于RTP和RTSP等协议。SDP也作为WebRTC的组件之一&a…

论文高级GPT指令推荐

一、科研选题与方向确认二、文献综述与整理 一、科研选题与方向确认 头脑风暴选题指令:Brainstorm potential research topics within [你的研究领域], focusing on areas with limited existing research and significant potential impact. For each topic, prov…

基于Java的愤怒的小鸟游戏的设计与实现【源码+文档+部署讲解】

目录 摘要 Abstract 1 绪论 1.1 游戏开发的背景 1.2 典型的Java游戏介绍 1.2.1 Minecraft介绍 1.2.2 Super Mario Bros介绍 1.2.3 The Sims介绍 1.3 游戏开发的意义 2 开发环境 2.1 开发语言 2.2 开发工具 2.3 JDK介绍 2.4 Java Awt介绍 2.5 Java Swing 介绍 2.…

ZCC1923替代BOS1921Piezo Haptic Driver with Digital Front End

FEATURES • High-Voltage Low Power Piezo Driver o Drive 100nF at 190VPP and 250Hz with 490mW o Drives Capacitive Loads up to 1000nF o Energy Recovery o Differential Output o Small Solution Footprint, QFN & WLCSP • Low Quiescent Current: SHUTDOWN; …

vulnhub靶场【IA系列】之Keyring

前言 靶机:IA-Keyring,IP地址为192.168.10.11 攻击:kali,IP地址为192.168.10.2 都采用虚拟机,网卡为桥接模式 文章中涉及的靶场以及相关工具,放置在网盘中,链接https://pan.quark.cn/s/55d71…

安全开发 javaEE应用 servlet 路由技术 生命周期 JDBC数据库操作

前言 什么是javaEE ? javaEE就是基本的企业开发语言 什么是servlet(翻译是小服务程序 或者 是服务连接器) 就是本地服务器和http协议的中间件 Servlet 路由 Servlet是运行在Web服务器或应用服务器上的程序,它是作为来自Web浏览器或其他HTTP客户端的请求和HT…

算法15、双指针(归并排序两种做法)

🌰1、two-Sum严格递增序列 晴问算法 因为是有序的序列(严格递增)所以可以考虑用二分查找的思路! 二分查找变体版-双指针。 因为严格递增的序列特性,让i, j(或者left, right)的枚举互相牵制,因而我们可以…

AI大模型开发—1、百度的千帆大模型调用(文心一言的底层模型,ENRIE等系列)、API文档目的地

文章目录 前言一、千帆大模型平台简介二、百度平台官网初使用1、平台注册和使用2、应用注册 并 申请密钥3、开启千帆大模型 API调用a、API文档b、 前言 本章旨在为读者奉献一份实用的操作指南,深入探索如何高效利用百度千帆大模型平台的卓越功能。我们将从账号注册…