人生苦短我用Python pandas文件格式转换

embedded/2024/10/20 12:16:06/

人生苦短我用Python pandas文件格式转换

  • 前言
  • 示例1 excel与csv互转
  • 常用格式的方法
    • Flat file
    • Excel
    • JSON
    • XML
  • 示例2 常用格式转换
    • 简要需求
    • 依赖
    • export方法
    • main方法
  • 附其它格式的方法
    • HTML
    • Pickling
    • Clipboard
    • Latex
    • HDFStore: PyTables (HDF5)
    • Feather
    • Parquet
    • ORC
    • SAS
    • SPSS
    • SQL
    • Google BigQuery
    • STATA

前言

pandas支持多种文件格式,通过pandasIO方法,可以实现不同格式之间的互相转换。本文通过excelcsv互转的示例和pandas的支持的文件格式,实现一个简单的文件格式转换的功能。

示例1 excel与csv互转

在前文实现了excel转csv,即通过pandasexcelcsv,反过来也可以将csv转为excel

下面是excelcsv互转的示例代码:

  • excel转csv
python">def export_csv(input_file, output_path):# 创建ExcelFile对象with pd.ExcelFile(input_file) as xls:# 获取工作表名称列表for i, sheet_name in enumerate(xls.sheet_names):# 读取工作表并转换为DataFramedf = pd.read_excel(xls, sheet_name=sheet_name)output_file = os.path.join(output_path, f'{i + 1}-{sheet_name}.csv')# 将DataFrame中的数据写入CSV文件。df.to_csv(output_file, index=False)
  • csv转为excel
python">def export_excel(input_file, output_file):if not output_file:input_path = pathlib.Path(input_file)output_path = input_path.parent / (input_path.stem + '.xlsx')output_file = str(output_path)df = pd.read_csv(input_file)df.to_excel(output_file, index=False)

常用格式的方法

以下来自pandas官网 Input/Outout部分

Flat file

方法说明
read_table(filepath_or_buffer, *[, sep, …])Read general delimited file into DataFrame.
read_csv(filepath_or_buffer, *[, sep, …])Read a comma-separated values (csv) file into DataFrame.
DataFrame.to_csv([path_or_buf, sep, na_rep, …])Write object to a comma-separated values (csv) file.
read_fwf(filepath_or_buffer, *[, colspecs, …])Read a table of fixed-width formatted lines into DataFrame.

Excel

方法说明
read_excel(io[, sheet_name, header, names, …])Read an Excel file into a pandas DataFrame.
DataFrame.to_excel(excel_writer, *[, …])Write object to an Excel sheet.
ExcelFile(path_or_buffer[, engine, …])Class for parsing tabular Excel sheets into DataFrame objects.
ExcelFile.book
ExcelFile.sheet_names
ExcelFile.parse([sheet_name, header, names, …])Parse specified sheet(s) into a DataFrame.
方法说明
Styler.to_excel(excel_writer[, sheet_name, …])Write Styler to an Excel sheet.
方法说明
ExcelWriter(path[, engine, date_format, …])Class for writing DataFrame objects into excel sheets.

JSON

方法说明
read_json(path_or_buf, *[, orient, typ, …])Convert a JSON string to pandas object.
json_normalize(data[, record_path, meta, …])Normalize semi-structured JSON data into a flat table.
DataFrame.to_json([path_or_buf, orient, …])Convert the object to a JSON string.
方法说明
build_table_schema(data[, index, …])Create a Table schema from data.

XML

方法说明
read_xml(path_or_buffer, *[, xpath, …])Read XML document into a DataFrame object.
DataFrame.to_xml([path_or_buffer, index, …])Render a DataFrame to an XML document.

示例2 常用格式转换

根据常用格式的IO方法,完成一个常用格式的格式转换功能。

第一步从指定格式的文件中读取数据,并将其转换为 DataFrame 对象。

第二部将 DataFrame 中的数据写入指定格式的文件中。

简要需求

  • 根据输入输出的文件后缀名,自动进行格式转换,若格式不支持输出提示。
  • 支持的格式csvxlsxjsonxml

依赖

pip install pandas
pip install openpyxl
pip install lxml

export方法

python">def export(input_file, output_file):if not os.path.isfile(input_file):print('Input file does not exist')returnif input_file.endswith('.csv'):df = pd.read_csv(input_file, encoding='utf-8')elif input_file.endswith('.json'):df = pd.read_json(input_file, encoding='utf-8')elif input_file.endswith('.xlsx'):df = pd.read_excel(input_file)elif input_file.endswith('.xml', encoding='utf-8'):df = pd.read_xml(input_file)else:print('Input file type not supported')returnif output_file.endswith('.csv'):df.to_csv(output_file, index=False)elif output_file.endswith('.json'):df.to_json(output_file, orient='records', force_ascii=False)elif output_file.endswith('.xlsx'):df.to_excel(output_file, index=False)elif output_file.endswith('.xml'):df.to_xml(output_file, index=False)elif output_file.endswith('.html'):df.to_html(output_file, index=False, encoding='utf-8')else:print('Output file type not supported')return

main方法

python">def main(argv):input_path = Noneoutput_path = Nonetry:shortopts = "hi:o:"longopts = ["ipath=", "opath="]opts, args = getopt.getopt(argv, shortopts, longopts)except getopt.GetoptError:print('usage: export.py -i <inputpath> -o <outputpath>')sys.exit(2)for opt, arg in opts:if opt in ("-h", "--help"):print('usage: export.py -i <inputpath> -o <outputpath>')sys.exit()elif opt in ("-i", "--ipath"):input_path = argelif opt in ("-o", "--opath"):output_path = argprint(f'输入路径为:{input_path}')print(f'输出路径为:{output_path}')export(input_path, output_path)

附其它格式的方法

以下来自pandas官网 Input/Outout部分

HTML

方法说明
read_html(io, *[, match, flavor, header, …])Read HTML tables into a list of DataFrame objects.
DataFrame.to_html([buf, columns, col_space, …])Render a DataFrame as an HTML table.
方法说明
Styler.to_html([buf, table_uuid, …])Write Styler to a file, buffer or string in HTML-CSS format.

Pickling

方法说明
read_pickle(filepath_or_buffer[, …])Load pickled pandas object (or any object) from file.
DataFrame.to_pickle(path, *[, compression, …])Pickle (serialize) object to file.

Clipboard

方法说明
read_clipboard([sep, dtype_backend])Read text from clipboard and pass to read_csv().
DataFrame.to_clipboard(*[, excel, sep])Copy object to the system clipboard.

Latex

方法说明
DataFrame.to_latex([buf, columns, header, …])Render object to a LaTeX tabular, longtable, or nested table.
方法说明
Styler.to_latex([buf, column_format, …])Write Styler to a file, buffer or string in LaTeX format.

HDFStore: PyTables (HDF5)

方法说明
read_hdf(path_or_buf[, key, mode, errors, …])Read from the store, close it if we opened it.
HDFStore.put(key, value[, format, index, …])Store object in HDFStore.
HDFStore.append(key, value[, format, axes, …])Append to Table in file.
HDFStore.get(key)Retrieve pandas object stored in file.
HDFStore.select(key[, where, start, stop, …])Retrieve pandas object stored in file, optionally based on where criteria.
HDFStore.info()Print detailed information on the store.
HDFStore.keys([include])Return a list of keys corresponding to objects stored in HDFStore.
HDFStore.groups()Return a list of all the top-level nodes.
HDFStore.walk([where])Walk the pytables group hierarchy for pandas objects.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

Feather

方法说明
read_feather(path[, columns, use_threads, …])Load a feather-format object from the file path.
DataFrame.to_feather(path, **kwargs)Write a DataFrame to the binary Feather format.

Parquet

方法说明
read_parquet(path[, engine, columns, …])Load a parquet object from the file path, returning a DataFrame.
DataFrame.to_parquet([path, engine, …])Write a DataFrame to the binary parquet format.

ORC

方法说明
read_orc(path[, columns, dtype_backend, …])Load an ORC object from the file path, returning a DataFrame.
DataFrame.to_orc([path, engine, index, …])Write a DataFrame to the ORC format.

SAS

方法说明
read_sas(filepath_or_buffer, *[, format, …])Read SAS files stored as either XPORT or SAS7BDAT format files.

SPSS

方法说明
read_spss(path[, usecols, …])Load an SPSS file from the file path, returning a DataFrame.

SQL

方法说明
read_sql_table(table_name, con[, schema, …])Read SQL database table into a DataFrame.
read_sql_query(sql, con[, index_col, …])Read SQL query into a DataFrame.
read_sql(sql, con[, index_col, …])Read SQL query or database table into a DataFrame.
DataFrame.to_sql(name, con, *[, schema, …])Write records stored in a DataFrame to a SQL database.

Google BigQuery

方法说明
read_gbq(query[, project_id, index_col, …])(DEPRECATED) Load data from Google BigQuery.

STATA

方法说明
read_stata(filepath_or_buffer, *[, …])Read Stata file into DataFrame.
DataFrame.to_stata(path, *[, convert_dates, …])Export DataFrame object to Stata dta format.
方法说明
StataReader.data_labelReturn data label of Stata file.
StataReader.value_labels()Return a nested dict associating each variable name to its value and label.
StataReader.variable_labels()Return a dict associating each variable name with corresponding label.
StataWriter.write_file()Export DataFrame object to Stata dta format.

http://www.ppmy.cn/embedded/108859.html

相关文章

Apache DolphinScheduler在Cisco Webex的应用与优化实践

引言 我叫李庆旺&#xff0c;是Cisco Webex的一名软件工程师&#xff0c;同时也是Apache DolphinScheduler&#xff08;以下简称DS&#xff09;的Committer。 在过去的两年里&#xff0c;公司基于Apache DolphinScheduler进行了多项持续改进和创新&#xff0c;以更好地适应我们…

Android Fragment 学习备忘

1.fragment的动态添加与管理&#xff0c;fragment生命周期在后面小节&#xff1a;https://www.bilibili.com/video/BV1Ng411K7YP/?p37&share_sourcecopy_web&vd_source982a7a7c05972157e8972c41b546f9e4https://www.bilibili.com/video/BV1Ng411K7YP/?p37&share_…

Fabric.js全面介绍:强大的交互式图形编辑框架

在前端开发中&#xff0c;图形界面的创建与编辑一直是开发者们关注的重点。随着Web技术的不断发展&#xff0c;HTML5 Canvas 提供了一个强大的画布功能&#xff0c;然而其API相对低级&#xff0c;操作复杂图形和高级交互时显得力不从心。这时&#xff0c;Fabric.js 应运而生&am…

springboot+vue+mybatis智慧篮球馆预约+PPT+论文+讲解+售后

近些年来&#xff0c;随着科技的飞速发展&#xff0c;互联网的普及逐渐延伸到各行各业中&#xff0c;给人们生活带来了十分的便利&#xff0c;智慧篮球馆预约利用计算机网络实现信息化管理&#xff0c;使整个智慧篮球馆预约的发展和服务水平有显著提升。 本文拟采用Eclipse开发…

Docker核心原理解读:深度剖析Docker Daemon,掌控容器背后的引擎

容器技术已经成为现代应用程序开发和部署中的核心工具&#xff0c;而在Docker生态系统中&#xff0c;Docker Daemon 扮演着至关重要的角色。它不仅是Docker架构的核心&#xff0c;还负责容器的管理、镜像的操作、资源的分配等复杂任务。本文将深入解读Docker Daemon的工作原理&…

网络学习-eNSP配置VRRP

虚拟路由冗余协议(Virtual Router Redundancy Protocol&#xff0c;简称VRRP) VRRP广泛应用在边缘网络中&#xff0c;是一种路由冗余协议&#xff0c;它的设计目标是支持特定情况下IP数据流量失败转移不会引起混乱&#xff0c;允许主机使用单路由器&#xff0c;以及即使在实际…

一键云迁移:利用VMware PowerCLI将OVA虚拟机顺利迁移到AWS

哈喽大家好&#xff0c;欢迎来到虚拟化时代君&#xff08;XNHCYL&#xff09;。 “ 大家好&#xff0c;我是虚拟化时代君&#xff0c;一位潜心于互联网的技术宅男。这里每天为你分享各种你感兴趣的技术、教程、软件、资源、福利…&#xff08;每天更新不间断&#xff0c;福利…

YOLOv5的数据增强详解

文章目录 前言数据增强1.Mosaic&#xff08;马赛克增强&#xff09;2.MixUp&#xff08;混合增强&#xff09;3.Letterbox&#xff08;自适应缩放技术&#xff09;4.Random perspective&#xff08;随机透视变换&#xff09;5.Albumentations模块5.HSV变换6.Flip&#xff08;翻…