datax安装部署使用 windows

news/2024/11/26 4:50:22/

Datax在win10中的安装_windows安装datax_JMzz的博客-CSDN博客

DataX/userGuid.md at master · alibaba/DataX · GitHub

环境准备:

1.JDK(1.8以上,推荐1.8)

2.①Python(推荐Python2.7.X)

②Python(Python3.X.X的可以下载下面的安装包替换)

python3.0需要替换安装目录bin下的3个文件

替换文件下载:链接: 百度网盘 请输入提取码 提取码: re42

3.Apache Maven 3.x (Compile DataX) 非编译安装不需要

Python环境安装这里就不作说明,请自行下载安装。

1、下载解压

备注:我用的Python3.0.X 没有替换python相关文件

安装目录

E:\DATAX\datax

所有脚本请到 E:\DATAX\datax\bin 下执行

cmd

e:

cd E:\DATAX\datax\bin

2、自检脚本

python datax.py ../job/job.json

3、练手配置示例:从stream读取数据并打印到控制台

第一步、创建作业的配置文件(json格式)

可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

例如:

python datax.py -r streamreader -w streamwriter

返回如下

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.Please refer to the streamreader document:https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.mdPlease refer to the streamwriter document:https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.mdPlease save the following configuration as a json file and  usepython {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.{"job": {"content": [{"reader": {"name": "streamreader","parameter": {"column": [],"sliceRecordCount": ""}},"writer": {"name": "streamwriter","parameter": {"encoding": "","print": true}}}],"setting": {"speed": {"channel": ""}}}
}

根据模板配置json如下:

stream2stream.json

{"job": {"content": [{"reader": {"name": "streamreader","parameter": {"sliceRecordCount": 10,"column": [{"type": "long","value": "10"},{"type": "string","value": "hello,你好,世界-DataX"}]}},"writer": {"name": "streamwriter","parameter": {"encoding": "UTF-8","print": true}}}],"setting": {"speed": {"channel": 5}}}
}

第二步:启动DataX

$ cd {YOUR_DATAX_DIR_BIN}

$ python datax.py ./stream2stream.json

python datax.py ../job/stream2stream.json

同步结束,显示日志如下:

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2023-03-16 13:52:50.773 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2023-03-16 13:52:50.776 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2023-03-16 13:52:50.786 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-03-16 13:52:50.791 [main] INFO  Engine - the machine info  =>osInfo: Oracle Corporation 1.8 25.172-b11jvmInfo:        Windows 10 amd64 10.0cpu num:        8totalPhysicalMemory:    -0.00GfreePhysicalMemory:     -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names        [PS MarkSweep, PS Scavenge]MEMORY_NAME                    | allocation_size                | init_sizePS Eden Space                  | 256.00MB                       | 256.00MBCode Cache                     | 240.00MB                       | 2.44MBCompressed Class Space         | 1,024.00MB                     | 0.00MBPS Survivor Space              | 42.50MB                        | 42.50MBPS Old Gen                     | 683.00MB                       | 683.00MBMetaspace                      | -0.00MB                        | 0.00MB2023-03-16 13:52:50.815 [main] INFO  Engine -
{"content":[{"reader":{"name":"streamreader","parameter":{"column":[{"type":"long","value":"10"},{"type":"string","value":"hello,你好,世界-DataX"}],"sliceRecordCount":10}},"writer":{"name":"streamwriter","parameter":{"encoding":"UTF-8","print":true}}}],"setting":{"speed":{"channel":5}}
}2023-03-16 13:52:50.833 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2023-03-16 13:52:50.835 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2023-03-16 13:52:50.835 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-03-16 13:52:50.837 [main] INFO  JobContainer - Set jobId = 0
2023-03-16 13:52:50.855 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-03-16 13:52:50.856 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-03-16 13:52:50.857 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-03-16 13:52:50.857 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-03-16 13:52:50.858 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2023-03-16 13:52:50.859 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [5] tasks.
2023-03-16 13:52:50.859 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [5] tasks.
2023-03-16 13:52:50.880 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-03-16 13:52:50.889 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-03-16 13:52:50.892 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-03-16 13:52:50.900 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [5] channels for [5] tasks.
2023-03-16 13:52:50.905 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-03-16 13:52:50.906 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-03-16 13:52:50.916 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
2023-03-16 13:52:50.919 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-03-16 13:52:50.923 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
2023-03-16 13:52:50.929 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
2023-03-16 13:52:50.933 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
10      hello,你好,世界-DataX
2023-03-16 13:52:51.049 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[130]ms
2023-03-16 13:52:51.049 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[120]ms
2023-03-16 13:52:51.050 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[135]ms
2023-03-16 13:52:51.052 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[129]ms
2023-03-16 13:52:51.052 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[119]ms
2023-03-16 13:52:51.053 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-03-16 13:53:00.918 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 950 bytes | Speed 95B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.008s | Percentage 100.00%
2023-03-16 13:53:00.919 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-03-16 13:53:00.923 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-03-16 13:53:00.923 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2023-03-16 13:53:00.923 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-03-16 13:53:00.924 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: E:\DATAX\datax\hook
2023-03-16 13:53:00.925 [job-0] INFO  JobContainer -[total cpu info] =>averageCpu                     | maxDeltaCpu                    | minDeltaCpu-1.00%                         | -1.00%                         | -1.00%[total gc info] =>NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTimePS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000sPS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s2023-03-16 13:53:00.925 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-03-16 13:53:00.926 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 950 bytes | Speed 95B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.008s | Percentage 100.00%
2023-03-16 13:53:00.927 [job-0] INFO  JobContainer -
任务启动时刻                    : 2023-03-16 13:52:50
任务结束时刻                    : 2023-03-16 13:53:00
任务总计耗时                    :                 10s
任务平均流量                    :               95B/s
记录写入速度                    :              5rec/s
读出记录总数                    :                  50
读写失败总数                    :                   0

4、实际配置

这里只测试了mysql的相关配置,其他的需要继续研究

脚本格式信息可以去https://github.com/alibaba/DataX/查看

因为和clickhouse一起学习,这里可以看到支持对 CLickHouse的写

1、mysqlreader

DataX/mysqlreader/doc/mysqlreader.md at master · alibaba/DataX · GitHub

table column方式和querysql方式是冲突的。只能用一种

干货:

jdbcUrl 可以配置多个,依次检查合法性

table 可以配置多个,需保证多张表是同一schema结构?,table必须包含在connection配置单元中

1、配置一个从Mysql数据库同步抽取数据到本地的作业:

通过table column方式

mysql2stream1.json

{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "sa","column": ["ryxm","rysfz"],"splitPk": "id","connection": [{"table": ["sys_czry"],"jdbcUrl": ["jdbc:mysql://172.16.0.101:3306/qyjx_v3.1?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=GMT%2B8&&nullCatalogMeansCurrent=true&allowMultiQueries=true&rewriteBatchedStatements=true"]}]}},"writer": {"name": "streamwriter","parameter": {"print":true}}}]}
}

python datax.py ../job/mysql2stream1.json

2、配置一个自定义SQL的数据库同步任务到本地内容的作业:

通过querysql方式

mysql2stream2.json

{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "sa","connection": [{"querySql": ["SELECT ryxm,rysfz,rygh from  sys_czry;"],"jdbcUrl": ["jdbc:mysql://172.16.0.101:3306/qyjx_v3.1?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=GMT%2B8&&nullCatalogMeansCurrent=true&allowMultiQueries=true&rewriteBatchedStatements=true"]}]}},"writer": {"name": "streamwriter","parameter": {"print": true,"encoding": "UTF-8"}}}]}
}

python datax.py ../job/mysql2stream2.json

2、mysqlwrier

DataX/mysqlwriter/doc/mysqlwriter.md at master · alibaba/DataX · GitHub

1、这里使用一份从内存产生到 Mysql 导入的数据

{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "streamreader","parameter": {"column" : [{"value": "DataX","type": "string"}],"sliceRecordCount": 1000}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "insert","username": "root","password": "root","column": ["name"],"session": ["set session sql_mode='ANSI'"],"preSql": ["delete from test"],"connection": [{"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=gbk","table": ["test"]}]}}}]}
}

python datax.py ../job/stream2mysql1.json

结果:

2、这里使用一份从mysql(服务器1)产生到 Mysql(本地) 导入的数据

模拟跨服务器、数据库环境

{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "sa","connection": [{"querySql": ["SELECT ryxm,rysfz from  sys_czry;"],"jdbcUrl": ["jdbc:mysql://172.16.0.101:3306/qyjx_v3.1?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=GMT%2B8&&nullCatalogMeansCurrent=true&allowMultiQueries=true&rewriteBatchedStatements=true"]}]}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "insert","username": "root","password": "root","column": ["name","rysfz"],"session": ["set session sql_mode='ANSI'"],"preSql": ["delete from test"],"connection": [{"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=gbk","table": ["test"]}]}}}]}
}

python datax.py ../job/mysql2mysql.json

结果:

异常:

1、如果出现乱码的话

先输入CHCP 65001


http://www.ppmy.cn/news/415795.html

相关文章

充能书单|618,买什么都不如买知识!

前言 “IT有得聊”是机械工业出版社旗下IT专业资讯和服务平台,致力于帮助读者在广义的IT领域里,掌握更专业、更实用的知识与技能,快速提升职场竞争力。 点击蓝色微信名可快速关注我们。 一年一度的618又到啦!今年的618就不要乱买…

如何用手机访问电脑本地的localhost

第一步:【非常重要】!!!! 关闭电脑的杀毒软件和防火墙 第二步: 手机和电脑连在同一个局域网内. 例如:连接同一个wifi 第三步: 打开命令行窗口, 输入ipconfig,找到 IPv4 地址 第四步&…

ILRuntime是如何与Unity互相调用的

一、ILRuntime的基本介绍 ILRuntime是一个跨平台CLR实现,它可以在多个平台上运行C#代码,包括Android、iOS、Windows、Linux等等。ILRuntime的实现方式是将C#代码编译成IL代码,然后在运行时通过JIT或AOT的方式将IL代码转换为机器代码&#xf…

同一局域网手机不能通过IP访问电脑本地项目?解决办法如下

首先确保电脑防火墙或者杀毒软件关闭,因为大多数情况都是这样造成的 接下来再允许80或者一些特殊的端口访问权限 点击下一步输入自己想要访问的端口好号 其次,电脑开启热点 然后用手机连接该热点,WINR 输入cmd,打开命令行&#x…

手机和计算机错误怎么办,电脑连接手机失败怎么办

电脑连接手机失败怎么办 移动电话,或称为无线电话,通常称为手机,原本只是一种通讯工具,早期又有大哥大的俗称,是可以在较广范围内使用的便携式电话终端,下面就是小编整理的电脑连接手机失败怎么办&#xff…

电脑有必要安装杀毒软件吗?

如今互联网已经有几十年的沉淀了,真的有必要安装杀毒软件吗? 那一年它风光无限,到现在可有可无,杀毒软件渐渐被大家遗忘。伴随着5 G时代的到来,系统杀毒, CPU杀毒和主动防御的兴起,安装杀毒软件…

开热点给电脑消耗大吗_电脑用手机热点上网耗流量多不多

展开全部 电脑用手机热点消耗流量e69da5e6ba9062616964757a686964616f31333433643038非常多,除非手机流量很大,否则不建议开热点给电脑使用。 电脑上网和手机上网访问的网址不一样,手机访问的是专门给手机使用的手机网站,做过优化…

Android手机会中电脑病毒么,安卓手机中木马病毒怎么办

安卓手机是大众手机,国内小伙伴们使用最多的也就是安卓手机,安卓手机中木马病毒怎么办?很多网友都会有这样的疑问。今天佰佰安全网来给大家出出招吧。 1,一般来说,对于手机顽固木马是非常难以清理的,一般即使恢复出厂…