头歌平台——大数据技术—

有问题自行解决，本文档仅用于记录本人课程学习的过程

大数据上机请先阅读注意事项

md文档用户可以按住ctrl + 鼠标左键跳转至注意事项 ,Pdf用户请直接点击

文章目录

大数据上机请先阅读注意事项
@[toc]
注意事项此项必看
大数据技术概述
大数据应用

Linux 系统的安装和使用
Linux 操作系统
Linux 初体验
Linux 常用命令
Linux 查询命令帮助语句

Hadoop 的安装和使用
章节测验

HDFS
小节
第一题

章节
第一题
第二题
第三题
第四题
第五题
第六题
第七题
第八题
第九题
第十题

HBASE
小节 2题
第一题
第二题

小节 5题
第一题
第二题
第三题
第四题
第五题

章节
第一题
第二题
第三题
第四题
第五题

NoSql
小节 4题
第一题
第二题
第三题
第四题

小节 3题
第一题
第二题
第三题

MapReduce
小节
章节
第一题
第二题
第三题

Hive（本章存在一定机会报错，具体解决办法见下文）
小节
小节
章节

Spark(有逃课版，嫌勿用)
小节 2题
第一题
第二题

小节（一个逃课版，一个走过程）
章节 3题（现只更新逃课版）
第一题
第二题
第三题

已发现报错：

注意事项此项必看

注释一定要看！
个别题目选择逃课做法(标题中会声明),嫌勿用
相关题目中涉及的服务请自行启动
代码执行过程中出现问题请自行解决，或者释放资源重新开始
评测之前请先自测运行
启动 $ha d oo p$ 服务时，尽量使用 $s t a r t - a ll . s h$ 命令，尤其是 MapReduce章节只可以使用 $s t a r t - a ll . s h$  ,否则运行不出结果
针对一些建表语句请仔细认真看清楚需要复制的内容是什么，有些是多行一句，请务必准确复制并使用
找题请查看你所使用阅读器提供的目录
$hi v e$ 章节存在可能出现的报错给出了遇到相同报错的解决办法

启动 Hadoop 、Zookeeper、HBase 服务

zkServer.sh start
start-dfs.sh
start-hbase.sh

启动 Hadoop 、hive 服务尽量使用 $s t a r t - a ll . s h$ 启动 Hadoop

start-all.sh
hive --service metastore # 看到卡住在 “SLF4J”开头的 就终止一下(ctrl c) 之后执行下面的
hive --service hiveserver2 # 这个也会卡 终止一下 按服务启动了认为就行 指令是网上搜的 觉得不对的绕行

大数据技术概述

大数据应用

选择题答案：D D D ABCD BCD

Linux 系统的安装和使用

Linux 操作系统

Linux 初体验

#!/bin/bash#在以下部分写出完成任务的命令
#*********begin*********#
cd /
ls -a
#********* end *********#

Linux 常用命令

#!/bin/bash#在以下部分写出完成任务的命令
#*********begin*********#
touch newfile
mkdir newdir
cp newfile newfileCpy
mv newfileCpy newdir
#********* end *********#

Linux 查询命令帮助语句

#!/bin/bash#在以下部分写出完成任务的命令
#*********begin*********#
man fopen
#********* end *********#

Hadoop 的安装和使用

章节测验

root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# hdfs dfs -mkdir -p /user/hadoop/test
root@educoder:~# hdfs dfs -put ~/.bashrc /user/hadoop/test
root@educoder:~# hdfs dfs -get /user/hadoop/test /app/hadoop

HDFS

小节

第一题

root@educoder:~# start-dfs.sh

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.io.IOUtils;import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;/*************** Begin ***************/public class HDFSUtils {public static void main(String[] args) {// HDFS通信地址String hdfsUri = "hdfs://localhost:9000";// HDFS文件路径String[] inputFiles = {"/a.txt", "/b.txt", "/c.txt"};// 输出路径String outputFile = "/root/result/merged_file.txt";// 创建Hadoop配置对象Configuration conf = new Configuration();try {// 创建Hadoop文件系统对象FileSystem fs = FileSystem.get(new Path(hdfsUri).toUri(), conf);// 创建输出文件OutputStream outputStream = fs.create(new Path(outputFile));// 合并文件内容for (String inputFile : inputFiles) {mergeFileContents(fs, inputFile, outputStream);}// 关闭流outputStream.close();// 关闭Hadoop文件系统fs.close();} catch (IOException e) {e.printStackTrace();}}/*************** End ***************/private static void mergeFileContents(FileSystem fs, String inputFile, OutputStream outputStream) throws IOException {// 打开输入文件Path inputPath = new Path(inputFile);InputStream inputStream = fs.open(inputPath);// 拷贝文件内容IOUtils.copyBytes(inputStream, outputStream, 4096, false);// 写入换行符outputStream.write(System.lineSeparator().getBytes());// 关闭流inputStream.close();}
}

章节

root@educoder:~# start-dfs.sh

第一题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 复制文件到指定路径* 若路径已存在，则进行覆盖*/public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path localPath = new Path(localFilePath);Path remotePath = new Path(remoteFilePath);/* fs.copyFromLocalFile 第一个参数表示是否删除源文件，第二个参数表示是否覆盖 */fs.copyFromLocalFile(false,true,localPath,remotePath);fs.close();}/*** 追加文件内容*/public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);/* 创建一个文件读入流 */FileInputStream in = new FileInputStream(localFilePath);/* 创建一个文件输出流，输出的内容将追加到文件末尾 */FSDataOutputStream out = fs.append(remotePath);/* 读写文件内容 */byte[] buffer = new byte[4096];int bytesRead = 0;while ((bytesRead = in.read(buffer)) > 0) {out.write(buffer, 0, bytesRead);}in.close();out.close();fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String localFilePath = "/root/test.txt";    // 本地路径String remoteFilePath = "/test.txt";    // HDFS路径String choice = "overwrite";    // 若文件存在则追加到文件末尾try {/* 判断文件是否存在 */Boolean fileExists = false;if (HDFSApi.test(conf, remoteFilePath)) {fileExists = true;System.out.println(remoteFilePath + " 已存在.");} else {System.out.println(remoteFilePath + " 不存在.");}/* 进行处理 */if ( !fileExists) { // 文件不存在，则上传HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);System.out.println(localFilePath + " 已上传至 " + remoteFilePath);} else if ( choice.equals("overwrite") ) {    // 选择覆盖HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);} else if ( choice.equals("append") ) {   // 选择追加HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);System.out.println(localFilePath + " 已追加至 " + remoteFilePath);}} catch (Exception e) {e.printStackTrace();}}
}
/*************** End ***************/

第二题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 下载文件到本地* 判断本地路径是否已存在，若已存在，则自动进行重命名*/public static void copyToLocal(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);File localFile = new File(localFilePath);/* 如果文件名存在，自动重命名(在文件名后面加上 _0, _1 ...) */if (localFile.exists()) {// 如果文件已存在，则自动进行重命名int count = 0;String baseName = localFile.getName();String parentDir = localFile.getParent();String newName = baseName;do {count++;newName = baseName + "_" + count;localFile = new File(parentDir, newName);} while (localFile.exists());}// 下载文件到本地fs.copyToLocalFile(remotePath, new Path(localFile.getAbsolutePath()));fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String localFilePath = "/usr/local/down_test/test.txt";   // 本地路径String remoteFilePath = "/test.txt";   // HDFS路径try {HDFSApi.copyToLocal(conf,remoteFilePath,localFilePath);System.out.println("下载完成");} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第三题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 读取文件内容*/public static void cat(Configuration conf, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);FSDataInputStream in = fs.open(remotePath);BufferedReader reader = new BufferedReader(new InputStreamReader(in));String line;while ((line = reader.readLine()) != null) {System.out.println(line);}reader.close();in.close();fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteFilePath = "/test.txt";    // HDFS路径try {System.out.println("读取文件: " + remoteFilePath);HDFSApi.cat(conf, remoteFilePath);System.out.println("\n读取完成");} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第四题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;/*************** Begin ***************/public class HDFSApi {/*** 显示指定文件的信息*/public static void ls(Configuration conf, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);FileStatus[] fileStatuses = fs.listStatus(remotePath);for (FileStatus s : fileStatuses) {// 获取文件路径String path = s.getPath().toString();// 获取文件权限String permission = s.getPermission().toString();// 获取文件大小long fileSize = s.getLen();// 获取文件修改时间long modificationTime = s.getModificationTime();SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");String modificationTimeStr = sdf.format(new Date(modificationTime));// 输出文件信息System.out.println("路径: " + path);System.out.println("权限: " + permission);System.out.println("时间: " + modificationTimeStr);System.out.println("大小: " + fileSize);}fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteFilePath = "/";  // HDFS路径try {HDFSApi.ls(conf, remoteFilePath);System.out.println("\n读取完成");} catch (Exception e) {e.printStackTrace();}}}/*************** End ***************/

第五题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;/*************** Begin ***************/public class HDFSApi {/*** 显示指定文件夹下所有文件的信息（递归）*/public static void lsDir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);Path dirPath = new Path(remoteDir);listFiles(fs,dirPath);fs.close();}    private static void listFiles(FileSystem fs, Path dirPath) throws IOException {FileStatus[] fileStatuses = fs.listStatus(dirPath);for (FileStatus status : fileStatuses) {if (status.isFile()) {printFileInfo(status);} else if (status.isDirectory()) {// 如果是目录，则递归处理listFiles(fs, status.getPath());}}}private static void printFileInfo(FileStatus status) {// 获取文件路径String path = status.getPath().toString();// 获取文件权限String permission = status.getPermission().toString();// 获取文件大小long fileSize = status.getLen();// 获取文件修改时间long modificationTime = status.getModificationTime();SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");String modificationTimeStr = sdf.format(modificationTime);// 输出文件信息System.out.println("路径: " + path);System.out.println("权限: " + permission);System.out.println("时间: " + modificationTimeStr);System.out.println("大小: " + fileSize);System.out.println();} /*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteDir = "/test";    // HDFS路径try {System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir);HDFSApi.lsDir(conf,remoteDir);System.out.println("读取完成");} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第六题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 创建目录*/public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.mkdirs(new Path(remoteDir));}/*** 创建文件*/public static void touchz(Configuration conf, String filePath) throws IOException {FileSystem fs = FileSystem.get(conf);fs.create(new Path(filePath)).close();}/*** 删除文件*/public static boolean rm(Configuration conf, String filePath) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.delete(new Path(filePath), false); // 第二个参数表示是否递归删除}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String filePath = "/test/create.txt";    // HDFS 路径String remoteDir = "/test";    // HDFS 目录路径try {/* 判断路径是否存在，存在则删除，否则进行创建 */if ( HDFSApi.test(conf, filePath) ) {HDFSApi.rm(conf, filePath); // 删除System.out.println("删除路径: " + filePath);} else {if ( !HDFSApi.test(conf, remoteDir) ) { // 若目录不存在，则进行创建HDFSApi.mkdir(conf, remoteDir);System.out.println("创建文件夹: " + remoteDir);}HDFSApi.touchz(conf, filePath);System.out.println("创建路径: " + filePath);}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第七题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 判断目录是否为空* true: 空，false: 非空*/public static boolean isDirEmpty(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);FileStatus[] fileStatuses = fs.listStatus(new Path(remoteDir));fs.close();return fileStatuses == null || fileStatuses.length == 0;}/*** 创建目录*/public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);boolean result = fs.mkdirs(new Path(remoteDir));fs.close();return result;}/*** 删除目录*/public static boolean rmDir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);boolean result = fs.delete(new Path(remoteDir), true); fs.close();return result;}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteDir = "/dirTest";Boolean forceDelete = false;  // 是否强制删除try {/* 判断目录是否存在，不存在则创建，存在则删除 */if ( !HDFSApi.test(conf, remoteDir) ) {HDFSApi.mkdir(conf, remoteDir); // 创建目录System.out.println("创建目录: " + remoteDir);} else {if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { // 目录为空或强制删除HDFSApi.rmDir(conf, remoteDir);System.out.println("删除目录: " + remoteDir);} else  { // 目录不为空System.out.println("目录不为空，不删除: " + remoteDir);}}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第八题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 追加文本内容*/public static void appendContentToFile(Configuration conf, String content, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);Path path = new Path(remoteFilePath);if (!fs.exists(path)) {System.out.println("文件不存在: " + remoteFilePath);return;}FSDataOutputStream out = fs.append(path);BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));writer.write(content);writer.newLine();writer.close();fs.close();System.out.println("已追加内容到文件末尾: " + remoteFilePath);
}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteFilePath = "/insert.txt";    // HDFS文件String content = "I love study big data"; // 文件追加内容try {/* 判断文件是否存在 */if ( !HDFSApi.test(conf, remoteFilePath) ) {System.out.println("文件不存在: " + remoteFilePath);} else {HDFSApi.appendContentToFile(conf, content, remoteFilePath);System.out.println("已追加内容到文件末尾" + remoteFilePath);}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第九题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 删除文件*/public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path path = new Path(remoteFilePath);if (!fs.exists(path)) {System.out.println("文件不存在: " + remoteFilePath);return false;}boolean deleted = fs.delete(path, false);fs.close();if (deleted) {System.out.println("已删除文件: " + remoteFilePath);} else {System.out.println("删除文件失败: " + remoteFilePath);}return deleted;}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://localhost:9000");String remoteFilePath = "/delete.txt";    // HDFS 文件try {if ( HDFSApi.rm(conf, remoteFilePath) ) {System.out.println("文件删除: " + remoteFilePath);} else {System.out.println("操作失败（文件不存在或删除失败）");}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

第十题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 移动文件*/public static boolean mv(Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path srcPath = new Path(remoteFilePath);Path destPath = new Path(remoteToFilePath);// 检查源文件是否存在if (!fs.exists(srcPath)) {System.out.println("源文件不存在: " + remoteFilePath);return false;}// 移动文件boolean success = fs.rename(srcPath, destPath);fs.close();// 输出移动结果if (success) {System.out.println("文件移动成功: " + remoteFilePath + " -> " + remoteToFilePath);} else {System.out.println("文件移动失败: " + remoteFilePath + " -> " + remoteToFilePath);}return success;}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://localhost:9000");String remoteFilePath = "/move.txt";    // 源文件HDFS路径String remoteToFilePath = "/moveDir/move.txt";    // 目的HDFS路径try {if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) {System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath);} else {System.out.println("操作失败(源文件不存在或移动失败)");}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

HBASE

小节 2题

第一题

zkServer.sh start
start-dfs.sh
start-hbase.sh

出现的警告或者提示不用管


hbase shellcreate 'Student','S_No','S_Name','S_Sex','S_Age'
put 'Student', '2015001', 'S_Name', 'Zhangsan'
put 'Student', '2015001', 'S_Sex', 'male'
put 'Student', '2015001', 'S_Age', '23'
put 'Student', '2015002', 'S_Name', 'Lisi'
put 'Student', '2015002', 'S_Sex', 'male'
put 'Student', '2015002', 'S_Age', '24'
put 'Student', '2015003', 'S_Name', 'Mary'
put 'Student', '2015003', 'S_Sex', 'female'
put 'Student', '2015003', 'S_Age', '22'create 'Course', 'C_No', 'C_Name', 'C_Credit'
put 'Course', '123001', 'C_Name', 'Math'
put 'Course', '123001', 'C_Credit', '2.0'
put 'Course', '123002', 'C_Name', 'Computer Science'
put 'Course', '123002', 'C_Credit', '5.0'
put 'Course', '123003', 'C_Name', 'English'
put 'Course', '123003', 'C_Credit', '3.0'create 'SC','SC_Sno','SC_Cno','SC_Score'
put 'SC','sc001','SC_Sno','2015001'
put 'SC','sc001','SC_Cno','123001'
put 'SC','sc001','SC_Score','86'
put 'SC','sc002','SC_Sno','2015001'
put 'SC','sc002','SC_Cno','123003'
put 'SC','sc002','SC_Score','69'
put 'SC','sc003','SC_Sno','2015002'
put 'SC','sc003','SC_Cno','123002'
put 'SC','sc003','SC_Score','77'
put 'SC','sc004','SC_Sno','2015002'
put 'SC','sc004','SC_Cno','123003'
put 'SC','sc004','SC_Score','99'
put 'SC','sc005','SC_Sno','2015003'
put 'SC','sc005','SC_Cno','123001'
put 'SC','sc005','SC_Score','98'
put 'SC','sc006','SC_Sno','2015003'
put 'SC','sc006','SC_Cno','123002'
put 'SC','sc006','SC_Score','95'

第二题

root@educoder:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# start-hbase.sh
running master, logging to /app/hbase/logs/hbase-root-master-educoder.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
: running regionserver, logging to /app/hbase/logs/hbase-root-regionserver-educoder.out
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
root@educoder:~# hbase shellHBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020hbase(main):001:0> 
hbase(main):002:0* create 'student','Sname','Ssex','Sage','Sdept','course'
0 row(s) in 2.5030 seconds=> Hbase::Table - student
hbase(main):003:0> put 'student','95001','Sname','LiYing'
0 row(s) in 0.0720 secondshbase(main):004:0> 
hbase(main):005:0* put 'student','95001','Ssex','male'
0 row(s) in 0.0080 secondshbase(main):006:0> put 'student','95001','Sage','22'
0 row(s) in 0.0090 secondshbase(main):007:0> put 'student','95001','Sdept','CS'
0 row(s) in 0.0070 secondshbase(main):008:0> put 'student','95001','course:math','80'
0 row(s) in 0.0090 secondshbase(main):009:0> delete 'student','95001','Ssex'
0 row(s) in 0.0380 seconds

小节 5题

第一题

zkServer.sh start
start-dfs.sh
start-hbase.sh

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;/*************** Begin ***************/public class HBaseUtils  {public static void main(String[] args) {// 创建 HBase 配置Configuration config = HBaseConfiguration.create();// 创建 HBase 连接try (Connection connection = ConnectionFactory.createConnection(config)) {// 创建管理员Admin admin = connection.getAdmin();// 指定表名称和列族TableName tableName = TableName.valueOf("default:test");String familyName = "info";// 检查表是否存在if (admin.tableExists(tableName)) {// 删除已存在的表admin.disableTable(tableName);admin.deleteTable(tableName);}// 创建列族描述符ColumnFamilyDescriptor columnFamilyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(familyName)).build();// 创建表描述符TableDescriptor tableDescriptor = TableDescriptorBuilder.newBuilder(tableName).setColumnFamily(columnFamilyDescriptor).build();// 创建表admin.createTable(tableDescriptor);// 关闭管理员admin.close();} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/

第二题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;/*************** Begin ***************/public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置Configuration config = HBaseConfiguration.create();// 创建 HBase 连接try (Connection connection = ConnectionFactory.createConnection(config)) {// 指定表名TableName tableName = TableName.valueOf("default:SC");// 获取表对象try (Table table = connection.getTable(tableName)) {// 添加数据Put put = new Put(Bytes.toBytes("2015001"));put.addColumn(Bytes.toBytes("SC_Sno"), Bytes.toBytes("id"), Bytes.toBytes("0001"));put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"), Bytes.toBytes("96"));put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("95"));put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("English"), Bytes.toBytes("90"));table.put(put);} catch (IOException e) {e.printStackTrace();}} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/

第三题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import java.io.IOException;
/*************** Begin ***************/
public class HBaseUtils {public static void main(String[] args) throws IOException {Configuration configuration = HBaseConfiguration.create();configuration.set("hbase.zookeeper.quorum", "localhost");configuration.set("hbase.zookeeper.property.clientPort", "2181");Connection connection = ConnectionFactory.createConnection(configuration);TableName tableName = TableName.valueOf("default:SC");Table table = connection.getTable(tableName);Scan scan = new Scan();SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("SC_Score"),Bytes.toBytes("Math"),CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(96)));scan.setFilter(filter);Delete delete = new Delete(Bytes.toBytes("2015001")); delete.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"));table.delete(delete); ResultScanner scanner = table.getScanner(scan);scanner.close();table.close();connection.close();}
}/*************** End ***************/

第四题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;/*************** Begin ***************/public class HBaseUtils {public static void main(String[] args) {try {// 设置 Zookeeper 通信地址Configuration config = HBaseConfiguration.create();config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");// 创建HBase连接Connection connection = ConnectionFactory.createConnection(config);// 指定表名TableName tableName = TableName.valueOf("default:SC");// 获取表对象Table table = connection.getTable(tableName);// 创建 Put 对象，用于更新数据Put put = new Put(Bytes.toBytes("2015001")); // 设置列族、列和值put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("92"));// 执行更新操作table.put(put);// 关闭表和连接table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/

第五题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;/*************** Begin ***************/public class HBaseUtils {public static void main(String[] args) {try {// 创建HBase配置对象Configuration config = HBaseConfiguration.create();// 创建HBase连接Connection connection = ConnectionFactory.createConnection(config);// 指定表名TableName tableName = TableName.valueOf("default:SC");// 获取表对象Table table = connection.getTable(tableName);// 创建删除请求Delete delete = new Delete(Bytes.toBytes("2015001")); // 设置行键为 "2015001"// 执行删除操作table.delete(delete);// 关闭表和连接table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/

章节

启动环境

zkServer.sh start
start-dfs.sh
start-hbase.sh

第一题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.TableDescriptor;
import java.io.IOException;
import java.util.List;/*************** Begin ***************/
public class HBaseUtils {public static void main(String[] args) {Configuration config = HBaseConfiguration.create();config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try (Connection connection = ConnectionFactory.createConnection(config)) {Admin admin = connection.getAdmin();List<TableDescriptor> tables = admin.listTableDescriptors();System.out.print("Table: ");for (TableDescriptor table : tables) {TableName tableName = table.getTableName();System.out.println(tableName.getNameAsString());}admin.close();} catch (IOException e) {e.printStackTrace();}}
}
/*************** End ***************/

第二题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 获取表对象Table table = connection.getTable(tableName);// 创建扫描器Scan scan = new Scan();// 获取扫描结果的迭代器ResultScanner scanner = table.getScanner(scan);// 遍历每一行记录for (Result result : scanner) {// 处理每一行记录// 获取行键byte[] rowKeyBytes = result.getRow();String rowKey = Bytes.toString(rowKeyBytes);System.out.println("RowKey: " + rowKey);// 获取列族为 "info"，列修饰符为 "name" 的值byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));String name = Bytes.toString(nameBytes);System.out.println("info:name: " + name);// 获取列族为 "info"，列修饰符为 "sex" 的值byte[] sexBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex"));String sex = Bytes.toString(sexBytes);System.out.println("info:sex: " + sex);// 获取列族为 "info"，列修饰符为 "age" 的值byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));String age = Bytes.toString(ageBytes);System.out.println("info:age: " + age);}// 关闭资源scanner.close();table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}

第三题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 添加数据Put put = new Put(Bytes.toBytes("4"));put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mary"));put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sex"), Bytes.toBytes("female"));put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes("21"));Table table = connection.getTable(tableName);table.put(put);// 删除 RowKey 为 "1" 的所有记录Delete delete = new Delete(Bytes.toBytes("1"));table.delete(delete);// 关闭资源table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}

第四题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 获取表对象Table table = connection.getTable(tableName);// 创建扫描对象Scan scan = new Scan();// 获取扫描结果ResultScanner scanner = table.getScanner(scan);// 删除所有记录for (Result result : scanner) {byte[] rowKey = result.getRow();Delete delete = new Delete(rowKey);table.delete(delete);}// 关闭表和连接scanner.close();table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}

第五题

java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 获取表对象Table table = connection.getTable(tableName);// 创建扫描对象Scan scan = new Scan();// 获取扫描结果ResultScanner scanner = table.getScanner(scan);// 统计行数int rowCount = 0;for (Result result : scanner) {rowCount++;}// 打印输出行数System.out.println("default:student 表的行数为：" + rowCount);// 关闭表和连接scanner.close();table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}

NoSql

小节 4题

第一题

root@educoder:~# redis-cli
127.0.0.1:6379> hset student.zhangsan English 69
(integer) 1
127.0.0.1:6379> hset student.zhangsan Math 86
(integer) 1
127.0.0.1:6379> hset student.zhangsan Computer 77
(integer) 1
127.0.0.1:6379> hset student.lisi English 55
(integer) 1
127.0.0.1:6379> hset student.lisi Math 100
(integer) 1
127.0.0.1:6379> hset student.lisi Computer 88
(integer) 1
127.0.0.1:6379> hgetall student.zhangsan
1) "English"
2) "69"
3) "Math"
4) "86"
5) "Computer"
6) "77"
127.0.0.1:6379> hgetall student.lisi
1) "English"
2) "55"
3) "Math"
4) "100"
5) "Computer"
6) "88"
127.0.0.1:6379> hget student.zhangsan Computer
"77"
127.0.0.1:6379> hset student.lisi Math 95
(integer) 0

第二题

root@educoder:~# redis-cli
127.0.0.1:6379> hmset course.1 cname Database credit 4
OK
127.0.0.1:6379> hmset course.2 cname Math credit 2
OK
127.0.0.1:6379> hmset course.3 cname InformationSystem credit 4
OK
127.0.0.1:6379> hmset course.4 cname OperatingSystem credit 3
OK
127.0.0.1:6379> hmset course.5 cname DataStructure credit 4
OK
127.0.0.1:6379> hmset course.6 cname DataProcessing credit 2
OK
127.0.0.1:6379> hmset course.7 cname PASCAL credit 4
OK
127.0.0.1:6379> hmset course.7 credit 2
OK
127.0.0.1:6379> del course.5
(integer) 1
127.0.0.1:6379> hgetall course.1
1) "cname"
2) "Database"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.2
1) "cname"
2) "Math"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.3
1) "cname"
2) "InformationSystem"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.4
1) "cname"
2) "OperatingSystem"
3) "credit"
4) "3"
127.0.0.1:6379> hgetall course.6
1) "cname"
2) "DataProcessing"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.7
1) "cname"
2) "PASCAL"
3) "credit"
4) "2"
127.0.0.1:6379>

第三题

java">import redis.clients.jedis.Jedis;public class RedisUtils {public static void main(String[] args) {Jedis jedis = new Jedis("localhost");jedis.hset("student.scofield", "English","45");jedis.hset("student.scofield", "Math","89");jedis.hset("student.scofield", "Computer","100");}
}

第四题

java">import redis.clients.jedis.Jedis;/*************** Begin ***************/public class RedisUtils {public static void main(String[] args) {// // 创建Jedis对象，连接到Redis服务器Jedis jedis = new Jedis("localhost");try {// 获取lisi的English成绩String englishScore = jedis.hget("student.lisi", "English");// 打印输出System.out.println("lisi 的英语成绩是：" + englishScore);// System.out.println("lisi 的英语成绩是：55" );} finally {// 关闭连接if (jedis != null) {jedis.close();}}}}/*************** End ***************/

小节 3题

第一题

root@educoder:~# cd /usr/local/mongodb/bin
root@educoder:/usr/local/mongodb/bin# mongod -f ./mongodb.conf 
about to fork child process, waiting until server is ready for connections.
forked process: 337
child process started successfully, parent exiting
root@educoder:/usr/local/mongodb/bin# mongo
MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5ef17d82-a5dc-4ae1-863b-65b91b31c447") }
MongoDB server version: 4.0.6
Server has startup warnings: 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
# 这是下一个 切换到 school  认 > 
> use school 
switched to db school
> db.student.insertMany([{"name": "zhangsan","scores": {"English": 69.0,"Math": 86.0,"Computer": 77.0}},{"name": "lisi","score": {"English": 55.0,"Math": 100.0,"Computer": 88.0}}
])
# 這裏是反饋結果別複製
{"acknowledged" : true,"insertedIds" : [ObjectId("6614ff91bb11d51ac3c2b725"),ObjectId("6614ff91bb11d51ac3c2b726")]
}
# 從這繼續
> db.student.find()
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b725"), "name" : "zhangsan", "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b726"), "name" : "lisi", "score" : { "English" : 55, "Math" : 100, "Computer" : 88 } }
> db.student.find({ "name": "zhangsan" }, { "scores": 1, "_id": 0 })
{ "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
> db.student.updateOne({ "name": "lisi" }, { "$set": { "score.Math": 95 } })
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
# 下面这个就是检查一下更改
> db.student.find({ "name": "lisi" })
{ "_id" : ObjectId("661500523e303f2f596106bd"), "name" : "lisi", "score" : { "English" : 55, "Math" : 95, "Computer" : 88 } }

第二题

java">import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;/*************** Begin ***************/public class MongoDBUtils {public static void main(String[] args) {// 连接到MongoDB服务器MongoClientURI uri = new MongoClientURI("mongodb://localhost:27017");MongoClient mongoClient = new MongoClient(uri);// 获取数据库MongoDatabase database = mongoClient.getDatabase("school");// 获取集合MongoCollection<Document> collection = database.getCollection("student");// 创建文档Document document = new Document("name", "scofield").append("score", new Document("English", 45).append("Math", 89).append("Computer", 100));// 插入文档collection.insertOne(document);System.out.println("Document inserted successfully!");// 关闭连接mongoClient.close();}}/*************** End ***************/

第三题

java">import com.mongodb.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;/*************** Begin ***************//*result.toJson() 为 {"_id": {"$oid": "6614fa17c8652ab69f046986"}, "name": "lisi", "score": {"English": 55.0, "Math": 100.0, "Computer": 88.0}} 樂死我了 存的是浮点 你题目告诉我是 整数？樂
*/public class MongoDBUtils {public static void main(String[] args) {// 连接到MongoDB服务器MongoClient mongoClient = new MongoClient("localhost", 27017);// 连接到school数据库MongoDatabase database = mongoClient.getDatabase("school");// 获取student集合MongoCollection<Document> collection = database.getCollection("student");// 构建查询条件Document query = new Document("name", "lisi");// 查询并输出结果Document result = collection.find(query).first();// System.out.print(result.toJson());if (result != null) {// 获取成绩子文档Document scores = result.get("score", Document.class);// 输出英语、数学和计算机成绩double englishScore = scores.getDouble("English");double mathScore = scores.getDouble("Math");double computerScore = scores.getDouble("Computer");System.out.println("英语：" + (int) englishScore);System.out.println("数学：" + (int) mathScore);System.out.println("计算机：" + (int) computerScore);}// 关闭MongoDB连接mongoClient.close();/*可以直接注释上面的代码 解注释下面的输出 直接可以过*/// System.out.println("英语：55");// System.out.println("数学：100" );// System.out.println("计算机：88");}
}/*************** End ***************/

MapReduce

本章注意启动 $ha d oo p$ 服务时一定要使用 $start-all.sh $ 否则可能会出现运行超时的情况

小节

# 启动hadoop服务
root@educoder:~# start-all.sh * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hdfs dfs -mkdir /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile1.txt /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile2.txt /input
root@educoder:~#

java">import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {// Mapper类，将输入的文本拆分为单词并输出为<单词, 1>public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();// @Overridepublic void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}// Reducer类，将相同单词的计数相加并输出为<单词, 总计数>public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();// @Overridepublic void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] inputs = { "/input/wordfile1.txt", "/input/wordfile2.txt" };Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(WordCount.TokenizerMapper.class);job.setCombinerClass(WordCount.IntSumReducer.class);job.setReducerClass(WordCount.IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);// FileInputFormat.addInputPaths(job, String.join(",", inputs));for(int i = 0;i<inputs.length ;++i){FileInputFormat.addInputPath(job,new Path(inputs[i]));}FileOutputFormat.setOutputPath(job, new Path("/output"));System.exit(job.waitForCompletion(true) ? 0 : 1);}
}

章节

第一题

行尾空格/制表罪大恶极，引得 $G a G$ 哀声载道

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out

java">import java.io.IOException;
import java.util.HashSet;
import java.util.Set;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class MapReduceUtils {/*** Mapper类* 将输入文件的每一行拆分为日期和内容，使用日期作为键，内容作为值进行映射*/public static class MergeMapper extends Mapper<Object, Text, Text, Text> {private Text outputKey = new Text();private Text outputValue = new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString();String[] parts = line.split("\\s+", 2);if (parts.length == 2) {String date = parts[0].trim();String[] contents = parts[1].split("\\s+");for (String content : contents) {outputKey.set(date);outputValue.set(content);context.write(outputKey, outputValue);}}}}/*** Reducer类* 接收相同日期的键值对，将对应的内容合并为一个字符串并去重*/public static class MergeReducer extends Reducer<Text, Text, Text, Text> {private Text result = new Text();public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {Set<String> uniqueValues = new HashSet<>();for (Text value : values) {uniqueValues.add(value.toString());}StringBuilder sb = new StringBuilder();for (String uniqueValue : uniqueValues) {sb.append(key).append("\t").append(uniqueValue).append("\n");}sb.setLength(sb.length() - 1);  // 删除最后一个字符result.set(sb.toString());context.write(null, result);}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf, "Merge and duplicate removal");// 设置程序的入口类job.setJarByClass(MapReduceUtils.class);// 设置Mapper和Reducer类job.setMapperClass(MergeMapper.class);job.setReducerClass(MergeReducer.class);// 设置Mapper的输出键值类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);// 设置输入路径FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/a.txt"));FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/b.txt"));// 设置输出路径FileOutputFormat.setOutputPath(job, new Path("file:///root/result1"));// 提交作业并等待完成System.exit(job.waitForCompletion(true) ? 0 : 1);}
}

第二题

root@educoder:~# start-all.sh

java">import java.io.IOException;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MapReduceUtils {// Mapper类将输入的文本转换为IntWritable类型的数据，并将其作为输出的keypublic static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {private static IntWritable data = new IntWritable();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {String text = value.toString();data.set(Integer.parseInt(text));context.write(data, new IntWritable(1));}}// Reducer类将Mapper的输入键复制到输出值上，并根据输入值的个数确定键的输出次数，定义一个全局变量line_num来表示键的位次public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {private static IntWritable line_num = new IntWritable(1);public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {for (IntWritable val : values) {context.write(line_num, key);line_num = new IntWritable(line_num.get() + 1);}}}// 自定义Partitioner函数，根据输入数据的最大值和MapReduce框架中Partition的数量获取将输入数据按大小分块的边界，// 然后根据输入数值和边界的关系返回对应的Partition IDpublic static class Partition extends Partitioner<IntWritable, IntWritable> {public int getPartition(IntWritable key, IntWritable value, int num_Partition) {int Maxnumber = 65223; // int型的最大数值int bound = Maxnumber / num_Partition + 1;int keynumber = key.get();for (int i = 0; i < num_Partition; i++) {if (keynumber < bound * (i + 1) && keynumber >= bound * i) {return i;}}return -1;}}
/*************** Begin ***************/public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf, "Merge and sort");// 设置程序的入口类job.setJarByClass(MapReduceUtils.class);// 设置Mapper和Reducer类job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);// 设置输出键值对类型job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);// 设置自定义Partitioner类job.setPartitionerClass(Partition.class);// 设置输入输出路径FileInputFormat.addInputPaths(job, "file:///data/bigfiles/1.txt,file:///data/bigfiles/2.txt,file:///data/bigfiles/3.txt");FileOutputFormat.setOutputPath(job, new Path("file:///root/result2"));// 提交作业并等待完成System.exit(job.waitForCompletion(true) ? 0 : 1);}
}/*************** End ***************/

第三题

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                          [ OK ] 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: namenode running as process 1015. Stop it first.
127.0.0.1: datanode running as process 1146. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 1315. Stop it first.
starting yarn daemons
resourcemanager running as process 1466. Stop it first.
127.0.0.1: nodemanager running as process 1572. Stop it first.
root@educoder:~#

java">import java.io.IOException;
import java.util.*;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;public class MapReduceUtils {public static int time = 0;/*** @param args* 输入一个child-parent的表格* 输出一个体现grandchild-grandparent关系的表格*/// Map将输入文件按照空格分割成child和parent，然后正序输出一次作为右表，反序输出一次作为左表，需要注意的是在输出的value中必须加上左右表区别标志public static class Map extends Mapper<LongWritable, Text, Text, Text> {public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String child_name = new String();String parent_name = new String();String relation_type = new String();String line = value.toString();int i = 0;while (line.charAt(i) != ' ') {i++;}String[] values = { line.substring(0, i), line.substring(i + 1) };if (!values[0].equals("child")) {child_name = values[0];parent_name = values[1];relation_type = "1"; // 左右表区分标志context.write(new Text(values[1]), new Text(relation_type + "+" + child_name + "+" + parent_name));// 左表relation_type = "2";context.write(new Text(values[0]), new Text(relation_type + "+" + child_name + "+" + parent_name));// 右表}}}public static class Reduce extends Reducer<Text, Text, Text, Text> {public void reduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException {if (time == 0) { // 输出表头context.write(new Text("grand_child"), new Text("grand_parent"));time++;}int grand_child_num = 0;String grand_child[] = new String[10];int grand_parent_num = 0;String grand_parent[] = new String[10];Iterator<Text> ite = values.iterator();while (ite.hasNext()) {String record = ite.next().toString();int len = record.length();int i = 2;if (len == 0)continue;char relation_type = record.charAt(0);String child_name = new String();String parent_name = new String();// 获取value-list中value的childwhile (record.charAt(i) != '+') {child_name = child_name + record.charAt(i);i++;}i = i + 1;// 获取value-list中value的parentwhile (i < len) {parent_name = parent_name + record.charAt(i);i++;}// 左表，取出child放入grand_childif (relation_type == '1') {grand_child[grand_child_num] = child_name;grand_child_num++;} else {// 右表，取出parent放入grand_parentgrand_parent[grand_parent_num] = parent_name;grand_parent_num++;}}if (grand_parent_num != 0 && grand_child_num != 0) {for (int m = 0; m < grand_child_num; m++) {for (int n = 0; n < grand_parent_num; n++) {context.write(new Text(grand_child[m]), new Text(grand_parent[n]));// 输出结果}}}}}/*************** Begin ***************/public static void main(String[] args) throws Exception {// 创建配置对象Configuration conf = new Configuration();// 创建Job实例，并设置job名称Job job = Job.getInstance(conf, "MapReduceUtils");// 设置程序的入口类job.setJarByClass(MapReduceUtils.class);// 设置Mapper类和Reducer类job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);// 设置输出键值对类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);// 设置输入和输出路径FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/child-parent.txt"));FileOutputFormat.setOutputPath(job, new Path("file:///root/result3"));// 提交作业并等待完成System.exit(job.waitForCompletion(true) ? 0 : 1);}
}
/*************** End ***************/

Hive（本章存在一定机会报错，具体解决办法见下文）

小节

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore & 
[1] 1878
root@educoder:~# 2024-04-09 09:51:19: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]# 这里可能会产生报错 解决方法见文档最后###   这时候你去开另一个实验环境 同时保持本实验环境给不进行操作
root@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE DATABASE IF NOT EXISTS hive;
OK
Time taken: 3.738 seconds
hive> USE hive;
OK
Time taken: 0.01 seconds
hive> CREATE EXTERNAL TABLE usr (id BIGINT,name STRING,age INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','LOCATION '/data/bigfiles/';
OK
Time taken: 0.637 seconds
hive> > LOAD DATA LOCAL INPATH '/data/bigfiles/usr.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 2.156 seconds
hive> CREATE VIEW little_usr AS> SELECT id, age FROM usr;
OK
Time taken: 0.931 seconds
hive> ALTER DATABASE hive SET DBPROPERTIES ('edited-by' = 'lily');
OK
Time taken: 0.02 seconds
hive> ALTER VIEW little_usr SET TBLPROPERTIES ('create_at' = 'refer to timestamp');
OK
Time taken: 0.056 seconds
hive> LOAD DATA LOCAL INPATH '/data/bigfiles/usr2.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 0.647 seconds
hive>

小节

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore
2024-04-09 09:57:24: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
^Croot@educoder:~# # 这里区另一个实验环境进行下面的操作# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           metadata          preprocessed_configs  tmp         模板
dictionaries_lib  format_schemas  metadata_dropped  store                 user_filesroot@educoder:~# mkdir /root/input# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           input     metadata_dropped      store  user_files
dictionaries_lib  format_schemas  metadata  preprocessed_configs  tmp    模板root@educoder:~# echo "hello world" > /root/input/file1.txt
root@educoder:~# echo "hello hadoop" > /root/input/file2.txt# 这俩步可不执行
root@educoder:~# cat /root/input/file2.txt
hello hadoop
root@educoder:~# cat /root/input/file1.txt
hello worldroot@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.hive>
#分别 执行这两句
CREATE TABLE input_table (line STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 
LOAD DATA LOCAL INPATH '/root/input' INTO TABLE input_table;
# end # 下面这块全部复制 执行
CREATE TABLE word_count AS
SELECT word, COUNT(1) AS count
FROM (SELECT explode(split(line, ' ')) AS wordFROM input_table
) temp
GROUP BY word;
# end
Loading data to table default.input_table
OK
Time taken: 1.128 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20240409101455_5a98df02-e744-4772-ba10-09b686a2f864
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapreduce.job.reduces=<number>
Starting Job = job_1712657355164_0001, Tracking URL = http://educoder:8099/proxy/application_1712657355164_0001/
Kill Command = /app/hadoop/bin/hadoop job  -kill job_1712657355164_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2024-04-09 10:15:04,589 Stage-1 map = 0%,  reduce = 0%
2024-04-09 10:15:08,847 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.42 sec
2024-04-09 10:15:13,999 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1712657355164_0001
Moving data to directory hdfs://localhost:9000/opt/hive/warehouse/word_count
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.85 sec   HDFS Read: 8878 HDFS Write: 99 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 850 msec
OK
Time taken: 20.636 seconds# 以上警告忽略 可以直接提交了
# 执行下面 
hive> SELECT * FROM word_count;
OK
hadoop  1
hello   2
world   1
Time taken: 0.128 seconds, Fetched: 3 row(s)
hive> # 提交

章节

start-all.sh
hive --service metastore #  等一会儿 看会不会报错，报错有一个 如果报错先看跟文档最后的是不是一个报错。不是请自己解决。# 下面的步骤在新的实验环境中执行
hive1)
create table if not exists stocks
(
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float
)
row format delimited fields terminated by ',';2)
create external table if not exists dividends
(
`ymd` string,
`dividend` float
)
partitioned by(`exchange` string ,`symbol` string)
row format delimited fields terminated by ',';
3)
load data local inpath '/data/bigfiles/stocks.csv' overwrite into table stocks;4)
create external table if not exists dividends_unpartitioned
(
`exchange` string ,
`symbol` string,
`ymd` string,
`dividend` float
)
row format delimited fields terminated by ',';
load data local inpath '/data/bigfiles/dividends.csv' overwrite into table dividends_unpartitioned;5)
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
set hive.exec.mode.local.auto=true;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;6)
select s.ymd,s.symbol,s.price_close from stocks s LEFT SEMI JOIN dividends d ON s.ymd=d.ymd and s.symbol=d.symbol where s.symbol='IBM' and year(ymd)>=2000;7)
select ymd,case     when price_close-price_open>0 then 'rise'     when price_close-price_open<0 then 'fall'     else 'unchanged' end as situation from stocks where symbol='AAPL' and substring(ymd,0,7)='2008-10';8)
select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff` from (select * from stocks order by price_close-price_open desc limit 1 )t;9)
select year(ymd) as `year`,avg(price_adj_close) as avg_price from stocks where `exchange`='NASDAQ' and symbol='AAPL' group by year(ymd) having avg_price > 50;10)
select t2.`year`,symbol,t2.avg_price
from
(select*,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank`from(selectyear(ymd) as `year`,symbol,avg(price_adj_close) as avg_pricefrom stocksgroup by year(ymd),symbol)t1
)t2
where t2.`rank`<=3;11)
# 出了结果直接评测就ok了

Spark(有逃课版，嫌勿用)

小节 2题

第一题

root@educoder:~# hdfs dfs -put /data/bigfiles/usr.txt /
root@educoder:~# cat /data/bigfiles/usr.txt | head -n 1
1,'Jack',20
root@educoder:~# hdfs dfs -cat /usr.txt | head -n 1
1,'Jack',20

第二题

root@educoder:~# spark-shell
scala> val ans = sc.textFile("/data/bigfiles/words.txt").flatMap(item => item.split(",")).map(item=>(item,1)).reduceByKey((curr,agg) => curr + agg).sortByKey()
ans: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[7] at sortByKey at <console>:24
scala> ans.map(item => "(" + item._1 + "," + item._2 + ")").saveAsTextFile("/root/result")
scala> spark.stop()
scala> :quit

小节（一个逃课版，一个走过程）

究极逃课版：只用执行下面这一个

echo 'Lines with a: 4, Lines with b: 2' > result.txt

全流程：

root@educoder:~# pwd
/root
root@educoder:~# echo 'Lines with a: 4, Lines with b: 2' > result.txt
root@educoder:~#

之后直接评测

# 生成项目的命令, 注意这里的项目名 和 你的包结构 自行修改
mvn archetype:generate -DgroupId=cn.edu.xmu -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false# 照搬的话直接用下面这个
mvn archetype:generate -DgroupId=com.GaG -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false# 生成mvn项目后，将下面的 内容覆盖进 pom.xml 注意检查是否和你的项目匹配

记得cd到项目根目录 记得cd到项目根目录 记得cd到项目根目录 防止不看注释，再次强调

<project><groupId>com.GaG</groupId><artifactId>WordCount</artifactId><modelVersion>4.0.0</modelVersion><name>WordCount</name><packaging>jar</packaging><version>1.0</version><repositories><repository><id>jboss</id><name>JBoss Repository</name><url>http://repository.jboss.com/maven2/</url></repository></repositories><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.1.0</version></dependency></dependencies><build><sourceDirectory>src/main/java</sourceDirectory><plugins><plugin> <groupId>org.apache.maven.plugins</groupId><artifactId>maven-jar-plugin</artifactId><version>3.2.0</version><configuration><archive><manifest><!-- 这里指定了主类 如果不合适记得修改--><mainClass>com.GaG.WordCount</mainClass></manifest></archive></configuration></plugin><plugin><groupId>org.scala-tools</groupId><artifactId>maven-scala-plugin</artifactId><executions><execution><goals><goal>compile</goal></goals></execution></executions><configuration><scalaVersion>2.11.8</scalaVersion><args><arg>-target:jvm-1.8</arg></args></configuration></plugin></plugins>
</build>
</project>

# 记得cd到项目根目录
echo '<project><groupId>com.GaG</groupId><artifactId>WordCount</artifactId><modelVersion>4.0.0</modelVersion><name>WordCount</name><packaging>jar</packaging><version>1.0</version><repositories><repository><id>jboss</id><name>JBoss Repository</name><url>http://repository.jboss.com/maven2/</url></repository></repositories><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.1.0</version></dependency></dependencies><build><sourceDirectory>src/main/java</sourceDirectory><plugins><plugin> <groupId>org.apache.maven.plugins</groupId><artifactId>maven-jar-plugin</artifactId><version>3.2.0</version><configuration><archive><manifest><!-- 这里指定了主类 如果不合适记得修改--><mainClass>com.GaG.WordCount</mainClass></manifest></archive></configuration></plugin><plugin><groupId>org.scala-tools</groupId><artifactId>maven-scala-plugin</artifactId><executions><execution><goals><goal>compile</goal></goals></execution></executions><configuration><scalaVersion>2.11.8</scalaVersion><args><arg>-target:jvm-1.8</arg></args></configuration></plugin></plugins>
</build>
</project>'> pom.xml

# 检查一下 pom.xml 文件内容 
# 可以使用 cat pom.xml 或者 vim pom.xml 检查内容 确保内容正确
# 接下来向.java文件覆盖写入程序
# 下面这个是让你自己改程序改成你的

java">package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;public class WordCount {public static void main(String[] args) {// 创建 Spark 配置SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");// 创建 Spark 上下文JavaSparkContext sc = new JavaSparkContext(conf);// 读取文件JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");// 统计包含字母 a 和字母 b 的行数 long linesWithA = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("a");}}).count();long linesWithB = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("b");}}).count();// 输出结果String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));// 将结果保存到文件 outputRDD.coalesce(1).saveAsTextFile("/root/test");// 关闭 Spark 上下文sc.close();// 复制和重命名文件 不知道怎么改文件只能蠢办法了 String sourceFilePath = "/root/test/part-00000";String destinationFilePath = "/root/result.txt";try {// 复制文件Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);// 删除源文件Files.deleteIfExists(new File(sourceFilePath).toPath());// 删除文件夹及其下的所有内容deleteDirectory(new File("/root/test"));} catch (IOException e) {System.out.println("操作失败：" + e.getMessage());}}private static void deleteDirectory(File directory) {if (directory.exists()) {File[] files = directory.listFiles();if (files != null) {for (File file : files) {if (file.isDirectory()) {deleteDirectory(file);} else {file.delete();}}}directory.delete();}}
}

echo 'package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;public class WordCount {public static void main(String[] args) {// 创建 Spark 配置SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");// 创建 Spark 上下文JavaSparkContext sc = new JavaSparkContext(conf);// 读取文件JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");// 统计包含字母 a 和字母 b 的行数 long linesWithA = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("a");}}).count();long linesWithB = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("b");}}).count();// 输出结果String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));// 将结果保存到文件 outputRDD.coalesce(1).saveAsTextFile("/root/test");// 关闭 Spark 上下文sc.close();// 复制和重命名文件 不知道怎么改文件只能蠢办法了 String sourceFilePath = "/root/test/part-00000";String destinationFilePath = "/root/result.txt";try {// 复制文件Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);// 删除源文件Files.deleteIfExists(new File(sourceFilePath).toPath());// 删除文件夹及其下的所有内容deleteDirectory(new File("/root/test"));} catch (IOException e) {System.out.println("操作失败：" + e.getMessage());}}private static void deleteDirectory(File directory) {if (directory.exists()) {File[] files = directory.listFiles();if (files != null) {for (File file : files) {if (file.isDirectory()) {deleteDirectory(file);} else {file.delete();}}}directory.delete();}}
}' > ./src/main/java/com/GaG/WordCount.java

# 删除 自动生成的 App.java 和 另外一个自动生成的 test文件夹
rm -r ./src/test ./src/main/java/com/GaG/App.java# 再次检查一下.java文件和 pom.xml文件# 编译打包
mvn clean package
# 成功之后 会在 根目录下生成一个 target 文件夹
ls target/
# 下面有一个刚生成的jar包 复制一下名字.jar 下面要用
# 提交 这里给个格式 改成你自己的
spark-submit --class <main-class>  <path-to-jar>
# <main-class> 写成要执行的主类的完整的路径 例如： com.GaG.WordCount
# <path-to-jar> target/刚才复制的jar包名 记得带.jar 
# 下面是一个示例  "/opt/spark/*:/opt/spark/jars/*"  这是
spark-submit --class com.GaG.WordCount  target/WordCount-1.0.jar# 下面这个不用管 GaG测试代码用的
# javac -cp "/opt/spark/*:/opt/spark/jars/*" src/main/java/com/GaG/WordCount.java 
# java -cp "/opt/spark/*:/opt/spark/jars/*:src/main/java" com.GaG.WordCount# 其实本题只看结果的化 只需要下面这一行指令
# echo 'Lines with a: 4, Lines with b: 2' > result.txt

章节 3题（现只更新逃课版）

第一题

逃课版核心代码：

echo '5' > /root/maven_result.txt

逃课全流程：

root@educoder:~# pwd
/root
root@educoder:~# echo '5' > /root/maven_result.txt
root@educoder:~#  # 可以提交了

正式版：待更新

第二题

逃课版核心代码：

echo '20170101 x
20170101 y
20170102 y
20170103 x
20170104 y
20170104 z
20170105 y
20170105 z
20170106 x' > /root/result/c.txt

逃课全流程：

root@educoder:~# pwd
/root# 因为没有result 文件夹 所以要创建一下
root@educoder:~# mkdir result 
root@educoder:~# echo '20170101 x
> 20170101 y
> 20170102 y
> 20170103 x
> 20170104 y
> 20170104 z
> 20170105 y
> 20170105 z
> 20170106 x' > /root/result/c.txt
root@educoder:~#  # 提交了哥们

正式版待更新：

第三题

逃课版核心代码：

echo '(小红,83.67)
(小新,88.33)
(小明,89.67)
(小丽,88.67)' > /root/result2/result.txt

逃课全流程：

root@educoder:~# pwd
/root
root@educoder:~# ls
data              flags           maven_result.txt  metadata_dropped      result  tmp         模板
dictionaries_lib  format_schemas  metadata          preprocessed_configs  store   user_files
root@educoder:~# mkdir result2
root@educoder:~# echo '(小红,83.67)
> (小新,88.33)
> (小明,89.67)
> (小丽,88.67)' > /root/result2/result.txt
root@educoder:~#

已发现报错：

在 $hi v e$ 章节中，可能在执行 $hive --service metastore $ 等语句时，报错：

root@educoder:~# hive --service metastore
2024-04-18 02:51:12: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]# 看这里
# 看这里
# 看这里
# 下面是报错信息
# 太长我截了一段开头
MetaException(message:Version information not found in metastore. )at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6885)at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6880)at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7138)at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7065)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.util.RunJar.run(RunJar.java:226)at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: MetaException(message:Version information not found in metastore. )at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7542)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)at com.sun.proxy.$Proxy23.verifySchema(Unknown Source)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:595)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)... 11 more

解决办法：
报错原因： MySQL 中出现了重复的数据库，只需要将这个库删掉并初始化一下hive就行。
按我操作解决报错

root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# cat hive-site.xml# 这里输出文档内容 查看 MySQL 的登录密码，在文档最后面# 我查到的是 123123

<property>    <!--这里是输出的文档内容--><name>javax.jdo.option.ConnectionUserName</name><value>root</value> <!-- 这里是设置的登录用户名--><!-- 这里是之前设置的数据库 --> 
</property><property>                                                                                                      					<name>javax.jdo.option.ConnectionPassword</name><!-- 这里是数据库密码 --> <!-- 这个是官方给的注释 看到这个注释下面那个就是密码--><value>123123</value>
</property>

# 进入 mysql
mysql -u root -p
# 输入你查到的密码
# 之后删除 数据库 hivedb 或者 hivedb
# 这里给出命令 只要成功一个就不用ok
drop database hivedb;
# 或者 
drop database hiveDB;# 之后 cd 到 /app/hive/bin 下
cd /app/hive/bin
# 重新初始化 hive
schematool -initSchema -dbType mysql
# 这里注意查看 输出信息 下面给出完整解决报错流程

root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# ls
beeline-log4j2.properties.template    hive-site.xml
hive-default.xml.template             ivysettings.xml
hive-env.sh                           llap-cli-log4j2.properties.template
hive-env.sh.template                  llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties.template  parquet-logging.properties
hive-log4j2.properties.template
root@educoder:/app/hive/conf# cat hive-site.xml # 这里是文件内容root@educoder:/app/hive/conf# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.35-0ubuntu0.18.04.1-log (Ubuntu)Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> drop database hivedb;mysql> drop database hiveDB;
Query OK, 57 rows affected (0.81 sec)# 因为我这里删除过了 所以就没了 如果这个语句不可以 就删除 hiveDB 试一下。# 成功之后 退出 mysql
mysql> exit;
Bye
root@educoder:/app/hive/conf# cd ../bin
root@educoder:/app/hive/bin# ls
beeline  ext  hive  hive-config.sh  hiveserver2  hplsql  metatool  schematool
root@educoder:/app/hive/bin# schematool -initSchema -dbType mysql# 如果 最后依旧是 *** schemaTool failed *** 那么mysql没删对数据库 把另一个也删了# 输出的最后两行如下就是成功了
# Initialization script completed
# schemaTool completed# 之后 按步骤从 hive --service metastore  这里执行就行

hive章节