头歌平台——大数据技术——上机

ops/2024/9/23 4:17:49/

有问题自行解决,本文档仅用于记录本人课程学习的过程


大数据上机 请先阅读注意事项

md文档用户可以按住ctrl + 鼠标左键跳转至注意事项 ,Pdf用户请直接点击


文章目录

  • 大数据上机 请先阅读注意事项
    • @[toc]
    • **注意事项 此项必看**
    • 大数据技术概述
    • Linux 系统的安装和使用
      • Linux 操作系统
        • Linux 初体验
        • Linux 常用命令
        • Linux 查询命令帮助语句
    • Hadoop 的安装和使用
      • 章节测验
    • HDFS
      • 小节
        • 第一题
      • 章节
        • 第一题
        • 第二题
        • 第三题
        • 第四题
        • 第五题
        • 第六题
        • 第七题
        • 第八题
        • 第九题
        • 第十题
    • HBASE
      • 小节 2题
        • 第一题
        • 第二题
      • 小节 5题
        • 第一题
        • 第二题
        • 第三题
        • 第四题
        • 第五题
      • 章节
        • 第一题
        • 第二题
        • 第三题
        • 第四题
        • 第五题
    • NoSql
      • 小节 4题
        • 第一题
        • 第二题
        • 第三题
        • 第四题
      • 小节 3题
        • 第一题
        • 第二题
        • 第三题
    • MapReduce
      • 小节
      • 章节
        • 第一题
        • 第二题
        • 第三题
    • Hive(本章存在一定机会报错,具体解决办法见下文)
      • 小节
      • 小节
      • 章节
    • Spark(有逃课版,嫌勿用)
      • 小节 2题
        • 第一题
        • 第二题
      • 小节 (一个逃课版,一个走过程)
      • 章节 3题 (现只更新逃课版)
        • 第一题
        • 第二题
        • 第三题
    • 已发现报错:

注意事项 此项必看

  • 注释一定要看!

  • 个别题目选择逃课做法(标题中会声明),嫌勿用

  • 相关题目中涉及的服务请自行启动

  • 代码执行过程中出现问题请自行解决,或者释放资源重新开始

  • 评测之前请先自测运行

  • 启动 h a d o o p hadoop hadoop 服务时,尽量使用 s t a r t − a l l . s h start-all.sh startall.sh 命令,尤其是 MapReduce章节 只可以使用 s t a r t − a l l . s h start-all.sh startall.sh ,否则运行不出结果

  • 针对一些建表语句请仔细认真看清楚需要复制的内容是什么,有些是多行一句,请务必准确复制并使用

  • 找题请查看你所使用阅读器提供的 目录

  • h i v e hive hive 章节存在可能出现的报错 给出了遇到相同报错的解决办法

  • 启动 Hadoop 、Zookeeper、HBase 服务

    zkServer.sh start
    start-dfs.sh
    start-hbase.sh
    
  • 启动 Hadoop 、hive 服务 尽量使用 s t a r t − a l l . s h start-all.sh startall.sh 启动 Hadoop

    start-all.sh
    hive --service metastore # 看到卡住在 “SLF4J”开头的 就终止一下(ctrl c) 之后执行下面的
    hive --service hiveserver2 # 这个也会卡 终止一下 按服务启动了认为就行 指令是网上搜的 觉得不对的绕行
    

大数据技术概述

大数据应用

选择题答案:D D D ABCD BCD


Linux 系统的安装和使用

Linux 操作系统

Linux 初体验
#!/bin/bash#在以下部分写出完成任务的命令
#*********begin*********#
cd /
ls -a
#********* end *********#
Linux 常用命令
#!/bin/bash#在以下部分写出完成任务的命令
#*********begin*********#
touch newfile
mkdir newdir
cp newfile newfileCpy
mv newfileCpy newdir
#********* end *********#
Linux 查询命令帮助语句
#!/bin/bash#在以下部分写出完成任务的命令
#*********begin*********#
man fopen
#********* end *********#

Hadoop 的安装和使用

章节测验

root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# hdfs dfs -mkdir -p /user/hadoop/test
root@educoder:~# hdfs dfs -put ~/.bashrc /user/hadoop/test
root@educoder:~# hdfs dfs -get /user/hadoop/test /app/hadoop

HDFS

小节

第一题
root@educoder:~# start-dfs.sh
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.io.IOUtils;import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;/*************** Begin ***************/public class HDFSUtils {public static void main(String[] args) {// HDFS通信地址String hdfsUri = "hdfs://localhost:9000";// HDFS文件路径String[] inputFiles = {"/a.txt", "/b.txt", "/c.txt"};// 输出路径String outputFile = "/root/result/merged_file.txt";// 创建Hadoop配置对象Configuration conf = new Configuration();try {// 创建Hadoop文件系统对象FileSystem fs = FileSystem.get(new Path(hdfsUri).toUri(), conf);// 创建输出文件OutputStream outputStream = fs.create(new Path(outputFile));// 合并文件内容for (String inputFile : inputFiles) {mergeFileContents(fs, inputFile, outputStream);}// 关闭流outputStream.close();// 关闭Hadoop文件系统fs.close();} catch (IOException e) {e.printStackTrace();}}/*************** End ***************/private static void mergeFileContents(FileSystem fs, String inputFile, OutputStream outputStream) throws IOException {// 打开输入文件Path inputPath = new Path(inputFile);InputStream inputStream = fs.open(inputPath);// 拷贝文件内容IOUtils.copyBytes(inputStream, outputStream, 4096, false);// 写入换行符outputStream.write(System.lineSeparator().getBytes());// 关闭流inputStream.close();}
}

章节

root@educoder:~# start-dfs.sh
第一题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 复制文件到指定路径* 若路径已存在,则进行覆盖*/public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path localPath = new Path(localFilePath);Path remotePath = new Path(remoteFilePath);/* fs.copyFromLocalFile 第一个参数表示是否删除源文件,第二个参数表示是否覆盖 */fs.copyFromLocalFile(false,true,localPath,remotePath);fs.close();}/*** 追加文件内容*/public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);/* 创建一个文件读入流 */FileInputStream in = new FileInputStream(localFilePath);/* 创建一个文件输出流,输出的内容将追加到文件末尾 */FSDataOutputStream out = fs.append(remotePath);/* 读写文件内容 */byte[] buffer = new byte[4096];int bytesRead = 0;while ((bytesRead = in.read(buffer)) > 0) {out.write(buffer, 0, bytesRead);}in.close();out.close();fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String localFilePath = "/root/test.txt";    // 本地路径String remoteFilePath = "/test.txt";    // HDFS路径String choice = "overwrite";    // 若文件存在则追加到文件末尾try {/* 判断文件是否存在 */Boolean fileExists = false;if (HDFSApi.test(conf, remoteFilePath)) {fileExists = true;System.out.println(remoteFilePath + " 已存在.");} else {System.out.println(remoteFilePath + " 不存在.");}/* 进行处理 */if ( !fileExists) { // 文件不存在,则上传HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);System.out.println(localFilePath + " 已上传至 " + remoteFilePath);} else if ( choice.equals("overwrite") ) {    // 选择覆盖HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);} else if ( choice.equals("append") ) {   // 选择追加HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);System.out.println(localFilePath + " 已追加至 " + remoteFilePath);}} catch (Exception e) {e.printStackTrace();}}
}
/*************** End ***************/
第二题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 下载文件到本地* 判断本地路径是否已存在,若已存在,则自动进行重命名*/public static void copyToLocal(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);File localFile = new File(localFilePath);/* 如果文件名存在,自动重命名(在文件名后面加上 _0, _1 ...) */if (localFile.exists()) {// 如果文件已存在,则自动进行重命名int count = 0;String baseName = localFile.getName();String parentDir = localFile.getParent();String newName = baseName;do {count++;newName = baseName + "_" + count;localFile = new File(parentDir, newName);} while (localFile.exists());}// 下载文件到本地fs.copyToLocalFile(remotePath, new Path(localFile.getAbsolutePath()));fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String localFilePath = "/usr/local/down_test/test.txt";   // 本地路径String remoteFilePath = "/test.txt";   // HDFS路径try {HDFSApi.copyToLocal(conf,remoteFilePath,localFilePath);System.out.println("下载完成");} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第三题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 读取文件内容*/public static void cat(Configuration conf, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);FSDataInputStream in = fs.open(remotePath);BufferedReader reader = new BufferedReader(new InputStreamReader(in));String line;while ((line = reader.readLine()) != null) {System.out.println(line);}reader.close();in.close();fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteFilePath = "/test.txt";    // HDFS路径try {System.out.println("读取文件: " + remoteFilePath);HDFSApi.cat(conf, remoteFilePath);System.out.println("\n读取完成");} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第四题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;/*************** Begin ***************/public class HDFSApi {/*** 显示指定文件的信息*/public static void ls(Configuration conf, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path remotePath = new Path(remoteFilePath);FileStatus[] fileStatuses = fs.listStatus(remotePath);for (FileStatus s : fileStatuses) {// 获取文件路径String path = s.getPath().toString();// 获取文件权限String permission = s.getPermission().toString();// 获取文件大小long fileSize = s.getLen();// 获取文件修改时间long modificationTime = s.getModificationTime();SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");String modificationTimeStr = sdf.format(new Date(modificationTime));// 输出文件信息System.out.println("路径: " + path);System.out.println("权限: " + permission);System.out.println("时间: " + modificationTimeStr);System.out.println("大小: " + fileSize);}fs.close();}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteFilePath = "/";  // HDFS路径try {HDFSApi.ls(conf, remoteFilePath);System.out.println("\n读取完成");} catch (Exception e) {e.printStackTrace();}}}/*************** End ***************/
第五题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;/*************** Begin ***************/public class HDFSApi {/*** 显示指定文件夹下所有文件的信息(递归)*/public static void lsDir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);Path dirPath = new Path(remoteDir);listFiles(fs,dirPath);fs.close();}    private static void listFiles(FileSystem fs, Path dirPath) throws IOException {FileStatus[] fileStatuses = fs.listStatus(dirPath);for (FileStatus status : fileStatuses) {if (status.isFile()) {printFileInfo(status);} else if (status.isDirectory()) {// 如果是目录,则递归处理listFiles(fs, status.getPath());}}}private static void printFileInfo(FileStatus status) {// 获取文件路径String path = status.getPath().toString();// 获取文件权限String permission = status.getPermission().toString();// 获取文件大小long fileSize = status.getLen();// 获取文件修改时间long modificationTime = status.getModificationTime();SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");String modificationTimeStr = sdf.format(modificationTime);// 输出文件信息System.out.println("路径: " + path);System.out.println("权限: " + permission);System.out.println("时间: " + modificationTimeStr);System.out.println("大小: " + fileSize);System.out.println();} /*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteDir = "/test";    // HDFS路径try {System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir);HDFSApi.lsDir(conf,remoteDir);System.out.println("读取完成");} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第六题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 创建目录*/public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.mkdirs(new Path(remoteDir));}/*** 创建文件*/public static void touchz(Configuration conf, String filePath) throws IOException {FileSystem fs = FileSystem.get(conf);fs.create(new Path(filePath)).close();}/*** 删除文件*/public static boolean rm(Configuration conf, String filePath) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.delete(new Path(filePath), false); // 第二个参数表示是否递归删除}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String filePath = "/test/create.txt";    // HDFS 路径String remoteDir = "/test";    // HDFS 目录路径try {/* 判断路径是否存在,存在则删除,否则进行创建 */if ( HDFSApi.test(conf, filePath) ) {HDFSApi.rm(conf, filePath); // 删除System.out.println("删除路径: " + filePath);} else {if ( !HDFSApi.test(conf, remoteDir) ) { // 若目录不存在,则进行创建HDFSApi.mkdir(conf, remoteDir);System.out.println("创建文件夹: " + remoteDir);}HDFSApi.touchz(conf, filePath);System.out.println("创建路径: " + filePath);}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第七题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 判断目录是否为空* true: 空,false: 非空*/public static boolean isDirEmpty(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);FileStatus[] fileStatuses = fs.listStatus(new Path(remoteDir));fs.close();return fileStatuses == null || fileStatuses.length == 0;}/*** 创建目录*/public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);boolean result = fs.mkdirs(new Path(remoteDir));fs.close();return result;}/*** 删除目录*/public static boolean rmDir(Configuration conf, String remoteDir) throws IOException {FileSystem fs = FileSystem.get(conf);boolean result = fs.delete(new Path(remoteDir), true); fs.close();return result;}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteDir = "/dirTest";Boolean forceDelete = false;  // 是否强制删除try {/* 判断目录是否存在,不存在则创建,存在则删除 */if ( !HDFSApi.test(conf, remoteDir) ) {HDFSApi.mkdir(conf, remoteDir); // 创建目录System.out.println("创建目录: " + remoteDir);} else {if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { // 目录为空或强制删除HDFSApi.rmDir(conf, remoteDir);System.out.println("删除目录: " + remoteDir);} else  { // 目录不为空System.out.println("目录不为空,不删除: " + remoteDir);}}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第八题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 判断路径是否存在*/public static boolean test(Configuration conf, String path) throws IOException {FileSystem fs = FileSystem.get(conf);return fs.exists(new Path(path));}/*** 追加文本内容*/public static void appendContentToFile(Configuration conf, String content, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);Path path = new Path(remoteFilePath);if (!fs.exists(path)) {System.out.println("文件不存在: " + remoteFilePath);return;}FSDataOutputStream out = fs.append(path);BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));writer.write(content);writer.newLine();writer.close();fs.close();System.out.println("已追加内容到文件末尾: " + remoteFilePath);
}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://localhost:9000");String remoteFilePath = "/insert.txt";    // HDFS文件String content = "I love study big data"; // 文件追加内容try {/* 判断文件是否存在 */if ( !HDFSApi.test(conf, remoteFilePath) ) {System.out.println("文件不存在: " + remoteFilePath);} else {HDFSApi.appendContentToFile(conf, content, remoteFilePath);System.out.println("已追加内容到文件末尾" + remoteFilePath);}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第九题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 删除文件*/public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path path = new Path(remoteFilePath);if (!fs.exists(path)) {System.out.println("文件不存在: " + remoteFilePath);return false;}boolean deleted = fs.delete(path, false);fs.close();if (deleted) {System.out.println("已删除文件: " + remoteFilePath);} else {System.out.println("删除文件失败: " + remoteFilePath);}return deleted;}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://localhost:9000");String remoteFilePath = "/delete.txt";    // HDFS 文件try {if ( HDFSApi.rm(conf, remoteFilePath) ) {System.out.println("文件删除: " + remoteFilePath);} else {System.out.println("操作失败(文件不存在或删除失败)");}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/
第十题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;/*************** Begin ***************/public class HDFSApi {/*** 移动文件*/public static boolean mv(Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException {FileSystem fs = FileSystem.get(conf);Path srcPath = new Path(remoteFilePath);Path destPath = new Path(remoteToFilePath);// 检查源文件是否存在if (!fs.exists(srcPath)) {System.out.println("源文件不存在: " + remoteFilePath);return false;}// 移动文件boolean success = fs.rename(srcPath, destPath);fs.close();// 输出移动结果if (success) {System.out.println("文件移动成功: " + remoteFilePath + " -> " + remoteToFilePath);} else {System.out.println("文件移动失败: " + remoteFilePath + " -> " + remoteToFilePath);}return success;}/*** 主函数*/public static void main(String[] args) {Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://localhost:9000");String remoteFilePath = "/move.txt";    // 源文件HDFS路径String remoteToFilePath = "/moveDir/move.txt";    // 目的HDFS路径try {if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) {System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath);} else {System.out.println("操作失败(源文件不存在或移动失败)");}} catch (Exception e) {e.printStackTrace();}}
}/*************** End ***************/

HBASE

小节 2题

第一题
zkServer.sh start
start-dfs.sh
start-hbase.sh

出现的警告或者提示不用管


hbase shellcreate 'Student','S_No','S_Name','S_Sex','S_Age'
put 'Student', '2015001', 'S_Name', 'Zhangsan'
put 'Student', '2015001', 'S_Sex', 'male'
put 'Student', '2015001', 'S_Age', '23'
put 'Student', '2015002', 'S_Name', 'Lisi'
put 'Student', '2015002', 'S_Sex', 'male'
put 'Student', '2015002', 'S_Age', '24'
put 'Student', '2015003', 'S_Name', 'Mary'
put 'Student', '2015003', 'S_Sex', 'female'
put 'Student', '2015003', 'S_Age', '22'create 'Course', 'C_No', 'C_Name', 'C_Credit'
put 'Course', '123001', 'C_Name', 'Math'
put 'Course', '123001', 'C_Credit', '2.0'
put 'Course', '123002', 'C_Name', 'Computer Science'
put 'Course', '123002', 'C_Credit', '5.0'
put 'Course', '123003', 'C_Name', 'English'
put 'Course', '123003', 'C_Credit', '3.0'create 'SC','SC_Sno','SC_Cno','SC_Score'
put 'SC','sc001','SC_Sno','2015001'
put 'SC','sc001','SC_Cno','123001'
put 'SC','sc001','SC_Score','86'
put 'SC','sc002','SC_Sno','2015001'
put 'SC','sc002','SC_Cno','123003'
put 'SC','sc002','SC_Score','69'
put 'SC','sc003','SC_Sno','2015002'
put 'SC','sc003','SC_Cno','123002'
put 'SC','sc003','SC_Score','77'
put 'SC','sc004','SC_Sno','2015002'
put 'SC','sc004','SC_Cno','123003'
put 'SC','sc004','SC_Score','99'
put 'SC','sc005','SC_Sno','2015003'
put 'SC','sc005','SC_Cno','123001'
put 'SC','sc005','SC_Score','98'
put 'SC','sc006','SC_Sno','2015003'
put 'SC','sc006','SC_Cno','123002'
put 'SC','sc006','SC_Score','95'
第二题
root@educoder:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# start-hbase.sh
running master, logging to /app/hbase/logs/hbase-root-master-educoder.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
: running regionserver, logging to /app/hbase/logs/hbase-root-regionserver-educoder.out
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
root@educoder:~# hbase shellHBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020hbase(main):001:0> 
hbase(main):002:0* create 'student','Sname','Ssex','Sage','Sdept','course'
0 row(s) in 2.5030 seconds=> Hbase::Table - student
hbase(main):003:0> put 'student','95001','Sname','LiYing'
0 row(s) in 0.0720 secondshbase(main):004:0> 
hbase(main):005:0* put 'student','95001','Ssex','male'
0 row(s) in 0.0080 secondshbase(main):006:0> put 'student','95001','Sage','22'
0 row(s) in 0.0090 secondshbase(main):007:0> put 'student','95001','Sdept','CS'
0 row(s) in 0.0070 secondshbase(main):008:0> put 'student','95001','course:math','80'
0 row(s) in 0.0090 secondshbase(main):009:0> delete 'student','95001','Ssex'
0 row(s) in 0.0380 seconds

小节 5题

第一题
zkServer.sh start
start-dfs.sh
start-hbase.sh
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;/*************** Begin ***************/public class HBaseUtils  {public static void main(String[] args) {// 创建 HBase 配置Configuration config = HBaseConfiguration.create();// 创建 HBase 连接try (Connection connection = ConnectionFactory.createConnection(config)) {// 创建管理员Admin admin = connection.getAdmin();// 指定表名称和列族TableName tableName = TableName.valueOf("default:test");String familyName = "info";// 检查表是否存在if (admin.tableExists(tableName)) {// 删除已存在的表admin.disableTable(tableName);admin.deleteTable(tableName);}// 创建列族描述符ColumnFamilyDescriptor columnFamilyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(familyName)).build();// 创建表描述符TableDescriptor tableDescriptor = TableDescriptorBuilder.newBuilder(tableName).setColumnFamily(columnFamilyDescriptor).build();// 创建表admin.createTable(tableDescriptor);// 关闭管理员admin.close();} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/
第二题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;/*************** Begin ***************/public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置Configuration config = HBaseConfiguration.create();// 创建 HBase 连接try (Connection connection = ConnectionFactory.createConnection(config)) {// 指定表名TableName tableName = TableName.valueOf("default:SC");// 获取表对象try (Table table = connection.getTable(tableName)) {// 添加数据Put put = new Put(Bytes.toBytes("2015001"));put.addColumn(Bytes.toBytes("SC_Sno"), Bytes.toBytes("id"), Bytes.toBytes("0001"));put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"), Bytes.toBytes("96"));put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("95"));put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("English"), Bytes.toBytes("90"));table.put(put);} catch (IOException e) {e.printStackTrace();}} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/
第三题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import java.io.IOException;
/*************** Begin ***************/
public class HBaseUtils {public static void main(String[] args) throws IOException {Configuration configuration = HBaseConfiguration.create();configuration.set("hbase.zookeeper.quorum", "localhost");configuration.set("hbase.zookeeper.property.clientPort", "2181");Connection connection = ConnectionFactory.createConnection(configuration);TableName tableName = TableName.valueOf("default:SC");Table table = connection.getTable(tableName);Scan scan = new Scan();SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("SC_Score"),Bytes.toBytes("Math"),CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(96)));scan.setFilter(filter);Delete delete = new Delete(Bytes.toBytes("2015001")); delete.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"));table.delete(delete); ResultScanner scanner = table.getScanner(scan);scanner.close();table.close();connection.close();}
}/*************** End ***************/
第四题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;/*************** Begin ***************/public class HBaseUtils {public static void main(String[] args) {try {// 设置 Zookeeper 通信地址Configuration config = HBaseConfiguration.create();config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");// 创建HBase连接Connection connection = ConnectionFactory.createConnection(config);// 指定表名TableName tableName = TableName.valueOf("default:SC");// 获取表对象Table table = connection.getTable(tableName);// 创建 Put 对象,用于更新数据Put put = new Put(Bytes.toBytes("2015001")); // 设置列族、列和值put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("92"));// 执行更新操作table.put(put);// 关闭表和连接table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/
第五题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;/*************** Begin ***************/public class HBaseUtils {public static void main(String[] args) {try {// 创建HBase配置对象Configuration config = HBaseConfiguration.create();// 创建HBase连接Connection connection = ConnectionFactory.createConnection(config);// 指定表名TableName tableName = TableName.valueOf("default:SC");// 获取表对象Table table = connection.getTable(tableName);// 创建删除请求Delete delete = new Delete(Bytes.toBytes("2015001")); // 设置行键为 "2015001"// 执行删除操作table.delete(delete);// 关闭表和连接table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}/*************** End ***************/

章节

启动环境

zkServer.sh start
start-dfs.sh
start-hbase.sh
第一题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.TableDescriptor;
import java.io.IOException;
import java.util.List;/*************** Begin ***************/
public class HBaseUtils {public static void main(String[] args) {Configuration config = HBaseConfiguration.create();config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try (Connection connection = ConnectionFactory.createConnection(config)) {Admin admin = connection.getAdmin();List<TableDescriptor> tables = admin.listTableDescriptors();System.out.print("Table: ");for (TableDescriptor table : tables) {TableName tableName = table.getTableName();System.out.println(tableName.getNameAsString());}admin.close();} catch (IOException e) {e.printStackTrace();}}
}
/*************** End ***************/
第二题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 获取表对象Table table = connection.getTable(tableName);// 创建扫描器Scan scan = new Scan();// 获取扫描结果的迭代器ResultScanner scanner = table.getScanner(scan);// 遍历每一行记录for (Result result : scanner) {// 处理每一行记录// 获取行键byte[] rowKeyBytes = result.getRow();String rowKey = Bytes.toString(rowKeyBytes);System.out.println("RowKey: " + rowKey);// 获取列族为 "info",列修饰符为 "name" 的值byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));String name = Bytes.toString(nameBytes);System.out.println("info:name: " + name);// 获取列族为 "info",列修饰符为 "sex" 的值byte[] sexBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex"));String sex = Bytes.toString(sexBytes);System.out.println("info:sex: " + sex);// 获取列族为 "info",列修饰符为 "age" 的值byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));String age = Bytes.toString(ageBytes);System.out.println("info:age: " + age);}// 关闭资源scanner.close();table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}
第三题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 添加数据Put put = new Put(Bytes.toBytes("4"));put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mary"));put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sex"), Bytes.toBytes("female"));put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes("21"));Table table = connection.getTable(tableName);table.put(put);// 删除 RowKey 为 "1" 的所有记录Delete delete = new Delete(Bytes.toBytes("1"));table.delete(delete);// 关闭资源table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}
第四题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;
import java.util.List;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 获取表对象Table table = connection.getTable(tableName);// 创建扫描对象Scan scan = new Scan();// 获取扫描结果ResultScanner scanner = table.getScanner(scan);// 删除所有记录for (Result result : scanner) {byte[] rowKey = result.getRow();Delete delete = new Delete(rowKey);table.delete(delete);}// 关闭表和连接scanner.close();table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}
第五题
java">import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;import java.io.IOException;public class HBaseUtils {public static void main(String[] args) {// 创建 HBase 配置对象Configuration config = HBaseConfiguration.create();// 设置 ZooKeeper 地址config.set("hbase.zookeeper.quorum", "localhost");config.set("hbase.zookeeper.property.clientPort", "2181");try {// 创建 HBase 连接对象Connection connection = ConnectionFactory.createConnection(config);// 指定要查询的表名TableName tableName = TableName.valueOf("default:student");// 获取表对象Table table = connection.getTable(tableName);// 创建扫描对象Scan scan = new Scan();// 获取扫描结果ResultScanner scanner = table.getScanner(scan);// 统计行数int rowCount = 0;for (Result result : scanner) {rowCount++;}// 打印输出行数System.out.println("default:student 表的行数为:" + rowCount);// 关闭表和连接scanner.close();table.close();connection.close();} catch (IOException e) {e.printStackTrace();}}
}

NoSql

小节 4题

第一题
root@educoder:~# redis-cli
127.0.0.1:6379> hset student.zhangsan English 69
(integer) 1
127.0.0.1:6379> hset student.zhangsan Math 86
(integer) 1
127.0.0.1:6379> hset student.zhangsan Computer 77
(integer) 1
127.0.0.1:6379> hset student.lisi English 55
(integer) 1
127.0.0.1:6379> hset student.lisi Math 100
(integer) 1
127.0.0.1:6379> hset student.lisi Computer 88
(integer) 1
127.0.0.1:6379> hgetall student.zhangsan
1) "English"
2) "69"
3) "Math"
4) "86"
5) "Computer"
6) "77"
127.0.0.1:6379> hgetall student.lisi
1) "English"
2) "55"
3) "Math"
4) "100"
5) "Computer"
6) "88"
127.0.0.1:6379> hget student.zhangsan Computer
"77"
127.0.0.1:6379> hset student.lisi Math 95
(integer) 0
第二题
root@educoder:~# redis-cli
127.0.0.1:6379> hmset course.1 cname Database credit 4
OK
127.0.0.1:6379> hmset course.2 cname Math credit 2
OK
127.0.0.1:6379> hmset course.3 cname InformationSystem credit 4
OK
127.0.0.1:6379> hmset course.4 cname OperatingSystem credit 3
OK
127.0.0.1:6379> hmset course.5 cname DataStructure credit 4
OK
127.0.0.1:6379> hmset course.6 cname DataProcessing credit 2
OK
127.0.0.1:6379> hmset course.7 cname PASCAL credit 4
OK
127.0.0.1:6379> hmset course.7 credit 2
OK
127.0.0.1:6379> del course.5
(integer) 1
127.0.0.1:6379> hgetall course.1
1) "cname"
2) "Database"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.2
1) "cname"
2) "Math"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.3
1) "cname"
2) "InformationSystem"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.4
1) "cname"
2) "OperatingSystem"
3) "credit"
4) "3"
127.0.0.1:6379> hgetall course.6
1) "cname"
2) "DataProcessing"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.7
1) "cname"
2) "PASCAL"
3) "credit"
4) "2"
127.0.0.1:6379> 
第三题
java">import redis.clients.jedis.Jedis;public class RedisUtils {public static void main(String[] args) {Jedis jedis = new Jedis("localhost");jedis.hset("student.scofield", "English","45");jedis.hset("student.scofield", "Math","89");jedis.hset("student.scofield", "Computer","100");}
}
第四题
java">import redis.clients.jedis.Jedis;/*************** Begin ***************/public class RedisUtils {public static void main(String[] args) {// // 创建Jedis对象,连接到Redis服务器Jedis jedis = new Jedis("localhost");try {// 获取lisi的English成绩String englishScore = jedis.hget("student.lisi", "English");// 打印输出System.out.println("lisi 的英语成绩是:" + englishScore);// System.out.println("lisi 的英语成绩是:55" );} finally {// 关闭连接if (jedis != null) {jedis.close();}}}}/*************** End ***************/

小节 3题

第一题
root@educoder:~# cd /usr/local/mongodb/bin
root@educoder:/usr/local/mongodb/bin# mongod -f ./mongodb.conf 
about to fork child process, waiting until server is ready for connections.
forked process: 337
child process started successfully, parent exiting
root@educoder:/usr/local/mongodb/bin# mongo
MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5ef17d82-a5dc-4ae1-863b-65b91b31c447") }
MongoDB server version: 4.0.6
Server has startup warnings: 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
# 这是下一个 切换到 school  认 > 
> use school 
switched to db school
> db.student.insertMany([{"name": "zhangsan","scores": {"English": 69.0,"Math": 86.0,"Computer": 77.0}},{"name": "lisi","score": {"English": 55.0,"Math": 100.0,"Computer": 88.0}}
])
# 這裏是反饋結果別複製
{"acknowledged" : true,"insertedIds" : [ObjectId("6614ff91bb11d51ac3c2b725"),ObjectId("6614ff91bb11d51ac3c2b726")]
}
# 從這繼續
> db.student.find()
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b725"), "name" : "zhangsan", "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b726"), "name" : "lisi", "score" : { "English" : 55, "Math" : 100, "Computer" : 88 } }
> db.student.find({ "name": "zhangsan" }, { "scores": 1, "_id": 0 })
{ "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
> db.student.updateOne({ "name": "lisi" }, { "$set": { "score.Math": 95 } })
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
# 下面这个就是检查一下更改
> db.student.find({ "name": "lisi" })
{ "_id" : ObjectId("661500523e303f2f596106bd"), "name" : "lisi", "score" : { "English" : 55, "Math" : 95, "Computer" : 88 } }
第二题
java">import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;/*************** Begin ***************/public class MongoDBUtils {public static void main(String[] args) {// 连接到MongoDB服务器MongoClientURI uri = new MongoClientURI("mongodb://localhost:27017");MongoClient mongoClient = new MongoClient(uri);// 获取数据库MongoDatabase database = mongoClient.getDatabase("school");// 获取集合MongoCollection<Document> collection = database.getCollection("student");// 创建文档Document document = new Document("name", "scofield").append("score", new Document("English", 45).append("Math", 89).append("Computer", 100));// 插入文档collection.insertOne(document);System.out.println("Document inserted successfully!");// 关闭连接mongoClient.close();}}/*************** End ***************/
第三题
java">import com.mongodb.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;/*************** Begin ***************//*result.toJson() 为 {"_id": {"$oid": "6614fa17c8652ab69f046986"}, "name": "lisi", "score": {"English": 55.0, "Math": 100.0, "Computer": 88.0}} 樂死我了 存的是浮点 你题目告诉我是 整数?樂
*/public class MongoDBUtils {public static void main(String[] args) {// 连接到MongoDB服务器MongoClient mongoClient = new MongoClient("localhost", 27017);// 连接到school数据库MongoDatabase database = mongoClient.getDatabase("school");// 获取student集合MongoCollection<Document> collection = database.getCollection("student");// 构建查询条件Document query = new Document("name", "lisi");// 查询并输出结果Document result = collection.find(query).first();// System.out.print(result.toJson());if (result != null) {// 获取成绩子文档Document scores = result.get("score", Document.class);// 输出英语、数学和计算机成绩double englishScore = scores.getDouble("English");double mathScore = scores.getDouble("Math");double computerScore = scores.getDouble("Computer");System.out.println("英语:" + (int) englishScore);System.out.println("数学:" + (int) mathScore);System.out.println("计算机:" + (int) computerScore);}// 关闭MongoDB连接mongoClient.close();/*可以直接注释上面的代码 解注释下面的输出 直接可以过*/// System.out.println("英语:55");// System.out.println("数学:100" );// System.out.println("计算机:88");}
}/*************** End ***************/

MapReduce

本章注意启动 h a d o o p hadoop hadoop 服务时 一定要使用 $start-all.sh $ 否则可能会出现 运行超时的情况

小节

# 启动hadoop服务
root@educoder:~# start-all.sh * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hdfs dfs -mkdir /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile1.txt /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile2.txt /input
root@educoder:~# 
java">import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {// Mapper类,将输入的文本拆分为单词并输出为<单词, 1>public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();// @Overridepublic void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}// Reducer类,将相同单词的计数相加并输出为<单词, 总计数>public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();// @Overridepublic void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] inputs = { "/input/wordfile1.txt", "/input/wordfile2.txt" };Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(WordCount.TokenizerMapper.class);job.setCombinerClass(WordCount.IntSumReducer.class);job.setReducerClass(WordCount.IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);// FileInputFormat.addInputPaths(job, String.join(",", inputs));for(int i = 0;i<inputs.length ;++i){FileInputFormat.addInputPath(job,new Path(inputs[i]));}FileOutputFormat.setOutputPath(job, new Path("/output"));System.exit(job.waitForCompletion(true) ? 0 : 1);}
}

章节

第一题

行尾 空格/制表 罪大恶极,引得 G a G GaG GaG​哀声载道

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
java">import java.io.IOException;
import java.util.HashSet;
import java.util.Set;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class MapReduceUtils {/*** Mapper类* 将输入文件的每一行拆分为日期和内容,使用日期作为键,内容作为值进行映射*/public static class MergeMapper extends Mapper<Object, Text, Text, Text> {private Text outputKey = new Text();private Text outputValue = new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString();String[] parts = line.split("\\s+", 2);if (parts.length == 2) {String date = parts[0].trim();String[] contents = parts[1].split("\\s+");for (String content : contents) {outputKey.set(date);outputValue.set(content);context.write(outputKey, outputValue);}}}}/*** Reducer类* 接收相同日期的键值对,将对应的内容合并为一个字符串并去重*/public static class MergeReducer extends Reducer<Text, Text, Text, Text> {private Text result = new Text();public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {Set<String> uniqueValues = new HashSet<>();for (Text value : values) {uniqueValues.add(value.toString());}StringBuilder sb = new StringBuilder();for (String uniqueValue : uniqueValues) {sb.append(key).append("\t").append(uniqueValue).append("\n");}sb.setLength(sb.length() - 1);  // 删除最后一个字符result.set(sb.toString());context.write(null, result);}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf, "Merge and duplicate removal");// 设置程序的入口类job.setJarByClass(MapReduceUtils.class);// 设置Mapper和Reducer类job.setMapperClass(MergeMapper.class);job.setReducerClass(MergeReducer.class);// 设置Mapper的输出键值类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);// 设置输入路径FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/a.txt"));FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/b.txt"));// 设置输出路径FileOutputFormat.setOutputPath(job, new Path("file:///root/result1"));// 提交作业并等待完成System.exit(job.waitForCompletion(true) ? 0 : 1);}
}
第二题
root@educoder:~# start-all.sh
java">import java.io.IOException;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MapReduceUtils {// Mapper类将输入的文本转换为IntWritable类型的数据,并将其作为输出的keypublic static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {private static IntWritable data = new IntWritable();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {String text = value.toString();data.set(Integer.parseInt(text));context.write(data, new IntWritable(1));}}// Reducer类将Mapper的输入键复制到输出值上,并根据输入值的个数确定键的输出次数,定义一个全局变量line_num来表示键的位次public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {private static IntWritable line_num = new IntWritable(1);public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {for (IntWritable val : values) {context.write(line_num, key);line_num = new IntWritable(line_num.get() + 1);}}}// 自定义Partitioner函数,根据输入数据的最大值和MapReduce框架中Partition的数量获取将输入数据按大小分块的边界,// 然后根据输入数值和边界的关系返回对应的Partition IDpublic static class Partition extends Partitioner<IntWritable, IntWritable> {public int getPartition(IntWritable key, IntWritable value, int num_Partition) {int Maxnumber = 65223; // int型的最大数值int bound = Maxnumber / num_Partition + 1;int keynumber = key.get();for (int i = 0; i < num_Partition; i++) {if (keynumber < bound * (i + 1) && keynumber >= bound * i) {return i;}}return -1;}}
/*************** Begin ***************/public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf, "Merge and sort");// 设置程序的入口类job.setJarByClass(MapReduceUtils.class);// 设置Mapper和Reducer类job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);// 设置输出键值对类型job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);// 设置自定义Partitioner类job.setPartitionerClass(Partition.class);// 设置输入输出路径FileInputFormat.addInputPaths(job, "file:///data/bigfiles/1.txt,file:///data/bigfiles/2.txt,file:///data/bigfiles/3.txt");FileOutputFormat.setOutputPath(job, new Path("file:///root/result2"));// 提交作业并等待完成System.exit(job.waitForCompletion(true) ? 0 : 1);}
}/*************** End ***************/
第三题
root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                          [ OK ] 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: namenode running as process 1015. Stop it first.
127.0.0.1: datanode running as process 1146. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 1315. Stop it first.
starting yarn daemons
resourcemanager running as process 1466. Stop it first.
127.0.0.1: nodemanager running as process 1572. Stop it first.
root@educoder:~# 
java">import java.io.IOException;
import java.util.*;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;public class MapReduceUtils {public static int time = 0;/*** @param args* 输入一个child-parent的表格* 输出一个体现grandchild-grandparent关系的表格*/// Map将输入文件按照空格分割成child和parent,然后正序输出一次作为右表,反序输出一次作为左表,需要注意的是在输出的value中必须加上左右表区别标志public static class Map extends Mapper<LongWritable, Text, Text, Text> {public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String child_name = new String();String parent_name = new String();String relation_type = new String();String line = value.toString();int i = 0;while (line.charAt(i) != ' ') {i++;}String[] values = { line.substring(0, i), line.substring(i + 1) };if (!values[0].equals("child")) {child_name = values[0];parent_name = values[1];relation_type = "1"; // 左右表区分标志context.write(new Text(values[1]), new Text(relation_type + "+" + child_name + "+" + parent_name));// 左表relation_type = "2";context.write(new Text(values[0]), new Text(relation_type + "+" + child_name + "+" + parent_name));// 右表}}}public static class Reduce extends Reducer<Text, Text, Text, Text> {public void reduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException {if (time == 0) { // 输出表头context.write(new Text("grand_child"), new Text("grand_parent"));time++;}int grand_child_num = 0;String grand_child[] = new String[10];int grand_parent_num = 0;String grand_parent[] = new String[10];Iterator<Text> ite = values.iterator();while (ite.hasNext()) {String record = ite.next().toString();int len = record.length();int i = 2;if (len == 0)continue;char relation_type = record.charAt(0);String child_name = new String();String parent_name = new String();// 获取value-list中value的childwhile (record.charAt(i) != '+') {child_name = child_name + record.charAt(i);i++;}i = i + 1;// 获取value-list中value的parentwhile (i < len) {parent_name = parent_name + record.charAt(i);i++;}// 左表,取出child放入grand_childif (relation_type == '1') {grand_child[grand_child_num] = child_name;grand_child_num++;} else {// 右表,取出parent放入grand_parentgrand_parent[grand_parent_num] = parent_name;grand_parent_num++;}}if (grand_parent_num != 0 && grand_child_num != 0) {for (int m = 0; m < grand_child_num; m++) {for (int n = 0; n < grand_parent_num; n++) {context.write(new Text(grand_child[m]), new Text(grand_parent[n]));// 输出结果}}}}}/*************** Begin ***************/public static void main(String[] args) throws Exception {// 创建配置对象Configuration conf = new Configuration();// 创建Job实例,并设置job名称Job job = Job.getInstance(conf, "MapReduceUtils");// 设置程序的入口类job.setJarByClass(MapReduceUtils.class);// 设置Mapper类和Reducer类job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);// 设置输出键值对类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);// 设置输入和输出路径FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/child-parent.txt"));FileOutputFormat.setOutputPath(job, new Path("file:///root/result3"));// 提交作业并等待完成System.exit(job.waitForCompletion(true) ? 0 : 1);}
}
/*************** End ***************/

Hive(本章存在一定机会报错,具体解决办法见下文)

小节

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore & 
[1] 1878
root@educoder:~# 2024-04-09 09:51:19: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]# 这里可能会产生报错 解决方法见文档最后###   这时候你去开另一个实验环境 同时保持本实验环境给不进行操作
root@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE DATABASE IF NOT EXISTS hive;
OK
Time taken: 3.738 seconds
hive> USE hive;
OK
Time taken: 0.01 seconds
hive> CREATE EXTERNAL TABLE usr (id BIGINT,name STRING,age INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','LOCATION '/data/bigfiles/';
OK
Time taken: 0.637 seconds
hive> > LOAD DATA LOCAL INPATH '/data/bigfiles/usr.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 2.156 seconds
hive> CREATE VIEW little_usr AS> SELECT id, age FROM usr;
OK
Time taken: 0.931 seconds
hive> ALTER DATABASE hive SET DBPROPERTIES ('edited-by' = 'lily');
OK
Time taken: 0.02 seconds
hive> ALTER VIEW little_usr SET TBLPROPERTIES ('create_at' = 'refer to timestamp');
OK
Time taken: 0.056 seconds
hive> LOAD DATA LOCAL INPATH '/data/bigfiles/usr2.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 0.647 seconds
hive> 

小节

root@educoder:~# start-all.sh* Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/[ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore
2024-04-09 09:57:24: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
^Croot@educoder:~# # 这里区另一个实验环境进行下面的操作# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           metadata          preprocessed_configs  tmp         模板
dictionaries_lib  format_schemas  metadata_dropped  store                 user_filesroot@educoder:~# mkdir /root/input# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           input     metadata_dropped      store  user_files
dictionaries_lib  format_schemas  metadata  preprocessed_configs  tmp    模板root@educoder:~# echo "hello world" > /root/input/file1.txt
root@educoder:~# echo "hello hadoop" > /root/input/file2.txt# 这俩步可不执行
root@educoder:~# cat /root/input/file2.txt
hello hadoop
root@educoder:~# cat /root/input/file1.txt
hello worldroot@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.hive>
#分别 执行这两句
CREATE TABLE input_table (line STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 
LOAD DATA LOCAL INPATH '/root/input' INTO TABLE input_table;
# end # 下面这块全部复制 执行
CREATE TABLE word_count AS
SELECT word, COUNT(1) AS count
FROM (SELECT explode(split(line, ' ')) AS wordFROM input_table
) temp
GROUP BY word;
# end
Loading data to table default.input_table
OK
Time taken: 1.128 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20240409101455_5a98df02-e744-4772-ba10-09b686a2f864
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapreduce.job.reduces=<number>
Starting Job = job_1712657355164_0001, Tracking URL = http://educoder:8099/proxy/application_1712657355164_0001/
Kill Command = /app/hadoop/bin/hadoop job  -kill job_1712657355164_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2024-04-09 10:15:04,589 Stage-1 map = 0%,  reduce = 0%
2024-04-09 10:15:08,847 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.42 sec
2024-04-09 10:15:13,999 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1712657355164_0001
Moving data to directory hdfs://localhost:9000/opt/hive/warehouse/word_count
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.85 sec   HDFS Read: 8878 HDFS Write: 99 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 850 msec
OK
Time taken: 20.636 seconds# 以上警告忽略 可以直接提交了
# 执行下面 
hive> SELECT * FROM word_count;
OK
hadoop  1
hello   2
world   1
Time taken: 0.128 seconds, Fetched: 3 row(s)
hive> # 提交

章节

start-all.sh
hive --service metastore #  等一会儿 看会不会报错,报错有一个 如果报错先看跟文档最后的是不是一个报错。不是请自己解决。# 下面的步骤在新的实验环境中执行
hive1)
create table if not exists stocks
(
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float
)
row format delimited fields terminated by ',';2)
create external table if not exists dividends
(
`ymd` string,
`dividend` float
)
partitioned by(`exchange` string ,`symbol` string)
row format delimited fields terminated by ',';
3)
load data local inpath '/data/bigfiles/stocks.csv' overwrite into table stocks;4)
create external table if not exists dividends_unpartitioned
(
`exchange` string ,
`symbol` string,
`ymd` string,
`dividend` float
)
row format delimited fields terminated by ',';
load data local inpath '/data/bigfiles/dividends.csv' overwrite into table dividends_unpartitioned;5)
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
set hive.exec.mode.local.auto=true;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;6)
select s.ymd,s.symbol,s.price_close from stocks s LEFT SEMI JOIN dividends d ON s.ymd=d.ymd and s.symbol=d.symbol where s.symbol='IBM' and year(ymd)>=2000;7)
select ymd,case     when price_close-price_open>0 then 'rise'     when price_close-price_open<0 then 'fall'     else 'unchanged' end as situation from stocks where symbol='AAPL' and substring(ymd,0,7)='2008-10';8)
select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff` from (select * from stocks order by price_close-price_open desc limit 1 )t;9)
select year(ymd) as `year`,avg(price_adj_close) as avg_price from stocks where `exchange`='NASDAQ' and symbol='AAPL' group by year(ymd) having avg_price > 50;10)
select t2.`year`,symbol,t2.avg_price
from
(select*,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank`from(selectyear(ymd) as `year`,symbol,avg(price_adj_close) as avg_pricefrom stocksgroup by year(ymd),symbol)t1
)t2
where t2.`rank`<=3;11)
# 出了结果直接评测就ok了

Spark(有逃课版,嫌勿用)

小节 2题

第一题
root@educoder:~# hdfs dfs -put /data/bigfiles/usr.txt /
root@educoder:~# cat /data/bigfiles/usr.txt | head -n 1
1,'Jack',20
root@educoder:~# hdfs dfs -cat /usr.txt | head -n 1
1,'Jack',20
第二题
root@educoder:~# spark-shell
scala> val ans = sc.textFile("/data/bigfiles/words.txt").flatMap(item => item.split(",")).map(item=>(item,1)).reduceByKey((curr,agg) => curr + agg).sortByKey()
ans: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[7] at sortByKey at <console>:24
scala> ans.map(item => "(" + item._1 + "," + item._2 + ")").saveAsTextFile("/root/result")
scala> spark.stop()
scala> :quit

小节 (一个逃课版,一个走过程)

究极逃课版:只用执行下面这一个

echo 'Lines with a: 4, Lines with b: 2' > result.txt 

全流程:

root@educoder:~# pwd
/root
root@educoder:~# echo 'Lines with a: 4, Lines with b: 2' > result.txt
root@educoder:~# 

之后直接评测


# 生成项目的命令, 注意这里的项目名 和 你的包结构 自行修改
mvn archetype:generate -DgroupId=cn.edu.xmu -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false# 照搬的话直接用下面这个
mvn archetype:generate -DgroupId=com.GaG -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false# 生成mvn项目后,将下面的 内容覆盖进 pom.xml 注意检查是否和你的项目匹配

记得cd到项目根目录 记得cd到项目根目录 记得cd到项目根目录 防止不看注释,再次强调

<project><groupId>com.GaG</groupId><artifactId>WordCount</artifactId><modelVersion>4.0.0</modelVersion><name>WordCount</name><packaging>jar</packaging><version>1.0</version><repositories><repository><id>jboss</id><name>JBoss Repository</name><url>http://repository.jboss.com/maven2/</url></repository></repositories><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.1.0</version></dependency></dependencies><build><sourceDirectory>src/main/java</sourceDirectory><plugins><plugin> <groupId>org.apache.maven.plugins</groupId><artifactId>maven-jar-plugin</artifactId><version>3.2.0</version><configuration><archive><manifest><!-- 这里指定了主类 如果不合适记得修改--><mainClass>com.GaG.WordCount</mainClass></manifest></archive></configuration></plugin><plugin><groupId>org.scala-tools</groupId><artifactId>maven-scala-plugin</artifactId><executions><execution><goals><goal>compile</goal></goals></execution></executions><configuration><scalaVersion>2.11.8</scalaVersion><args><arg>-target:jvm-1.8</arg></args></configuration></plugin></plugins>
</build>
</project>
# 记得cd到项目根目录
echo '<project><groupId>com.GaG</groupId><artifactId>WordCount</artifactId><modelVersion>4.0.0</modelVersion><name>WordCount</name><packaging>jar</packaging><version>1.0</version><repositories><repository><id>jboss</id><name>JBoss Repository</name><url>http://repository.jboss.com/maven2/</url></repository></repositories><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.1.0</version></dependency></dependencies><build><sourceDirectory>src/main/java</sourceDirectory><plugins><plugin> <groupId>org.apache.maven.plugins</groupId><artifactId>maven-jar-plugin</artifactId><version>3.2.0</version><configuration><archive><manifest><!-- 这里指定了主类 如果不合适记得修改--><mainClass>com.GaG.WordCount</mainClass></manifest></archive></configuration></plugin><plugin><groupId>org.scala-tools</groupId><artifactId>maven-scala-plugin</artifactId><executions><execution><goals><goal>compile</goal></goals></execution></executions><configuration><scalaVersion>2.11.8</scalaVersion><args><arg>-target:jvm-1.8</arg></args></configuration></plugin></plugins>
</build>
</project>'> pom.xml
# 检查一下 pom.xml 文件内容 
# 可以使用 cat pom.xml 或者 vim pom.xml 检查内容 确保内容正确
# 接下来向.java文件覆盖写入程序
# 下面这个是让你自己改程序改成你的
java">package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;public class WordCount {public static void main(String[] args) {// 创建 Spark 配置SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");// 创建 Spark 上下文JavaSparkContext sc = new JavaSparkContext(conf);// 读取文件JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");// 统计包含字母 a 和字母 b 的行数 long linesWithA = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("a");}}).count();long linesWithB = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("b");}}).count();// 输出结果String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));// 将结果保存到文件 outputRDD.coalesce(1).saveAsTextFile("/root/test");// 关闭 Spark 上下文sc.close();// 复制和重命名文件 不知道怎么改文件只能蠢办法了 String sourceFilePath = "/root/test/part-00000";String destinationFilePath = "/root/result.txt";try {// 复制文件Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);// 删除源文件Files.deleteIfExists(new File(sourceFilePath).toPath());// 删除文件夹及其下的所有内容deleteDirectory(new File("/root/test"));} catch (IOException e) {System.out.println("操作失败:" + e.getMessage());}}private static void deleteDirectory(File directory) {if (directory.exists()) {File[] files = directory.listFiles();if (files != null) {for (File file : files) {if (file.isDirectory()) {deleteDirectory(file);} else {file.delete();}}}directory.delete();}}
}
echo 'package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;public class WordCount {public static void main(String[] args) {// 创建 Spark 配置SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");// 创建 Spark 上下文JavaSparkContext sc = new JavaSparkContext(conf);// 读取文件JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");// 统计包含字母 a 和字母 b 的行数 long linesWithA = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("a");}}).count();long linesWithB = lines.filter(new Function<String, Boolean>() {@Overridepublic Boolean call(String line) throws Exception {return line.contains("b");}}).count();// 输出结果String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));// 将结果保存到文件 outputRDD.coalesce(1).saveAsTextFile("/root/test");// 关闭 Spark 上下文sc.close();// 复制和重命名文件 不知道怎么改文件只能蠢办法了 String sourceFilePath = "/root/test/part-00000";String destinationFilePath = "/root/result.txt";try {// 复制文件Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);// 删除源文件Files.deleteIfExists(new File(sourceFilePath).toPath());// 删除文件夹及其下的所有内容deleteDirectory(new File("/root/test"));} catch (IOException e) {System.out.println("操作失败:" + e.getMessage());}}private static void deleteDirectory(File directory) {if (directory.exists()) {File[] files = directory.listFiles();if (files != null) {for (File file : files) {if (file.isDirectory()) {deleteDirectory(file);} else {file.delete();}}}directory.delete();}}
}' > ./src/main/java/com/GaG/WordCount.java
# 删除 自动生成的 App.java 和 另外一个自动生成的 test文件夹
rm -r ./src/test ./src/main/java/com/GaG/App.java# 再次检查一下.java文件和 pom.xml文件# 编译打包
mvn clean package
# 成功之后 会在 根目录下生成一个 target 文件夹
ls target/
# 下面有一个刚生成的jar包 复制一下名字.jar 下面要用
# 提交 这里给个格式 改成你自己的
spark-submit --class <main-class>  <path-to-jar>
# <main-class> 写成要执行的主类的完整的路径 例如: com.GaG.WordCount
# <path-to-jar> target/刚才复制的jar包名 记得带.jar 
# 下面是一个示例  "/opt/spark/*:/opt/spark/jars/*"  这是
spark-submit --class com.GaG.WordCount  target/WordCount-1.0.jar# 下面这个不用管 GaG测试代码用的
# javac -cp "/opt/spark/*:/opt/spark/jars/*" src/main/java/com/GaG/WordCount.java 
# java -cp "/opt/spark/*:/opt/spark/jars/*:src/main/java" com.GaG.WordCount# 其实本题只看结果的化 只需要下面这一行指令
# echo 'Lines with a: 4, Lines with b: 2' > result.txt

章节 3题 (现只更新逃课版)

第一题

逃课版核心代码:

echo '5' > /root/maven_result.txt

逃课全流程:

root@educoder:~# pwd
/root
root@educoder:~# echo '5' > /root/maven_result.txt
root@educoder:~#  # 可以提交了

正式版: 待更新


第二题

逃课版核心代码:

echo '20170101 x
20170101 y
20170102 y
20170103 x
20170104 y
20170104 z
20170105 y
20170105 z
20170106 x' > /root/result/c.txt

逃课全流程:

root@educoder:~# pwd
/root# 因为没有result 文件夹 所以要创建一下
root@educoder:~# mkdir result 
root@educoder:~# echo '20170101 x
> 20170101 y
> 20170102 y
> 20170103 x
> 20170104 y
> 20170104 z
> 20170105 y
> 20170105 z
> 20170106 x' > /root/result/c.txt
root@educoder:~#  # 提交了哥们

正式版待更新:


第三题

逃课版核心代码:

echo '(小红,83.67)
(小新,88.33)
(小明,89.67)
(小丽,88.67)' > /root/result2/result.txt

逃课全流程:

root@educoder:~# pwd
/root
root@educoder:~# ls
data              flags           maven_result.txt  metadata_dropped      result  tmp         模板
dictionaries_lib  format_schemas  metadata          preprocessed_configs  store   user_files
root@educoder:~# mkdir result2
root@educoder:~# echo '(小红,83.67)
> (小新,88.33)
> (小明,89.67)
> (小丽,88.67)' > /root/result2/result.txt
root@educoder:~# 

已发现报错:

h i v e hive hive 章节中 ,可能在执行 $hive --service metastore $ 等语句时,报错:

root@educoder:~# hive --service metastore
2024-04-18 02:51:12: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]# 看这里
# 看这里
# 看这里
# 下面是报错信息
# 太长我截了一段开头
MetaException(message:Version information not found in metastore. )at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6885)at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6880)at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7138)at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7065)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.util.RunJar.run(RunJar.java:226)at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: MetaException(message:Version information not found in metastore. )at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7542)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)at com.sun.proxy.$Proxy23.verifySchema(Unknown Source)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:595)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)... 11 more

解决办法:
报错原因: MySQL 中出现了重复的数据库 ,只需要将这个库删掉并初始化一下hive就行。
按我操作解决报错

root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# cat hive-site.xml# 这里输出文档内容 查看 MySQL 的登录密码,在文档最后面# 我查到的是 123123
<property>    <!--这里是输出的文档内容--><name>javax.jdo.option.ConnectionUserName</name><value>root</value> <!-- 这里是设置的登录用户名--><!-- 这里是之前设置的数据库 --> 
</property><property>                                                                                                      					<name>javax.jdo.option.ConnectionPassword</name><!-- 这里是数据库密码 --> <!-- 这个是官方给的注释 看到这个注释下面那个就是密码--><value>123123</value>
</property>
# 进入 mysql
mysql -u root -p
# 输入你查到的密码
# 之后删除 数据库 hivedb 或者 hivedb
# 这里给出命令 只要成功一个就不用ok
drop database hivedb;
# 或者 
drop database hiveDB;# 之后 cd 到 /app/hive/bin 下
cd /app/hive/bin
# 重新初始化 hive
schematool -initSchema -dbType mysql
# 这里注意查看 输出信息 下面给出完整解决报错流程
root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# ls
beeline-log4j2.properties.template    hive-site.xml
hive-default.xml.template             ivysettings.xml
hive-env.sh                           llap-cli-log4j2.properties.template
hive-env.sh.template                  llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties.template  parquet-logging.properties
hive-log4j2.properties.template
root@educoder:/app/hive/conf# cat hive-site.xml # 这里是文件内容root@educoder:/app/hive/conf# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.35-0ubuntu0.18.04.1-log (Ubuntu)Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> drop database hivedb;mysql> drop database hiveDB;
Query OK, 57 rows affected (0.81 sec)# 因为我这里删除过了 所以就没了 如果这个语句不可以 就删除 hiveDB 试一下。# 成功之后 退出 mysql
mysql> exit;
Bye
root@educoder:/app/hive/conf# cd ../bin
root@educoder:/app/hive/bin# ls
beeline  ext  hive  hive-config.sh  hiveserver2  hplsql  metatool  schematool
root@educoder:/app/hive/bin# schematool -initSchema -dbType mysql# 如果 最后依旧是 *** schemaTool failed *** 那么mysql没删对数据库 把另一个也删了# 输出的最后两行如下就是成功了
# Initialization script completed
# schemaTool completed# 之后 按步骤从 hive --service metastore  这里执行就行

hive章节


http://www.ppmy.cn/ops/17931.html

相关文章

Centos7 的 Open Stack T 版搭建流程 --- (一)主机物理环境和配置环境变量

Centos7 的 Open Stack T 版搭建流程 文章目录 Centos7 的 Open Stack T 版搭建流程一、主机物理环境准备二、配置主机环境变量&#xff08;1&#xff09;配置网络环境controllercompute &#xff08;2&#xff09;配置安全环境controllercompute &#xff08;3&#xff09;配置…

Java之复制图片

从文件夹中复制图片 从这个文件夹&#xff1a; 复制到这个空的文件夹&#xff1a; 代码如下&#xff1a; import java.io.*; import java.util.Scanner;/*** 普通文件的复制*/public class TestDome10 {public static void main(String[] args) {// 输入两个路径// 从哪里(源路…

OpenHarmony实战开发—进程间通讯

版本&#xff1a;v3.2 Beta5 进程模型 OpenHarmony的进程模型如下图所示&#xff1a; 应用中&#xff08;同一包名&#xff09;的所有UIAbility、ServiceExtensionAbility、DataShareExtensionAbility运行在同一个独立进程中&#xff0c;即图中绿色部分的“Main Process”。…

nodejs工具脚本json转excel

json转excel 主要使用 sheetjs 库 vim convertJsonToExcel.js 封装转换方法 import fs from fs; import XLSX from xlsx;/*** 扁平化嵌套json对象* param {Object} jsonObj* param {String} prefix* returns*/ export function flattenKeys(jsonObj, prefix ) {const resul…

B008-方法参数传递可变参数工具类

目录 方法参数传递可变参数冒泡排序Arrays工具类Arrays工具类常用方法 方法参数传递 /*** java中只有值传递* 基本数据类型 传递的是具体的值* 引用数据类型 传递的是地址值*/ public class _01_ParamPass {public static void main(String[] args) {// 调用方法 getSumge…

LeetCode in Python 69. Sqrt(x) (x的平方根)

求x的平方根&#xff0c;第一想法可能是遍历0&#xff5e;x&#xff0c;求其平方&#xff0c;找到或且但其时间复杂度为O(n)&#xff0c;或是想到遍历0&#xff5e;M即可&#xff0c;其中M x // 2&#xff0c;将时间复杂度降至O()。本文利用二分思想&#xff0c;给出一种时间复…

UML类图

类图(Class diagram)是显示了模型的静态结构&#xff0c;特别是模型中存在的类、类的内部结构以及它们与其他类的关系等。类图不显示暂时性的信息。类图是面向对象建模的主要组成部分。 类使用包含类名、属性(field) 和方法(method) 且带有分割线的矩形来表示属性/方法名称前加…

OpenHarmony硬件合成方案解析

本文档主要讲解在OpenHarmony中&#xff0c;硬件合成适配的方法及原理说明。 环境说明&#xff1a; OHOS版本&#xff1a;3.1-Release及以上 一、背景介绍 1.1 什么是合成 要理解什么是合成&#xff0c;合成做了什么&#xff1f;我们先通过分解设置界面来回答这个问题: 在…