【深度学习】论文复现-对论文数据集的一些处理

news/2024/12/21 2:04:02/

如何书写伪代码:
ref:https://www.bilibili.com/video/BV12D4y1j7Zf/?vd_source=3f7ae4b9d3a2d84bf24ff25f3294d107

i=14时产出的图片比较合理

import json
import os.path
from matplotlib.ticker import FuncFormatter
import pandas as pd
import matplotlib.pyplot as plt# csv_path= r"/home/justin/Desktop/code/python_project/mypaper/data_process/CAM-01-SRV-lvm0.csv"
# df = pd.read_csv(csv_path, header=0, sep=",")
# df.head(5)
# df = df[["Timestamp", "Hostname", "DiskNumber", "Type", "LBA", "Size", "ResponseTime"]][df["Type"] == "Read"].reset_index(drop=True)
# base_dir = os.path.dirname(os.path.abspath(__file__))
# for i in range(1, 30):
#     # 勾画出,数据的请求分布
#     start_row = i * 100
#     end_row = (i + 1) * 100
#     print(start_row, end_row)
#     df1 = df[['LBA']][start_row:end_row]
#     from matplotlib.ticker import ScalarFormatter
#     plt.plot(df1.index, df1.LBA)
#     plt.title('Irregularity of I/O access locality')
#     plt.xlabel('Access Order')
#     plt.ylabel('Logical Block Address (unit:B)')
#     def format_ticks(x, _):
#         return f'{int(x):,}'
#     plt.gca().yaxis.set_major_formatter(FuncFormatter(format_ticks),ScalarFormatter(useMathText=False))
#     plt.gca().yaxis.get_major_formatter().set_scientific(False)
#     plt.subplots_adjust(left=0.25)
#     # plt.show()
#     save_img_path = os.path.join(base_dir, 'weak_locality','irregularity_io_access_locality_{}.png'.format(i))
#     print(save_img_path)
#     plt.savefig(save_img_path, format='png')
#     plt.clf()import os
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter, MultipleLocator# Load the CSV file
csv_path = r"/home/justin/Desktop/code/python_project/mypaper/data_process/CAM-01-SRV-lvm0.csv"
df = pd.read_csv(csv_path, header=0, sep=",")
df.head(5)
pd.set_option('display.max_rows', None)  # Show all rows
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.expand_frame_repr', False)  # Prevent line wrapping for large DataFrames# Optionally, if you want to set width and precision for better formatting
pd.set_option('display.width', None)  # Auto-detect the width of the display
pd.set_option('display.precision', 3) # Filter the DataFrame for 'Read' types and specific columns
df = df[["Timestamp", "Hostname", "DiskNumber", "Type", "LBA", "Size", "ResponseTime"]][df["Type"] == "Read"].reset_index(drop=True)
# Calculate the differences
df['LBA_diff'] = df['LBA'].diff()# Drop NaN values resulting from the difference computation
df = df.dropna(subset=['LBA_diff'])
LBA_diff_list = df['LBA_diff'].tolist()# def find_repeated_sequences(lst, length):
#     """
#     Find and return the first and last indices of the first repeating sequence of the given length.#     :param lst: List of integers to search for repeating sequences.
#     :param length: Length of the sequence to look for.
#     :return: Tuple containing the first and last indices of the first repeating sequence, or None if not found.
#     """
#     sequence_indices = {}
#     first_index = None
#     last_index = None#     # Iterate through the list to find sequences of the specified length
#     for i in range(len(lst) - length + 1):
#         # Get the current sequence as a tuple (to allow it to be a dictionary key)
#         current_sequence = tuple(lst[i:i + length])#         if current_sequence in sequence_indices:
#             # If the sequence has been seen before, update indices
#             first_index = sequence_indices[current_sequence][0]  # First occurrence
#             last_index = i  # Update last occurrence
#             break  # Only need the first repeating sequence
#         else:
#             # Store the index of the first occurrence of this sequence
#             sequence_indices[current_sequence] = (i,)#     return (first_index, last_index)# # Example usage
# lst = LBA_diff_list[20000:]
# length = 5# result = find_repeated_sequences(lst, length)
# print(f"First occurrence: {result[0]}, Last occurrence: {result[1]}")
# print(df[20117:20123],df[20119:20125])
# # Exit the script if needed
# exit("==========")
# # Get the base directory path
# base_dir = os.path.dirname(os.path.abspath(__file__))# Get the base directory path
base_dir = os.path.dirname(os.path.abspath(__file__))for i in range(1, 30):if i!=14:continue# Define the start and end row indicesstart_row = i * 100end_row = (i + 1) * 100print(start_row, end_row)# Slice the necessary part of the DataFramedf1 = df[['LBA']][start_row:end_row].reset_index(drop=True)# Plot the dataplt.plot(df1.index, df1.LBA)plt.title('Irregularity of I/O Access Locality')plt.xlabel('Access Order (unit:times)')plt.ylabel('Logical Block Address (unit:B)')# Function to format y-ticks with commasdef format_ticks(x, _):return f'{int(x):,}'# Set the y-axis major formatterplt.gca().yaxis.set_major_formatter(FuncFormatter(format_ticks))# Set x-axis major and minor ticksplt.gca().xaxis.set_major_locator(MultipleLocator(10))  # Major ticks every 10 unitsplt.gca().xaxis.set_minor_locator(MultipleLocator(5))   # Minor ticks every 2 unitsax = plt.gca()  # Get the current axesax.spines['top'].set_visible(False)    # Hide the top spineax.spines['right'].set_visible(False)  # Hide the right spineax.spines['left'].set_visible(True)    # Show the left spineax.spines['bottom'].set_visible(True)# Adjust the margins if necessaryplt.subplots_adjust(left=0.25)# Constructing the save image pathsave_img_path = os.path.join(base_dir, 'weak_locality', 'irregularity_io_access_locality_{}.png'.format(i))print(save_img_path)# Save the plot as a PNG fileplt.savefig(save_img_path, format='png')# Clear the figure after savingplt.clf()# Plot the 'Size' columndf2 = df[['Size']][start_row:end_row].reset_index(drop=True)    # Set the figure sizeplt.plot(df2['Size'], marker='o',markersize=2,linestyle='-',linewidth=0.5)  # Plot with markersplt.title('Variablity of I/O Access Size')plt.xlabel('Access Order(unit:times)')plt.ylabel('Size(Unit:B)')plt.gca().xaxis.set_major_locator(MultipleLocator(10))  # Major ticks every 10 unitsplt.gca().xaxis.set_minor_locator(MultipleLocator(5))   # Minor ticks every 2 unitsax = plt.gca()  # Get the current axesax.spines['top'].set_visible(False)    # Hide the top spineax.spines['right'].set_visible(False)  # Hide the right spineax.spines['left'].set_visible(True)    # Show the left spineax.spines['bottom'].set_visible(True)# plt.grid()  # Add grid for better readabilityplt.tight_layout()  # Adjust layout to avoid clippingsave_img_path = os.path.join(base_dir, 'weak_locality', 'io_access_locality_size_{}.png'.format(i))plt.savefig(save_img_path, format='png')print(save_img_path)plt.clf()

Total count: 246990497, Only once count: 52128003, ratio: 21.11%
Mean: 35004.78260010141
Median: 32768.0 中位数
Mode: 65536.0 众数
Minimum: 512.0 最小值
Maximum: 6410240.0 最大值

\documentclass{article}
\usepackage[ruled,longend,linesnumber]{algorithm2e}
\usepackage{xeCJK}\begin{document}
\begin{algorithm}
\KwIn{我在B站刷到了本视频}
\KwOut{我学会了,给个三连}
\Begin{
我在B站刷到了本视频\;
看标题好像有点用,点进去看看\;
\While{视频正在播放}{继续观看\;\tcc{不考虑没看懂某一部分,所以一直回看的死循环}\eIf{理解}{看下部分\;下部分变为这部分\;}{回看这部分\;}
}
我学会了,给个三连!
}
\caption{如何生成好看的伪代码}
\end{algorithm}\end{document}
\documentclass{article}
\usepackage[ruled, longend, linesnumbered]{algorithm2e}
\usepackage{xeCJK}\begin{document}\begin{algorithm}
\KwIn{ $T$: LBA Sequence; \ $L$: Window size;}
\KwOut{$X$, $y$}
\tcc{X是列表,每个item包含(Delta-LBA,SIZE)两个元素数据\;y是列表,每个item包含(Delta-LBA,SIZE)两个元素数据\; L是滑动窗口大小}
\Begin{$i \gets 0$ \; $j \gets 0$ \; \While{$i + L < T.length()$}{$X[j] \gets T[i:i+L-1]$\;$y[j] \gets T[i+L]$\;$i \gets i+k$\;$j \gets j+1$\;}\KwRet{$X$, $y$}
}
\caption{LBA Feature Preprocessor}
\end{algorithm}\end{document}

http://www.ppmy.cn/news/1556788.html

相关文章

Moretl非共享文件夹日志采集

免费: 至Gitee下载 教程: 使用说明 用途 定时全量或增量采集工控机,电脑文件或日志. 优势 开箱即用: 解压直接运行.不需额外下载.管理设备: 后台统一管理客户端.无人值守: 客户端自启动,自更新.稳定安全: 架构简单,兼容性好,通过授权控制访问. 架构 技术架构: Asp .net …

【计算机视觉基础CV】03-深度学习图像分类实战:鲜花数据集加载与预处理详解

本文将深入介绍鲜花分类数据集的加载与处理方式&#xff0c;同时详细解释代码的每一步骤并给出更丰富的实践建议和拓展思路。以实用为导向&#xff0c;为读者提供从数据组织、预处理、加载到可视化展示的完整过程&#xff0c;并为后续模型训练打下基础。 前言 在计算机视觉的深…

WebRTC服务质量(04)- 重传机制(01) RTX NACK概述

WebRTC服务质量&#xff08;01&#xff09;- Qos概述 WebRTC服务质量&#xff08;02&#xff09;- RTP协议 WebRTC服务质量&#xff08;03&#xff09;- RTCP协议 WebRTC服务质量&#xff08;04&#xff09;- 重传机制&#xff08;01) RTX NACK概述 WebRTC服务质量&#xff08;…

webpack如何自定义插件?示例

在Webpack中创建自定义插件通常涉及以下步骤&#xff1a; 使用 module.exports 导出一个类或者一个函数。 这个类或者函数需要实现 apply 方法&#xff0c;这个方法会接收一个 compiler 对象作为参数。 在 apply 方法中&#xff0c;你可以订阅Webpack的生命周期钩子&#xff…

部署 Apache Samza 和 Apache Kafka

部署 Apache Samza 和 Apache Kafka 的流处理系统可以分为以下几个步骤,涵盖环境准备、部署细节和生产环境的优化。 1. 环境准备 硬件要求 Kafka Broker:至少 3 台服务器,建议每台服务器配备 4 核 CPU、16GB 内存和高速磁盘。Samza 部署节点:根据任务规模,至少准备 2 台…

Android Studio的笔记--BusyBox相关

BusyBox 相关 BusyBoxandroid上安装busybox和使用示例一、下载二、移动三、安装和设置环境变量四、使用 busybox源码下载和查看 BusyBox BUSYBOX BUSYBOX链接https://busybox.net/ 点击链接后如图 点击左边菜单栏的Get BusyBix中的Download Source 跳转到busybox 的下载源码…

Mac charles报错 invalid keystore format

1.问题说明 打开charles会有一个 invalid keystore format的提示&#xff0c;更隐藏的影响&#xff0c;是安卓设备安装了凭证&#xff0c;但是charles仍然抓不到包&#xff0c;会展示unknow&#xff0c;即使是charles配置好了ssl proxy setting&#xff0c;并且mac信任了char…

CEF127 编译指南 MacOS 篇 - 安装 Git 和 Python(三)

1. 引言 在前面的文章中&#xff0c;我们已经完成了 Xcode 及基础开发工具的安装和配置。接下来&#xff0c;我们需要安装两个同样重要的工具&#xff1a;Git 和 Python。这两个工具在 CEF 的编译过程中扮演着关键角色。Git 负责管理和获取源代码&#xff0c;而 Python 则用于…