快速了解NSL-KDD数据集

news/2025/3/21 13:46:19/

NSL-KDD 数据集是著名的KDD’99数据集的修订版本,该数据集由四个子数据集组成:KDDTest+、KDDTest-21、KDDTrain+、KDDTrain+_20Percent。其中KDDTest-21 和 KDDTrain+_20Percent 是 KDDTrain+ 和 KDDTest+ 的子集。数据集每条记录包含 43 个特征,其中 41 个特征指的是流量输入本身,最后两个是标签(正常或攻击)和分数(流量输入本身的严重性)。

数据集中存在 4 种不同类型的攻击:拒绝服务 (DoS)、探测、用户到根 (U2R) 和远程到本地 (R2L)。每种攻击的简要说明如下:

  • DoS 是一种尝试关闭进出目标系统的流量的攻击。 IDS被系统无法处理的异常流量淹没,并关闭以保护自己。这可以防止正常流量访问网络。这方面的一个例子可能是在线零售商在大促销的一天被大量在线订单淹没,并且由于网络无法处理所有请求,它将关闭阻止付费客户购买任何东西。这是数据集中最常见的攻击。
  • 探测或监视是一种尝试从网络获取信息的攻击。这里的目标是像小偷一样窃取重要信息,无论是关于客户的个人信息还是银行信息。
  • U2R 是一种从普通用户帐户开始并尝试以超级用户 (root) 身份访问系统或网络的攻击。攻击者试图利用系统中的漏洞来获得 root权限/访问权限。
  • R2L 是一种尝试获得对远程机器的本地访问权限的攻击。攻击者没有对系统/网络的本地访问权限,并试图以“破解”他们的方式进入网络。

每种攻击的不同子类的细分如下表:
在这里插入图片描述
每种攻击类型的数据分布如下:
在这里插入图片描述
数据集中提供的特征可以分为四类:内在、内容、基于主机和基于时间。以下是对不同类别功能的描述:

  • 内在特征可以从数据包的包头中获得,无需查看有效负载本身,保存有关数据包的基本信息。此类别包含在特征 1-9。
  • 内容特征包含有关原始数据包的信息,因为它们是分多个而不是一个发送的。有了这些信息,系统就可以访问有效载荷。此类别包含在特征 10–22。
  • 基于时间的功能在两秒的窗口内对流量输入进行分析,并包含诸如尝试与同一主机建立多少连接等信息。这些特征主要是计数和速率,而不是有关流量输入内容的信息。此类别包含在特征 23–31。
  • 基于主机的功能与基于时间的功能类似,不同之处在于它不是在 2 秒的窗口内分析,而是对一系列连接进行分析(通过 x 个连接向同一主机发出多少请求)。这些功能旨在访问跨度超过两秒窗口时间跨度的攻击。此类别包含在特征 32–41。

下表中可以看到分类特征的可能值的细分。有 3 个可能的协议类型值、60 个可能的服务值和 11 个可能的标志值。
在这里插入图片描述
Flag 中的每个值代表一个连接的状态,每个值的解释如下:
在这里插入图片描述
每个特征的描述和数据集的细分如下表:

#Feature NameDescriptionTypeValue TypeRanges (Between both train and test)
1DurationLength of time duration of the connectionContinuousIntegers0 - 54451
2Protocol TypeProtocol used in the connectionCategoricalStrings
3ServiceDestination network service usedCategoricalStrings
4FlagStatus of the connection – Normal or ErrorCategoricalStrings
5Src BytesNumber of data bytes transferred from source to destination in single connectionContinuousIntegers0 - 1379963888
6Dst BytesNumber of data bytes transferred from destination to source in single connectionContinuousIntegers0 - 309937401
7LandIf source and destination IP addresses and port numbers are equal then, this variable takes value 1 else 0BinaryIntegers{ 0 , 1 }
8Wrong FragmentTotal number of wrong fragments in this connectionDiscreteIntegers{ 0,1,3 }
9UrgentNumber of urgent packets in this connection. Urgent packets are packets with the urgent bit activatedDiscreteIntegers0 - 3
10HotNumber of “hot‟ indicators in the content such as: entering a system directory, creating programs and executing programsContinuousIntegers0 - 101
11Num Failed LoginsCount of failed login attemptsContinuousIntegers0 - 4
12Logged InLogin Status : 1 if successfully logged in; 0 otherwiseBinaryIntegers{ 0 , 1 }
13Num CompromisedNumber of "compromised” conditionsContinuousIntegers0 - 7479
14Root Shell1 if root shell is obtained; 0 otherwiseBinaryIntegers{ 0 , 1 }
15Su Attempted1 if "su root’’ command attempted or used; 0 otherwiseDiscrete (Dataset contains ‘2’ value)Integers0 - 2
16Num RootNumber of "root’’ accesses or number of operations performed as a root in the connectionContinuousIntegers0 - 7468
17Num File CreationsNumber of file creation operations in the connectionContinuousIntegers0 - 100
18Num ShellsNumber of shell promptsContinuousIntegers0 - 2
19Num Access FilesNumber of operations on access control filesContinuousIntegers0 - 9
20Num Outbound CmdsNumber of outbound commands in an ftp sessionContinuousIntegers{ 0 }
21Is Hot Logins1 if the login belongs to the "hot’’ list i.e., root or admin; else 0BinaryIntegers{ 0 , 1 }
22Is Guest Login1 if the login is a "guest’’ login; 0 otherwiseBinaryIntegers{ 0 , 1 }
23CountNumber of connections to the same destination host as the current connection in the past two secondsDiscreteIntegers0 - 511
24Srv CountNumber of connections to the same service (port number) as the current connection in the past two secondsDiscreteIntegers0 - 511
25Serror RateThe percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 1
26Srv Serror RateThe percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in srv_count (24)DiscreteFloats (hundredths of a decimal)0 - 1
27Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 1
28Srv Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in srv_count (24)DiscreteFloats (hundredths of a decimal)0 - 1
29Same Srv RateThe percentage of connections that were to the same service, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 1
30Diff Srv RateThe percentage of connections that were to different services, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 1
31Srv Diff Host RateThe percentage of connections that were to different destination machines among the connections aggregated in srv_count (24)DiscreteFloats (hundredths of a decimal)0 - 1
32Dst Host CountNumber of connections having the same destination host IP addressDiscreteIntegers0 - 255
33Dst Host Srv CountNumber of connections having the same port numberDiscreteIntegers0 - 255
34Dst Host Same Srv RateThe percentage of connections that were to different services, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 1
35Dst Host Diff Srv RateThe percentage of connections that were to different services, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 1
36Dst Host Same Src Port RateThe percentage of connections that were to the same source port, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 1
37Dst Host Srv Diff Host RateThe percentage of connections that were to different destination machines, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 1
38Dst Host Serror RateThe percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 1
39Dst Host Srv Serror RateThe percent of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 1
40Dst Host Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 1
41Dst Host Srv Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 1
42ClassClassification of the traffic inputCategoricalStrings
43Difficulty LevelDifficulty levelDiscreteIntegers0 - 21

数据集下载链接:https://www.unb.ca/cic/datasets/nsl.html
数据集详细介绍请参考:https://towardsdatascience.com/a-deeper-dive-into-the-nsl-kdd-data-set-15c753364657


http://www.ppmy.cn/news/946299.html

相关文章

NSL-KDD数据集介绍以及字段介绍

数据集 (1)NSL-KDD数据集的训练集中不包含冗余记录,所以分类器不会偏向更频繁的记录; (2)NSL-KDD数据集的测试集中没有重复的记录,使得检测率更为准确。 (3)来自每个难度级别组的所选记录的数量与原始KDD数据集中的记录的百分比成反比。结果,不同机器学习方法的分类…

KDD CUP 99 数据集解析、挖掘与下载

KDD CUP 99 数据集解析、挖掘与下载 数据特征描述 一个网络连接定义为在某个时间内从开始到结束的TCP数据包序列,并且在这段时间内,数据在预定义的协议下(如TCP、UDP)从源IP地址到目的IP地址的传递。每个网络连接被标记为正常&a…

L多样化

为了解决同质性攻击和背景知识攻击所带来的隐私泄露,Machanavajjhala等人提出了L-多样性(l-diversity)模型。简单来说,就是在公开的数据中,每一个等价类里的敏感属性必须具有多样性,即L-多样性保证每一个等价类里,敏感…

protell99 使用细节

细节1:protell99se 如果要把多张原理图,合在一起去画PCB,在保持多张原理图的架构下, 可按以下几步操作(A1,A2,A3 三步操作) A1: 可点Design->Create symbol form sheet 然后,选择要合并的原理图。 A2: 接下来生…

前端 Jenkins 自动化部署

由于公司使用自己搭建的 svn 服务器来进行代码管理,因此这里 Jenkins 是针对 svn 服务器来进行的配置,其实跟Git 配置基本一致。 在没有自动化部署前 之前项目每次修改之后都需要本地 ​​npm run build ​​一次手动发布到服务器上方便测试和产品查看…

自动登录harbor脚本

在CI/CD自动化流程中,或者执行shell脚本拉取本地镜像时,需要手工docker login本地仓库,十分的不方便,这里提供一个可以执行的sh脚本,自动登录本地的镜像仓库。 1、安装expect yum -y install expect 2、创建sh文件 vi…

毫米波雷达 AWR2243 开箱靓照

收到开发板已经有一段时间了,一直还没发帖,年尾单位各种项目验收,确实有点忙,以至于现在才来,好了,废话不多说了,首先来开箱看看我们的板子吧。 AWR2243 Boost是TI mmwave系列微波雷达开发板。…

简单介绍Radar(雷达)和Lidar(激光雷达)的区别

在看无人驾驶相关的资料时,发现关于radar和lidar分的很清,特在此做个简单的介绍以免混淆。 两者最本质的区别是在所用的波上,波长不通,Radar属于毫米波,通常是4-12mm; lidar用的是激光波长通常在900-1500nm之间。 Ra…