AttacKG:从网络威胁情报报告构建技术知识图谱
文章摘要
网络攻击越来越复杂和多样化,使得攻击检测越来越具有挑战性。为了打击这些攻击,安全从业者积极总结并以网络威胁情报(CTI)报告的形式交流他们对组织间攻击的知识。然而,由于以自然语言文本编写的CTI报告不是用于自动分析的,因此报告的使用需要繁琐的手动威胁情报恢复工作。此外,单个报告通常仅涵盖攻击模式的有限方面(例如,技术),因此不足以提供具有多种变体的攻击的全面视图。
在本文中,我们提出AttacKG从CTI报告中自动提取结构化攻击行为图,并识别相关的攻击技术。然后,我们汇总报告中的威胁情报,以收集不同方面的技术,并将攻击行为图增强为技术知识图(TKG)。
在我们对来自不同情报来源的真实世界CTI报告的评估中,AttacKG有效地识别了28262种攻击技术和8393种独特的妥协指标。为了进一步验证AttacKG在提取威胁情报方面的准确性,我们在16个手动标记的CTI报告上运行AttacKG。实验结果表明,AttacKG以0.887、0.896和0.789的F1分数准确识别攻击相关实体、依赖性和技术,这优于最先进的方法。此外,我们的TKG直接受益于建立在攻击技术之上的下游安全实践,例如先进的持续威胁检测和网络攻击重建。
相关资源
- 文章地址:AttacKG: Constructing Technique Knowledge Graph from Cyber Threat Intelligence Reports.-学术范
- 源码地址:https://github.com/li-zhenyuan/Knowledge-enhanced-Attack-Graph.
- 数据集及模型地址:https://drive.google.com/drive/folders/1zVGPpN-i-BLlpFqQERscFGb45PkhfkUm
- Spacy网站:English · spaCy Models Documentation
环境搭建
源码git clone下载(或者下载zip文档)
git clone https://github.com/li-zhenyuan/Knowledge-enhanced-Attack-Graph.git
虚拟环境配置
笔者使用的MiniConda构建虚拟环境(同Anaconda)
conda create -n AttacKG python=3.8
构建好后进入环境
conda activate AttacKG
如果直接使用
pip install -r requirements.txt
可能会出现两个问题:一个是模块依赖冲突的问题,二是网络访问不可达的问题。如下:
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))': /msg-systems/coreferee/raw/master/models/coreferee_model_en.zip
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /msg-systems/coreferee/raw/master/models/coreferee_model_en.zip (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-gpu 2.10.0 requires numpy>=1.20, but you have numpy 1.19.2 which is incompatible.
dglgo 0.0.2 requires pydantic>=1.9.0, but you have pydantic 1.8.2 which is incompatible.
pandas 1.5.0 requires numpy>=1.20.3; python_version < "3.10", but you have numpy 1.19.2 which is incompatible.
安装Python模块
先安装一下基础模块
pip install scipy joblib scikit-learn Pygments pandas psutil
解决依赖冲突问题
因为源码中requirment.txt的模块会跟现有环境冲突,所以笔者重新配了一下各个模块的版本
absl-py==1.0.0
astunparse==1.6.3
beautifulsoup4==4.10.0
blis==0.7.8
bs4==0.0.1
cachetools==5.0.0
catalogue==2.0.6
certifi==2021.10.8
cffi==1.15.0
chardet==4.0.0
charset-normalizer==2.0.12
click==8.0.4
coreferee==1.3.1
#coreferee-model-en @ https://github.com/msg-systems/coreferee/raw/master/models/coreferee_model_en.zip
cryptography==36.0.1
cycler==0.11.0
cymem==2.0.6
flatbuffers==2.0
fonttools==4.30.0
gast==0.4.0
google-auth==2.6.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.44.0
h5py==2.10.0
idna==3.3
importlib-metadata==4.11.2
jarowinkler==1.0.1
Jinja2==3.0.3
joblib==1.1.0
keras==2.10.0
Keras-Preprocessing==1.1.2
kiwisolver==1.3.2
langcodes==3.3.0
Levenshtein==0.18.1
libclang==13.0.0
lxml==4.8.0
Markdown==3.3.6
MarkupSafe==2.1.0
matplotlib==3.5.1
murmurhash==1.0.6
networkx==2.7.1
nltk==3.7
numpy==1.20.3
oauthlib==3.2.0
opt-einsum==3.3.0
packaging==21.3
pathy==0.6.1
pdfminer.six==20211012
pdfplumber==0.6.0
Pillow==9.0.1
preshed==3.0.6
protobuf==3.19.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydantic==1.9.0
pyparsing==3.0.7
python-dateutil==2.8.2
rapidfuzz==2.0.6
regex==2022.3.2
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.8
simplejson==3.17.6
six==1.16.0
smart-open==5.2.1
soupsieve==2.3.1
spacy==3.4.0
spacy-legacy==3.0.10
spacy-loggers==1.0.3
srsly==2.4.5
tensorboard==2.10.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow-estimator==2.10.0
tensorflow-gpu==2.10.0
tensorflow-io-gcs-filesystem==0.27.0
termcolor==1.1.0
tf-estimator-nightly==2.8.0.dev2021122109
thinc==8.1.0
tqdm==4.63.0
typer==0.4.2
typing-extensions==4.1.1
urllib3==1.26.8
Wand==0.6.7
wasabi==0.9.1
Werkzeug==2.0.3
wrapt==1.14.0
XlsxWriter==3.0.3
zipp==3.7.0
保存上面的内容为新的requirement.txt
选择Pycharm加载的conda环境
pip install requirement.txt #修改后的requirement.txt
解决网络访问问题
直接下载coreferee_model_en.zip
https://github.com/msg-systems/coreferee/raw/master/models/coreferee_model_en.zip
解压后在conda环境里本地安装
cd coreferee_model_en
pip install -e .
手动安装一下 en_core_web_sm-3.4.1语言模型(当前最新版)
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.1/en_core_web_sm-3.4.1-py3-none-any.whl
pip install en_core_web_sm-3.4.1-py3-none-any.whl
模型安装
下载模型new_cti.model.zip,并解压到论文源码目录
https://drive.google.com/drive/folders/1zVGPpN-i-BLlpFqQERscFGb45PkhfkUm
模板安装
手动下载templates.zip
https://drive.google.com/drive/folders/1zVGPpN-i-BLlpFqQERscFGb45PkhfkUm
解压到论文源码文件夹下,即可。
成功运行
然后就可以执行README.md文件中的Sample
# Generating attack graph for CTI report
python main.py -M attackGraphGeneration -R "./Dataset/Evaluation/Frankenstein Campaign.txt" -O ./output.pdf
# Identifing techniques in CTI report
python main.py -M techniqueIdentification -T ./templates -R "./Dataset/Evaluation/Frankenstein Campaign.txt" -O ./output.pdf