统计关键词前20位
数据集如下:
地址:
https://github.com/Algernon98/github-store/tree/main/data./text./genshin.
from gensim import corpora, models
import config
import jieba
import jieba.analyse
import train
from codecs import openstopwords_path = config.stopwords_path
segmented_path = config.segmented_path
test_path = config.test_path
raw_path = config.raw_path
result_path = config.result_path
topic_num = 30def get_stopwords_set(file_name):with open(file_name, 'r', encoding='utf-8') as f:return set([line.strip() for line in f])def get_words_list(file_name, stop_word_file):stop_words_set = get_stopwords_set(stop_word_file)word_list = []with open(file_name, 'r', encoding='utf-8') as f:for line in f:tmp_list = list(jieba.cut(line.strip(), cut_all=False))word_list.append([i for i in tmp_list if i not in stop_words_set])return word_listdef extract_theme(raw_file, stop_word_file, num_topics=10):result = []# 列表,每个元素也是列表,即分词后的词语列表word_list = get_words_list(raw_file, stop_word_file)# 生成文档的词典,每个此与一个整形索引值对应word_dict = corpora.Dictionary(word_list)# 词频统计,转化为空间向量格式corpus_list = [word_dict.doc2bow(text) for text in word_list]lda = models.ldamodel.LdaModel(corpus=corpus_list, id2word=word_dict, num_topics=num_topics, alpha='auto')for pattern in lda.show_topics(num_topics=num_topics, num_words=1, formatted=False):result.append(pattern[1][0][0])return resultdef main():files = train.get_files(raw_path)file_name = result_path + "/theme_result.txt"f_word_result = open(file_name, "w+", encoding='utf-8')f_word_result.write("主题词提取" + "\n")for f in files:f_word_result.write('\n' + f.split("\\")[-1][:-4] + ":\n")topics = extract_theme(f, stopwords_path, 100)topic_list = []for t in topics:if t not in topic_list and len(topic_list) < topic_num:topic_list.append(t)f_word_result.write(t + '\n')print(f + ' save to: ' + file_name + " ok.")f_word_result.close()if __name__ == '__main__':main()
结果:
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/text-feature-master/text-feature-master/theme.py
Building prefix dict from the default dictionary …
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 1.033 seconds.
Prefix dict has been built successfully.
data/raw\捕鸟者的故事.txt save to: data/result/theme_result.txt ok.
data/raw\日月前事.txt save to: data/result/theme_result.txt ok.
data/raw\浮槃歌卷.txt save to: data/result/theme_result.txt ok.
data/raw\镜子与魔法师的故事.txt save to: data/result/theme_result.txt ok.
data/raw\阿赫玛尔的故事.txt save to: data/result/theme_result.txt ok.
进程已结束,退出代码0
主题词提取捕鸟者的故事:
少年
捕鸟
筹备
密林
长久
战争
悲伤
少女
说
之鸟
鸟
十分日月前事:
写
天上
法涅斯
诱惑
种子
原初
发光
太阳
囚禁
说
国王
禁令
园丁
时刻
火
影子
先祖
第一次
到来
嗣
记录
这种
秘密
制造
全部
日月
大王
奇迹
归途
立约浮槃歌卷:
作者
彻知
迈出
毫无保留
逐诈
深渊
第一句
缺漏
遗迹
拥有者
遗体
簇拥
语言
无法
多么
律法
真的
槃
相比
节
浮
无人
当时
精灵
念诵
永
情人
备注
宛若
馈镜子与魔法师的故事:
一夜
总是
镜子
魔法师
女人
漫漫长夜
昔日
故事
宫殿
这座
话语
般的
已阿赫玛尔的故事:
不可
阿赫玛
芬芳
沙丘
神王
一夜
回
灾祸
蒙昧
手杖
七重
加
书记
想起
沙漠
众
迷宫
智慧
过去
旅团
怎能
一手
生命
住民
震动
惩罚
陛下
统帅
理性
情感分析
https://github.com/shibing624/pysenti
from pysenti import ModelClassifiertexts = ["我爱阿卓","我喜欢阿卓","阿卓最好了","超喜欢阿卓捏","没有阿卓我会难过","阿卓,坏!"]m = ModelClassifier()
for i in texts:r = m.classify(i)print(i, r)
我爱阿卓 {‘positive_prob’: 0.8466820791295006, ‘negative_prob’: 0.15331792087049945}
我喜欢阿卓 {‘positive_prob’: 0.6747967727946668, ‘negative_prob’: 0.32520322720533323}
阿卓最好了 {‘positive_prob’: 0.5, ‘negative_prob’: 0.5}
超喜欢阿卓捏 {‘positive_prob’: 0.7779255451248865, ‘negative_prob’: 0.22207445487511346}
没有阿卓我会难过 {‘positive_prob’: 0.8653531007266904, ‘negative_prob’: 0.13464689927330964}
阿卓,坏! {‘positive_prob’: 0.17124756072277025, ‘negative_prob’: 0.8287524392772297}
文本提取
模板:
https://github.com/letiantian/TextRank4ZH
代码
#-*- encoding:utf-8 -*-
from __future__ import print_function
import importlib,sys
importlib.reload(sys)import sys
try:importlib.reload(sys)sys.setdefaultencoding('utf-8')
except:passimport codecs
from textrank4zh import TextRank4Keyword, TextRank4Sentencetext = codecs.open('../test/doc/01.txt', 'r', 'utf-8').read()
tr4w = TextRank4Keyword()tr4w.analyze(text=text, lower=True, window=2) # py2中text必须是utf8编码的str或者unicode对象,py3中必须是utf8编码的bytes或者str对象print( '关键词:' )
for item in tr4w.get_keywords(20, word_min_len=1):print(item.word, item.weight)print()
print( '关键短语:' )
for phrase in tr4w.get_keyphrases(keywords_num=20, min_occur_num= 2):print(phrase)tr4s = TextRank4Sentence()
tr4s.analyze(text=text, lower=True, source = 'all_filters')print()
print( '摘要:' )
for item in tr4s.get_key_sentences(num=3):print(item.index, item.weight, item.sentence)
数据集
浮槃歌卷
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/TextRank4ZH/example/example01.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.489 seconds.
Prefix dict has been built successfully.
关键词:
备注 0.009742484094232517
谜题 0.00895192051928737
无法 0.008928982024858924
女主人 0.0084675400252497
人们 0.008453594560064712
王女 0.008415178312232818
君王 0.008343775020150554
作 0.006996706115774437
梨 0.006637384687180664
塔 0.006571901190830194
知识 0.006521435305209939
全部 0.006017572251023542
缺词 0.006017572251023542
回答 0.0057618736938632705
说 0.005471815209872001
无人 0.005469906201271802
语言 0.005448693305806349
智慧 0.005328745641089529
抵御 0.005170270890198952
摧垮 0.005170270890198952关键短语:
女主人说摘要:
17 0.0172286704332726 【室罗婆耽院诃般荼,塔法佐莉的备注:本节首句中,暂时无法确定含义的词语,也可译作「农田」或「墓园」
52 0.016848163757853078 彻知的君王啊,若是你的智慧真的与人们传说的不差毫厘,
72 0.016848163757853078 彻知的君王啊,若是你的智慧真的与人们传说的不差毫厘,进程已结束,退出代码0
阿赫玛尔的故事
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/TextRank4ZH/example/example01.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.574 seconds.
Prefix dict has been built successfully.
关键词:
智慧 0.014352705637079653
王 0.010845562797082157
知识 0.009491235284100272
沙丘 0.009273868947831002
镇灵 0.008765206011653781
生命 0.008700254211786216
贤者 0.008512954726829796
尔 0.008334562709858306
陛下 0.008235193740032542
成为 0.007806746637740814
说 0.007798790729346429
回 0.007711778318750074
君王 0.007378189784886893
灾祸 0.007095572914054973
玫瑰 0.007023821594126371
阿赫玛 0.006940470647070925
镀金 0.006864355408463337
无底 0.006713895728302868
往往 0.006630375398411197
过去 0.006439889693916835关键短语:
阿赫玛尔摘要:
4 0.033449378499342035 因此即便贵为大地四方之王,深受三大部族无数子民信仰,又被难以捉摸的镇灵崇拜,每当仰视天穹时,回想起天上的九重又九重的乐园、回想起千百年前无情的惩治,阿赫玛尔仍不免垂下高贵的头颅,发出无解的叹息
43 0.03244297229717682 他们说,阿赫玛尔的肉体在王座上渐渐腐朽,为巨虫所噬,而他的魂灵则同王都千百万尖叫的魂灵融成一体,永远在呼啸的末日中徘徊迷途,沿着蛇行的黑暗盘廊,向无底的深渊横冲直撞而去
42 0.02884013537738098 他们说,阿赫玛尔最终将他自己的智慧抽离了骨血,投入了无穷无尽、永远向着深处曲折蛇行的回廊、阶梯、门洞与雕梁进程已结束,退出代码0
白夜国馆藏
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/TextRank4ZH/example/example01.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.487 seconds.
Prefix dict has been built successfully.
关键词:
会 0.00945293107094898
龙蜥 0.008985161542912045
海 0.007266934837240164
太阳 0.006848229165971737
白夜 0.0063326576390741705
没有 0.006122751687146217
人 0.005816887054897028
人们 0.005680592314158335
光 0.005250242836753312
世界 0.0046159719828632205
渊下 0.004480827111156577
水 0.00441805817105551
先祖 0.004354413330551786
国 0.004345795461616112
贤人 0.0043423433175067225
实验 0.004267520239767349
出现 0.004236880141916847
元素 0.004210609549875028
见 0.0042022717684937145
风 0.003925479652771031
时 0.003623991002497307
龙 0.003577253592461393
名字 0.003512549384231825
影子 0.003489354931921611
生物 0.0034446278574639675
火 0.0033825500180776616
研究 0.003260560214370052
全部 0.003234997767633707
去 0.003177858674726537
日月 0.0031004664413358993关键短语:
白夜国摘要:
201 0.007251016661601182 著者可以忽略,或写研究所内之号,严禁暴露研究人员之古白夜国/渊下宫名或现代海祇/鸣神/稻妻式名字
229 0.006867967296295812 因为容不下海祇之血,龙蜥的身体会出现各种不良反应
191 0.006850799238337301 根据预言,它应该就是贤人所展示的太阳,用来照亮没有见过光的洞穴进程已结束,退出代码0
词云生成
代码
# -*- coding: utf-8 -*-from __future__ import print_functionimport jieba.analyse
import matplotlib.pyplot as plt
from wordcloud import WordCloud# 设置相关的文件路径
# bg_image_path = "bg_image.jpg"
text_path = './data/xiaozhu.txt'
font_path = 'msyh.ttf'
stopwords_path = 'stopword.txt'def clean_using_stopword(text):"""去除停顿词,利用常见停顿词表+自建词库:param text: 输入文本:return: 分词去停用词后的文本"""mywordlist = []seg_list = jieba.cut(text, cut_all=False)liststr = "/".join(seg_list)with open(stopwords_path,encoding='utf-8') as f_stop:f_stop_text = f_stop.read()f_stop_text = f_stop_textf_stop_seg_list = f_stop_text.split('\n')for myword in liststr.split('/'): # 去除停顿词,生成新文档if not (myword.strip() in f_stop_seg_list) and len(myword.strip()) > 1:mywordlist.append(myword)return ''.join(mywordlist)def preprocessing():"""文本预处理:return:"""with open(text_path,encoding='utf-8') as f:content = f.read()return clean_using_stopword(content)return contentdef extract_keywords():"""利用jieba来进行中文分词。analyse.extract_tags采用TF-IDF算法进行关键词的提取。:return:"""# 抽取300个关键词,带权重,后面需要根据权重来生成词云allow_pos = ('nr',) # 词性tags = jieba.analyse.extract_tags(preprocessing(), 300, withWeight=True)keywords = dict()for i in tags:# print("%s---%f" % (i[0], i[1]))keywords[i[0]] = i[1]print(keywords.keys())return keywordsdef draw_wordcloud():"""生成词云 1.配置WordCloud 2.plt进行显示"""# back_coloring = plt.imread('bg_pic.jpg') # 设置背景图片# 设置词云属性wc = WordCloud(font_path=font_path, # 设置字体background_color="white", # 背景颜色max_words=2000, # 词云显示的最大词数width=1200, # 宽度height=800, # 高度# mask=back_coloring, # 设置背景图片)wc.generate_from_frequencies(extract_keywords())# 显示图plt.figure()plt.imshow(wc)plt.axis("off")plt.show()# 保存到本地wc.to_file("word_cloud.jpg")if __name__ == '__main__':draw_wordcloud()
数据集
白夜国馆藏
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/Keywords_cloud/wordCloud.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.485 seconds.
Prefix dict has been built successfully.
dict_keys(['白夜', '原初', '渊下', '太阳', '贤人', '先祖', '大御神', '谜面', '阿倍', '龙嗣', '深海', '影子', '大神', '日月', '园丁', '实验', '常世', '谜底', '祭司', '良久', '天上', '名字', '虚空', '黑暗', '大日', '常夜', '灵木', '狭间', '谜题', '之子', '记录', '到来', '布拉克', '三界', '赫利', '树精', '预言', '纪年', '孩子', '理解', '三隅', '法涅斯', '海渊', '元素', '七位', '王座', '水文', '生活', '恐怖', '进化', '世界', '卷宗', '未曾', '大王', '归途', '子代', '生物', '乃是', '最早', '种子', '落成', '地理', '答案', '囚禁', '化身', '龙王', '寿终', '国王', '比喻', '蜃楼', '小说', '接纳', '诱惑', '常世国', '宫民', '衔枝', '箱舟', '过光', '伊洛斯', '俄斯', '海渊之土', '奥罗', '巴洛斯', '会以', '零柒大御', '腿数', '阳炎幻', '六十个', '唯一', '无始无终', '幻想', '崇拜', '海面', '版本', '本书', '腌臜', '龙蛇', '大蛇', '谨记', '大地', '特令', '去往', '研究', '崩落', '现象', '四十个', '只能', '环境', '事物', '著者', '欢欣', '禁止', '时刻', '谜语', '小童', '蛋壳', '神智', '外孙', '十二岁', '序号', '砍伐', '之下', '元年', '孙女', '寻找', '人形', '眷属', '白日', '十二个', '之神', '六十', '方圆', '长成', '筛选', '第一次', '洞窟', '畜牧', '移植', '意义', '亲近', '祈祷', '永恒', '此书', '宫里', '莲花', '拥有', '树枝', '三角', '智力', '落入', '土地', '课题', '国土', '称呼', '子孙', '这片', '珊瑚', '光明', '请问', '打败', '漫长', '大战', '后代', '宇宙', '开门', '描绘', '收获', '暴露', '尝试', '展示', '执政', '见过', '姓名', '诞生', '改称', '只不过', '百千', '寻回', '勘校', '无翼鸟', '波西', '修建', '忘记', '之地', '交流', '特性', '一群', '取材自', '肆意妄为', '安贞', '得以', '变动', '遭遇', '迷狂', '艾普', '对照组', '摔断了腿', '微风吹拂', '阿斯克', '秘密', '势力', '食物', '习惯', '雕梁', '类图书', '爱多', '俯首称臣', '御园', '御使', '掘出来', '四百余年', '塔斯', '违逆', '依序', '住民', '异常', '神尚', '有鸣', '神岛', '鸣神', '五圣隐', '初代', '宣他', '被常世', '之子们', '原如鸡', '宫才', '那龙嗣', '如草', '相抗', '选立', '赌约', '庞巨', '之感身', '之众', '此界', '虽非', '我民', '终人子', '有何', '之民一蛇', '侵攻', '真王', '它生', '衔枝后', '天地创造', '从者', '自此时起', '贵金', '高天', '火之年', '再开', '海渊之民', '千灯', '逐入', '为食', '无时不刻', '千风', '我仅', '目盲', '俄斯目', '新修', '折下灵', '一念则', '千劫', '新树', '记写', '我于', '所过', '天中', '因海', '之民', '水之事', '千风时', '之千风', '蜥界', '阳炎', '兴动', '为标', '国将', '秘法', '蛇心', '古早', '无鳞蛇', '吕羽氏', '伽玛', '西隆', '古海语', '注经', '应写', '内之号', '之古', '宫名', '神稻', '妻式', '水卷', '高体', '多眠', '而龙', '军械库'])
浮槃歌卷
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/Keywords_cloud/wordCloud.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.476 seconds.
Prefix dict has been built successfully.
dict_keys(['王女', '女主人', '谜题', '君王', '备注', '彻知', '缺词', '智慧', '回答', '因论', '自砂海', '耽院诃般', '法佐莉', '教令', '永恒', '学者', '残卷', '挖掘出', '万国', '不差毫厘', '无人', '姣美', '目睹', '大宫', '无从', '确是', '世间', '室罗婆', '本卷', '护末', '那院', '伐护末', '院诃般', '亚莎', '有翼者', '高天', '第一个', '作者', '摧垮', '无穷尽', '缺漏', '遗迹', '特尔', '原初', '第二个', '屈服于', '永世', '身份', '赞颂', '译作', '容颜', '其下', '星辰', '仁慈', '蔷薇', '判断', '深渊', '地上', '古代', '依然', '丝绸', '香料', '女王', '宝石', '统领', '推测', '抵御', '翻译', '纡尊降贵', '无数', '开口', '晨露', '知识', '思恋', '答出', '红冠', '语言', '指代', '倾尽全力', '怨怒', '暂时', '首句', '死而复生', '波梨', '袈国', '吏兵', '辉烁', '细麻', '昼星', '装缀', '节末句', '与璃', '神临', '晨霜室', '罗婆', '塔法佐莉', '造于', '颂唱', '你作', '地向', '易逝', '满是', '芯髓', '其上', '穹灵', '逐诈', '智差', '之术', '仿拓', '劫灭', '全知', '敏黠', '之物', '荧华彻', '不毁', '原卷', '残文', '一人名', '答过', '缔下', '考研', '瞩望', '晚春', '第三句', '第四句', '身边', '主宰者', '死物', '负责', '虚像', '冷意', '谢礼', '黄金', '雅丽', '墓园', '醇香', '必将会', '明晨', '花茎', '第二句', '迷醉', '填入', '树荫下', '和煦', '沙尘', '拥有者', '芳菲', '倾慕', '五体投地', '心中', '毫无保留', '难住', '子民', '宛若', '邪魔', '萦绕', '三个', '非常感谢', '搞不清楚', '千万年', '欢欣', '重生', '碎银', '正法', '问出', '念诵', '谜语', '古往今来', '尊奉', '暖风', '正理', '降下', '如故', '箭矢', '赠与', '仆从', '篱笆', '枯萎', '难解', '立定', '万象', '盟约', '秀美', '妥善处理', '从天而降', '竟能', '破除', '悉心', '散佚', '自始至终', '升天', '俯伏', '创造者', '荆棘', '聆听', '从未见过', '胆敢', '铸成', '微光', '相仿', '诸神', '第一句', '赞许', '万千', '割断', '消逝', '稍作', '听过', '油灯', '教诲', '溪流', '词语', '精灵', '可否', '美貌', '何曾', '称得上', '盔甲', '胜过', '混淆', '眷属', '自上而下', '卷起', '迷茫', '译本', '话语', '陪伴', '俯身', '同心', '摘下', '统辖', '自下而上', '奥秘', '须臾', '甜蜜', '不动声色', '一缕', '困惑', '情人', '织物', '侍女', '捍卫', '簇拥', '芳香', '遗体', '禁忌', '传授', '远方', '身着', '迈出', '冰冷', '闪烁', '花朵', '真诚', '佩服', '绕过', '不惜', '解开', '毁灭', '臣民', '幸运', '疑虑', '坚实', '使者', '自此', '提及', '神圣', '第三个', '行礼', '月光', '掌管', '身旁', '农田', '城邦', '不由', '撰写', '痕迹', '书籍', '鲜血', '大师', '特产', '引领', '月亮', '本书', '意图', '遗憾', '宫殿', '七月', '二位', '含义', '死去', '眼中', '脚下', '世上', '我会', '乃是', '花园', '暗暗', '所致', '活着'])
阿赫玛尔的故事
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/Keywords_cloud/wordCloud.py
Building prefix dict from the default dictionary …
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.474 seconds.
Prefix dict has been built successfully.
dict_keys([‘阿赫玛’, ‘智慧’, ‘镇灵’, ‘贤者’, ‘住民’, ‘之王’, ‘沙丘’, ‘蒙昧’, ‘七重’, ‘吾王’, ‘沙漠’, ‘陛下’, ‘大地’, ‘镀金’, ‘君王’, ‘知识’, ‘永远’, ‘部族’, ‘怎能’, ‘尔以’, ‘四方’, ‘蛇行’, ‘佞臣’, ‘子民’, ‘天空’, ‘九重’, ‘千百万’, ‘千百年’, ‘魂灵’, ‘智者’, ‘进言’, ‘迷宫’, ‘生命’, ‘灾祸’, ‘呼啸’, ‘禁忌’, ‘玫瑰’, ‘绿洲’, ‘一千’, ‘预料’, ‘百年’, ‘终究’, ‘惩罚’, ‘统帅’, ‘震动’, ‘一手’, ‘深处’, ‘沉默’, ‘一夜’, ‘建立’, ‘历史’, ‘无数’, ‘园艺师’, ‘恶徒’, ‘镀成’, ‘谈情’, ‘诸元素’, ‘御者’, ‘魔神’, ‘雕梁’, ‘书记’, ‘权力’, ‘力能’, ‘狂想’, ‘智识’, ‘引路人’, ‘神王’, ‘制御’, ‘夜梦’, ‘三大’, ‘鸣泣’, ‘尽享’, ‘勇壮’, ‘同猛狮’, ‘如众贤’, ‘至贤’, ‘搏龙’, ‘食尸’, ‘全失’, ‘镇灵能’, ‘长歌’, ‘起于’, ‘咒诅’, ‘王众’, ‘乞谅’, ‘应知’, ‘溺于’, ‘旧梦’, ‘醉于’, ‘哀想’, ‘足使’, ‘以应’, ‘羊之王’, ‘天罚’, ‘尽藏’, ‘才行’, ‘顿地’, ‘之主生者’, ‘迎回’, ‘深黑’, ‘凡躯’, ‘狂沙’, ‘抽离’, ‘所噬’, ‘同王’, ‘盘廊’, ‘巨墙’, ‘其上’, ‘狮鹫’, ‘于己’, ‘愚行’, ‘旅团’, ‘众砾’, ‘如铁’, ‘将振帆’, ‘漫卷’, ‘九重天’, ‘融成’, ‘愚者’, ‘歌者’, ‘王土’, ‘夜莺’, ‘妙药’, ‘孤零’, ‘呓语’, ‘弃绝’, ‘沙海’, ‘比斯’, ‘垮塌’, ‘亡者’, ‘傲视’, ‘折腰’, ‘念旧’, ‘劝诫’, ‘王座’, ‘遗落’, ‘赫尔曼’, ‘骨殖’, ‘唤回’, ‘去往’, ‘天穹’, ‘专断’, ‘贵为’, ‘长久之计’, ‘迷途’, ‘高踞’, ‘查考’, ‘巨虫’, ‘难以捉摸’, ‘子嗣’, ‘骨血’, ‘懦夫’, ‘谄媚’, ‘蹙眉’, ‘倒伏’, ‘飘忽’, ‘困于’, ‘祸患’, ‘辈出’, ‘横冲直撞’, ‘某位’, ‘预知’, ‘手杖’, ‘公牛’, ‘谗言’, ‘亵渎’, ‘无边无际’, ‘哀悼’, ‘国度’, ‘沙暴’, ‘虚空’, ‘宫阙’, ‘横渡’, ‘圆柱’, ‘仰视’, ‘逝去’, ‘无底’, ‘芬芳’, ‘无穷无尽’, ‘悔恨’, ‘回首’, ‘永生’, ‘终极’, ‘迷失’, ‘未来’, ‘相聚’, ‘霸占’, ‘年岁’, ‘化作’, ‘藏身’, ‘安乐’, ‘今人’, ‘终是’, ‘献上’, ‘报应’, ‘门洞’, ‘妄图’, ‘回廊’, ‘幸存’, ‘妄想’, ‘唤醒’, ‘宫城’, ‘凡人’, ‘听信’, ‘征服者’, ‘责怪’, ‘无忧’, ‘荣耀’, ‘阶梯’, ‘掩埋’, ‘惩治’, ‘遗忘’, ‘受惠’, ‘远古’, ‘忧伤’, ‘复活’, ‘话语’, ‘末日’, ‘散落’, ‘一夜之间’, ‘勇士’, ‘流浪’, ‘世界’, ‘席卷’, ‘堕落’, ‘真实性’, ‘计策’, ‘肉体’, ‘尖叫’, ‘高贵’, ‘逃离’, ‘哑巴’, ‘夺回’, ‘典籍’, ‘深渊’, ‘头颅’, ‘每到’, ‘狂风’, ‘乐园’, ‘良机’, ‘忧郁’, ‘盲人’, ‘腐朽’, ‘之主’, ‘埋葬’, ‘不死’, ‘发抖’, ‘荒唐’, ‘以求’, ‘悲伤’, ‘神灵’, ‘怀抱’, ‘倾听’, ‘罪恶’, ‘直言’, ‘武士’, ‘不论是’, ‘无情’, ‘掩饰’, ‘空虚’, ‘宝石’, ‘讲述’, ‘情愿’, ‘时光’, ‘毁灭’, ‘永久’, ‘预言’, ‘使者’, ‘千年’, ‘之子’, ‘多有’, ‘帝王’, ‘月光’, ‘少女’, ‘梦想’, ‘深受’, ‘徘徊’, ‘崇拜’, ‘弥补’, ‘只顾’, ‘同一个’, ‘叹息’, ‘念头’, ‘来临’, ‘权威’, ‘一体’, ‘后代’, ‘恐惧’, ‘曲折’, ‘信仰’, ‘天上’, ‘少年’, ‘寻求’, ‘超越’, ‘之人’, ‘知名’, ‘首领’, ‘王国’, ‘理性’, ‘黑暗’])
知乎回答:你认为《三体》中最震撼的一句话是什么?
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/Keywords_cloud/wordCloud.py
Building prefix dict from the default dictionary …
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.479 seconds.
Prefix dict has been built successfully.
dict_keys([‘魔戒’, ‘万有引力’, ‘墓地’, ‘罗辑’, ‘人类’, ‘三体’, ‘丁仪’, ‘白艾思’, ‘宇宙’, ‘地球’, ‘智子’, ‘黑暗’, ‘五十四年’, ‘恩斯’, ‘水洼’, ‘太空’, ‘森林’, ‘面壁’, ‘太阳’, ‘落下去’, ‘钢印’, ‘女孩’, ‘远处’, ‘飞船’, ‘建造’, ‘回答’, ‘孩子’, ‘核弹’, ‘相聚’, ‘思想’, ‘耐德’, ‘忏悔’, ‘物理’, ‘一束’, ‘惠子’, ‘世界’, ‘害怕’, ‘呼叫’, ‘一切都是’, ‘降临’, ‘十几代’, ‘四维空间’, ‘规律’, ‘北海’, ‘里教’, ‘史强’, ‘叶文洁’, ‘海干’, ‘低维’, ‘程心’, ‘飞机’, ‘高维’, ‘外星’, ‘这片’, ‘所有人’, ‘舰队’, ‘或者说’, ‘上帝’, ‘肯定’, ‘漫长’, ‘眼中’, ‘警告’, ‘物理学’, ‘不到’, ‘东方人’, ‘沙漠’, ‘经济学’, ‘沙漠化’, ‘两个’, ‘安全带’, ‘城府’, ‘飞越’, ‘晚霞’, ‘再也’, ‘十分钟’, ‘青铜时代’, ‘人类文明’, ‘生命’, ‘邪恶’, ‘加速器’, ‘徐徐’, ‘战舰’, ‘假象’, ‘样儿’, ‘夕阳’, ‘深渊’, ‘感觉’, ‘引擎’, ‘离开’, ‘西洋’, ‘三条’, ‘声音’, ‘空间’, ‘惩罚’, ‘审判’, ‘沙发’, ‘子孙’, ‘统帅’, ‘以色列’, ‘蓝色’, ‘震动’, ‘感谢’, ‘愿望’, ‘陆地’, ‘恐惧’, ‘军人’, ‘笑容’, ‘安慰’, ‘不该’, ‘聚集’, ‘未来’, ‘世纪’, ‘引水员’, ‘拉菲尔’, ‘预测出’, ‘供应线’, ‘干什么’, ‘想象’, ‘这场’, ‘文明’, ‘男孩儿’, ‘冤冤相报’, ‘兴奋’, ‘一遍’, ‘露出’, ‘办公室’, ‘正确’, ‘这话’, ‘明天’, ‘攻击’, ‘目光’, ‘失败主义’, ‘消失’, ‘两次’, ‘承认’, ‘同志’, ‘现实’, ‘关键’, ‘时间’, ‘苦短’, ‘投进来’, ‘和平主义者’, ‘展开’, ‘richtext’, ‘星河’, ‘铅色’, ‘中隐入’, ‘之穹’, ‘阴着’, ‘脸装’, ‘俗里’, ‘没个’, ‘你碍’, ‘味来’, ‘总沾着’, ‘向大史’, ‘光中’, ‘大史’, ‘这枚’, ‘汪淼大史’, ‘过妈’, ‘随船’, ‘八十二’, ‘常伟思’, ‘常伟思太多’, ‘外太空’, ‘汪淼’, ‘抬眼’, ‘两色’, ‘海是’, ‘海弄’, ‘海弄海’, ‘干前’, ‘同维’, ‘有鱼’, ‘系好’, ‘有窗’, ‘面墙’, ‘俩系’, ‘当罗辑’, ‘吴岳’, ‘19’, ‘张援’, ‘朝候’, ‘产室’, ‘30’, ‘10000’, ‘汇在’, ‘滔天罪行’, ‘请主来’, ‘执剑’, ‘守护人’, ‘交接仪式’, ‘血雾’, ‘地喊出’, ‘647’, ‘白艾’, ‘没人动’, ‘杨冬’, ‘空是’, ‘用空’, ‘升三体’, ‘以爱’, ‘没人能’, ‘事上’, ‘对山杉’, ‘我用’, ‘还会升’, ‘类似’, ‘地面’, ‘血与火’, ‘国际法庭’, ‘挖墓’, ‘之外’, ‘极端分子’, ‘自虐’, ‘光灿灿’, ‘指甲油’, ‘几十亿年’, ‘刚刚’, ‘主席’, ‘古筝’, ‘信息内容’, ‘大学’, ‘第一次’, ‘悸动’, ‘和平相处’, ‘怒视’, ‘很快’, ‘维德’, ‘计算’, ‘状态’, ‘一阵’, ‘媚眼’, ‘舷窗’, ‘执行’, ‘力量’, ‘血光’, ‘唯利是图’, ‘悠着点’, ‘猛跳’, ‘巴勒斯坦人’, ‘面前’, ‘一千八百’, ‘梦呓’, ‘越陷越深’, ‘女性化’, ‘咧嘴一笑’, ‘第三十二章’, ‘看着’, ‘连山’, ‘感到遗憾’, ‘照进’, ‘长路’, ‘瞎猜’, ‘永恒不变’, ‘不可理喻’, ‘恺撒’, ‘怪怪的’, ‘扫把’, ‘可有可无’, ‘七八米’, ‘无神论者’, ‘于海洋’, ‘有生之年’, ‘夕照’, ‘捐给’, ‘计算出来’, ‘俗气’, ‘电筒’, ‘维度’, ‘初始条件’, ‘老谋深算’, ‘拘押’, ‘返航’, ‘精神力量’, ‘婆婆妈妈’, ‘决定论’, ‘玩世不恭’, ‘眼睛’, ‘神采飞扬’, ‘画儿’, ‘开时’, ‘炸得’, ‘陷进’, ‘最终目标’, ‘弧度’, ‘文明史’, ‘超重’, ‘犯有’, ‘指战员’, ‘付出代价’, ‘这一’, ‘血肉模糊’, ‘半世纪’, ‘大大咧咧’, ‘起爆’, ‘计划’, ‘有件事’, ‘待发’])
bilibili财报
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/Keywords_cloud/wordCloud.py
Building prefix dict from the default dictionary …
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.534 seconds.
Prefix dict has been built successfully.
dict_keys([‘嗶哩’, ‘開支’, ‘百萬’, ‘百萬元’, ‘準則’, ‘2021’, ‘總額’, ‘我們’, ‘虧損’, ‘用戶’, ‘認會計’, ‘指標’, ‘非公’, ‘美元’, ‘經營’, ‘回購’, ‘資產’, ‘財務’, ‘美國存’, ‘轉換’, ‘同期’, ‘每股’, ‘2022’, ‘負債’, ‘無形’, ‘相關’, ‘電話’, ‘公司’, ‘增加’, ‘股權’, ‘通過’, ‘則為’, ‘淨額’, ‘收購’, ‘獲得’, ‘資料’, ‘股數’, ‘355351263391248558393538141’, ‘股份’, ‘未經’, ‘優先票’, ‘服務’, ‘激勵’, ‘投資’, ‘會議’, ‘匯率’, ‘有限公司’, ‘中國’, ‘202231’, ‘付費’, ‘廣告’, ‘股東’, ‘營業額’, ‘銷售’, ‘營業’, ‘費用’, ‘經調’, ‘業務’, ‘經調整’, ‘現金’, ‘有關’, ‘業績’, ‘數據’, ‘成本’, ‘第一季度’, ‘前瞻性’, ‘未來’, ‘價值’, ‘遊戲’, ‘電商’, ‘營銷’, ‘及攤’, ‘計劃’, ‘會計’, ‘美國’, ‘流動’, ‘狀況’, ‘金額’, ‘所得’, ‘所致’, ‘收益’, ‘日期’, ‘審計’, ‘淨營業額’, ‘新冠’, ‘市場’, ‘影響’, ‘億元’, ‘移動’, ‘研發’, ‘損為’, ‘變動’, ‘美國公’, ‘單位’, ‘淨虧損’, ‘2.545’, ‘345.80’, ‘損攤’, ‘加權’, ‘股加權’, ‘攤薄’, ‘非流動’, ‘權益’, ‘2.504’, ‘224.20’, ‘平均’, ‘增值’, ‘公告’, ‘香港’, ‘和服’, ‘疫情’, ‘展望’, ‘行政’, ‘公允’, ‘千元’, ‘財務業績’, ‘標誌’, ‘社區’, ‘30%’, ‘商業化’, ‘發展’, ‘產品’, ‘同時’, ‘淨虧’, ‘整淨’, ‘價物’, ‘建議’, ‘預計’, ‘進行’, ‘估計’, ‘消費’, ‘調整’, ‘所載’, ‘公認’, ‘審計調’, ‘節表’, ‘活動’, ‘編號’, ‘網上’, ‘投資者’, ‘作為’, ‘視頻’, ‘評估’, ‘證券’, ‘電郵’, ‘止三個’, ‘20212021202231123131’, ‘股東淨’, ‘214395307470277862’, ‘應付’, ‘普通股’, ‘收入’, ‘品牌’, ‘能力’, ‘包括’, ‘定期存款’, ‘提供’, ‘管理’, ‘付款’, ‘毛利’, ‘同比增加’, ‘重播’, ‘平台’, ‘領先視頻’, ‘日止’, ‘054.1’, ‘797.3’, ‘33%’, ‘日活’, ‘創下’, ‘經濟’, ‘減少’, ‘16%’, ‘費用戶’, ‘數量’, ‘萬元’, ‘支為’, ‘攤銷開’, ‘支通’, ‘過業務’, ‘薄虧’, ‘薄淨虧’, ‘現金及’, ‘一項’, ‘萬股’, ‘聯交所’, ‘申請’, ‘現時’, ‘確定’, ‘整淨虧’, ‘損經’, ‘參閱’, ‘結尾’, ‘八時’, ‘電話會’, ‘httpirbilibilicom’, ‘中國年’, ‘輕人’, ‘決策’, ‘認為’, ‘識別’, ‘經營業績’, ‘整體’, ‘考慮’, ‘根據’, ‘委員會’, ‘報告’, ‘書面’, ‘娛樂’, ‘行業’, ‘bilibilitpgircom’, ‘簡明合’, ‘904859095767284132’, ‘少數’, ‘歸屬’, ‘903555088014281982’, ‘股淨’, ‘股淨虧’, ‘賬款’, ‘攤銷’, ‘261453322756997’, ‘514514981’, ‘22518796771641114’, ‘338779’, ‘生效’, ‘短期’, ‘提交’, ‘流量’, ‘上市公司’, ‘首席’, ‘因素’, ‘港交所’, ‘有助’, ‘有所提高’, ‘直播’, ‘列表’, ‘提升’, ‘日均’, ‘文化’, ‘月均’, ‘一代’, ‘董事’, ‘私人’, ‘特定’, ‘收到’, ‘比增’, ‘初步’, ‘新高’, ‘月票’, ‘一步’, ‘納斯達克代’, ‘BILI’, ‘代號’, ‘9626’, ‘活躍’, ‘293.6’, ‘百萬移’, ‘動端’, ‘戶達’, ‘276.4’, ‘分別’, ‘31%’, ‘活躍用’, ‘79.4’, ‘32%’, ‘27.2’, ‘面對’, ‘發和靜態’, ‘全國範圍’, ‘面臨’, ‘總部’, ‘期間’, ‘超過’, ‘000’, ‘名員工’, ‘辦公’, ‘任務確’, ‘保員’, ‘工維持’, ‘業務運營’, ‘執行面’, ‘這些’, ‘挑戰’, ‘活用戶’, ‘穩健’, ‘增長’, ‘活躍度’, ‘單個’, ‘時長’, ‘顯著’, ‘95’, ‘分鐘’, ‘此同’, ‘堅定’, ‘份額’, ‘縱觀’, ‘及收’, ‘緊費用’, ‘宏觀’, ‘壓力’])
美团火锅
D:\coder\randomnumbers\venv\Scripts\python.exe D:/coder/randomnumbers/Keywords_cloud/wordCloud.py
Building prefix dict from the default dictionary …
Loading model from cache C:\Users\83854\AppData\Local\Temp\jieba.cache
Loading model cost 0.508 seconds.
Prefix dict has been built successfully.
dict_keys([‘火锅’, ‘评论’, ‘重庆火锅’, ‘四川火锅’, ‘潮汕’, ‘牛肉’, ‘串串’, ‘大斌家’, ‘自助’, ‘牛羊肉’, ‘羊肉’, ‘长沙’, ‘广场’, ‘步行街’, ‘万家’, ‘涮羊肉’, ‘五一广场’, ‘河西’, ‘地摊’, ‘宁乡’, ‘步步高’, ‘串串香’, ‘城店’, ‘胡记’, ‘北京’, ‘大学城’, ‘溪湖’, ‘毛肚’, ‘黄兴路’, ‘七七’, ‘星沙’, ‘重庆’, ‘含浦’, ‘现切’, ‘涉外经济’, ‘小火锅’, ‘技术开发区’, ‘海底’, ‘雨花’, ‘特色’, ‘德思勤’, ‘银盆岭’, ‘沙岭’, ‘望城’, ‘学院’, ‘德庄’, ‘梅溪’, ‘湘江’, ‘国际’, ‘市井’, ‘新天地’, ‘天虹’, ‘川味’, ‘丽店’, ‘茶子山’, ‘洋湖’, ‘小龙’, ‘月亮’, ‘四方’, ‘喜盈门’, ‘南路’, ‘羊蝎子’, ‘一桥’, ‘星城’, ‘达美’, ‘新城’, ‘西站’, ‘烧烤’, ‘正荣’, ‘汇店’, ‘火锅店’, ‘北路’, ‘天街’, ‘桐梓’, ‘黄兴’, ‘德思勤店’, ‘吾悦店’, ‘泉塘’, ‘星沙店’, ‘新开铺’, ‘猪肚’, ‘中心店’, ‘芙蓉’, ‘解放’, ‘草本’, ‘世纪’, ‘西路’, ‘王府井’, ‘云塘’, ‘中茂城店’, ‘九记’, ‘路店’, ‘边炉’, ‘汤锅’, ‘牛腩’, ‘运达’, ‘涉外’, ‘新村’, ‘老九’, ‘沿线’, ‘环宇’, ‘友阿奥’, ‘锅圈’, ‘食汇’, ‘食材’, ‘总店’, ‘生鲜’, ‘高铁’, ‘浏阳’, ‘特莱斯’, ‘科技园’, ‘经济’, ‘杨家山’, ‘红星’, ‘汽车’, ‘城区’, ‘城市’, ‘荟聚店’, ‘犊门’, ‘袁家岭’, ‘大斌府’, ‘阿杜打’, ‘咸嘉’, ‘榔梨镇’, ‘五江’, ‘坡子’, ‘阳光’, ‘月湖’, ‘万象’, ‘医学院’, ‘海鲜’, ‘会展中心’, ‘凯德’, ‘喜乐’, ‘商学院’, ‘公园’, ‘柴火’, ‘理工’, ‘体育’, ‘铜锅’, ‘韩国菜’, ‘北辰’, ‘烤肉’, ‘东塘’, ‘旗舰店’, ‘亭店’, ‘桂花路’, ‘王记’, ‘鲜羊里’, ‘岳麓区’, ‘粤旺’, ‘栋烂’, ‘含浦店’, ‘首店’, ‘咸嘉湖’, ‘知味居’, ‘砂之船’, ‘大众传媒’, ‘侯家塘’, ‘罗家’, ‘超市’, ‘环保’, ‘科大’, ‘阿布杜’, ‘小区’, ‘五一’, ‘树木’, ‘大牛’, ‘财富’, ‘大侠’, ‘直营店’, ‘火车站’, ‘德政’, ‘武广’, ‘中心’, ‘二十九’, ‘清泉’, ‘一环’, ‘黄土岭’, ‘大道’, ‘奥克斯’, ‘大排档’, ‘华润’, ‘洋湖店’, ‘季季’, ‘老颜头’, ‘暮云’, ‘湖店’, ‘辣度’, ‘街店’, ‘淮川’, ‘美来’, ‘荟店’, ‘顶福胜’, ‘开福区’, ‘董记’, ‘三汁’, ‘开福寺’, ‘美蛙’, ‘复地’, ‘谭鸭血’, ‘爱涮’, ‘原切’, ‘庖丁’, ‘山店’, ‘阿华’, ‘泉塘店’, ‘MeetSun’, ‘韩式’, ‘懒汉’, ‘湾子’, ‘大学’, ‘南站’, ‘螺蛳’, ‘花城’, ‘店铺’, ‘中路’, ‘新疆’, ‘小羊’, ‘大虾’, ‘潭州’, ‘农业大学’, ‘鸡窝’, ‘龙华’, ‘砂锅’, ‘老四’, ‘老友’, ‘分店’, ‘大王’, ‘大福’, ‘天马’, ‘山镇’, ‘贺龙’, ‘万国’, ‘黄花’, ‘富兴’, ‘龙湖’, ‘小牛’, ‘铁道’, ‘中南大学’, ‘浪琴’, ‘金霞’, ‘港式’, ‘永安’, ‘中南’, ‘星光’, ‘莱茵’, ‘湘雅’, ‘朝阳路’, ‘一号’, ‘珠江’, ‘门口’, ‘中海’, ‘保利’, ‘黑山羊’, ‘书院’, ‘三角洲’, ‘鲜牛’, ‘正荣店’, ‘黄花镇’, ‘老佰老’, ‘潮正’, ‘理涛’, ‘龙炎阁’, ‘袁记’, ‘川锅’, ‘马栏’, ‘岛店’, ‘附一’, ‘丽发’, ‘美店’, ‘黄小椒’, ‘奥莱店’, ‘达美店’, ‘京贵楼’, ‘西店’, ‘伍家岭’, ‘傣妹’, ‘善弟’, ‘合牛记’, ‘福盛小鲜’, ‘园店’, ‘山语’, ‘鲜切’, ‘钰樽楼’, ‘太平街’, ‘洪西店’, ‘肆匠’, ‘开福’, ‘驴庄’, ‘涂家’, ‘冲店’, ‘浦沅’, ‘悦方’, ‘潮上’, ‘顺老’, ‘赵火火’, ‘鲜焖’, ‘卜蜂’, ‘烤串’])