奥运会数据集分析(部分)

news/2024/10/30 15:24:17/

数据科学应用案例实践报告

小组成员:XXX

主要方法:采用pandas 进行数据处理,采用Pyecharts 进行绘图

摘要: 针对奥运会2020夏季奥运会的相关分析,利用了python里面的pandas和pyecharts等相关的库,实现了数据清洗,数据挖掘,以及数据可视化,将奥运会的每日金牌数和奥运会的相关数据进行了,整理,对数据进行了相关预测。将数据预测与相关变化以数据图表的方式展示出来,更加易于理解。

关键词:奥运会,python,pandas,pyecharts……

Abstract: For the relevant analysis of the Olympic Games in the 2020 Summer Olympics, the relevant libraries such as pandas and pyecharts in python are used to realize data cleaning, data mining, and data visualization. The daily gold medals of the Olympic Games and the relevant data of the Olympic Games are organized and organized. , Made relevant predictions on the data. The data forecasts and related changes are displayed in the form of data charts, which is easier to understand.
Keywords: Olympic Games, python, pandas, pyecharts…

一. 背景:

2020奥运会结束后,对奥运会数据进行数据分析,通过将数据可视化展示出我们奥运会的金牌榜与奥运会的变化,以便于我们可以充分的了解奥运会。

二. 进行数据分析的流程

1. 导入模块

如果缺少库,请输入pip install -r requirements.txt进行安装

!pip install --upgrade pyecharts
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pyecharts in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (1.9.1)
Requirement already satisfied: jinja2 in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from pyecharts) (3.0.3)
Requirement already satisfied: prettytable in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from pyecharts) (2.4.0)
Requirement already satisfied: simplejson in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from pyecharts) (3.17.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from jinja2->pyecharts) (2.0.1)
Requirement already satisfied: wcwidth in c:\users\merlin_wong\appdata\roaming\python\python39\site-packages (from prettytable->pyecharts) (0.2.5)
import pandas as pd
from pyecharts.charts import Timeline, Line, Tree
from pyecharts import options as opts
from pyecharts.commons.utils import JsCode

2. Pandas数据处理

2.1 读取数据

df = pd.read_csv('../others/2020东京奥运会奖牌数据.csv', index_col=0, encoding = 'gb18030')
df.head(20)
国家国家编码金牌银牌铜牌总计国旗
日期
2021-07-24中国CHN3014https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-24意大利ITA1102https://www.sinaimg.cn/ty/2020/Olympic/flag/IT...
2021-07-24日本JPN1102https://www.sinaimg.cn/ty/2020/Olympic/flag/JP...
2021-07-24韩国KOR1023https://www.sinaimg.cn/ty/2020/Olympic/flag/KO...
2021-07-24厄瓜多尔ECU1001https://www.sinaimg.cn/ty/2020/Olympic/flag/EC...
2021-07-24匈牙利HUN1001https://www.sinaimg.cn/ty/2020/Olympic/flag/HU...
2021-07-24伊朗IRI1001https://www.sinaimg.cn/ty/2020/Olympic/flag/IR...
2021-07-24科索沃KOS1001https://www.sinaimg.cn/ty/2020/Olympic/flag/KO...
2021-07-24泰国THA1001https://www.sinaimg.cn/ty/2020/Olympic/flag/TH...
2021-07-24ROCROC0112https://www.sinaimg.cn/ty/2020/Olympic/flag/RO...
2021-07-24塞尔维亚SRB0112https://www.sinaimg.cn/ty/2020/Olympic/flag/SR...
2021-07-24比利时BEL0101https://www.sinaimg.cn/ty/2020/Olympic/flag/BE...
2021-07-24西班牙ESP0101https://www.sinaimg.cn/ty/2020/Olympic/flag/ES...
2021-07-24印度IND0101https://www.sinaimg.cn/ty/2020/Olympic/flag/IN...
2021-07-24荷兰NED0101https://www.sinaimg.cn/ty/2020/Olympic/flag/NE...
2021-07-24罗马尼亚ROU0101https://www.sinaimg.cn/ty/2020/Olympic/flag/RO...
2021-07-24中国台北TPE0101https://www.sinaimg.cn/ty/2020/Olympic/flag/TP...
2021-07-24突尼斯TUN0101https://www.sinaimg.cn/ty/2020/Olympic/flag/TU...
2021-07-24爱沙尼亚EST0011https://www.sinaimg.cn/ty/2020/Olympic/flag/ES...
2021-07-24法国FRA0011https://www.sinaimg.cn/ty/2020/Olympic/flag/FR...

将csv中的数据导入到我们的项目

2.2 是否有缺失值

df.isnull().any()
国家      False
国家编码    False
金牌      False
银牌      False
铜牌      False
总计      False
国旗      False
dtype: bool

各列数据均不存在缺失情况。

2.3 查看中国每日数据

df1 = df[df['国家']=='中国']
df1
国家国家编码金牌银牌铜牌总计国旗
日期
2021-07-24中国CHN3014https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-25中国CHN3137https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-26中国CHN0437https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-27中国CHN3003https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-28中国CHN3126https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-29中国CHN3104https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-30中国CHN4329https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-07-31中国CHN2305https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-01中国CHN3115https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-02中国CHN53311https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-03中国CHN3407https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-04中国CHN0101https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-05中国CHN2204https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-06中国CHN2215https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...
2021-08-07中国CHN1102https://www.sinaimg.cn/ty/2020/Olympic/flag/CH...

2.4 统计中国、美国、日本、澳大利亚4个国家数据

all_country_data = []
flg = {}
cols = ['国家']
countrys = ['中国','美国','日本','澳大利亚']
for country in countrys:df1 = df[df['国家']==country]df_t = df1.copy()df2 = df.loc[df['国家']==country,['金牌','银牌','铜牌','总计']]if len(df2.index.tolist()) >= len(cols):cols += df2.index.tolist()flg[country] = df1.iloc[:1, -1].values[0]one_country_data = [country]datasss = []for i in range(df2.shape[0]):    datasss.append(df2[:i+1].apply(lambda x:x.sum()).values.tolist())d1 = pd.DataFrame(data=datasss, columns=['金牌','银牌','铜牌','总计'])for col in d1.columns:df_t[col] = d1[col].valuesdf_t1 = df_t.loc[:,['金牌']]one_country_data += df_t['金牌'].values.tolist()all_country_data.append(one_country_data)
all_country_data
[['中国', 3, 6, 6, 9, 12, 15, 19, 21, 24, 29, 32, 32, 34, 36, 37],['美国', 4, 7, 9, 11, 14, 14, 16, 20, 22, 24, 25, 29, 31, 31],['日本', 1, 5, 8, 10, 13, 15, 17, 18, 18, 18, 19, 21, 22, 24],['澳大利亚', 1, 2, 3, 6, 8, 9, 10, 14, 14, 15, 17, 17]]

dataFrame更新

d2 = pd.DataFrame(data=all_country_data,columns=cols)
d2 = d2.fillna(method = 'ffill',axis=1)
d2
国家2021-07-242021-07-252021-07-262021-07-272021-07-282021-07-292021-07-302021-07-312021-08-012021-08-022021-08-032021-08-042021-08-052021-08-062021-08-07
0中国3669121519212429323234.036.037.0
1美国47911141416202224252931.031.031.0
2日本15810131517181818192122.024.024.0
3澳大利亚123689101414151717171717

可根据需要获取多个国家数据,改变countrys列表即可。

3. Pyecharts绘图

3.1 绘制基础折线图

CHN = []
x_data=cols[1:]
for d_time in cols[1:]:CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])
l1 = (Line().add_xaxis(x_data)# 中国线条.add_yaxis('中国',CHN,label_opts=opts.LabelOpts(is_show=True)).set_global_opts(title_opts=opts.TitleOpts(title='中国金牌',pos_left='center',),yaxis_opts=opts.AxisOpts(name='金牌/枚',            is_scale=True,max_=40),legend_opts=opts.LegendOpts(is_show=False),))
l1.render_notebook()  
    <div id="df8cdd80eb3c45b8804f3c6d90582c11" style="width:900px; height:500px;"></div>

3.2 加载样式

# 背景色
background_color_js = ("new echarts.graphic.LinearGradient(0, 0, 0, 1, ""[{offset: 0, color: '#d9d9d9'}, {offset: 1, color: '#ffd966'}], false)"
)# 线条样式
linestyle_dic = { 'normal': {'width': 4,  'shadowColor': '#696969', 'shadowBlur': 10,  'shadowOffsetY': 10,  'shadowOffsetX': 10,  }}timeline = Timeline(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),width='980px',height='600px'))
timeline.add_schema(is_auto_play=True, is_loop_play=True, is_timeline_show=True, play_interval=500)CHN = []
x_data=cols[1:]
for d_time in cols[1:]:CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])
line = (Line(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),width='980px',height='600px')).add_xaxis(x_data)# 中国线条.add_yaxis('中国',CHN,symbol_size=10,is_smooth=True,label_opts=opts.LabelOpts(is_show=True),markpoint_opts=opts.MarkPointOpts(data=[  opts.MarkPointItem(name="",type_='max',value_index=0,symbol='image://'+ flg['中国'],symbol_size=[40, 25],)],label_opts=opts.LabelOpts(is_show=False),)).set_series_opts(linestyle_opts=linestyle_dic,label_opts=opts.LabelOpts(font_size=12, color='red' )).set_global_opts(title_opts=opts.TitleOpts(title='中国金牌',pos_left='center',pos_top='2%',title_textstyle_opts=opts.TextStyleOpts(color='#DC143C', font_size=20)),xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=14, color='red'),axisline_opts=opts.AxisLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),yaxis_opts=opts.AxisOpts(name='金牌/枚',            is_scale=True,max_=40,name_textstyle_opts=opts.TextStyleOpts(font_size=16,font_weight='bold',color='#FFD700'),axislabel_opts=opts.LabelOpts(font_size=13,color='red'),splitline_opts=opts.SplitLineOpts(is_show=True, linestyle_opts=opts.LineStyleOpts(type_='dashed')),axisline_opts=opts.AxisLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),legend_opts=opts.LegendOpts(is_show=False, pos_right='1.5%', pos_top='2%',legend_icon='roundRect',orient = 'horizontal'),))
line.render_notebook()
    <div id="dc3037b44d38492aa44c5ed8e10d86c7" style="width:980px; height:600px;"></div>

3.3 动态展示中国每日金牌数据

# 背景色
background_color_js = ("new echarts.graphic.LinearGradient(0, 0, 0, 1, ""[{offset: 0, color: '#d9d9d9'}, {offset: 1, color: '#ffd966'}], false)"
)# 线条样式
linestyle_dic = { 'normal': {'width': 4,  'shadowColor': '#696969', 'shadowBlur': 10,  'shadowOffsetY': 10,  'shadowOffsetX': 10,  }}timeline = Timeline(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),width='980px',height='600px'))
timeline.add_schema(is_auto_play=True, is_loop_play=True, is_timeline_show=True, play_interval=500)CHN = []
x_data=cols[1:]
for d_time in cols[1:]:CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])line = (Line(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),width='980px',height='600px')).add_xaxis(x_data)# 中国线条.add_yaxis('中国',CHN,symbol_size=10,is_smooth=True,label_opts=opts.LabelOpts(is_show=True),markpoint_opts=opts.MarkPointOpts(data=[  opts.MarkPointItem(name="",type_='max',value_index=0,symbol='image://'+ flg['中国'],symbol_size=[40, 25],)],label_opts=opts.LabelOpts(is_show=False),)).set_series_opts(linestyle_opts=linestyle_dic,label_opts=opts.LabelOpts(font_size=12, color='red' )).set_global_opts(title_opts=opts.TitleOpts(title='中国金牌',pos_left='center',pos_top='2%',title_textstyle_opts=opts.TextStyleOpts(color='#DC143C', font_size=20)),xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=14, color='red'),axisline_opts=opts.AxisLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),yaxis_opts=opts.AxisOpts(name='金牌/枚',            is_scale=True,max_=40,name_textstyle_opts=opts.TextStyleOpts(font_size=16,font_weight='bold',color='#FFD700'),axislabel_opts=opts.LabelOpts(font_size=13,color='red',rotate=15),splitline_opts=opts.SplitLineOpts(is_show=True, linestyle_opts=opts.LineStyleOpts(type_='dashed')),axisline_opts=opts.AxisLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),legend_opts=opts.LegendOpts(is_show=True, pos_right='1%', pos_top='2%',legend_icon='roundRect',orient = 'vertical'),))timeline.add(line, '{}'.format(d_time))timeline.render_notebook()
    <div id="ee4506559c1742cba49e952ffd6ac889" style="width:980px; height:600px;"></div>

3.4 增加其他国家每日金牌数据

# 背景色
background_color_js = ("new echarts.graphic.LinearGradient(0, 0, 0, 1, ""[{offset: 0, color: '#d9d9d9'}, {offset: 1, color: '#ffd966'}], false)"
)# 线条样式
linestyle_dic = { 'normal': {'width': 4,  'shadowColor': '#696969', 'shadowBlur': 10,  'shadowOffsetY': 10,  'shadowOffsetX': 10,  }}timeline = Timeline(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),width='980px',height='600px'))
timeline.add_schema(is_auto_play=True, is_loop_play=True, is_timeline_show=True, play_interval=500)CHN, USA, JPN, AUS = [], [], [], []
x_data=cols[1:]
for d_time in cols[1:]:CHN.append(d2[d_time][d2['国家']=='中国'].values.tolist()[0])USA.append(d2[d_time][d2['国家']=='美国'].values.tolist()[0])JPN.append(d2[d_time][d2['国家']=='日本'].values.tolist()[0])AUS.append(d2[d_time][d2['国家']=='澳大利亚'].values.tolist()[0])line = (Line(init_opts=opts.InitOpts(bg_color=JsCode(background_color_js),width='980px',height='600px')).add_xaxis(x_data)# 中国线条.add_yaxis('中国',CHN,symbol_size=10,is_smooth=True,label_opts=opts.LabelOpts(is_show=True),markpoint_opts=opts.MarkPointOpts(data=[  opts.MarkPointItem(name="",type_='max',value_index=0,symbol='image://'+ flg['中国'],symbol_size=[40, 25],)],label_opts=opts.LabelOpts(is_show=False),))# 美国线条.add_yaxis('美国',USA,symbol_size=5,is_smooth=True,label_opts=opts.LabelOpts(is_show=True),markpoint_opts=opts.MarkPointOpts(data=[opts.MarkPointItem(name="",type_='max',value_index=0,symbol='image://'+ flg['美国'],symbol_size=[40, 25],)],label_opts=opts.LabelOpts(is_show=False),))# 日本线条.add_yaxis('日本',JPN,symbol_size=5,is_smooth=True,label_opts=opts.LabelOpts(is_show=True),markpoint_opts=opts.MarkPointOpts(data=[  opts.MarkPointItem(name="",type_='max',value_index=0,symbol='image://'+ flg['日本'],symbol_size=[40, 25],)],label_opts=opts.LabelOpts(is_show=False),))# 澳大利亚线条.add_yaxis('澳大利亚',AUS,symbol_size=5,is_smooth=True,label_opts=opts.LabelOpts(is_show=True),markpoint_opts=opts.MarkPointOpts(data=[  opts.MarkPointItem(name="",type_='max',value_index=0,symbol='image://'+ flg['澳大利亚'],symbol_size=[40, 25],)],label_opts=opts.LabelOpts(is_show=False),)).set_series_opts(linestyle_opts=linestyle_dic).set_global_opts(title_opts=opts.TitleOpts(title='中国 VS 美国 VS 日本 VS 澳大利亚',pos_left='center',pos_top='2%',title_textstyle_opts=opts.TextStyleOpts(color='#DC143C', font_size=20)),xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=14, color='red'),axisline_opts=opts.AxisLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),yaxis_opts=opts.AxisOpts(name='金牌/枚',            is_scale=True,max_=40,name_textstyle_opts=opts.TextStyleOpts(font_size=16,font_weight='bold',color='#FFD700'),axislabel_opts=opts.LabelOpts(font_size=13,color='red',rotate=15),splitline_opts=opts.SplitLineOpts(is_show=True, linestyle_opts=opts.LineStyleOpts(type_='dashed')),axisline_opts=opts.AxisLineOpts(is_show=True,linestyle_opts=opts.LineStyleOpts(width=2, color='#DB7093'))),legend_opts=opts.LegendOpts(is_show=True, pos_right='1%', pos_top='2%',legend_icon='roundRect',orient = 'vertical'),))timeline.add(line, '{}'.format(d_time))
timeline.render_notebook()
    <div id="e831c9c3c8564956b748d3f05df4b186" style="width:980px; height:600px;"></div>

http://www.ppmy.cn/news/354565.html

相关文章

如何收看2020年夏季东京奥运会?

除了大多数重大活动&#xff0c;2020年夏季奥运会也因COVID-19大流行而推迟。东京2020年奥运会&#xff08;2021年奥运会&#xff09;将于2021年7月23日至8月8日在日本东京举行&#xff0c;并通过NBC在美国播出。 如何直播2020年奥运会开幕式 开幕式为奥运会搭建了舞台&#…

华为鸿蒙电视什么屏幕,荣耀智慧屏出世,鸿蒙真容貌!和智能电视究竟有什么区别?...

原标题&#xff1a;荣耀智慧屏出世&#xff0c;鸿蒙真容貌&#xff01;和智能电视究竟有什么区别&#xff1f; 8月10日下午&#xff0c;预热了近一个月的荣耀智慧屏终于正式发布了&#xff0c;作为华为荣耀的全新品类&#xff0c;它还是全球首款搭载华为鸿蒙操作系统的终端。正…

华为鸿蒙电视什么屏幕,华为“屏”什么?

原标题&#xff1a;华为“屏”什么&#xff1f; 电视曾经是全家人的中心&#xff0c;也是发现世界、获取资讯的入口。而如今&#xff0c;手机用来了解一切&#xff0c;电视只剩下娱乐。 在这个万物皆智能化的社会&#xff0c;电视产品的一成不变显得有些格格不入。未来&#xf…

html中简单的樱花特效

给大家分享一个樱花特效 <script src"https://cdn.jsdelivr.net/gh/wallleap/cdnlatest/js/sakura.js"></script> 效果图&#xff1a;

Element官网

Element官网&#xff1a;https://element.eleme.cn/#/zh-CN/component/installation

给网站添加樱花效果

给个人网站添加樱花效果 下载js文件&#xff1a;链接 直接在index.html文件中添加 <script src"./sakura.js"></script>注意js文件要保存下来&#xff0c;然后确定好路径就可以了 如果不保存js文件的话也可以用网络地址 <script src"https://…

canvas-樱花飘落

网上找的资源,有时间可以研究! <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta http-equiv"X-UA-Compatible" content"IEedge"><meta name"viewport" content"w…