MYSQL PARTITIONING分区操作和性能测试

devtools/2025/2/19 16:56:42/

PARTITION OR NOT PARTITION IN MYSQl

Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case

TWO TABLES READY

How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.

  1. Partition table with 10 partitions
CREATE TABLE points_partition 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (PARTITION p16 VALUES less than (2016),PARTITION p17 VALUES less than (2017),PARTITION p18 VALUES less than (2018),PARTITION p19 VALUES less than (2019),PARTITION p20 VALUES less than (2020),PARTITION p21 VALUES less than (2021),PARTITION p22 VALUES less than (2022),PARTITION p23 VALUES less than (2023),PARTITION p24 VALUES less than (2024),PARTITION p25 VALUES less than (2025)
) ;
  1. Normal table
CREATE TABLE points_full_table 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time));

Create millions of rows

For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1

-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),("data1", "data2"),("data1", "data2");

Add large data with tools

from faker import Faker
import randomdef insert_large_data(nums=10):fake = Faker()data = [(random.random(), random.random(), random.random(),str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]cursor = connection.cursor()sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"# execute sql with your idea tool

DB-status

partition table take extra files to preserve data, also, extra disk space
请添加图片描述
partition table
请添加图片描述

TEST RESULTS WITHOUT EXTRA INDEX(created_time)

test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where

FROM:mysqlslap

# partition_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.156 secondsMinimum number of seconds to run all queries: 0.156 secondsMaximum number of seconds to run all queries: 0.156 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.172 secondsMinimum number of seconds to run all queries: 0.172 secondsMaximum number of seconds to run all queries: 0.172 secondsNumber of clients running queries: 10Average number of queries per client: 10

In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.

however, if we put a order by in where clause, things will be a huge different.

test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where; Using filesort
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where; Using filesort

FROM:mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 4.931 secondsMinimum number of seconds to run all queries: 4.931 secondsMaximum number of seconds to run all queries: 4.931 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 54.652 secondsMinimum number of seconds to run all queries: 54.652 secondsMaximum number of seconds to run all queries: 54.652 secondsNumber of clients running queries: 10Average number of queries per client: 10

A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.

WHY:In most circumstances, you’re better off using indexes instead of partitioning

the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information

let’s add a index

ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);

TEST RESULTS WITH INDEX

test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition; Using MRR

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.168 secondsMinimum number of seconds to run all queries: 0.168 secondsMaximum number of seconds to run all queries: 0.168 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.368 secondsMinimum number of seconds to run all queries: 0.368 secondsMaximum number of seconds to run all queries: 0.368 secondsNumber of clients running queries: 10Average number of queries per client: 10

again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.

let’s put a “order by” to see the magic

test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.162 secondsMinimum number of seconds to run all queries: 0.162 secondsMaximum number of seconds to run all queries: 0.162 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.185 secondsMinimum number of seconds to run all queries: 0.185 secondsMaximum number of seconds to run all queries: 0.185 secondsNumber of clients running queries: 10Average number of queries per client: 10

same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()

more tests?
let’s stop here

table vs (better view)

key/typepartitionnormalpartition+ordernormal+orderpartition+indexnormal+indexpartition+order+indexnormal+order+index
diskspace~590m~540m~590m~540m~750m~700m~750m~700m
mysqlslap-benchmark0.156s0.172s4.931s54.652s0.168s0.368s0.162s0.185s
mysql-explain-touched-rows9116259747207911625974720745581226417844558122641784
index////created_time_indexcreated_time_indexcreated_time_indexcreated_time_index

mysqlslap__mysql_workbench_271">POINTS BASED ON TEST(mysqlslap & mysql workbench)

  1. Index works good without partitioning, most of cases even better
  2. Under condition of range query by partition field, partitioning tables works good indeed
  3. drop partitions is much more efficient when doing a big delete
  4. if queries use specific partition, performance will better

Other Points Related & documents & Links:

  1. Partitioning mainly helps when your full table is larger than RAM
  2. No partitioning without million rows, Only BY RANGE provides any performance…
  3. index order(DESC or ASC) is also important
  4. mysqlslap–benchmark tool
  5. questions about partition

http://www.ppmy.cn/devtools/140670.html

相关文章

3D 生成重建015-Feature 3DGS理解3DGS场景内的一切

33D 生成重建015-Feature 3DGS理解3DGS场景内的一切 文章目录 0 论文工作1 方法介绍2 实验效果 0 论文工作 论文提出了一种名为“Feature 3DGS”的方法,该方法通过整合二维基础模型(如SAM和CLIP-LSeg)的特征场蒸馏,显著增强了三维…

CSS 属性

CSS 属性 和 字体(font) 相关的属性 font-size : 设置 字体的大小 ,值支持 像素 、 em (相对自身px单位) 、 rem (相对根元素) 、 pt (打印) 、 支持 百分比 10px80% font-weight : 设置 字体的 粗细 、支持 数字 和 单词 100 ,200, 300, 4…

LabVIEW-简单串口助手

LabVIEW-简单串口助手 串口函数VISA配置串口VISA写入函数VISA读取函数VISA资源名称按名称解除捆绑 函数存放位置思维导图主体界面为以下 串口函数 VISA配置串口 VISA写入函数 VISA读取函数 VISA资源名称 按名称解除捆绑 函数存放位置 思维导图 主体界面为以下 从创建好的“枚举…

【热门主题】000075 探索嵌入式硬件设计的奥秘

前言:哈喽,大家好,今天给大家分享一篇文章!并提供具体代码帮助大家深入理解,彻底掌握!创作不易,如果能帮助到大家或者给大家一些灵感和启发,欢迎收藏关注哦 💕 目录 【热…

[Flutter]Json序列化json_serializable使用属性全介绍

结合上一篇介绍[Flutter]Json和序列化数据,看浏览关注的人还是很多。这里出一篇详细介绍json_serializable的属性参数的解析说明。此文章根据目前最新版本json_serializable: ^6.9.0介绍 ,下面开始: 一、一般使用介绍 // json_serializable…

小红薯x-s算法最新补环境教程12-06更新(下)

在上一篇文章中已经讲了如何去定位x-s生成的位置,本篇文章就直接开始撸代码吧 如果没看过的话可以看:小红薯最新x-s算法分析12-06(x-s 56)(上)-CSDN博客 1、获取加密块代码 首先来到参数生成的位置&…

华为云域名网站修改DNS服务器教程

修改单个域名的DNS服务器 登录域名注册控制台。 进入“域名列表”页面。 在域名列表中,单击“域名”列的待修改DNS服务器的域名。 进入域名信息页面。 图1 域名信息 在域名信息页面,单击“DNS服务器”后的“修改”,进入“修改DNS服务器”页面。 图2 修改DNS服务器 在…

Day7 苍穹外卖项目 缓存菜品、SpringCache框架、缓存套餐、添加购物车、查看购物车、清空购物车

目录 1.缓存菜品 1.1 问题说明 1.2 实现思路 1.3 代码开发 1.3.1 加入缓存 1.3.2 清除缓存 1.3.2.1 新增菜品优化 1.3.2.2 菜品批量删除优化 1.3.2.3 修改菜品优化 1.3.2.4 菜品起售停售优化 1.4 功能测试 1.4.1 加入缓存 1.4.2 菜品修改 1.5 代码提交 2.缓存套餐 2.1 Spring C…