StarRocks之物化视图实践
同步物化视图
- 基础数据准备
以下示例基于表 sales_records,其中包含每笔交易的交易 ID record_id、销售员 seller_id、售卖门店 store_id、销售时间 sale_date 以及销售额 sale_amt。 - 建表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16CREATE TABLE sales_records(
record_id INT,
seller_id INT,
store_id INT,
sale_date DATE,
sale_amt BIGINT
) DISTRIBUTED BY HASH(record_id);
INSERT INTO sales_records
VALUES
(001,01,1,"2022-03-13",8573),
(002,02,2,"2022-03-14",6948),
(003,01,1,"2022-03-14",4319),
(004,03,3,"2022-03-15",8734),
(005,03,3,"2022-03-16",4212),
(006,02,2,"2022-03-17",9515); - 业务需求
该示例业务场景需要频繁分析不同门店的销售额,则查询需要大量调用 sum() 函数,耗费大量系统资源。可使用 EXPLAIN 命令查看此查询的 Query Profile。其 Query Profile 中的 rollup 项显示为 sales_records(即基表),说明该查询未使用物化视图加速。1
2
3SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53MySQL > EXPLAIN SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:3: store_id | 6: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 3: store_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: sum(6: sum) |
| | group by: 3: store_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 3: store_id |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(5: sale_amt) |
| | group by: 3: store_id |
| | |
| 0:OlapScanNode |
| TABLE: sales_records |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: sales_records <------ |
| tabletRatio=10/10 |
| tabletList=12049,12053,12057,12061,12065,12069,12073,12077,12081,12085 |
| cardinality=1 |
| avgRowSize=2.0 |
| numNodes=0 |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec) - 创建同步物化视图
1
2
3
4
5CREATE MATERIALIZED VIEW store_amt
AS
SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id; - 验证查询是否命中同步物化视图可以看到,此时 Query Profile 中的 rollup 项显示为 store_amt(即同步物化视图),说明该查询已命中同步物化视图。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51MySQL > EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:3: store_id | 6: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 3: store_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: sum(6: sum) |
| | group by: 3: store_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 3: store_id |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(5: sale_amt) |
| | group by: 3: store_id |
| | |
| 0:OlapScanNode |
| TABLE: sales_records |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: store_amt <------ |
| tabletRatio=10/10 |
| tabletList=12092,12096,12100,12104,12108,12112,12116,12120,12124,12128 |
| cardinality=6 |
| avgRowSize=2.0 |
| numNodes=0 |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)
异步物化视图
- 基础数据准备
以下示例基于 Default Catalog 中的两张基表:- 表 goods 包含产品 ID item_id1、产品名称 item_name 和产品价格 price。
- 表 order_list 包含订单 ID order_id、客户 ID client_id 和产品 ID item_id2。
- 其中 item_id1 与 item_id2 等价。
- 建表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27CREATE TABLE goods(
item_id1 INT,
item_name STRING,
price FLOAT
) DISTRIBUTED BY HASH(item_id1);
INSERT INTO goods
VALUES
(1001,"apple",6.5),
(1002,"pear",8.0),
(1003,"potato",2.2);
CREATE TABLE order_list(
order_id INT,
client_id INT,
item_id2 INT,
order_date DATE
) DISTRIBUTED BY HASH(order_id);
INSERT INTO order_list
VALUES
(10001,101,1001,"2022-03-13"),
(10001,101,1002,"2022-03-13"),
(10002,103,1002,"2022-03-13"),
(10002,103,1003,"2022-03-14"),
(10003,102,1003,"2022-03-14"),
(10003,102,1001,"2022-03-14"); - 业务需求
该示例业务场景需要频繁分析订单总额,则查询需要将两张表关联并调用 sum() 函数,根据订单 ID 和总额生成一张新表。除此之外,该业务场景需要每天刷新订单总额。1
2
3
4
5SELECT
order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id; - 创建异步物化视图
以下示例根据上述查询语句,基于表 goods 和表 order_list 创建一个“以订单 ID 为分组,对订单中所有商品价格求和”的异步物化视图,并设定其刷新方式为 ASYNC,每天自动刷新。1
2
3
4
5
6
7
8CREATE MATERIALIZED VIEW order_mv
DISTRIBUTED BY HASH(`order_id`)
REFRESH ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)
AS SELECT
order_list.order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id; - 管理异步物化视图
- 查询异步物化视图
1
2
3
4
5
6
7
8
9MySQL > SELECT * FROM order_mv;
+----------+--------------------+
| order_id | total |
+----------+--------------------+
| 10001 | 14.5 |
| 10002 | 10.200000047683716 |
| 10003 | 8.700000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec) - 修改异步物化视图
1
2
3
4
5
6// 启用被禁用的异步物化视图(将物化视图的状态设置为 Active)。
ALTER MATERIALIZED VIEW order_mv ACTIVE;
// 修改异步物化视图名称为 order_total
ALTER MATERIALIZED VIEW order_mv RENAME order_total;
// 修改异步物化视图的最大刷新间隔为 2 天。
ALTER MATERIALIZED VIEW order_mv REFRESH ASYNC EVERY(INTERVAL 2 DAY); - 查看异步物化视图
1
2
3SHOW MATERIALIZED VIEWS;
SHOW MATERIALIZED VIEWS WHERE NAME = "order_mv";
SHOW MATERIALIZED VIEWS WHERE NAME LIKE "order%"; - 查看异步物化视图创建语句
1
SHOW CREATE MATERIALIZED VIEW order_mv;
- 查看异步物化视图的执行状态
1
select * from information_schema.tasks order by CREATE_TIME desc limit 1\G;
- 删除异步物化视图
1
DROP MATERIALIZED VIEW order_mv;
- 查询异步物化视图