StarRocks之物化视图实践

同步物化视图

  • 基础数据准备
    以下示例基于表 sales_records,其中包含每笔交易的交易 ID record_id、销售员 seller_id、售卖门店 store_id、销售时间 sale_date 以及销售额 sale_amt。
  • 建表
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    CREATE TABLE sales_records(
    record_id INT,
    seller_id INT,
    store_id INT,
    sale_date DATE,
    sale_amt BIGINT
    ) DISTRIBUTED BY HASH(record_id);

    INSERT INTO sales_records
    VALUES
    (001,01,1,"2022-03-13",8573),
    (002,02,2,"2022-03-14",6948),
    (003,01,1,"2022-03-14",4319),
    (004,03,3,"2022-03-15",8734),
    (005,03,3,"2022-03-16",4212),
    (006,02,2,"2022-03-17",9515);
  • 业务需求
    该示例业务场景需要频繁分析不同门店的销售额,则查询需要大量调用 sum() 函数,耗费大量系统资源。可使用 EXPLAIN 命令查看此查询的 Query Profile。
    1
    2
    3
    SELECT store_id, SUM(sale_amt)
    FROM sales_records
    GROUP BY store_id;
    其 Query Profile 中的 rollup 项显示为 sales_records(即基表),说明该查询未使用物化视图加速。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    MySQL > EXPLAIN SELECT store_id, SUM(sale_amt)
    FROM sales_records
    GROUP BY store_id;
    +-----------------------------------------------------------------------------+
    | Explain String |
    +-----------------------------------------------------------------------------+
    | PLAN FRAGMENT 0 |
    | OUTPUT EXPRS:3: store_id | 6: sum |
    | PARTITION: UNPARTITIONED |
    | |
    | RESULT SINK |
    | |
    | 4:EXCHANGE |
    | |
    | PLAN FRAGMENT 1 |
    | OUTPUT EXPRS: |
    | PARTITION: HASH_PARTITIONED: 3: store_id |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 04 |
    | UNPARTITIONED |
    | |
    | 3:AGGREGATE (merge finalize) |
    | | output: sum(6: sum) |
    | | group by: 3: store_id |
    | | |
    | 2:EXCHANGE |
    | |
    | PLAN FRAGMENT 2 |
    | OUTPUT EXPRS: |
    | PARTITION: RANDOM |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 02 |
    | HASH_PARTITIONED: 3: store_id |
    | |
    | 1:AGGREGATE (update serialize) |
    | | STREAMING |
    | | output: sum(5: sale_amt) |
    | | group by: 3: store_id |
    | | |
    | 0:OlapScanNode |
    | TABLE: sales_records |
    | PREAGGREGATION: ON |
    | partitions=1/1 |
    | rollup: sales_records <------ |
    | tabletRatio=10/10 |
    | tabletList=12049,12053,12057,12061,12065,12069,12073,12077,12081,12085 |
    | cardinality=1 |
    | avgRowSize=2.0 |
    | numNodes=0 |
    +-----------------------------------------------------------------------------+
    45 rows in set (0.00 sec)
  • 创建同步物化视图
    1
    2
    3
    4
    5
    CREATE MATERIALIZED VIEW store_amt 
    AS
    SELECT store_id, SUM(sale_amt)
    FROM sales_records
    GROUP BY store_id;
  • 验证查询是否命中同步物化视图
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    MySQL > EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
    +-----------------------------------------------------------------------------+
    | Explain String |
    +-----------------------------------------------------------------------------+
    | PLAN FRAGMENT 0 |
    | OUTPUT EXPRS:3: store_id | 6: sum |
    | PARTITION: UNPARTITIONED |
    | |
    | RESULT SINK |
    | |
    | 4:EXCHANGE |
    | |
    | PLAN FRAGMENT 1 |
    | OUTPUT EXPRS: |
    | PARTITION: HASH_PARTITIONED: 3: store_id |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 04 |
    | UNPARTITIONED |
    | |
    | 3:AGGREGATE (merge finalize) |
    | | output: sum(6: sum) |
    | | group by: 3: store_id |
    | | |
    | 2:EXCHANGE |
    | |
    | PLAN FRAGMENT 2 |
    | OUTPUT EXPRS: |
    | PARTITION: RANDOM |
    | |
    | STREAM DATA SINK |
    | EXCHANGE ID: 02 |
    | HASH_PARTITIONED: 3: store_id |
    | |
    | 1:AGGREGATE (update serialize) |
    | | STREAMING |
    | | output: sum(5: sale_amt) |
    | | group by: 3: store_id |
    | | |
    | 0:OlapScanNode |
    | TABLE: sales_records |
    | PREAGGREGATION: ON |
    | partitions=1/1 |
    | rollup: store_amt <------ |
    | tabletRatio=10/10 |
    | tabletList=12092,12096,12100,12104,12108,12112,12116,12120,12124,12128 |
    | cardinality=6 |
    | avgRowSize=2.0 |
    | numNodes=0 |
    +-----------------------------------------------------------------------------+
    45 rows in set (0.00 sec)
    可以看到,此时 Query Profile 中的 rollup 项显示为 store_amt(即同步物化视图),说明该查询已命中同步物化视图。

异步物化视图

  • 基础数据准备
    以下示例基于 Default Catalog 中的两张基表:
    • 表 goods 包含产品 ID item_id1、产品名称 item_name 和产品价格 price。
    • 表 order_list 包含订单 ID order_id、客户 ID client_id 和产品 ID item_id2。
    • 其中 item_id1 与 item_id2 等价。
  • 建表
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    CREATE TABLE goods(
    item_id1 INT,
    item_name STRING,
    price FLOAT
    ) DISTRIBUTED BY HASH(item_id1);

    INSERT INTO goods
    VALUES
    (1001,"apple",6.5),
    (1002,"pear",8.0),
    (1003,"potato",2.2);

    CREATE TABLE order_list(
    order_id INT,
    client_id INT,
    item_id2 INT,
    order_date DATE
    ) DISTRIBUTED BY HASH(order_id);

    INSERT INTO order_list
    VALUES
    (10001,101,1001,"2022-03-13"),
    (10001,101,1002,"2022-03-13"),
    (10002,103,1002,"2022-03-13"),
    (10002,103,1003,"2022-03-14"),
    (10003,102,1003,"2022-03-14"),
    (10003,102,1001,"2022-03-14");
  • 业务需求
    该示例业务场景需要频繁分析订单总额,则查询需要将两张表关联并调用 sum() 函数,根据订单 ID 和总额生成一张新表。除此之外,该业务场景需要每天刷新订单总额。
    1
    2
    3
    4
    5
    SELECT
    order_id,
    sum(goods.price) as total
    FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
    GROUP BY order_id;
  • 创建异步物化视图
    以下示例根据上述查询语句,基于表 goods 和表 order_list 创建一个“以订单 ID 为分组,对订单中所有商品价格求和”的异步物化视图,并设定其刷新方式为 ASYNC,每天自动刷新。
    1
    2
    3
    4
    5
    6
    7
    8
    CREATE MATERIALIZED VIEW order_mv
    DISTRIBUTED BY HASH(`order_id`)
    REFRESH ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)
    AS SELECT
    order_list.order_id,
    sum(goods.price) as total
    FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
    GROUP BY order_id;
  • 管理异步物化视图
    • 查询异步物化视图
      1
      2
      3
      4
      5
      6
      7
      8
      9
      MySQL > SELECT * FROM order_mv;
      +----------+--------------------+
      | order_id | total |
      +----------+--------------------+
      | 10001 | 14.5 |
      | 10002 | 10.200000047683716 |
      | 10003 | 8.700000047683716 |
      +----------+--------------------+
      3 rows in set (0.01 sec)
    • 修改异步物化视图
      1
      2
      3
      4
      5
      6
      // 启用被禁用的异步物化视图(将物化视图的状态设置为 Active)。
      ALTER MATERIALIZED VIEW order_mv ACTIVE;
      // 修改异步物化视图名称为 order_total
      ALTER MATERIALIZED VIEW order_mv RENAME order_total;
      // 修改异步物化视图的最大刷新间隔为 2 天。
      ALTER MATERIALIZED VIEW order_mv REFRESH ASYNC EVERY(INTERVAL 2 DAY);
    • 查看异步物化视图
      1
      2
      3
      SHOW MATERIALIZED VIEWS;
      SHOW MATERIALIZED VIEWS WHERE NAME = "order_mv";
      SHOW MATERIALIZED VIEWS WHERE NAME LIKE "order%";
    • 查看异步物化视图创建语句
      1
      SHOW CREATE MATERIALIZED VIEW order_mv;
    • 查看异步物化视图的执行状态
      1
      select * from information_schema.tasks  order by CREATE_TIME desc limit 1\G;
    • 删除异步物化视图
      1
      DROP MATERIALIZED VIEW order_mv;