背景

验证一轮查询外部数据湖

Catalog

JDBC ✅已验证

CREATE EXTERNAL CATALOG jdbc1
PROPERTIES
(
    "type"="jdbc",
    "user"="root",
    "password"="XXX",
    "jdbc_uri"="jdbc:mysql://172.16.0.9:3306",
    "driver_url"="https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar",
    "driver_class"="com.mysql.cj.jdbc.Driver"
);

存算一体+自建es ✅已验证

基于腾讯云cos存算分离 + 腾讯云es产品 ✅已验证

存在问题，当使用腾讯云默认索引开了动态模板，starrocks解析mapping有bug，会导致show tables出不来

CREATE EXTERNAL CATALOG es_test
COMMENT 'es'
PROPERTIES
(
    "type" = "es",
    "es.type" = "_doc",
    "hosts" = "http://172.16.0.8:9200",
    "es.net.ssl" = "false",
    "user" = "elastic",
    "password" = "xxx",
    "es.nodes.wan.only" = "false"
);

EXTERNAL TABLE

COS文件外部表 ❌验证失败

CREATE EXTERNAL TABLE user_behavior_fet
(
    UserID int, 
    ItemID int,
    CategoryID int,
    BehaviorType string
) 
ENGINE=file
PROPERTIES 
(
    "path" = "s3://race-1301288126/input/user_behavior_ten_million_rows.parquet", 
    "format" = "parquet",
    "aws.s3.enable_ssl" = "false",
    "aws.s3.enable_path_style_access" = "true",
    "aws.s3.endpoint" = "cos.ap-guangzhou.myqcloud.com",
    "aws.s3.access_key" = "AKIDAFICxeJwnror5OarCRYXQMF1f5X7lJtO",
    "aws.s3.secret_key" = "XXX"
);

StarRocks之分区分桶副本数

发表于 2024-01-20 更新于 2024-01-28 阅读次数：

在现代数据库中，很多数据库都支持分区（Partition）或分桶（Tablet，分桶有时候又叫分片），它的主要目的是提高查询性能。
StarRocks 使用先分区后分桶的方式，可灵活地支持两种分布方式:

Hash分布: 不采用分区方式，整个table作为一个分区，只需指定分桶的数量。
Range-Hash组合数据分布: 设置分区，指定每个分区的分桶数量。
StarRocks 同时支持分区和分桶，若干个Tablet组成一个Partition 。

分区

分区的主要作⽤是将⼀张表按照分区键拆分成不同的管理单元，针对每⼀个管理单元选择相应的存储策略，⽐如副本数、冷热策略和存储介质等等。对于访问频率高的分区，可以使用SSD存储；对于访问频率低的分区，可以使用STAT存储。选择合理的分区键可以有效的裁剪扫描的数据量，一般选择日期或者区域作为分区键

PARTITION BY RANGE()
在实际应用中，用户一般选取时间列作为分区键，具体划分的粒度视数据量而定，单个分区原始数据量建议维持在100GB以内。

分桶

分桶的目的就是将数据打散为一个个逻辑分片（Tablet），以Tablet作为数据均衡的最小单位，使数据尽量均匀的分布在集群的各个BE节点上，以便在查询时充分发挥集群多机多核的优势。

DISTRIBUTED BY HASH()
对每个分区的数据，StarRocks还会再进行Hash分桶。我们在建表时通过DISTRIBUTED BY HASH()语句来设置分桶
在StarRocks的存储引擎中，用户数据被水平划分为若干个数据分片（Tablet，也称作数据分桶）。每个 Tablet包含若干数据行，各个Tablet之间的数据没有交集，并且在物理上是独立存储的。
多个Tablet在逻辑上归属于不同的分区（Partition）。一个Tablet只属于一个Partition，而一个Partition包含若干个 Tablet。因为Tablet在物理上是独立存储的，所以可以视为Partition在物理上也是独立。Tablet是数据移动、复制等操作的最小物理存储单元。
若干个Partition组成一个Table。Partition可以视为是逻辑上最小的管理单元，数据的导入与删除，都可以或仅能针对一个Partition进行。

副本数

StarRocks中的副本数就是同一个Tablet保存的份数，在建表时通过replication_num参数指定，也可以后面修改。默认不指定时，StarRocks使用三副本建表，也即每个Tablet会在不同节点存储三份（StarRocks的副本策略会将某个tablet的副本存储在与其不同IP的节点）。

为方便理解，我们假设当前有一个3BE节点的集群，有表Table A和Table B，表A和表B建表时都未设置分区（视为一个大分区），分桶数为3，副本数replication_num为2，则表A和表B在集群中数据分布的一种可能如下图：
yy05cc

案例讲解

如下：StarRocks 的数据划分以及 Tablet 多副本机制

dSQ1Ju

表按照日期划分为4个分区，第一个分区切分成4个Tablet。每个Tablet使用3副本进行备份，分布在3个不同的BE节点上。
由于一张表被切分成了多个Tablet，StarRocks在执行SQL语句时，可以对所有Tablet实现并发处理，从而充分的利用多机、多核提供的计算能力。
用户也可以利用StarRocks数据的切分方式，将高并发请求压力分摊到多个物理节点，从而可以通过增加物理节点的方式来扩展系统支持高并发的能力。

总结一下

在StarRocks中，Partition是数据导入和备份恢复的最小逻辑单位，Tablet是数据复制和均衡的最小物理单位。表（Table）、分区（Partition）、逻辑分片（Tablet）的关系如下图：
bPDqhg

分区是针对表的，是对表的数据取段。分桶是针对每个分区的，会将分区后的每段数据打散为逻辑分片Tablet。副本数是针对Tablet的，是指Tablet保存的份数。那么我们不难发现，对某一个数据表，若每个分区的分桶数一致，其总Tablet数：总Tablet数=分区数分桶数副本数

以table01为例，我们为其设置了3个分区，为每个分区设置了20个分桶，又对分桶后的tablet设置了1副本，则table01的总tablet数= 3 * 20 * 1 = 60 个。查看table01的tablet信息，发现确实共有60个tablet：

1 2	show tablet from table01; 60 rows in set (0.01 sec)

背景及数据准备

本次仅记录一些常见的导入

纽约市交通事故数据

1	curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv

天气数据

1	curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv

用户行为数据集

1	curl -O https://starrocks-datasets.s3.amazonaws.com/user_behavior_ten_million_rows.parquet

file1.csv

1,Lily,21
2,Rose,22
3,Alice,23
4,Julia,24

file2.csv

5,Tony,25
6,Adam,26
7,Allen,27
8,Jacky,28

file3.json
1
{"name": "北京", "code": 2}

从本地文件系统导入(Stream Load)

TIP:http://<fe_host>:<fe_http_port>默认端口号为8030; 也可以是http://<be_host>:<be_http_port>默认端口号为 8040。

table1表或table2表

创建table1表

CREATE TABLE `table1`
   (
       `id` int(11) NOT NULL COMMENT "用户 ID",
       `name` varchar(65533) NULL DEFAULT "" COMMENT "用户姓名",
       `score` int(11) NOT NULL DEFAULT "0" COMMENT "用户得分"
   )
       ENGINE=OLAP
       PRIMARY KEY(`id`)
       DISTRIBUTED BY HASH(`id`);

导入file1.json

curl --location-trusted -u <username>:<password> -H "label:123" \
    -H "Expect:100-continue" \
    -H "column_separator:," \
    -H "columns: id, name, score" \
    -T file1.csv -XPUT \
    http://<fe_host>:<fe_http_port>/api/import_example/table1/_stream_load

table_json表

创建 table_json表

CREATE TABLE `table_json`
(
    `id` int(11) NOT NULL COMMENT "城市 ID",
    `city` varchar(65533) NULL COMMENT "城市名称"
)
ENGINE=OLAP
PRIMARY KEY(`id`)
DISTRIBUTED BY HASH(`id`);

导入file3.json

curl -v --location-trusted -u <username>:<password> -H "strict_mode: true" \
    -H "Expect:100-continue" \
    -H "format: json" -H "jsonpaths: [\"$.name\", \"$.code\"]" \
    -H "columns: city,tmp_id, id = tmp_id * 100" \
    -T file3.json -XPUT \
    http://<fe_host>:<fe_http_port>/api/import_example/table_json/_stream_load

crashdata 表

创建 crashdata 表，用于存储交通事故数据集中的数据。该表的字段经过裁剪，仅包含与该教程相关字段。

CREATE DATABASE IF NOT EXISTS import_example;
USE import_example;

CREATE TABLE IF NOT EXISTS crashdata (
    CRASH_DATE DATETIME,
    BOROUGH STRING,
    ZIP_CODE STRING,
    LATITUDE INT,
    LONGITUDE INT,
    LOCATION STRING,
    ON_STREET_NAME STRING,
    CROSS_STREET_NAME STRING,
    OFF_STREET_NAME STRING,
    CONTRIBUTING_FACTOR_VEHICLE_1 STRING,
    CONTRIBUTING_FACTOR_VEHICLE_2 STRING,
    COLLISION_ID INT,
    VEHICLE_TYPE_CODE_1 STRING,
    VEHICLE_TYPE_CODE_2 STRING
);

导入纽约市交通事故数据

curl --location-trusted -u root             \
    -T ./NYPD_Crash_Data.csv                \
    -H "label:crashdata-0"                  \
    -H "column_separator:,"                 \
    -H "skip_header:1"                      \
    -H "enclose:\""                         \
    -H "max_filter_ratio:1"                 \
    -H "columns:tmp_CRASH_DATE, tmp_CRASH_TIME, CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5" \
    -XPUT http://<fe_host>:<fe_http_port>/api/import_example/crashdata/_stream_load

weatherdata 表

创建 weatherdata 表，用于存储天气数据集中的数据。该表的字段同样经过裁剪，仅包含与该教程相关字段。

CREATE DATABASE IF NOT EXISTS import_example;
USE import_example;
CREATE TABLE IF NOT EXISTS weatherdata (
    DATE DATETIME,
    NAME STRING,
    HourlyDewPointTemperature STRING,
    HourlyDryBulbTemperature STRING,
    HourlyPrecipitation STRING,
    HourlyPresentWeatherType STRING,
    HourlyPressureChange STRING,
    HourlyPressureTendency STRING,
    HourlyRelativeHumidity STRING,
    HourlySkyConditions STRING,
    HourlyVisibility STRING,
    HourlyWetBulbTemperature STRING,
    HourlyWindDirection STRING,
    HourlyWindGustSpeed STRING,
    HourlyWindSpeed STRING
);

导入天气数据

curl --location-trusted -u root             \
    -T ./72505394728.csv                    \
    -H "label:weather-0"                    \
    -H "column_separator:,"                 \
    -H "skip_header:1"                      \
    -H "enclose:\""                         \
    -H "max_filter_ratio:1"                 \
    -H "columns: STATION, DATE, LATITUDE, LONGITUDE, ELEVATION, NAME, REPORT_TYPE, SOURCE, HourlyAltimeterSetting, HourlyDewPointTemperature, HourlyDryBulbTemperature, HourlyPrecipitation, HourlyPresentWeatherType, HourlyPressureChange, HourlyPressureTendency, HourlyRelativeHumidity, HourlySkyConditions, HourlySeaLevelPressure, HourlyStationPressure, HourlyVisibility, HourlyWetBulbTemperature, HourlyWindDirection, HourlyWindGustSpeed, HourlyWindSpeed, Sunrise, Sunset, DailyAverageDewPointTemperature, DailyAverageDryBulbTemperature, DailyAverageRelativeHumidity, DailyAverageSeaLevelPressure, DailyAverageStationPressure, DailyAverageWetBulbTemperature, DailyAverageWindSpeed, DailyCoolingDegreeDays, DailyDepartureFromNormalAverageTemperature, DailyHeatingDegreeDays, DailyMaximumDryBulbTemperature, DailyMinimumDryBulbTemperature, DailyPeakWindDirection, DailyPeakWindSpeed, DailyPrecipitation, DailySnowDepth, DailySnowfall, DailySustainedWindDirection, DailySustainedWindSpeed, DailyWeather, MonthlyAverageRH, MonthlyDaysWithGT001Precip, MonthlyDaysWithGT010Precip, MonthlyDaysWithGT32Temp, MonthlyDaysWithGT90Temp, MonthlyDaysWithLT0Temp, MonthlyDaysWithLT32Temp, MonthlyDepartureFromNormalAverageTemperature, MonthlyDepartureFromNormalCoolingDegreeDays, MonthlyDepartureFromNormalHeatingDegreeDays, MonthlyDepartureFromNormalMaximumTemperature, MonthlyDepartureFromNormalMinimumTemperature, MonthlyDepartureFromNormalPrecipitation, MonthlyDewpointTemperature, MonthlyGreatestPrecip, MonthlyGreatestPrecipDate, MonthlyGreatestSnowDepth, MonthlyGreatestSnowDepthDate, MonthlyGreatestSnowfall, MonthlyGreatestSnowfallDate, MonthlyMaxSeaLevelPressureValue, MonthlyMaxSeaLevelPressureValueDate, MonthlyMaxSeaLevelPressureValueTime, MonthlyMaximumTemperature, MonthlyMeanTemperature, MonthlyMinSeaLevelPressureValue, MonthlyMinSeaLevelPressureValueDate, MonthlyMinSeaLevelPressureValueTime, MonthlyMinimumTemperature, MonthlySeaLevelPressure, MonthlyStationPressure, MonthlyTotalLiquidPrecipitation, MonthlyTotalSnowfall, MonthlyWetBulb, AWND, CDSD, CLDD, DSNW, HDSD, HTDD, NormalsCoolingDegreeDay, NormalsHeatingDegreeDay, ShortDurationEndDate005, ShortDurationEndDate010, ShortDurationEndDate015, ShortDurationEndDate020, ShortDurationEndDate030, ShortDurationEndDate045, ShortDurationEndDate060, ShortDurationEndDate080, ShortDurationEndDate100, ShortDurationEndDate120, ShortDurationEndDate150, ShortDurationEndDate180, ShortDurationPrecipitationValue005, ShortDurationPrecipitationValue010, ShortDurationPrecipitationValue015, ShortDurationPrecipitationValue020, ShortDurationPrecipitationValue030, ShortDurationPrecipitationValue045, ShortDurationPrecipitationValue060, ShortDurationPrecipitationValue080, ShortDurationPrecipitationValue100, ShortDurationPrecipitationValue120, ShortDurationPrecipitationValue150, ShortDurationPrecipitationValue180, REM, BackupDirection, BackupDistance, BackupDistanceUnit, BackupElements, BackupElevation, BackupEquipment, BackupLatitude, BackupLongitude, BackupName, WindEquipmentChangeDate" \
    -XPUT http://<fe_host>:<fe_http_port>/api/import_example/weatherdata/_stream_load

从云储存导入 ✅

将文件放到cos的云储存,导入到StarRocks内
mdDEHG

通过如下语句，把COS存储空间race-1301288126里input文件夹内数据文件user_behavior_ten_million_rows.parquet的数据导入到目标表 user_behavior_replica

建表

CREATE DATABASE IF NOT EXISTS mydatabase;
USE mydatabase;

CREATE TABLE user_behavior_replica
(
    UserID int(11),
    ItemID int(11),
    CategoryID int(11),
    BehaviorType varchar(65533),
    Timestamp varbinary
)
ENGINE = OLAP 
DUPLICATE KEY(UserID)
DISTRIBUTED BY HASH(UserID);

导入任务

LOAD LABEL mydatabase.label_brokerloadtest_502
(
    DATA INFILE("cosn://race-1301288126/input/user_behavior_ten_million_rows.parquet")
    INTO TABLE user_behavior_replica
    FORMAT AS "parquet"
)
WITH BROKER
(
    "fs.cosn.userinfo.secretId" = "xxx",
    "fs.cosn.userinfo.secretKey" = "xxx",
    "fs.cosn.bucket.endpoint_suffix" = "cos.ap-guangzhou.myqcloud.com"
)
PROPERTIES
(
    "timeout" = "3600"
);

通过如下语句，把COS存储空间race-1301288126里input文件夹内数据文件file1.csv、file2.csv的数据导入到目标表table1、table2：

建表

CREATE TABLE `table1`
   (
       `id` int(11) NOT NULL COMMENT "用户 ID",
       `name` varchar(65533) NULL DEFAULT "" COMMENT "用户姓名",
       `score` int(11) NOT NULL DEFAULT "0" COMMENT "用户得分"
   )
       ENGINE=OLAP
       PRIMARY KEY(`id`)
       DISTRIBUTED BY HASH(`id`);
          
CREATE TABLE `table2`
   (
       `id` int(11) NOT NULL COMMENT "用户 ID",
       `name` varchar(65533) NULL DEFAULT "" COMMENT "用户姓名",
       `score` int(11) NOT NULL DEFAULT "0" COMMENT "用户得分"
   )
       ENGINE=OLAP
       PRIMARY KEY(`id`)
       DISTRIBUTED BY HASH(`id`);

导入单个文件到单表

LOAD LABEL mydatabase.label_brokerloadtest_501
(
    DATA INFILE("cosn://race-1301288126/input/file1.csv")
    INTO TABLE table1
    COLUMNS TERMINATED BY ","
    (id, name, score)
)
WITH BROKER
(
    "fs.cosn.userinfo.secretId" = "xxx",
    "fs.cosn.userinfo.secretKey" = "xxx",
    "fs.cosn.bucket.endpoint_suffix" = "cos.ap-guangzhou.myqcloud.com"
)
PROPERTIES
(
    "timeout" = "3600"
);

导入多个文件到单表

LOAD LABEL mydatabase.label_brokerloadtest_402
(
    DATA INFILE("cosn://race-1301288126/input/*")
    INTO TABLE table1
    COLUMNS TERMINATED BY ","
    (id, name, score)
)
WITH BROKER
(
    "fs.cosn.userinfo.secretId" = "xxx",
    "fs.cosn.userinfo.secretKey" = "xxx",
    "fs.cosn.bucket.endpoint_suffix" = "cos.ap-guangzhou.myqcloud.com"
)
PROPERTIES
(
    "timeout" = "3600"
);

导入多个数据文件到多表

LOAD LABEL mydatabase.label_brokerloadtest_403
(
    DATA INFILE("cosn://race-1301288126/input/file1.csv")
    INTO TABLE table1
    COLUMNS TERMINATED BY ","
    (id, name, score)
    ,
    DATA INFILE("cosn://race-1301288126/input/file2.csv")
    INTO TABLE table2
    COLUMNS TERMINATED BY ","
    (id, name, score)
)
WITH BROKER
(
    "fs.cosn.userinfo.secretId" = "xxx",
    "fs.cosn.userinfo.secretKey" = "xxx",
    "fs.cosn.bucket.endpoint_suffix" = "cos.ap-guangzhou.myqcloud.com"
);
PROPERTIES
(
    "timeout" = "3600"
);

其他数据集

1.3 亿条亚马逊产品的用户评论信息,总大小约为 37GB

每行包含用户 ID（customer_id）、评论 ID（review_id）、已购买产品 ID（product_id）、产品分类（product_category）、评分（star_rating）、评论标题（review_headline）、评论内容（review_body）等 15 列信息。
amazon_reviews/amazon_reviews_2010
amazon_reviews/amazon_reviews_2011
amazon_reviews/amazon_reviews_2012
amazon_reviews/amazon_reviews_2013
amazon_reviews/amazon_reviews_2014
amazon_reviews/amazon_reviews_2015

建表

CREATE TABLE `amazon_reviews` (  
  `review_date` int(11) NULL,  
  `marketplace` varchar(20) NULL,  
  `customer_id` bigint(20) NULL,  
  `review_id` varchar(40) NULL,
  `product_id` varchar(10) NULL,
  `product_parent` bigint(20) NULL,
  `product_title` varchar(500) NULL,
  `product_category` varchar(50) NULL,
  `star_rating` smallint(6) NULL,
  `helpful_votes` int(11) NULL,
  `total_votes` int(11) NULL,
  `vine` boolean NULL,
  `verified_purchase` boolean NULL,
  `review_headline` varchar(500) NULL,
  `review_body` string NULL
) ENGINE=OLAP
DUPLICATE KEY(`review_date`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`review_date`);

同步物化视图

基础数据准备
以下示例基于表 sales_records，其中包含每笔交易的交易 ID record_id、销售员 seller_id、售卖门店 store_id、销售时间 sale_date 以及销售额 sale_amt。

建表

CREATE TABLE sales_records(
    record_id INT,
    seller_id INT,
    store_id INT,
    sale_date DATE,
    sale_amt BIGINT
) DISTRIBUTED BY HASH(record_id);

INSERT INTO sales_records
VALUES
    (001,01,1,"2022-03-13",8573),
    (002,02,2,"2022-03-14",6948),
    (003,01,1,"2022-03-14",4319),
    (004,03,3,"2022-03-15",8734),
    (005,03,3,"2022-03-16",4212),
    (006,02,2,"2022-03-17",9515);

业务需求
该示例业务场景需要频繁分析不同门店的销售额，则查询需要大量调用 sum() 函数，耗费大量系统资源。可使用 EXPLAIN 命令查看此查询的 Query Profile。

1
2
3

SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;

其 Query Profile 中的 rollup 项显示为 sales_records（即基表），说明该查询未使用物化视图加速。

MySQL > EXPLAIN SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String                                                              |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                             |
|  OUTPUT EXPRS:3: store_id | 6: sum                                          |
|   PARTITION: UNPARTITIONED                                                  |
|                                                                             |
|   RESULT SINK                                                               |
|                                                                             |
|   4:EXCHANGE                                                                |
|                                                                             |
| PLAN FRAGMENT 1                                                             |
|  OUTPUT EXPRS:                                                              |
|   PARTITION: HASH_PARTITIONED: 3: store_id                                  |
|                                                                             |
|   STREAM DATA SINK                                                          |
|     EXCHANGE ID: 04                                                         |
|     UNPARTITIONED                                                           |
|                                                                             |
|   3:AGGREGATE (merge finalize)                                              |
|   |  output: sum(6: sum)                                                    |
|   |  group by: 3: store_id                                                  |
|   |                                                                         |
|   2:EXCHANGE                                                                |
|                                                                             |
| PLAN FRAGMENT 2                                                             |
|  OUTPUT EXPRS:                                                              |
|   PARTITION: RANDOM                                                         |
|                                                                             |
|   STREAM DATA SINK                                                          |
|     EXCHANGE ID: 02                                                         |
|     HASH_PARTITIONED: 3: store_id                                           |
|                                                                             |
|   1:AGGREGATE (update serialize)                                            |
|   |  STREAMING                                                              |
|   |  output: sum(5: sale_amt)                                               |
|   |  group by: 3: store_id                                                  |
|   |                                                                         |
|   0:OlapScanNode                                                            |
|      TABLE: sales_records                                                   |
|      PREAGGREGATION: ON                                                     |
|      partitions=1/1                                                         |
|      rollup: sales_records     <------                                      |
|      tabletRatio=10/10                                                      |
|      tabletList=12049,12053,12057,12061,12065,12069,12073,12077,12081,12085 |
|      cardinality=1                                                          |
|      avgRowSize=2.0                                                         |
|      numNodes=0                                                             |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)

创建同步物化视图

CREATE MATERIALIZED VIEW store_amt 
AS
SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;

验证查询是否命中同步物化视图

MySQL > EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String                                                              |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                             |
|  OUTPUT EXPRS:3: store_id | 6: sum                                          |
|   PARTITION: UNPARTITIONED                                                  |
|                                                                             |
|   RESULT SINK                                                               |
|                                                                             |
|   4:EXCHANGE                                                                |
|                                                                             |
| PLAN FRAGMENT 1                                                             |
|  OUTPUT EXPRS:                                                              |
|   PARTITION: HASH_PARTITIONED: 3: store_id                                  |
|                                                                             |
|   STREAM DATA SINK                                                          |
|     EXCHANGE ID: 04                                                         |
|     UNPARTITIONED                                                           |
|                                                                             |
|   3:AGGREGATE (merge finalize)                                              |
|   |  output: sum(6: sum)                                                    |
|   |  group by: 3: store_id                                                  |
|   |                                                                         |
|   2:EXCHANGE                                                                |
|                                                                             |
| PLAN FRAGMENT 2                                                             |
|  OUTPUT EXPRS:                                                              |
|   PARTITION: RANDOM                                                         |
|                                                                             |
|   STREAM DATA SINK                                                          |
|     EXCHANGE ID: 02                                                         |
|     HASH_PARTITIONED: 3: store_id                                           |
|                                                                             |
|   1:AGGREGATE (update serialize)                                            |
|   |  STREAMING                                                              |
|   |  output: sum(5: sale_amt)                                               |
|   |  group by: 3: store_id                                                  |
|   |                                                                         |
|   0:OlapScanNode                                                            |
|      TABLE: sales_records                                                   |
|      PREAGGREGATION: ON                                                     |
|      partitions=1/1                                                         |
|      rollup: store_amt    <------                                           |
|      tabletRatio=10/10                                                      |
|      tabletList=12092,12096,12100,12104,12108,12112,12116,12120,12124,12128 |
|      cardinality=6                                                          |
|      avgRowSize=2.0                                                         |
|      numNodes=0                                                             |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)

可以看到，此时 Query Profile 中的 rollup 项显示为 store_amt（即同步物化视图），说明该查询已命中同步物化视图。

异步物化视图

基础数据准备
以下示例基于 Default Catalog 中的两张基表：
- 表 goods 包含产品 ID item_id1、产品名称 item_name 和产品价格 price。
- 表 order_list 包含订单 ID order_id、客户 ID client_id 和产品 ID item_id2。
- 其中 item_id1 与 item_id2 等价。

建表

CREATE TABLE goods(
    item_id1          INT,
    item_name         STRING,
    price             FLOAT
) DISTRIBUTED BY HASH(item_id1);

INSERT INTO goods
VALUES
    (1001,"apple",6.5),
    (1002,"pear",8.0),
    (1003,"potato",2.2);

CREATE TABLE order_list(
    order_id          INT,
    client_id         INT,
    item_id2          INT,
    order_date        DATE
) DISTRIBUTED BY HASH(order_id);

INSERT INTO order_list
VALUES
    (10001,101,1001,"2022-03-13"),
    (10001,101,1002,"2022-03-13"),
    (10002,103,1002,"2022-03-13"),
    (10002,103,1003,"2022-03-14"),
    (10003,102,1003,"2022-03-14"),
    (10003,102,1001,"2022-03-14");

业务需求
该示例业务场景需要频繁分析订单总额，则查询需要将两张表关联并调用 sum() 函数，根据订单 ID 和总额生成一张新表。除此之外，该业务场景需要每天刷新订单总额。
1
2
3
4
5
SELECT
order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;

创建异步物化视图
以下示例根据上述查询语句，基于表 goods 和表 order_list 创建一个“以订单 ID 为分组，对订单中所有商品价格求和”的异步物化视图，并设定其刷新方式为 ASYNC，每天自动刷新。

CREATE MATERIALIZED VIEW order_mv
DISTRIBUTED BY HASH(`order_id`)
REFRESH ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)
AS SELECT
    order_list.order_id,
    sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;

管理异步物化视图

查询异步物化视图

MySQL > SELECT * FROM order_mv;
+----------+--------------------+
| order_id | total              |
+----------+--------------------+
|    10001 |               14.5 |
|    10002 | 10.200000047683716 |
|    10003 |  8.700000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec)

修改异步物化视图

// 启用被禁用的异步物化视图（将物化视图的状态设置为 Active）。
ALTER MATERIALIZED VIEW order_mv ACTIVE;
// 修改异步物化视图名称为 order_total
ALTER MATERIALIZED VIEW order_mv RENAME order_total;
// 修改异步物化视图的最大刷新间隔为 2 天。
ALTER MATERIALIZED VIEW order_mv REFRESH ASYNC EVERY(INTERVAL 2 DAY);

查看异步物化视图

1
2
3

SHOW MATERIALIZED VIEWS;
SHOW MATERIALIZED VIEWS WHERE NAME = "order_mv";
SHOW MATERIALIZED VIEWS WHERE NAME LIKE "order%";

查看异步物化视图创建语句

1	SHOW CREATE MATERIALIZED VIEW order_mv;

查看异步物化视图的执行状态

1	select * from information_schema.tasks order by CREATE_TIME desc limit 1\G;

删除异步物化视图
1
DROP MATERIALIZED VIEW order_mv;

StarRocks之数据模型实践

发表于 2024-01-14 阅读次数：

目前StarRocks根据摄入数据和实际存储数据之间的映射关系，分为明细模型（Duplicate key）、聚合模型（Aggregate key）、更新模型（Unique key）和主键模型（Primary key）。

四种模型分别对应不同业务场景

明细模型

StarRocks建表默认采用明细模型，排序列使用稀疏索引，可以快速过滤数据。明细模型用于保存所有历史数据，并且用户可以考虑将过滤条件中频繁使用的维度列作为排序键，比如用户经常需要查看某一时间，可以将事件时间和事件类型作为排序键。

建表，在建表时指定模型和排序键

CREATE TABLE IF NOT EXISTS detail (
  event_time DATETIME NOT NULL COMMENT "datetime of event",
  event_type INT NOT NULL COMMENT "type of event",
  user_id INT COMMENT "id of user",
  device_code INT COMMENT "device of ",
  channel INT COMMENT "" 
) 
DUPLICATE KEY ( event_time, event_type ) 
DISTRIBUTED BY HASH ( user_id );

插入测试数据

INSERT INTO detail VALUES('2021-11-18 12:00:00.00',1,1001,1,1);
INSERT INTO detail VALUES('2021-11-17 12:00:00.00',2,1001,1,1);
INSERT INTO detail VALUES('2021-11-16 12:00:00.00',3,1001,1,1);
INSERT INTO detail VALUES('2021-11-15 12:00:00.00',1,1001,1,1);
INSERT INTO detail VALUES('2021-11-14 12:00:00.00',2,1001,1,1);

查询

SELECT *FROM detail;

+---------------------+------------+---------+-------------+---------+
| event_time          | event_type | user_id | device_code | channel |
+---------------------+------------+---------+-------------+---------+
| 2021-11-18 12:00:00 |          1 |    1001 |           1 |       1 |
| 2021-11-17 12:00:00 |          2 |    1001 |           1 |       1 |
| 2021-11-16 12:00:00 |          3 |    1001 |           1 |       1 |
| 2021-11-15 12:00:00 |          1 |    1001 |           1 |       1 |
| 2021-11-14 12:00:00 |          2 |    1001 |           1 |       1 |
+---------------------+------------+---------+-------------+---------+

聚合模型

在数据分析中，很多场景需要基于明细数据进行统计和汇总，这个时候就可以使用聚合模型了。比如：统计app访问流量、用户访问时长、用户访问次数、展示总量、消费统计等等场景。
适合聚合模型来分析的业务场景有以下特点：

业务方进行查询为汇总类查询，比如sum、count、max
不需要查看原始明细数据
老数据不会被频繁修改，只会追加和新增

建表，指定聚合模型

CREATE TABLE IF NOT EXISTS aggregate_tbl (
  site_id LARGEINT NOT NULL COMMENT "id of site",
  date DATE NOT NULL COMMENT "time of event",
  city_code VARCHAR ( 20 ) COMMENT "city_code of user",
  pv BIGINT SUM DEFAULT "0" COMMENT "total page views",
  mt BIGINT MAX
) 
AGGREGATE KEY(site_id, date, city_code)
DISTRIBUTED BY HASH (site_id);

插入测试数据

INSERT INTO aggregate_tbl VALUES(1001,'2021-11-18 12:00:00.00',100,1,5);
INSERT INTO aggregate_tbl VALUES(1001,'2021-11-18 12:00:00.00',100,1,10);
INSERT INTO aggregate_tbl VALUES(1001,'2021-11-18 12:00:00.00',100,1,15);
INSERT INTO aggregate_tbl VALUES(1001,'2021-11-18 12:00:00.00',100,1,100);
INSERT INTO aggregate_tbl VALUES(1001,'2021-11-18 12:00:00.00',100,1,20);
INSERT INTO aggregate_tbl VALUES(1002,'2021-11-18 12:00:00.00',100,1,5);
INSERT INTO aggregate_tbl VALUES(1002,'2021-11-18 12:00:00.00',100,3,25);
INSERT INTO aggregate_tbl VALUES(1002,'2021-11-18 12:00:00.00',100,1,15);

查询测试数据，可以看到pv是sum累计的值，mt是明细中最大的值。如果只需要查看聚合后的指标，那么使用此种模型将会大大减少存储的数据量。

select *from aggregate_tbl;

+---------+------------+-----------+------+------+
| site_id | date       | city_code | pv   | mt   |
+---------+------------+-----------+------+------+
| 1001    | 2021-11-18 | 100       |    5 |  100 |
| 1002    | 2021-11-18 | 100       |    5 |   25 |
+---------+------------+-----------+------+------+

更新模型

有些分析场景之下，数据需要进行更新比如拉链表，StarRocks则采用更新模型来满足这种需求,比如电商场景中，订单的状态经常会发生变化，每天的订单更新量可突破上亿。这种业务场景下，如果只靠明细模型下通过delete+insert的方式，是无法满足频繁更新需求的，因此，用户需要使用更新模型（唯一键来判断唯一性）来满足分析需求。但是如果用户需要更加实时/频繁的更新操作，建议使用主键模型。
使用更新模型的场景特点：

已经写入的数据有大量的更新需求
需要进行实时数据分析

建表，指定更新模型

CREATE TABLE IF NOT EXISTS update_detail (
	create_time DATE NOT NULL COMMENT "create time of an order",
	order_id BIGINT NOT NULL COMMENT "id of an order",
	order_state INT COMMENT "state of an order",
	total_price BIGINT COMMENT "price of an order" 
) 
UNIQUE KEY ( create_time, order_id ) 
DISTRIBUTED BY HASH ( order_id );

插入测试数据，注意：现在是指定create_time和order_id为唯一键，那么相同日期相同订单的数据会进行覆盖操作

INSERT INTO update_detail VALUES('2011-11-18',1001,1,1000);
INSERT INTO update_detail VALUES('2011-11-18',1001,2,2000);
INSERT INTO update_detail VALUES('2011-11-17',1001,2,500);
INSERT INTO update_detail VALUES('2011-11-18',1002,3,3000);
INSERT INTO update_detail VALUES('2011-11-18',1002,4,4500);

查询结果，可以看到如果日期和订单相同则会进行覆盖操作。

select *from update_detail;

+-------------+----------+-------------+-------------+
| create_time | order_id | order_state | total_price |
+-------------+----------+-------------+-------------+
| 2011-11-17  |     1001 |           2 |         500 |
| 2011-11-18  |     1001 |           2 |        2000 |
| 2011-11-18  |     1002 |           4 |        4500 |
+-------------+----------+-------------+-------------+

主键模型

相比较更新模型，主键模型可以更好地支持实时/频繁更新的功能。虽然更新模型也可以实现实时对数据的更新，但是更新模型采用Merge on Read读时合并策略会大大限制查询功能，在主键模型更好地解决了行级的更新操作。配合Flink-connector-starrocks可以完成Mysql CDC实时同步的方案。
需要注意的是：由于存储引擎会为主键建立索引，导入数据时会把索引加载到内存中，所以主键模型对内存的要求更高，所以不适合主键模型的场景还是比较多的。
目前比较适合使用主键模型的场景有这两种：

数据冷热特征，比如最近几天的数据才需要修改，老的冷数据很少需要修改，比如订单数据，老的订单完成后就不在更新，并且分区是按天进行分区的，那么在导入数据时历史分区的数据的主键就不会被加载，也就不会占用内存了，内存中仅会加载近几天的索引。
大宽表（数百列数千列），主键只占整个数据的很小一部分，内存开销比较低。比如用户状态/画像表，虽然列非常多，但总的用户数量不大（千万-亿级别），主键索引内存占用相对可控。

原理：由于更新模型采用Merge策略，使得谓词无法下推和索引无法使用，严重影响查询性能。所以主键模型通过主键约束，保证同一个主键仅存一条数据的记录，这样就规避了Merge操作。StarRocks收到对某记录的更新操作时，会通过主键索引找到该条数据的位置，并对其标记为删除，再插入一条数据，相当于把update改写为delete+insert

建表，指定主键模型

CREATE TABLE IF NOT EXISTS users (
	user_id BIGINT NOT NULL,
	name STRING NOT NULL,
	email STRING NULL,
	address STRING NULL,
	age TINYINT NULL,
	sex TINYINT NULL 
)
PRIMARY KEY ( user_id ) 
DISTRIBUTED BY HASH (user_id);

插入测试数据，和更新模型类似，当user_id相同发送冲突时会进行覆盖

INSERT INTO users VALUES(1001,'张三','zhang@qq.com','address1',17,'0');
INSERT INTO users VALUES(1001,'李四','li@qq.com','address2',18,'1');
INSERT INTO users VALUES(1002,'alice','alice@qq.com','address3',18,'0');
INSERT INTO users VALUES(1002,'peter','peter@qq.com','address4',18,'1');

查询数据

select *from users;
+---------+--------+--------------+----------+------+------+
| user_id | name   | email        | address  | age  | sex  |
+---------+--------+--------------+----------+------+------+
|    1001 | 李四   | li@qq.com    | address2 |   18 |    1 |
|    1002 | peter  | peter@qq.com | address4 |   18 |    1 |
+---------+--------+--------------+----------+------+------+

其他

StarRocks入门教程

StarRocks之CVM基于cos存算分离架构实践

发表于 2024-01-13 更新于 2024-02-10 阅读次数：

存算分离是真香

StarRocks 存算分离集群采用了存储计算分离架构，特别为云存储设计。在存算分离的模式下，StarRocks 将数据存储在对象存储（例如 AWS S3、GCS、OSS、Azure Blob 以及 MinIO）或 HDFS 中，而本地盘作为热数据缓存，用以加速查询。通过存储计算分离架构，您可以降低存储成本并且优化资源隔离。除此之外，集群的弹性扩展能力也得以加强。在查询命中缓存的情况下，存算分离集群的查询性能与存算一体集群性能一致。

在 v3.1 版本及更高版本中，StarRocks 存算分离集群由 FE 和 CN 组成。CN 取代了存算一体集群中的 BE。

相对存算一体架构，StarRocks 的存储计算分离架构提供以下优势：

廉价且可无缝扩展的存储。
弹性可扩展的计算能力。由于数据不存储在 CN 节点中，因此集群无需进行跨节点数据迁移或 Shuffle 即可完成扩缩容。
热数据的本地磁盘缓存，用以提高查询性能。
可选异步导入数据至对象存储，提高导入效率。

背景

本例背景以腾讯云COS作为存储卷，starrocks-3.2.2，1fe1cn进行部署

腾讯云服务器:172.16.0.4,4C8G,确保有jdk11环境

启动FE节点

1、创建元数据存储路径

1	mkdir -p /opt/downloads/StarRocks-3.2.2/fe/meta

在配置项 meta_dir 中指定元数据路径

1	meta_dir = /opt/downloads/StarRocks-3.2.2/fe/meta

2、增加存算分离配置

fe.conf 增加 run_mode，将原来的默认：shared_nothing 变更为 shared_data
其他的不用变更，会有默认值如：
cloud_native_meta_port 默认 6090,
enable_load_volume_from_conf 默认 true
cloud_native_storage_type 默认 S3
我们按照官方示例:

run_mode = shared_data
cloud_native_meta_port = <meta_port>
cloud_native_storage_type = S3

# 如 testbucket/subpath
aws_s3_path = <s3_path>
# 例如：ap-beijing
aws_s3_region = <region>
# 例如：https://cos.ap-beijing.myqcloud.com
aws_s3_endpoint = <endpoint_url>
aws_s3_access_key = <access_key>
aws_s3_secret_key = <secret_key>

结合我们实际最终在fe.conf添加如下配置

meta_dir = /opt/downloads/StarRocks-3.2.2/fe/meta

run_mode = shared_data
aws_s3_path = dev-files-1253767413/starrocks
aws_s3_region = ap-guangzhou
aws_s3_endpoint = https://cos.ap-guangzhou.myqcloud.com
aws_s3_access_key = <access_key>
aws_s3_secret_key = <secret_key>

3、启动 FE 节点
1
./fe/bin/start_fe.sh --daemon
4、查看 FE 日志，检查 FE 节点是否启动成功。
1
cat fe/log/fe.log | grep thrift
如果日志打印以下内容，则说明该 FE 节点启动成功：
“2024-01-13 16:12:29,911 INFO (UNKNOWN x.x.x.x_9010_1660119137253(-1)|1) [FeServer.start():52] thrift server started with port 9020.”

启动 CN 服务

Compute Node（CN）是一种无状态的计算服务，本身不存储数据。您可以通过添加 CN 节点为查询提供额外的计算资源。您可以使用 BE 部署文件部署 CN 节点。

因为使用默认端口配置，所以我们无需修改任何CN配置，如果需要变更，可到be/conf/cn.conf进行变更端口

1、启动 CN 节点
1
./be/bin/start_cn.sh --daemon
查看 CN 日志，检查 CN 节点是否启动成功
1
cat be/log/cn.INFO | grep heartbeat
如果日志打印以下内容，则说明该 CN 节点启动成功：
“I0313 15:03:45.820030 412450 thrift_server.cpp:375] heartbeat has started listening port on 9050”

搭建集群

我们下面通过 MySQL 客户端来连接 Starrocks FE，下载免安装的 MySQL 客户端

1、通过 MySQL 客户端连接到 StarRocks。您需要使用初始用户 root 登录，密码默认为空。
1
2
# mysql -h <fe_address> -P<query_port> -uroot
mysql -uroot -P9030 -h127.0.0.1

2、执行以下 SQL 查看 Leader FE 节点状态。

mysql> SHOW PROC '/frontends'\G
*************************** 1. row ***************************
             Name: 172.16.0.4_9010_1705155779443
               IP: 172.16.0.4
      EditLogPort: 9010
         HttpPort: 8030
        QueryPort: 9030
          RpcPort: 9020
             Role: LEADER
        ClusterId: 66234781
             Join: true
            Alive: true
ReplayedJournalId: 1895
    LastHeartbeat: 2024-01-14 00:07:14
         IsHelper: true
           ErrMsg: 
        StartTime: 2024-01-13 22:23:08
          Version: 3.2.2-269e832
1 row in set (0.04 sec)

如果字段 Alive 为 true，说明该 FE 节点正常启动并加入集群。
如果字段 Role 为 FOLLOWER，说明该 FE 节点有资格被选为 Leader FE 节点。
如果字段 Role 为 LEADER，说明该 FE 节点为 Leader FE 节点。

3、添加 CN 节点至集群。

1	ALTER SYSTEM ADD COMPUTE NODE "172.16.0.4:9050"

4、执行以下 SQL 查看 CN 节点状态。

mysql> SHOW PROC '/compute_nodes'\G
*************************** 1. row ***************************
        ComputeNodeId: 11083
                   IP: 172.16.0.4
        HeartbeatPort: 9050
               BePort: 9060
             HttpPort: 8040
             BrpcPort: 8060
        LastStartTime: 2024-01-13 23:24:02
        LastHeartbeat: 2024-01-14 00:09:59
                Alive: true
 SystemDecommissioned: false
ClusterDecommissioned: false
               ErrMsg: 
              Version: 3.2.2-269e832
             CpuCores: 4
    NumRunningQueries: 0
           MemUsedPct: 1.61 %
           CpuUsedPct: 0.2 %
       HasStoragePath: true
          StarletPort: 9070
             WorkerId: 1
1 row in set (0.00 sec)

如果字段 Alive 为 true，说明该 CN 节点正常启动并加入集群。

其他

停止 FE 节点。
1
./fe/bin/stop_fe.sh --daemon
停止 BE 节点。
1
./be/bin/stop_be.sh --daemon
停止 CN 节点。
1
./be/bin/stop_cn.sh --daemon

重置root密码

1	SET PASSWORD for root = PASSWORD('xxxxxx');

k8s使用nfs作为动态storageClass存储

发表于 2023-08-18 更新于 2024-01-13 阅读次数：

概述

相信使用过pv和pvc的肯定会想到很多问题，比如每次申请 pvc 都需要手动添加pv，这岂不是太不方便了。那我们如何实现类似于公有云或者私有云的共享存储模式呢？kubernetes 提供了 storageclass 的概念，接下来我们来一探究竟。

先上一张图大家就比较清楚了：
v354K4

环境

k8s集群环境

Node(宿主机上)都要安装nfs

1 2	[root@node-1 ~]# yum -y install nfs-utils [root@node-2 ~]# yum -y install nfs-utils

nfs 环境

搭建nfs服务端

yum -y install rpcbind nfs-utils

systemctl start rpcbind
systemctl start nfs
systemctl enable rpcbind 
systemctl enable nfs

mkdir  /home/nfsfile
chmod  -R  777  /home/nfsfile
cd  /home/nfsfile
echo  "This is a test file" > /nfsfile/test.txt

vi /etc/exports

1	/home/nfsfile *(rw,sync,root_squash,insecure)

这行代码的意思是把共享目录 /home/nfsfile 共享给 * 这个客户端ip，后面括号里的内容是权限参数，其中：

rw 表示设置目录可读写。
sync 表示数据会同步写入到内存和硬盘中，相反 rsync 表示数据会先暂存于内存中，而非直接写入到硬盘中。
no_root_squash NFS客户端连接服务端时如果使用的是root的话，那么对服务端分享的目录来说，也拥有root权限。
no_all_squash 不论NFS客户端连接服务端时使用什么用户，对服务端分享的目录来说都不会拥有匿名用户权限。
showmount -e localhost
1
2
Export list for localhost:
/home/nfsfile *

客户端验证nfs
我们在客户端执行以下命令：
showmount -e 10.8.111.153

1 2	Exports list on 10.8.111.153: /home/nfsfile *

客户端开始挂载共享目录：

1 2	mkdir nfsfile　　# 客户端新建挂载点 mount -t nfs 10.8.111.153:/home/nfsfile /root/nfsfile　　# 挂载服务端共享目录到新创建的挂载点

客户端验证是否挂载成功：

1
2

cd /root/nfsfile　　# 进入该目录后，将会看到之前在服务端创建的 test.txt 文件
cat test.txt　　# 打开后，发现文件内容与服务端文件内容的一致。说明本次 nfs 共享文件系统搭建成功！

最后，如果需要永久挂载该共享目录（即实现开机自动挂载），则可以通过如下方式实现：

1	echo "mount -t nfs 10.8.111.153:/home/nfsfile /root/nfsfile" >> /etc/rc.d/rc.local　　# 将挂载命令写入 rc.local

直接pod挂载nfs

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      name: nginx
  template:
    metadata:
      labels:
        name: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          ports:
          - containerPort: 80
          volumeMounts:
            - name: wwwroot
              mountPath: /usr/share/nginx/html
      volumes:
        - name: wwwroot
          nfs:
            server: 10.8.111.153
            path: "/home/nfsfile/www"

使用storageClass、pv、pvc

rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storages
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: storages
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storages
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storages
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: storages
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

nfs-subdir-external-provisioner.yaml

kind: Deployment
apiVersion: apps/v1
metadata:
  name: nfs-client-provisioner
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nfs-client-provisioner
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          # image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
          image: k8s.dockerproxy.com/sig-storage/nfs-subdir-external-provisioner:v4.0.2
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: k8s-sigs.io/nfs-subdir-external-provisioner
            - name: NFS_SERVER
              # value: <YOUR NFS SERVER HOSTNAME>
              value: 10.8.111.153
            - name: NFS_PATH
              # value: /var/nfs
              value: /home/nfsfile
      volumes:
        - name: nfs-client-root
          nfs:
            # server: <YOUR NFS SERVER HOSTNAME>
            server: 10.8.111.153
            path: /home/nfsfile

nfs-storage-class.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}" # 此处也可以使用 "${.PVC.namespace}/${.PVC.name}" 来使用pvc的名称作为nfs中真实目录名称
  onDelete: delete

nfs-test-pvc.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-pvc
  annotations:
    nfs.io/storage-path: "test-path" # not required, depending on whether this annotation was shown in the storage class description
spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi

nfs-test-nginx-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: test-nginx-pod
spec:
  containers:
    - name: nginx
      image: nginx:latest
      volumeMounts:
        - name: nginx-data
          mountPath: /usr/share/nginx/html
  volumes:
    - name: nginx-data
      persistentVolumeClaim:
        claimName: test-pvc

其他

https://blog.51cto.com/u_16175526/6718397
https://blog.51cto.com/u_16213459/7344688
https://blog.csdn.net/qq_30051761/article/details/131055705

Go 指针

发表于 2023-07-23 更新于 2024-01-13 阅读次数：

用GO搭建物联网平台一段时间了，还是依然在指针上习惯性误用，汇总一下

指针地址和指针类型

一个指针变量可以指向任何一个值的内存地址，它所指向的值的内存地址在 32 和 64 位机器上分别占用 4 或 8 个字节，占用字节的大小与所指向的值的大小无关。当一个指针被定义后没有分配到任何变量时，它的默认值为 nil。指针变量通常缩写为 ptr。
每个变量在运行时都拥有一个地址，这个地址代表变量在内存中的位置。Go语言中使用在变量名前面添加&操作符（前缀）来获取变量的内存地址（取地址操作），格式如下：

1	ptr := &v // v 的类型为 T

其中 v 代表被取地址的变量，变量 v 的地址使用变量 ptr 进行接收，ptr 的类型为T，称做 T 的指针类型，代表指针。
指针实际用法，可以通过下面的例子了解：

package main
import (
    "fmt"
)
func main() {
    var cat int = 1
    var str string = "banana"
    fmt.Printf("%p %p", &cat, &str)
}

运行结果：

1	0xc042052088 0xc0420461b0

代码说明如下：

第 8 行，声明整型变量 cat。
第 9 行，声明字符串变量 str。
第 10 行，使用 fmt.Printf 的动词%p打印 cat 和 str 变量的内存地址，指针的值是带有0x十六进制前缀的一组数据。

提示：变量、指针和地址三者的关系是，每个变量都拥有地址，指针的值就是地址。

从指针获取指针指向的值

当使用&操作符对普通变量进行取地址操作并得到变量的指针后，可以对指针使用*操作符，也就是指针取值，代码如下

package main

import (
    "fmt"
)

func main() {

    // 准备一个字符串类型
    var house = "Malibu Point 10880, 90265"

    // 对字符串取地址, ptr类型为*string
    ptr := &house

    // 打印ptr的类型
    fmt.Printf("ptr type: %T\n", ptr)

    // 打印ptr的指针地址
    fmt.Printf("address: %p\n", ptr)

    // 对指针进行取值操作
    value := *ptr

    // 取值后的类型
    fmt.Printf("value type: %T\n", value)

    // 指针取值后就是指向变量的值
    fmt.Printf("value: %s\n", value)

}

运行结果：

ptr type: *string
address: 0xc0420401b0
value type: string
value: Malibu Point 10880, 90265

代码说明如下：

第 10 行，准备一个字符串并赋值。
第 13 行，对字符串取地址，将指针保存到变量 ptr 中。
第 16 行，打印变量 ptr 的类型，其类型为 *string。
第 19 行，打印 ptr 的指针地址，地址每次运行都会发生变化。
第 22 行，对 ptr 指针变量进行取值操作，变量 value 的类型为 string。
第 25 行，打印取值后 value 的类型。
第 28 行，打印 value 的值。

取地址操作符&和取值操作符是一对互补操作符，&取出地址，根据地址取出地址指向的值。

变量、指针地址、指针变量、取地址、取值的相互关系和特性如下：

对变量进行取地址操作使用&操作符，可以获得这个变量的指针变量。
指针变量的值是指针地址。
对指针变量进行取值操作使用*操作符，可以获得指针变量指向的原变量的值。

使用指针修改值

通过指针不仅可以取值，也可以修改值。

前面已经演示了使用多重赋值的方法进行数值交换，使用指针同样可以进行数值交换，代码如下：

package main

import "fmt"

// 交换函数
func swap(a, b *int) {

    // 取a指针的值, 赋给临时变量t
    t := *a

    // 取b指针的值, 赋给a指针指向的变量
    *a = *b

    // 将a指针的值赋给b指针指向的变量
    *b = t
}

func main() {

// 准备两个变量, 赋值1和2
    x, y := 1, 2

    // 交换变量值
    swap(&x, &y)

    // 输出变量值
    fmt.Println(x, y)
}

运行结果：

2 1

代码说明如下：

第 6 行，定义一个交换函数，参数为 a、b，类型都为 *int 指针类型。
第 9 行，取指针 a 的值，并把值赋给变量 t，t 此时是 int 类型。
第 12 行，取 b 的指针值，赋给指针 a 指向的变量。注意，此时*a的意思不是取 a 指针的值，而是“a 指向的变量”。
第 15 行，将 t 的值赋给指针 b 指向的变量。
第 21 行，准备 x、y 两个变量，分别赋值为 1 和 2，类型为 int。
第 24 行，取出 x 和 y 的地址作为参数传给 swap() 函数进行调用。
第 27 行，交换完毕时，输出 x 和 y 的值。

操作符作为右值时，意义是取指针的值，作为左值时，也就是放在赋值操作符的左边时，表示 a 指针指向的变量。其实归纳起来，操作符的根本意义就是操作指针指向的变量。当操作在右值时，就是取指向变量的值，当操作在左值时，就是将值设置给指向的变量。

如果在 swap() 函数中交换操作的是指针值，会发生什么情况？可以参考下面代码：

package main

import "fmt"

func swap(a, b *int) {
    b, a = a, b
}

func main() {
    x, y := 1, 2
    swap(&x, &y)
    fmt.Println(x, y)
}

运行结果：

1 2

结果表明，交换是不成功的。上面代码中的 swap() 函数交换的是 a 和 b 的地址，在交换完毕后，a 和 b 的变量值确实被交换。但和 a、b 关联的两个变量并没有实际关联。这就像写有两座房子的卡片放在桌上一字摊开，交换两座房子的卡片后并不会对两座房子有任何影响。

Vmware虚拟机根盘扩容

发表于 2023-06-30 更新于 2024-01-13 阅读次数：

磁盘情况

查看扩容前的磁盘容量

[root@k8s-node2 ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 2.9G     0  2.9G   0% /dev
tmpfs                    2.9G     0  2.9G   0% /dev/shm
tmpfs                    2.9G  279M  2.7G  10% /run
tmpfs                    2.9G     0  2.9G   0% /sys/fs/cgroup
/dev/mapper/centos-root   17G   13G  4.5G  75% /   <-- 17G
/dev/sda1               1014M  187M  828M  19% /boot

查看磁盘分区情况

[root@k8s-node2 ~]# fdisk -l

Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors    <-- 107.4 GB
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000a6a43

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200    41943039    19921920   8e  Linux LVM

Disk /dev/mapper/centos-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

对扩容的磁盘分区操作

磁盘分区命令
1
fdisk /dev/sda
分区设置分区格式，在Fdisk命令处输入：t,分区号用默认 3（或回车），Hex代码输入：8e （代表适用Linux LVM分区类型）,最后写入分区表，在Fdisk命令位置输入：w
fdisk -l 查看我们新创建的dev/sda3分区了，分区格式为Linux LVM类型。
不重启的情况下重读分区,马上生效,格式化新增磁盘并分区
1
2
partprobe /dev/sda
mkfs.ext3 /dev/sda3

进入lvm中合并磁盘

#进入lvm
lvm
#初始化/dev/sda3
pvcreate /dev/sda3
#将新分区添加进系统默认的Volume group，centOS的默认Volume group为centos
vgextend centos /dev/sda3

#查看一下当前的Volume卷详情
vgdisplay -v
#将系统盘/dev/mapper/centos-root与sda3的5119空余容量合并，输入如下命令：
lvextend -l +20479 /dev/mapper/centos-root
quit

最后查看扩容及磁盘状态

1 2	#文件系统进行扩容，以让系统识别，输入如下命令（只适用于CentOS7） xfs_growfs /dev/mapper/centos-root

fdisk -l

查看系统容量
1
df -h

swiftUI笔记之WCDB实践

发表于 2023-06-18 更新于 2024-01-13 阅读次数：

idurrT

目标

为chatGpt聊天窗口添加支持本地持久化，操作方式分为：

1、新增聊天项，即在chat数据库增加数据（聊天项列表）

CREATE TABLE "main"."chat" (
  "sessionId" INTEGER,
  "chatId" TEXT,
  "content" TEXT,
  "isChatgpt" INTEGER,
  "createTime" REAL
);

2、新增会话，即在session数据库中增加数据（会话列表）

CREATE TABLE "main"."session" (
  "sessionId" INTEGER,
  "title" TEXT,
  "createTime" REAL
);

可以类比使用微信的过程，数据进行的本地化存储持久化。

实践

封装文件目录操作
我们先封装一下对文件目录的操作
QFileManage.swift

import Foundation
struct QFileManage {
    /// 创建文件夹
    static func createDirectory(at path: String) {
        let isExisted = FileManager.default.fileExists(atPath: path)
        guard !isExisted else { return }
        let url = URL(fileURLWithPath: path)
        do {
            try FileManager.default.createDirectory(at: url, withIntermediateDirectories: true, attributes: nil)
        } catch let error {
            debugPrint("创建文件夹失败！Path: \(path), Error: \(error.localizedDescription)")
        }
    }
}

extension QFileManage {
    /// 库目录
    static func libraryDirectory() -> String {
        return NSSearchPathForDirectoriesInDomains(.libraryDirectory, .userDomainMask, true).last!
    }
    
    /// 数据库目录
    static func databaseDirectory() -> String {
        let path = libraryDirectory() + "/database"
        createDirectory(at: path)
        return path
    }
    
    /// 文档目录
    static func documentsDirectory() -> String {
        return NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true).last!
    }
    
    /// 图片目录
    static func imagesDirectory() -> String {
        let path = documentsDirectory() + "/images"
        createDirectory(at: path)
        return path
    }
}

模型绑定
聊天模型：ChatDbModel.swift

import Foundation
import WCDBSwift

final class ChatDbModel: TableCodable {
    static var tableName: String { "chat" }
    
    var sessionId: Int64 = 0
    var chatId: String = ""
    var content: String = ""
    var isChatgpt: Bool = false
    var createTime: Date? = nil
    
    enum CodingKeys: String, CodingTableKey {
        typealias Root = ChatDbModel
        static let objectRelationalMapping = TableBinding(CodingKeys.self)
        
        case sessionId
        case chatId
        case content
        case isChatgpt
        case createTime
    }
    
    init(sessionId: Int64, chatId: String, content: String, isChatgpt: Bool, createTime: Date) {
        self.sessionId = sessionId
        self.chatId = chatId
        self.content = content
        self.isChatgpt = isChatgpt
        self.createTime = createTime
    }
}

会话模型：SessionDbModel.swift

import Foundation
import WCDBSwift

final class SessionDbModel: TableCodable {
    static var tableName: String { "session" }
    
    var sessionId: Int64 = 0
    var title: String? = nil
    var createTime: Date? = nil
    
    enum CodingKeys: String, CodingTableKey {
        typealias Root = SessionDbModel
        static let objectRelationalMapping = TableBinding(CodingKeys.self)
        case sessionId
        case title
        case createTime
    }
    
    init(sessionId: Int64, title: String? = nil, createTime: Date? = nil) {
        self.sessionId = sessionId
        self.title = title
        self.createTime = createTime
    }
}
extension SessionDbModel {
    static func insert(objects: [SessionDbModel]) {
        do {
            try db?.insert(objects, intoTable: SessionDbModel.tableName)
        } catch let error {
            debugPrint("插入session失败 ->\n\(error.localizedDescription)")
        }
    }
}

启动时，创建数据库

import SwiftUI
import WCDBSwift

var db: Database?

@main
struct tableViewApp: App {
    @AppStorage("appearance") var appearance: String = "system"
    var body: some Scene {
        WindowGroup {
            ContentView().preferredColorScheme(appearance == "system" ? nil : (appearance == "dark" ? .dark : .light))
        }
    }
    init() {
        // 创建数据库
        let path = QFileManage.databaseDirectory() + "/chatgpt.db"
        debugPrint("数据库路径：\(path)")
        db = Database(at:path)
        do {
            // 建表
            try db?.run(transaction: {_ in
                try db?.create(table: SessionDbModel.tableName, of: SessionDbModel.self)
                try db?.create(table: ChatDbModel.tableName, of: ChatDbModel.self)
            })
        } catch let error {
            debugPrint("创建数据库失败！Error: \(error.localizedDescription)")
        }
    }
}

操作数据库 TODO: viewModel方式操作

新增对话

1	SessionDbModel.insert(objects: [SessionDbModel(sessionId: 2, title: "111", createTime: Date())])