Ceph RBD

2024-07-04 468 0

RBD块存储

块存储是存储区域网络中使用的一个数据存储类型。在这种类型中,数据以块的形式存储在卷里,卷会挂载到节点上。可以为应用程序提供更大的存储容量,并且可靠性和性能都更高。

RBD协议,也就是Ceph块设备 (Ceph Block Device)。RBD除了可靠性和性能之外,还支持完整和增量式快照,精简的配置,写时复制(copy-on-write)式克隆。并且支持全内存式缓存。

目前CEPH RBD支持的最大镜像为16EB,镜像可以直接作为磁盘映射到物理裸机,虚拟机或者其他主机使用,KVM和Xen完全支持RBD,VMware等云厂商也支持RBD模式

创建资源池Pool

ceph osd pool create sunday 64 64

# sunday 为pool名称
# pg为64个 (pg和pgp数量需要一致)
# pgp为64个
# 没指定副本数,默认为三个副本

查看pool

[root@ceph01 ceph]# ceph osd lspools
1 sunday

查看 sunday pool 副本数
默认情况下为3个副本数 保证高可用

[root@ceph01 ceph]# ceph osd pool get sunday size
size: 3

这里也可以按需 修改副本数

[root@ceph01 ceph]# ceph osd pool set sunday size 2
set pool 1 size to 2
[root@ceph01 ceph]# ceph osd pool get sunday size
size: 2

RBD创建和映射

在创建镜像前我们还需要修改一下features值

在Centos7内核上,rbd很多特性都不兼容,目前3.0内核仅支持layering。所以我们需要删除其他特性

  • layering: 支持分层
  • striping: 支持条带化 v2
  • exclusive-lock: 支持独占锁
  • object-map: 支持对象映射(依赖 exclusive-lock)
  • fast-diff: 快速计算差异(依赖 object-map)
  • deep-flatten: 支持快照扁平化操作
  • journaling: 支持记录 IO 操作(依赖独占锁)

关闭不支持的特性一种是通过命令的方式修改,还有一种是在ceph.conf中添加rbd_default_features = 1来设置默认 features(数值仅是 layering 对应的 bit 码所对应的整数值)。

cd /etc/ceph/
echo "rbd_default_features = 1" >>ceph.conf
ceph-deploy --overwrite-conf config push ceph01 ceph02 ceph03

#当然也在rbd创建后手动删除,这种方式设置是临时性,一旦image删除或者创建新的image 时,还会恢复默认值。`
rbd feature disable sunady/sunday-rbd.img deep-flatten`
rbd feature disable sunady/sunday-rbd.img fast-diff`
rbd feature disable sunady/sunday-rbd.img object-map`
rbd feature disable sunady/sunday-rbd.img exclusive-lock`

RBD创建例子

rbd create -p sunday --image sunday-rbd.img --size 15G
#rbd create sunday/sunday-rbd.img --size 15G # 简写

查看RBD

[root@ceph01 ceph]# rbd -p sunday ls
sunday-rbd.img

删除RBD

[root@ceph01 ceph]# rbd rm sunday/sunday-rbd.img
[root@ceph01 ceph]# rbd info -p sunday --image sunday-rbd.img
[root@ceph01 ceph]# rbd info sunday/sunday-rbd.img
rbd image 'sunday-rbd.img':
    size 15 GiB in 3840 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 126f1be88a69
    block_name_prefix: rbd_data.126f1be88a69
    format: 2
    features: layering
    op_features: 
    flags: 
    create_timestamp: Tue Apr 16 12:34:10 2024
    access_timestamp: Tue Apr 16 12:34:10 2024
    modify_timestamp: Tue Apr 16 12:34:10 2024

块文件挂载
接下来我们要进行rbd的挂载 (这里不建议分区,如果分区,后续扩容比较麻烦,容易存在丢数据的情况。在分区不够用的情况下建议多块rbd)

[root@ceph01 ceph]# rbd map sunday/sunday-rbd.img
/dev/rbd0

[root@ceph01 ceph]# rbd device list
id pool   namespace image          snap device    
0  sunday           sunday-rbd.img -    /dev/rbd0

fdisk 可以看到/dev/rbd0 16GB

[root@ceph01 ceph]# fdisk  -l

Disk /dev/sda: 36.5 GB, 36507222016 bytes, 71303168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000eb3ad

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1050623      524288   83  Linux
/dev/sda2         1050624    71303167    35126272   83  Linux

Disk /dev/sdb: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/ceph--207f99ae--521e--421d--89ae--9dc47dede9a3-osd--block--89a69449--9021--428f--a70c--a8840a61d401: 107.4 GB, 107369988096 bytes, 209707008 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/rbd0: 16.1 GB, 16106127360 bytes, 31457280 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes

这里不建议进行分区

直接格式化 并挂载

[root@ceph01 ceph]# mkfs.ext4 /dev/rbd0
[root@ceph01 ceph]# mount /mnt/rbd0
[root@ceph01 ceph]# mount /dev/rbd0 /mnt/rbd0
[root@ceph01 ceph]# echo "123" >  /mnt/rbd0/sunday.txt
[root@ceph01 ceph]# ls -l /mnt/rbd0
total 20
drwx------ 2 root root 16384 Apr 16 12:41 lost+found
-rw-r--r-- 1 root root     4 Apr 16 12:42 sunday.txt

RBD扩容

这里将sunday-rbd 大小为15G扩容到20G

[root@ceph01 ceph]# rbd info sunday/sunday-rbd.img
rbd image 'sunday-rbd.img':
    size 15 GiB in 3840 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 126f1be88a69
    block_name_prefix: rbd_data.126f1be88a69
    format: 2
    features: layering
    op_features: 
    flags: 
    create_timestamp: Tue Apr 16 12:34:10 2024
    access_timestamp: Tue Apr 16 12:34:10 2024
    modify_timestamp: Tue Apr 16 12:34:10 2024
[root@ceph01 ceph]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        1.9G     0  1.9G   0% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           1.9G   12M  1.9G   1% /run
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda2        34G  2.0G   32G   6% /
/dev/sda1       488M  113M  340M  25% /boot
tmpfs           1.9G   52K  1.9G   1% /var/lib/ceph/osd/ceph-0
tmpfs           378M     0  378M   0% /run/user/0
/dev/rbd0        15G   41M   14G   1% /mnt

rbd扩容命令

[root@ceph01 ceph]# rbd resize sunday/sunday-rbd.img  --size 20G
Resizing image: 100% complete...done.

rbd扩容完成 但文件系统还未识别扩容

[root@ceph01 ceph]# rbd resize sunday/sunday-rbd.img  --size 20G
Resizing image: 100% complete...done.
[root@ceph01 ceph]# rbd info sunday/sunday-rbd.img
rbd image 'sunday-rbd.img':
    size 20 GiB in 5120 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 126f1be88a69
    block_name_prefix: rbd_data.126f1be88a69
    format: 2
    features: layering
    op_features: 
    flags: 
    create_timestamp: Tue Apr 16 12:34:10 2024
    access_timestamp: Tue Apr 16 12:34:10 2024
    modify_timestamp: Tue Apr 16 12:34:10 2024
    ```
使用resize2fs 文件系统扩容识别
```bash
[root@ceph01 ceph]# df -h | grep "/dev/rbd0"
/dev/rbd0        15G   41M   14G   1% /mnt

[root@ceph01 ceph]# resize2fs /dev/rbd0
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/rbd0 is mounted on /mnt; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 3
The filesystem on /dev/rbd0 is now 5242880 blocks long.

[root@ceph01 ceph]# df -h | grep "/dev/rbd0"
/dev/rbd0        20G   44M   19G   1% /mnt

扩容一般会涉及三方面的内容:
1.底层存储(rbd resize)
2.磁盘分区的扩容 (例如mbr分区)
3.Linux文件系统的扩容
所以不建议在rbd块设备进行分区

Ceph 警告处理

当我们osd有数据写入时,我们在查看ceph集群。发现ceph集群目前有警告这时候我们就需要处理这些警告

[root@ceph01 ceph]# ceph -s
  cluster:
    id:     ed040fb0-fa20-456a-a9f0-c9a96cdf089e
    health: HEALTH_WARN
            application not enabled on 1 pool(s)

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 25h)
    mgr: ceph01(active, since 24h), standbys: ceph02, ceph03
    osd: 3 osds: 3 up (since 24h), 3 in (since 24h)

  data:
    pools:   1 pools, 64 pgs
    objects: 83 objects, 221 MiB
    usage:   3.7 GiB used, 296 GiB / 300 GiB avail
    pgs:     64 active+clean

当我们创建pool资源池后,必须制定它使用ceph应用的类型 (ceph块设备、ceph对象网关、ceph文件系统)
如果我们不指定类型,集群health会提示HEALTH_WARN

查看ceph健康详情的信息

[root@ceph01 ceph]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'sunday'
    use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.

接下来我们将这个pool资源池进行分类,将sunday pool标示为rbd类型

[root@ceph01 ceph]# ceph osd pool application enable sunday rbd
enabled application 'rbd' on pool 'sunday'

[root@ceph01 ceph]# ceph osd pool application get sunday
{
    "rbd": {}
}

[root@ceph01 ceph]# ceph health detail
HEALTH_OK

当我们初始化后,rbd会将我们的pool修改为rbd格式。 健康状态自然就不会报错
设置完毕后,我们通过下面的命令可以看到pool目前的类型属于rbd类型

若处理完application告警,我们继续查看ceph健康信息

[root@ceph01 ceph]# ceph -s
  cluster:
    id:     ed040fb0-fa20-456a-a9f0-c9a96cdf089e
    health: HEALTH_WARN
            2 daemons have recently crashed

[root@ceph01 ceph]# ceph health detail
HEALTH_WARN 2 daemons have recently crashed; too many PGs per OSD (672 > max 250)
RECENT_CRASH 2 daemons have recently crashed
    mon.ceph03 crashed on host ceph03 at 2024-04-24 05:52:10.099517Z
    mon.ceph02 crashed on host ceph02 at 2024-04-24 05:52:14.566149Z

这里我们发现monitor ceph01节点有告警,大致意思是ceph01节点有一个crashed守护进程崩溃了。

官方解释如下

One or more Ceph daemons has crashed recently, and the crash has not yet been archived (acknowledged) by the administrator. This may indicate a software bug, a hardware problem (e.g., a failing disk), or some other problem.
一个或多个Ceph守护进程最近崩溃,管理员尚未存档(确认)崩溃。这可能表示软件错误、硬件问题(例如,磁盘故障)或其他问题。

这个报错并不影响我们,我们可以通过下面的命令看到crashed进程 (只要我们其他组件都是正常的,那么这一条就多半是误报。生产环境中处理这个故障还是要根据实际情况进行处理,不可以盲目的删除告警)

[root@ceph01 ceph]# ceph crash ls-new
ID                                                               ENTITY     NEW 
2024-04-24_05:52:10.099517Z_b19c3b1a-0f27-4424-90dd-d78a3a460d36 mon.ceph03  *  
2024-04-24_05:52:14.566149Z_6eac1e20-e4e2-4bc8-b342-5e1c891a32c2 mon.ceph02  * 

同时还可以使用ceph crash info [ID]查看进程详细信息

那么如何处理这个警告呢
第一种方法 (适合处理单个告警)

[root@ceph01 ~]# ceph crash  archive <ID>

第二种方法 (将所有的crashed打包归档)

[root@ceph01 ~]# ceph crash archive-all
[root@ceph01 ~]# ceph crash ls-new

我们再次查看状态就已经恢复

[root@ceph01 ~]#  ceph -s

故障处理

[root@ceph01 ceph]# ceph -s
  cluster:
    id:     ed040fb0-fa20-456a-a9f0-c9a96cdf089e
    health: HEALTH_WARN
            too many PGs per OSD (672 > max 250)
            1/3 mons down, quorum ceph01,ceph03

[root@ceph01 ceph]# vim /etc/ceph/ceph.conf 
[global]
...
# 添加
mon_max_pg_per_osd = 0

[root@ceph01 ceph]# ceph-deploy --overwrite-conf config push ceph01 ceph02 ceph03

[root@ceph01 ceph]# export HOSTS="ceph01 ceph02 ceph03"
[root@ceph01 ceph]# for i in $HOSTS; do ssh $i "systemctl restart ceph-mgr.target";done

自 ceph版本Luminous v12.2.x以后,参数mon_pg_warn_max_per_osd变更为mon_max_pg_per_osd,默认值也从300变更为200,修改该参数后,也由原来的重启ceph-mon.target服务变为重启ceph-mgr.target服务。
https://www.jianshu.com/p/f2b20a175702

相关文章

Ceph RGW及S3接口操作
CephFS 文件系统
Ceph osd 命令
Ceph Prometheus监控
Ceph 限额配置
Ceph RBD 删除不了处理过程

发布评论