Ceph Cluster Raspian (English Version)

!
Warning: This post is over 365 days old. The information may be out of date.

Small howto to explain how to install Ceph with Raspian Stretch. Many tutorials exist but I didn’t find one which works well. So here is a small post to describe my installation and the encoutered difficulties.

What do you need

  • 3 Raspberry PI 3
  • 3 SD cards ( 8Go)
  • 5 USB keys (I missed the sixth 😉 )
  • USB charger
  • One switch
  • Network cables

Here is all the good stuff:

equipment

Deployed architecture

A Ceph cluster needs some components:

  • monitor : supervise the cluster’s health
  • osd : where the files are stocked
  • mds : usefull only for CephFS

We will not use CephFS, so we will not deploy mds components. You will find more informations in the official documentation: Doc

A best practice is to not install mon and osd on the same machine. But we only have 3 machines so we will not follow this advice. So we will have this architecture :

|Hostname | Function | | +++ | +++ | | Ceph01 | Admin/Monitor/OSD | | Ceph02 | Monitor/OSD | | Ceph03 | Monitor/OSD |

This setup is not the best but it will be enough for our tests. Please note that ceph01 have an admin function. This is due to the fact that we will deploy the cluster from this machine using ceph-deploy.

Setup your machines

About the Raspian installation, I refer you to my article (Installation Raspian ).

There are two very points :

  • your machines must be synchronized with NTP (mandatory for the ceph’s cluster and to establish a quorum)
  • all your machine’s name must be resolved. So you can install and configure a DNS server or fill the /etc/hosts file on all your machines.

Install Ceph

In order to deploy our cluster, we will use ceph-deploy.

The Stretch packages are too old ( version 0.94 soit Hammer : Ceph Release ).

The Ceph project doesn’t provide (not yet ??) packages for the armhf architecture. So we will grab them in testing.

  • Create /etc/apt/sources.list.d/testing.list :

    # echo 'deb http://mirrordirector.raspbian.org/raspbian/ testing main' > /etc/apt/sources.list.d/testing.list
    
  • Pin testing’s packages to avoid a full upgrade :

    # cat << EOF > /etc/apt/preferences.d/ceph
    	Package : *
    	Pin : release a=stable
    	Pin-Priority : 900
    
    	Package : *
    	Pin : release a=testing
    	Pin-Priority : 300
    EOF
    
  • Ceph’s packages in testing are injewel version. ceph-deploy isn’t packaged by Raspian, we will grab the package provided by Ceph :

    # echo 'deb http://download.ceph.com/debian-jewel/ stretch main' > /etc/apt/sources.list.d/ceph.list
    
  • Get the repository key:

    # wget -q -O - http://download.ceph.com/keys/release.asc | apt-key add -
    
  • Install all the packages. Be carefull, there is a trap so install them in order (or else your cluster will not work)

    # apt-get install libleveldb1v5 ceph-deploy btrfs-progs
    
    # apt-get install -t testing ceph rbd-nbd
    

Here are the traps :

* Install the good version of *libleveldb1v5* or else Ceph tools will not work
* Install *ceph-deploy* **before** *ceph* to avoid dependencies problems with python's packages.
* Install *btrfs-progs* to format OSD with *btrfs*. It's not mandatory, *XFS* is preferred
* Install *rbd-nbd* because the *rbd* module doesn't exist on Raspbian

Deploy Ceph

Now that Ceph is install on all your machines, we will deploy the cluster using ceph-deploy. For this, we need to follow these prerequisites :

  • Use a dedicated account on each machine. We will give the ceph user created by the ceph’s package.

  • This account must have a full sudo access

  • The admin (ceph01) machine must connect to each other by ssh without password with the ceph account.

  • Create a file to give sudo access to ceph:

    # echo 'ceph ALL = (root) NOPASSWD:ALL' > /etc/sudoers.d/ceph
    
  • On ceph01 (our admin machine), log in as ceph et generate a ssh key :

    # su -s /bin/bash - ceph 
    $ ssh-keygen
    
  • Now change the shell of ceph user on all the machines :

    # chsh -s /bin/bash ceph
    

This will allow the admin machine to connect on each other with ssh (even on itself).

  • As ceph user, copy the ssh key on all machines (even on itself) :

    # su -s /bin/bash - ceph
    $ for h in ceph01 ceph02 ceph03 ; do ssh-copy-id ${h} ; done
    
  • As ceph user, we will create a work dir for ceph-deploy:

    $ mkdir ceph-deploy && cd ceph-deploy
    $ ceph-deploy new --public-network 192.168.1.0/25 ceph01 ceph02 ceph03 # Of course, adapt the names and the network
    $ ceph-deploy mon create-initial # Deploy *mon* on all the machines
    $ ceph admin ceph01 ceph02 ceph03 # Copy conf on all machines
    

From there, you should have a functionnal cluster but without OSD (so cluster’s health at HEALTH_ERR): bash $ ceph -s Now, we need to add OSD to our cluster. For it we will use our usb keys like this: * ceph01 : 2 keys ( /dev/sda and /dev/sdb ) * ceph02 : 2 keys ( /dev/sda and /dev/sdb ) * ceph03 : 1 key ( /dev/sda )

We will initialize our keys (still as ceph user): bash $ ceph-deploy disk zap ceph01:sda ceph01:sdb ceph02:sda ceph02:sdb ceph03:sda

One initialized, we will format them. I choose BTRFS, but it’s not mandatory. By default it will be XFS : bash $ ceph-deploy osd prepare --fs-type btrfs ceph01:sda ceph01:sdb ceph02:sda ceph02:sdb ceph03:sda

This command will create two partitions on each key. One for the data and one for the journal.

  • Then we activate them:

    $ ceph-deploy osd activate ceph01:/dev/sda1:/dev/sda2 ceph01:/dev/sdb1:/dev/sdb2 ceph02:/dev/sda1:/dev/sda2 ceph02:/dev/sdb1:/dev/sdb2 ceph03:/dev/sda1:/dev/sda2
    
  • Now our cluster should be up and in a good shape:

    $ ceph -s
    cluster 2a6de943-36d5-40bb-8c16-fb39b71846c0
     health HEALTH_OK
     monmap e2: 3 mons at {ceph01=192.168.1.37:6789/0,ceph02=192.168.1.38:6789/0,ceph03=192.168.1.39:6789/0}
            election epoch 68, quorum 0,1,2 ceph01,ceph02,ceph03
     osdmap e90: 5 osds: 5 up, 5 in
            flags sortbitwise,require_jewel_osds
      pgmap v16247: 64 pgs, 1 pools, 0 bytes data, 1 objects
            1169 MB used, 45878 MB / 48307 MB avail
                  64 active+clean
    client io 13141 B/s rd, 1812 B/s wr, 15 op/s rd, 43 op/s wr
    
  • Finish:

    • In the work dir of ceph-deploy (usually /var/lib/ceph/ceph-deploy), you will find a ceph.conf file. We need to add these 2 lines:

    [client] admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok ``` And then, do (as user ceph) :

      ```bash
    

    $ ceph admin ceph01 ceph02 ceph03 # on redéploie la conf ``` It’s to avoid socket’s conflict.

    • By default (??), the mon service is not activated by systemd. So if you reboot your machines, the cluster will stop work. We must enable it on each machine:

    systemctl enable ceph-mon.target

      ```
    
  • Change back the shell for the ceph user (on each machine):

    # chsh -s /bin/false ceph
    

And now ??

We will verify that everthing works well. As said before, the rbd module doesn’t exist on Raspbian so we will use the rbd-nbd package.

  • View pools: bash # ceph osd lspools 0, rbd By default, rbd use the rbd pool

  • Create a new pool:

     ```bash
     # ceph osd pool create containers 256
     ```
    
  • Create an “objet”:

    # rbd create -p containers --size 3G test
    
  • Check if everything is ok :

    # rbd -p containers ls
    test
    # rbd -p containers info test
    rbd image 'test':
    size 3072 MB in 768 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.31d6a2ae8944a
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    flags:
    
  • Map it :

    # rbd-nbd map containers/test
    /dev/nbd0
    
  • Checks:

    Use fdisk :

    # fdisk -l /dev/nbd0
    Disk /dev/nbd0: 3 GiB, 3221225472 bytes, 6291456 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    

    Format it:

    # mkfs.btrfs /dev/nbd0
    btrfs-progs v4.7.3
    

See http://btrfs.wiki.kernel.org for more information. Detected a SSD, turning off metadata duplication. Mkfs with -m dup if you want to force metadata duplication. Performing full device TRIM (1.00GiB) … Label: (null) UUID:
Node size: 16384 Sector size: 4096 Filesystem size: 3.00GiB Block group profiles: Data: single 8.00MiB Metadata: single 8.00MiB System: single 4.00MiB SSD detected: yes Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 3.00GiB /dev/nbd0 ```

Mount it and write:

```bash
# mount /dev/nbd0 /mnt && cd /mnt && echo test > test 
```
  • Show the mapped devices :

    # rbd-nbd list-mapped
    /dev/nbd0
    
  • Delete a device:

    # rbd-nbd unmap /dev/nbd0 && rbd rm test
    rbd-nbd: the device is not used
    Removing image: 100% complete...done.
    

Related Posts