Overview

NodeFabric Host Image is a modular system that contains NodeFabric Core Layer and “hosted” services - like MariaDB-Galera database and Ceph storage. Supported modules (ie included services) are delivered as Docker containers and NodeFabric is essentially a Docker Host providing integration and coordination layer for them.

There are currently two types of NodeFabric Host Images being released:

  • downloadable CentOS 7 based image build (in QCow2, VMDK, VHDX, VirtualBox OVA/VDI and Parallels PVM image output formats)
  • RedHat Enterprise Linux based AMI available from Amazon EC2 cloud Marketplace

Docker containers that are included in the NodeFabric Host Image build:

  • nf-consul, nf-registrator, nf-haproxy – which are part of NodeFabric Core Layer services
  • nf-galera implementing MariaDB-Galera service
  • nf-ceph-mon (ceph cluster monitor) , nf-ceph-mds (ie CephFS) for Ceph storage services

The following diagram provider high-level architecture overview for the modular NodeFabric Host system:

NodeFabric architecture overview

NodeFabric Core Layer

This is the highly available integration and coordination layer – based on Consul, Registrator and HAProxy. It implements distributed cluster state database and manages internal service endpoints - driven by service discovery and built-in health checks. Inter-service communication can happen over these fault tolerant and load balanced localhost-like service endpoints.

More about the Core Layer modules/containers and their roles:

About MariaDB-Galera service

MariaDB-Galera Cluster is a synchronous multi-master database cluster - an enhanced, drop-in replacement for MySQL available under GPL v2 license. It’s developed by the MariaDB community with the MariaDB Foundation as its main steward.

MariaDB is a community-developed fork of the MySQL relational database management system and it is kept up to date with the latest MySQL release from the same branch and in most respects MariaDB will work exactly as MySQL. Being a fork of a leading open source software system, it is notable for being led by the original developers of MySQL, who forked it due to concerns over its acquisition by Oracle. All commands, interfaces, libraries and APIs that exist in MySQL also exist in MariaDB. There is no need to convert databases to switch to MariaDB.

More info about MariaDB-Galera can be found here: https://mariadb.com/kb/en/mariadb/what-is-mariadb-galera-cluster/

About Ceph storage services

Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability. Ceph aims primarily to be completely distributed without a single point of failure, scalable to the exabyte level. Ceph replicates data and makes it fault-tolerant, using commodity hardware and requiring no specific hardware support. As a result of its design, the system is both self-healing and self-managing, aiming to minimize administration time and other costs.

More info about Ceph can be found here:

Currently its block-level (RDB) and file-level (CephFS) interfaces are supported and available in NodeFabric. Althou Ceph monitors and metadata daemons are run within Docker containers, OSDs (Object Storage Daemons) are not. These are run directly in the host OS context - one OSD per each underlying Ceph data disk device. You need to provide and attach dedicated block devices to NodeFabric VM/host nodes that will be initialized as Ceph data disks at later stage. However you can decide on exact block devices attachment distribution (which disks to which hosts) and you can have multiple disks (and OSDs) on each and every NodeFabric node.

About Docker, CentOS and RHEL

NodeFabric Host Images utilize Docker linux containers technology for achieving modular and expandable architecture. User defined or third-party services can be loaded as additional docker containers and integrated with the NodeFabric Core Layer.

CentOS is a stable Docker Host platform derived from the sources of Red Hat Enterprise Linux (RHEL). NodeFabric Host Image itself is a slightly customized CentOS Docker Host build – adding NodeFabric docker containers and Core Layer rpm packages – while NodeFabric AMI is based on original RedHat Enterprise Linux distribution.

References:

Deploy

NodeFabric is distributed as a prebuilt VM (or bare-metal) host image – which is used to deploy NodeFabric cluster nodes. As we are using quorum based clustering approach - total of 3 or 5 nodes are required to be deployed for successful operation. Exact cluster node count is depending on desired fault-tolerance factor - which can be 1 or 2 respectively.

There are two different NodeFabric Host Image builds released:

Current deployment targets supported are: Amazon EC2, Openstack, VMWare, KVM, Parallels Desktop, VirtualBox, Hyper-V and bare-metal.

In order to bootstrap NodeFabric cluster there are two options to choose from:

  • zero-configuration “Boot-and-Go” mode (which requires cloud user-data)
  • manual bootstrap procedure (ie supplying cluster hostmap and minimal config options)

Requirements and recommendations

General requirements:

  • 3 or 5 cluster nodes - either VMs or bare-metal hosts
  • at least 1GB of RAM per node
  • at least 10GB dedicated disk device per node for OS root
  • at least 64GB dedicated disk device per node for Ceph OSD data
  • at least 1x1Gbit network interface

Recommended cluster setup:

  • 3 cluster nodes (for single node fault tolerance)
  • 4GB or more RAM per node
  • 32GB OS root disk
  • 1x146GB or more Ceph data disks per each node (more and larger disks are always better, SSDs highly recommended for improved perfomance)
  • 10Gbit or Infiniband network fabric recommended for better perfomance (especially beneficial for Ceph)
  • external load-balancer for services that need to be published for remote consumers

Note

Depending on your deployment target you could use external load-balancers available in AWS, Openstack or in VMWare vSphere

Note

If you need higher fault tolerance factor than 1 – then you need to deploy 5-node cluster (for FT=2 and sacrificing MariaDB-Galera write speed)

Note

5-node clusters are EXPERIMENTAL at the moment!

User-data

Note

cloud-init is only valid for AWS AMI and nf-centos7-cloud.qcow2 images! Other (ie hypervisor) images do include default user account: “centos:changeme”.

NodeFabric Host Images targeted for cloud deployments can take advantage of config metadata (ie user-data) – in the cloud environments where it is available and supplied at boot time. It uses standard cloud-init package (for setting login ssh key / password, etc) together with custom nodefabric-cloudinit script (for NF specific options). User-data is used mainly for 2 things:

  • activating instances ssh login credentials
  • enabling “Boot-and-Go” mode for zero-configuration Core Layer bootstrap

Here is the full list of supported user-data (key=value based) options understood by nodefabric-cloudinit script:

Parameter Description
ATLAS_TOKEN Atlas token string (required for Boot-and-Go mode)
ATLAS_ENVNAME Environment name (required for Boot-and-Go mode)
NODENAME Supply your predefined hostname (optional)
SHARED_SECRET Consul Serf shared key (optional)
BOOTSTRAP_EXPECT Override initial cluster size - which is 3 by default (optional)

Note

ATLAS_TOKEN can be obtained from: https://atlas.hashicorp.com/

Note

ATLAS_ENVNAME must be in the following format: <your_atlas_username>/<desired_deployment_name> (ie jdunlop/my-cluster). Environment itself will be auto-created in ATLAS when first node auto-registers with the service during boot-up.

Note

SHARED_SECRET can be generated as: ‘openssl rand -base64 16’

Note

Set BOOTSTRAP_EXPECT=5 when bootstrapping 5-node clusters

Note

Current version of nodefabric-cloudinit script parses supported options from: http://169.254.169.254/latest/user-data

Obtaining ATLAS_TOKEN

For creating an ATLAS token please do the following:

ATLAS token generator

Pre-flight check

  • You have suitable NodeFabric Host Image to boot from (either downloaded VM/host image or AMI ID for desired Amazon EC2 region)
  • ATLAS_TOKEN (optional) - required for Core Layer remote auto-bootstrap service
  • ATLAS_ENVNAME (optional) - required for Core Layer remote auto-bootstrap service
  • SHARED_SECRET (optional) - required for Core Layer inter-communication encryption
  • your ssh keypair (required for cloud deployments) - for activating ssh login

Amazon EC2

Redhat Enterprise Linux based NodeFabric AMI is available from Amazon EC2 Marketplace (AWSMP). It’s an EBS backed HVM AMI. You can deploy node instances by using AWS EC2 console (method #1, recommeded) OR directly from AWSMP NodeFabric product page (method #2).

EC2 console method is the recommended option for NF AWS deployments - as it’s launch wizard supports instance user-data input, additional storage configuration and launching multiple instances in one go. The benefit from the alternative AWSMP 1-Click deployment method is that it supplies you with auto-generated security group.

Here is the example deployment diagram for AWS EC2 (spanning over multiple Availability Zones):

Example Amazon EC2 deployment within multiple Availability Zones

Method #2: 1-Click Launch from Marketplace

AWS Marketplace NodeFabric product page can be found here: https://aws.amazon.com/marketplace/pp/B015WKQZOM

AWS MP NodeFabric product page

Hint: Click “Continue” button on product page :-)

AWS MP 1-Click Launch landing page

Note

First goto “VPC Settings” and create/select VPC instead of EC2 Classic - before picking instance flavor!

The reason behind this is that EC2 Classic instances won’t preserve its internal subnet IPs after instance has been shut down. NodeFabric is a clustered solution - so it kind of depends on internal IPs being static - after it has been bootstrapped. NodeFabric will still work in EC2 Classic - but if you shut down one of the cluster nodes and its internal IP changes after that - it will be re-joining cluster as brand new node. So choosing VPC over EC2 Classic is highly recommended!

AWS MP 1-Click VPC Settings

Note

Once you select VPC instead of EC2 Classic you get whole different list of available instance flavors as well!

AWS MP 1-Click VPC instance flavors

Note

Select AWSMP autogenerated Security Group which already comes with the suitable ruleset

AWS MP 1-Click Security Group

Now “Launch with 1-Click” and you are done! Well ... not really. You have to repeat this process for 2 more times in order to deploy total of 3 NodeFabric instances (in 3 separate Availability Zones perhaps). Also you would need to add volumes to the deployed instances for Ceph data disks at later stage.

Openstack

TODO

# Set NodeFabric image version to download
NF_VERSION="0.4.3"
# Download image
curl -L -O http://downloads.sourceforge.net/project/opennode/NodeFabric/nf-centos7-cloud-${NF_VERSION}.qcow2


# Loading image to Glance catalog
glance image-create --name="NodeFabric-${NF_VERSION}" --is-public=true \
    --min-disk 10 --min-ram 1024 --progress \
    --container-format=bare --disk-format=qcow2 \
    --file nf-centos7-cloud-${NF_VERSION}.qcow2

VMWare

TODO

# Set NodeFabric image version to download
NF_VERSION="0.4.3"
# Download image
curl -L -O http://downloads.sourceforge.net/project/opennode/NodeFabric/nf-centos7-${NF_VERSION}.vmdk.gz

# Unpack image
gunzip nf-centos7-${NF_VERSION}.vmdk.gz

Libvirt KVM

# Set NodeFabric image version to download
NF_VERSION="0.4.3"
# Download image
curl -L -O http://downloads.sourceforge.net/project/opennode/NodeFabric/nf-centos7-bare-${NF_VERSION}.qcow2

# Clone under libvirt disk images location for ALL cluster nodes
for in `seq 1 3`; do rsync -av --progress nf-centos7-bare-${NF_VERSION}.qcow2 /var/lib/libvirt/images/nf-node${i}.qcow2; done

# Launch node1
virt-install \
--name=nf-node1 --memory=1024 --vcpus=1 \
    --disk=/var/lib/libvirt/images/nf-node1.qcow2,device=disk,bus=virtio \
    --noautoconsole --vnc --accelerate --os-type=linux --os-variant=rhel7 --import

# Launch node2
virt-install \
--name=nf-node2 --memory=1024 --vcpus=1 \
    --disk=/var/lib/libvirt/images/nf-node2.qcow2,device=disk,bus=virtio \
    --noautoconsole --vnc --accelerate --os-type=linux --os-variant=rhel7 --import

# Launch node3
virt-install \
--name=nf-node3 --memory=1024 --vcpus=1 \
    --disk=/var/lib/libvirt/images/nf-node3.qcow2,device=disk,bus=virtio \
    --noautoconsole --vnc --accelerate --os-type=linux --os-variant=rhel7 --import

Parallels Desktop

TODO

# Set NodeFabric image version to download
NF_VERSION="0.4.3"
# Download image
curl -L -O http://downloads.sourceforge.net/project/opennode/NodeFabric/nf-centos7-${NF_VERSION}.pvm.tgz

# Unpack image
gunzip nf-centos7-${NF_VERSION}.pvm.tgz

VirtualBox

TODO

# Set NodeFabric image version to download
NF_VERSION="0.4.3"
# Download image
curl -L -O http://downloads.sourceforge.net/project/opennode/NodeFabric/nf-centos7-${NF_VERSION}.ova

Bare metal

TODO

# Set NodeFabric image version to download
NF_VERSION="0.4.3"

# Set target disk device
BLKDEV="/dev/sdb"
# Download image
curl -L -O http://downloads.sourceforge.net/project/opennode/NodeFabric/nf-centos7-bare-${NF_VERSION}.qcow2

# Write image to physical disk device
qemu-img convert nf-centos7-bare-${NF_VERSION}.qcow2 -O raw $BLKDEV

Access

NodeFabric nodes/instances should be accessed over SSH connection for management, configuration and manual bootstrapping purposes. There are also local and remote web-based status dashboards available - more details about these are presented in the “Management” chapter.

SSH login

Note

Hypervisor images have built-in “centos:changeme” account

Note

Cloud images utilize cloud-init (ie user-data) mechanism for enabling ssh login keys under centos (or ec2-user for AMI) username

Node/instance default SSH login is “centos:changeme” – but for cloud images (ie for AWS and Openstack) ssh login keys are activated through cloud-init method.

Exact details how you need to supply your SSH public key differ between target cloud environments:

  • in case of AWS EC2 you have to create your ssh keypair in EC2 console
  • in case of Openstack you have to setup your ssh keypair through Horizon UI or nova cli

The following shell commands might be helpful in order to connect to deployed NodeFabric instances:

# Set node IP to connect to
NODE_IP="10.211.55.100" # replace this example IP with yours

# Set login username
NODE_USER="centos" # OR ec2-user for AWS

# Set to your login private key path
KEY_PATH="~/.ssh/id_rsa"

# Connect with your key
ssh -i ${KEY_PATH} ${NODE_USER}@${NODE_IP}

Note

You can set root user password and switch to root user priviledged environment by running the following commands:

# setting root password
sudo passwd

# switching to root user environment
su - root

Firewall ports

NodeFabric open network ports can be divided into 3 separate access zones: localhost only, LAN only and WAN/remote access. Enabling ICMP (ie ping) within LAN zone is highly recommended for diagnostic purposes. Management and internal dashboards access should be done over SSH connection (using port forwarding where necessary). Outgoing public internet connection is required for optional ATLAS cluster auto-join and remote dashboard services.

Zone: localhost

Service port(s) proto comments
Consul CLI RPC 8400 tcp  
Consul HTTP API & UI 8500 tcp Access UI through ssh pf
Consul DNS 8600 tcp/udp  
HAProxy UI 48080 tcp Access through ssh pf

Zone: LAN

Service port(s) proto comments
Consul RPC 8300 tcp  
Consul SERF 8301 tcp/udp  
MariaDB SQL 3306 tcp  
Galera SST 4444 tcp  
Galera WSREP 4567 tcp/udp  
Galera IST 4568 tcp  
Ceph MON 6789 tcp  
Ceph OSDs & MDS 6800:7300 tcp  

Zone: WAN/remote access

Service port(s) proto comments
SSH 22 tcp Could be limited to LAN only
Consul WAN gossip 8302 tcp/udp IF remote DCs are enabled

Bootstrap

Each service that NodeFabric provides (ie Core Layer, MariaDB-Galera and Ceph currently) has to be initialized at first - which is a one-time operation. However all services are later capable of repeatable (and non-destructive) automated bootstrapping – even if they loose quorum (ie in case of full cluster/service nodes shutdown/reboot).

Service auto-init (one-time) auto-bootstrap (repeatable)
NF Core Layer yes (with ATLAS) yes
MariaDB-Galera no yes
Ceph MON no yes
Ceph MDS no yes

NF Core Layer

Note

Each node must have its unique FQDN hostname set - otherwise nf-consul service container will refuse to start!

When NodeFabric nodes boot-up first time - they need to join and form the Consul cluster. For joining the cluster together each node must have it’s own FQDN hostname set and it needs to know about the other nodes participating - ie how to connect with other nodes (ie having cluster hostmap). There are two supported methods for initializing cluster hostmap:

  • by using remote auto-join mode with Hashicorp ATLAS public service (strictly optional but very convinient - hence recommended)
  • by editing /etc/nodefabric/nodefabric.hostmap config file manually (on ALL nodes)

Setup node FQDN hostname (IMPORTANT)

Depending on target environment there are three different cases:

  • in case of AWS node hostnames will be set by default already (using VPC LAN ip as a hostname) - optionally it is possible to supply custom hostname through user-data (ie NODENAME=node1.example.com)
  • in case of Openstack please set VM hostname by supplying NODENAME=node1.example.com as part of user-data during VM launch
  • in case of non-cloud deployments please login to node shell and set hostname manually - by following this recipe:
# NB! You must update also HOSTNAME environment variable - as it is used in scripts!
export HOSTNAME=node1.nf.int
hostnamectl set-hostname $HOSTNAME

# verify
echo $HOSTNAME && hostnamectl

Activating remote auto-join mode

Hashicorp ATLAS service can be used for NF Core Layer remote auto-join. Main benefit here is that you don’t need to know about node internal IPs for constructing the initial nodefabric hostmap - as this data will be collected and spread automagically by ATLAS remote service. ATLAS also adds remote web-based status dashboard as a bonus. Hashicorp does offer free-tier ATLAS service plans in order to get started.

For activating this remote auto-join mode within NodeFabric Host Image you have 2 possible options:

  • either by supplying ATLAS_TOKEN and ATLAS_ENVNAME key-value pairs through cloud user-data at boot time (for each node)
  • or by manually editing /etc/nodefabric/conf.d/nf-consul.conf file directly (after node has booted up) – and providing ATLAS_TOKEN together with desired ATLAS_ENVNAME there (on ALL nodes)

Example manually edited /etc/nodefabric/conf.d/nf-consul.conf file should look like this (replace CONSUL_ATLAS_TOKEN and CONSUL_ATLAS_ENVNAME values with yours):

### CONSUL CONFIG ###
CONSUL_INSTANCE="nf-consul"
CONSUL_IMAGE="opennode/nf-consul"
CONSUL_DATADIR="/var/lib/consul"
CONSUL_CONFDIR="/etc/nodefabric/files.d/consul/config"
CONSUL_EXECDIR="/etc/nodefabric/files.d/consul/scripts"
CONSUL_BOOTSTRAP_EXPECT=3
CONSUL_NODENAME="$( hostname )"
CONSUL_BIND_IP="$HOST_PUBLIC_IP"
CONSUL_BOOTSTRAP_HOSTS="$( cat /etc/nodefabric/nodefabric.hostmap 2>/dev/null | awk '{ print $1 }' )"
CONSUL_BOOTSTRAP_HOSTS_CSV=$( echo $CONSUL_BOOTSTRAP_HOSTS | tr ' ' , )
CONSUL_ATLAS_ENVNAME="jdunlop/my-cluster" # NB! Parameter format is: "atlas-user/atlas-env" as "jdunlop/testcluster"
CONSUL_ATLAS_TOKEN="7ks0pfuyZI6Jgg.atlasv1.fMYK8ySzyEbozyel3T1vi2qR2MZ3lHyAtCrOy7sYDnuYdnohmDarvlVKj01bxPa8syb"
CONSUL_SHARED_SECRET="" # Generate as: openssl rand -base64 16

Note

You need to execute ‘systemctl restart nf-consul’ after manually editing nf-consul.conf for ATLAS token and environment name!

Manual bootstrap procedure

If you don’t want to use remote auto-join mode then you can simply supply initial cluster hostmap manually - by editing /etc/nodefabric/nodefabric.hostmap config file and providing LAN IP address and hostname for each node in standard hostsfile format (ie ipaddr fqdn shortname in every line).

Example nodefabric.hostmap file would look like this:

192.168.40.101 node01.nf.int node01
192.168.40.102 node02.nf.int node02
192.168.40.103 node03.nf.int node03

Note

You need to execute ‘systemctl restart nf-consul’ after manually editing /etc/nodefabric/nodefabric.hostmap config file!

After all nodes have been bootstrapped you can observe NodeFabric Core Layer status by running nodefabric-dashboard (or nodefabric-status) utility:

[centos@ip-172-30-0-100 ~]$ sudo nodefabric-dashboard

# or one-off version of it would be
[centos@ip-172-30-0-100 ~]$ sudo nodefabric-status
NodeFabric Core Layer status

Debug

Consul eventlog can be observed on each cluster node by running nodefabric-monitor:

[centos@ip-172-30-0-100 ~]$ sudo nodefabric-monitor

Enabling MariaDB-Galera service

MariaDB-Galera database cluster is packaged and delivered as nf-galera docker containers - which are already included into NodeFabric Host Image. It’s service management commands are provided by nf-galera-ctl utility:

[root@nf-dev1 ~]# nf-galera-ctl help

Enable DB nodes

For MariaDB-Galera database service initialization you need to enable and start nf-galera containers across all cluster nodes. Do this by executing ‘nf-galera-ctl enable’ on a single cluster node:

Note

‘nf-galera-ctl enable’ command is broadcasted across ALL cluster nodes (ie run it on single node only)

[centos@ip-172-30-0-100 ~]$ sudo nf-galera-ctl enable

Please observe MySQL service node statuses from nodefabric-dashboard. All nodes should turn red gradually - which indicates that particular service container is up but is not yet passing all the health-checks (yellow status means container not yet started). Global MySQL DB service should stay in “FAILED” status for now - as it is not yet bootstrapped:

MariaDB-Galera nodes enabled

Bootstrap DB cluster

Once all DB service nodes reach “red/up/failed” status – you can execute ‘nf-galera-ctl bootstrap’ command for dataset initialization and cluster bootstrap:

Note

‘nf-galera-ctl bootstrap’ command is broadcasted across ALL cluster nodes (so run it on single node only)

[centos@ip-172-30-0-100 ~]$ sudo nf-galera-ctl bootstrap

It might take up to couple of minutes normally - when DB node statuses should turn to green in nodefabric-dashboard - and global DB service status should reach into “RUNNING” state:

MariaDB-Galera nodes bootstrapped

Note

After successful bootstrap database “root” user password is left empty and the account connectivity is limited to localhost

Debug

For debugging purposes nf-galera-monitor command can be used:

[centos@ip-172-30-0-100 ~]$ sudo nf-galera-monitor

Enabling Ceph storage services

There are 3 separate Ceph storage services that are currently included within NodeFabric Host Image:

  • Ceph cluster (MON) service
  • Ceph Remote Block Devices service (RBD)
  • Ceph distributed filesystem service (CephFS)

Ceph cluster monitor (MON) service is delivered as nf-ceph-mon docker containers - and it needs to be successfully initialized first - before any OSDs can join and before CephFS layer could be bootstrapped.

Object Storage Daemon software is actually included and run directly in the NodeFabric host OS context and for each Ceph data disk device there should be its own OSD daemon instance attached and running. You need to provide these dedicated block devices (min. 64GB per disk) to NodeFabric host for Ceph storage - additionally to default OS root disk. Multiple disks spreaded evenly across multiple NodeFabric hosts are recommended.

CephFS Metadata Service (ie MDS) is included as nf-ceph-mds docker container. It can be enabled and initialized after Ceph monitor cluster is running and initial number of OSDs (3) are joined and operational for storage pools. CephFS operates on top of its own dedicated Ceph RBD pools - which need to be created during bootstrap procedure.

nf-ceph-ctl, nf-ceph-disk and nf-ceph-fs utilities are used for various Ceph cluster related management tasks:

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-ctl help
[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-disk help
[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-fs help

Enable and bootstrap MON cluster

For enabling and starting nf-ceph-mon containers across all cluster nodes please execute ‘nf-ceph-ctl enable’:

Note

‘nf-ceph-ctl enable’ command is broadcasted to ALL cluster nodes - so execute on single node only

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-ctl enable

Ceph MON service node statuses should gradually turn red in nodefabric-dashboard:

Ceph MON nodes enabled

Once ALL Ceph MON nodes have been reaching UP status - you can issue ‘nf-ceph-ctl bootstrap’ for initializing Ceph cluster (one-time). This bootstrap process generates and distributes initial Ceph cluster configuration and keys across all nodes.

Note

Run ‘nf-ceph-ctl bootstrap’ on single node only - as it is broadcasted command

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-ctl bootstrap

Ceph MON service node statuses should be gradually reaching into OK state (expected bootstrap time should be normally less than a minute). Global Ceph MON service should reach into “RUNNING” state - as seen from the dashboard:

Ceph MON nodes bootstrapped

Provide and initialize Ceph disks

Note

Ceph disks have to be initialized on EACH node separately – meaning that nf-ceph-disk commands DO NOT broadcast across cluster!

Please login to each NodeFabric host and list available block devices (that you have previously attached to this VM/host):

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-disk list
INFO: Listing block devices ...
/dev/xvda :
 /dev/xvda1 other, xfs, mounted on /
/dev/xvdb other, unknown

Block devices with ‘unknown’ statuses are good candidates for Ceph disks :) In order to initialize particular block device as Ceph disk you have to run ‘nf-ceph-disk init’ command with full path to particular block device provided.

Note

‘nf-ceph-disk init’ WILL DESTROY ALL DATA ON SPECIFIED TARGET DISK!

Note

The following command will produce some partx related error/warning messages in the output - which can be ignored

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-disk init /dev/xvdb
INFO: Initializing /dev/xvdb ...
WARN: THIS WILL DESTROY ALL DATA ON /dev/xvdb!
Are you sure you wish to continue (yes/no): yes
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
The operation has completed successfully.
partx: specified range <1:0> does not make sense
The operation has completed successfully.
partx: /dev/xvdb: error adding partition 2
The operation has completed successfully.
partx: /dev/xvdb: error adding partitions 1-2
meta-data=/dev/xvdb1             isize=2048   agcount=4, agsize=720831 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2883323, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
The operation has completed successfully.
partx: /dev/xvdb: error adding partitions 1-2
INFO: /dev/xvdb initialized!

You can verify local OSD service status by issuing ‘nf-ceph-disk status’:

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-disk status
Ceph OSD status

Note

Now repeat this process and initialize ALL Ceph disks on ALL cluster nodes!

Once you are finished with Ceph disks initialization on all nodes - you should see the following fragment in the nodefabric-dashboard Ceph Status section (look for osdmap status line):

Ceph OSD MAP

Enable and bootstrap CephFS

For enabling CephFS layer - a POSIX compliant distributed filesystem - you need to start Ceph Metadata Daemon containers first (command is broadcasted across cluster nodes):

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-fs enable

Observe global Ceph MDS Service reaching into “RUNNING” state in nodefabric-dashboard - before proceeding with CephFS bootstrap:

CephFS enabled

Once Ceph MDS service is running you can issue CephFS bootstrap command (execute on single node):

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-fs bootstrap

After that you should see mdsmap line in Ceph status section in nodefabric-dashboard:

Note

Currently Ceph MDS service is run in active-passive mode - as suggested by Ceph authors for the sake of the stability

Ceph MDS Map

Now you can proceed and mount CephFS on each cluster node - if you desire to do so:

Note

This command is not broadcasted and enables only local /srv/cephfs mountpoint

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-fs mount

For checking global Ceph Metadata service status and local mountpoint on current node please run:

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-fs status
CephFS service status

Manage

Dashboards

There are several status dashboards available with NodeFabric:

nodefabric-dashboard

This default console based dashboard provides cluster-wide status overview across different NodeFabric service layers together with more detailed Consul membership and Ceph status boards.

# Run from arbitrary node console
nodefabric-dashboard

Global services statuses are presented as RUNNING, DEGRADED or FAILED:

  • RUNNING means that all nodes participating in service are OK
  • DEGRADED means that service has quorum and is operational - yet one or more nodes participating are failing
  • FAILED means that service has lost quorum and is not available

Service statuses on each node participating in a global service offering are colored as follows:

  • GREEN means service is OK (ie passing health checks)
  • YELLOW means that service module is not started
  • RED means that service module is started but not passing health checks (ie failing)

Note

Underlined node represents current Consul master

nodefabric-dashboard

Consul web UI

This local web UI provides cluster-wide status overview about internal services (as they get registered in consul) and about their built-in health checks. Editing support for Consul highly available Key-Value store is also included.

Note

Consul UI is only available from localhost (use ssh port forwarding for remote access)

# Setup local port forwarding over SSH connection to Consul UI port
NODE_IP="10.211.55.100"
NODE_USER="centos"
KEY_PATH="~/.ssh/id_rsa"
ssh -L 8500:localhost:8500 -i ${KEY_PATH} ${NODE_USER}@${NODE_IP}

# Load Consul UI in your web browser
http://localhost:8500/ui/
Consul UI

ATLAS dashboard

Remote counterpart for local Consul UI is provided by ATLAS service. Goto https://atlas.hashicorp.com/environments and login with your ATLAS user account for remote Consul dashboard:

ATLAS remote dashboard

HAProxy web UI

HAProxy dashboard provides status info about internal load-balanced service endpoints.

Note

HAProxy web UI is only available from localhost (use ssh port forwarding for remote access)

# Setup local port forwarding over SSH connection to HAProxy UI port
NODE_IP="10.211.55.100"
NODE_USER="centos"
KEY_PATH=".ssh/id_rsa"
ssh -L 48080:localhost:48080 -i ${KEY_PATH} ${NODE_USER}@${NODE_IP}

# Load HAProxy UI in your web browser
http://localhost:48080/
HAProxy UI

NF Core Layer

TODO

MariaDB-Galera service

nf-galera-ctl management utility provides several helpful commands:

[root@nf-dev1 ~]# nf-galera-ctl help

Usage:

  nf-galera cluster service management:

    nf-galera-ctl enable
    nf-galera-ctl disable
    nf-galera-ctl bootstrap
    nf-galera-ctl dbadmin-add <username> <database> [password]
    nf-galera-ctl passwd <username> [password]
    nf-galera-ctl user-list
    nf-galera-ctl user-remove <username>
    nf-galera-ctl database-list
    nf-galera-ctl database-create <database>
    nf-galera-ctl database-destroy <database>

Help:

    nf-galera-ctl help

For controlling cluster-wide MariaDB-Galera service status you can use the following commands:

# Enabling and starting nf-galera docker containers across cluster nodes
nf-galera-ctl enable

# Stopping and disabling nf-galera docker containers across cluster nodes
nf-galera-ctl disable

# Issuing manual bootstrap (for example if MariaDB-Galera auto-bootstrap failed, this command is re-run safe)
nf-galera-ctl bootstrap

Ceph storage services

TODO

[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-ctl help

Usage:

  nf-ceph-mon cluster service management:

    nf-ceph-ctl enable
    nf-ceph-ctl disable
    nf-ceph-ctl bootstrap


  Help:

    nf-ceph-ctl help
[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-disk help

Usage:

  OSD / Disk management:

    nf-ceph-disk list
    nf-ceph-disk status
    nf-ceph-disk init <blkdev>
    nf-ceph-disk activate <blkdev>


  Help:

    nf-ceph-disk help
[centos@ip-172-30-0-100 ~]$ sudo nf-ceph-fs help

Usage:

  CephFS management:

    nf-ceph-fs enable
    nf-ceph-fs disable
    nf-ceph-fs status
    nf-ceph-fs bootstrap
    nf-ceph-fs mount
    nf-ceph-fs umount


  Help:

    nf-ceph-fs help

System update

Included nodefabric-update utility will update OS root and NodeFabric service containers:

[centos@ip-172-30-0-100 ~]$ sudo nodefabric-update

Troubleshoot

Database cluster not auto-bootstrapping after full shutdown

In case of database cluster bootstrap problems you can re-run ‘nf-galera-ctl bootstrap’ - as it is designed to be re-run safe. It does not re-initialize dataset once it already exists – it only recovers last GTID and transforms node with the latest dataset as primary node.

sudo nf-galera-ctl bootstrap
sudo nf-galera-monitor

Ceph OSD does not activate after node reboot

Symptoms:

# Problem symptom #1: OSD mount is shown but OSD systemd service entry is missing
[root@nf-dev2 ~]# sudo nf-ceph-disk status

INFO: Listing OSD services ...


INFO: Listing OSD mounts ...

var-lib-ceph-osd-ceph\x2d2.mount - /var/lib/ceph/osd/ceph-2
  Loaded: loaded (/proc/self/mountinfo)
  Active: active (mounted) since Wed 2015-09-30 12:34:16 GST; 6min ago
   Where: /var/lib/ceph/osd/ceph-2
    What: /dev/sdb1

# Problem symptom #2: Ceph disk listing will complain over filesystem corruption
[root@nf-dev2 ~]# sudo nf-ceph-disk list
INFO: Listing block devices ...
mount: mount /dev/sdb1 on /var/lib/ceph/tmp/mnt.RuWU_R failed: Structure needs cleaning
WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying sgdisk; may not correctly identify ceph volumes with dmcrypt
/dev/sda :
 /dev/sda1 other, xfs, mounted on /boot
 /dev/sda2 other, LVM2_member
/dev/sdb :
mount: mount /dev/sdb1 on /var/lib/ceph/tmp/mnt.SGq2oW failed: Structure needs cleaning
 /dev/sdb1 ceph data, unprepared
 /dev/sdb2 ceph journal
/dev/sr0 other, unknown

Fixes:

# Repairing filesystem
[root@nf-dev2 ~]# sudo xfs_repair /dev/sdb1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
* ERROR: mismatched uuid in log
*            SB : 1cb2ae7d-5765-46c8-a217-03c1b4a6cfde
*            log: 9df2630e-5e8f-4455-9c72-c0b27764bace
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

# Re-activate OSD (note that you need to re-activate partition - not disk device!)
[root@nf-dev2 ~]# sudo nf-ceph-disk activate /dev/sdb1
INFO: Activating /dev/sdb1 ...
=== osd.1 ===
create-or-move updated item name 'osd.1' weight 0.06 at location {host=nf-dev2,root=default} to crush map
Starting Ceph osd.1 on nf-dev2...
Running as unit run-6098.service.
INFO: /dev/sdb1 activated!