3. Introduction

OpenPOWER Cluster Genesis (OPCG) enables greatly simplified configuration of clusters of baremetal OpenPOWER servers running Linux. It leverages widely used open source tools such as Cobbler, Ansible and Python. Because it relies solely on industry standard protocols such as IPMI and PXE boot, hybrid clusters of OpenPOWER and x86 nodes can readily be supported. Currently OPCG supports Ethernet networking with separate data and management networks. OPCG can configure simple flat networks for typical HPC environments or more advanced networks with VLANS and bridges for OpenStack environments. OPCG also configures the switches in the cluster. Currently Mellanox SX1410 is supported for the data network and the Lenovo G8052 is supported for the management network.

3.1. Overview

OPCG is designed to be easy to use. If you are implementing one of the supported architectures with supported hardware, OPCG eliminates the need for custom scripts or programming. It does this via a configuration file (config.yml) which drives the cluster configuration. The configuration file is a yaml text file which the user edits. Example YAML files are included. The configuration process is driven from a “deployer” node which does not need to remain in the cluster when finished. The process is as follows;

  1. Rack and cable the hardware.
  2. Initialize hardware.
    • initialize switches with static ip address, userid and password.
    • insure that all cluster compute nodes are set to obtain a DHCP address on their BMC ports.
  3. Install the OpenPOWER Cluster Genesis software on the deployer node.
  4. Edit an existing config.yml file.
  5. Run the OPCG software
  6. Power on the cluster compute nodes.

When finished, OPCG generates a YAML formatted inventory file which can be read by operational management software and used to seed configuration files needed for installing a solution software stack.

3.1.1. Hardware and Architecture Overview

The OpenPOWER Cluster Genesis software supports clusters of servers interconnected with Ethernet. The servers must support IPMI and PXE boot. Currently single racks with single or redundant data switches (with MLAG) are supported. Multiple racks can be interconnected with traditional two tier access-aggregation networking. In the future we plan to support two tier leaf-spine networks with L3 interconnect capable of supporting VXLAN.

3.1.2. Networking

The data network is implemented using the Mellanox SX1410 10 Gb switch. OPCG will support any number of data interfaces on the compute nodes. Currently OPCG supports one or two ethernet interfaces. These interfaces can be bonded with support for LAG or MLAG.

Templates are used to define multiple network configurations in the config.yml file. These can be physical ports, bonded ports, Linux bridges or vLANS. Physical ports can be renamed to ease installation of additional software stack elements.

3.1.3. Compute Nodes

OPCG supports clusters of heterogeneous compute nodes. Users can define any number of node types by creating templates in a config file. Node templates can include any network templates defined in the network templates section. The combination of node templates and network templates allows great flexibility in building heterogeneous clusterx with nodes dedicated to specific purposes.

3.1.4. Supported Hardware

OpenPOWER Compute Nodes;

  • S812LC
  • S822LC
  • Tyan servers derived from the above 2 nodes are generally supported.
  • SuperMicro OpenPOWER servers

x86 Compute Nodes;

  • Lenovo x3550
  • Lenovo x3650

Data Switches;

  • Mellanox SX1410
  • Mellanox SX1710

Support for Lenovo G8264 is planned

Management Switches;

  • Lenovo G8052

4. Prerequisite hardware setup

4.1. Hardware initialization

  • Insure the cluster is cabled according to build instructions and that a list of all switch port to compute node connections is available and verified. Note that every node to be deployed, must have a BMC and PXE connection to a management switch. (see the example cluster in Appendix-D)

  • Cable the deployer node to the cluster management network. It is strongly recommended that the deployer node be connected directly to the management switch. For large cluster deployments, a 10 Gb connection is recommended. The deployer node must also have access to the public internet (or site) network for accessing software and operating system image files. If the cluster management network does not have external access, an alternate connection with external access must be provided such as the cluster data network, or wireless etc.

  • Insure that the BMC ports of all cluster nodes are configured to obtain an IP address via DHCP.

  • If this is a first time OS install, insure that all PXE ports are also configured to obtain an ip address via DHCP. On OpenPOWER servers, this is typically done using the Petitboot menus.

  • Acquire any needed public and or site network addresses

  • Insure you have a config.yml file to drive the cluster configuration. If necessary, edit / create the config.yml file (see section 4 Creating the config.yml File)

  • Configure data switch(es) For out of box installation, it is usually easiest to configure the switch using a serial connection. See the switch installation guide. Using the Mellanox configuration wizard;

    • assign hostname

    • set DHCP to no for management interfaces

    • set zeroconf on mgmt0 interface: to no

    • do not enable ipv6 on management interfaces

    • assign static ip address. This must match the address specified in the config.yml file (keyname: ipaddr-data-switch:) and be in a different subnet than your cluster management subnet used for BMC and PXE communication.*

    • assign netmask. This must match the netmask of the subnet the deployer will use to access the management port of the switch.

    • default gateway

    • Primary DNS server

    • Domain name

    • Set Enable ipv6 to no

    • admin password. This must match the password specified in the config.yml file (keyword: password-data-switch:). Note that all data switches in the cluster must have the same userid and password.

    • disable spanning tree (typical industry standard commands; enable, configure terminal, no spanning-tree or for Lenovo switches spanning-tree mode disable)

    • enable SSH login. (ssh server enable)

    • If this switch has been used previously, delete any existing vlans which match those specified in the network template section of the config.yml file. This insures that only those nodes specified in the config file have access to the cluster. (for a brand new switch this step can be ignored)

      • login to the switch:

        enable
        configure terminal
        show vlan
        

        note those vlans that include the ports of the nodes to be included in the new cluster and remove those vlans or remove those ports from existing vlans:

        no vlan n
        
    • Save config. In switch config mode:

      configuration write
      

      Note that the management ports for the data and management switches in your cluster must all be in the same subnet. It is recommended that the subnet used for switch management be a private subnet which exists on the cluster management switches. If an external network is used to access the management interfaces of your cluster switches, insure that you have a route from the deployment container to the switch management interfaces. Generally this is handled automatically when Linux creates the deployer container.

    • If using redundant data switches with MLAG, configure link aggregation (LAG) on the interswitch peer links (IPL) links. (It is important to do this before cabling multiple links between the switches which will otherwise result in loops):

      switch> en
      switch# conf t
      switch(config)# interface port-channel 6    (example port channel No.  We advise to use the number of the lowest port in the group
      switch(config interface port-channel 1) # exit
      switch(config)# lacp
      switch(config)# interface ethernet 1/6-1/7      (example port channel #s eg ports 6 and 7)
      switch(config interface ethernet 1/6-1/7)# channel-group 6 mode active
      switch(config interface ethernet 1/6-1/7)# exit
      
  • Configure Management switch(es) (for out of box installation, it is usually necessary to configure the switch using a serial connection. See the switch installation guide. For additional info on Lenovo G8052 specific commands, see Appendix G. and the Lenovo RackSwitch G8052 Installation guide)

    • Enter config mode and create a vlan for use in accessing the management interfaces of your switches. This must match the vlan specified by the “vlan-mgmt-network:” key in your cluster configuration (config.yml) file:

      RS G8052> enable
      RS G8052# configure terminal
      RS G8052(config)# vlan 16
      RS G8052(config-vlan)# enable
      RS G8052(config-vlan)# exit
      
    • Enable IP interface mode for the management interface:

      RS G8052(config)# interface ip 1
      
    • assign a static ip address, netmask and gateway address to the management interface. This must match the address specified in the config.yml file (keyname: ipaddr-mgmt-switch:) and be in a different subnet than your cluster management subnet. Place this interface in the above created vlan. (Note: if the following configuration is executed on the interface you are using to communicate with the switch, you will lose connectivity when the vlan is applied. To avoid this, use the serial connection or an alternate management interface):

      RS G8052(config-ip-if)# ip address 192.168.16.20 (example IP address)
      RS G8052(config-ip-if)# ip netmask 255.255.255.0
      RS G8052(config-ip-if)# vlan 16
      RS G8052(config-ip-if)# enable
      RS G8052(config-ip-if)# exit
      
    • Configure the default gateway and enable the gateway:

      RS G8052(config)# ip gateway 1 address 192.168.16.1  (example ip address)
      RS G8052(config)# ip gateway 1 enable
      
    • Put the port used to connect to the deployer node (the node running Cluster Genesis) into trunk mode and add the above created vlan to that trunk:

      RS G8052(config)# interface port 46  (example port #)
      RS G8052(config-if)# switchport mode trunk
      RS G8052(config-if)# switchport trunk allowed vlan 1,16
      RS G8052(config-if)# exit
      
    • Verify the management interface setup:

      RS G8052(config)#show interface ip
      

      A typical good setup would look like:

      Interface information:
      1:      IP4 192.168.16.20    255.255.255.0   192.168.16.255,  vlan 16, up
      
    • Verify the vlan setup:

      RS G8052(config)#show vlan
      

      A typical good result would look something like:

      VLAN                Name                Status            Ports
      ----  --------------------------------  ------  -------------------------
      1     Default VLAN                      ena     1-3 5 7 9 11 13-23 25 27 29 31
                                                      33-46 48-XGE4
      16    VLAN 16                           ena     46
      
    • admin password. This must match the password specified in the config.yml file (keyword: password-mgmt-switch:). Note that all management switches in the cluster must have the same userid and password. The following command is interactive:

      access user administrator-password
      
    • disable spanning tree (for Lenovo switches enable, configure terminal, spanning-tree mode disable):

      spanning-tree mode disable
      
    • enable secure https and SSH login:

      ssh enable
      ssh generate-host-key
      access https enable
      
    • Save the config (For Lenovo switches, enter config mode For additional information, consult vendor documentation):

      copy running-config startup-config
      

4.2. Setting up the Deployer Node

Requirements; It is recommended that the deployer node have at least one available core of a XEON class processor, 16 GB of memory free and 64 GB available disk space. For larger cluster deployments, additional cores, memory and disk space are recommended. A 4 core XEON class processor with 32 GB memory and 320 GB disk space is generally adequate for installations up to several racks.

The deployer node requires internet access. The interface associated with the default route is used by the deployer for configuring the cluster. This requires that the default route be through the management switch. This restriction will be removed in above future release of Cluster gensesis.

Set up the Deployer Node (to be automated in the future)

  • Deployer OS Requirements:
    • Ubuntu
      • Release 14.04LTS or 16.04LTS
      • SSH login enabled
      • sudo priviledges
    • RHEL
  • Optionally, assign a static, public ip address to the BMC port to allow external control of the deployer node.

  • login into the deployer and install the vim, vlan and bridge-utils packages
    • Ubuntu:

      $ sudo apt-get update
      $ sudo apt-get install vim vlan bridge-utils
      
    • RHEL:

      $ sudo yum install vim vlan bridge-utils
      

Note: Genesis uses the port associated with the default route to access the management switch (ie eth0). This must be defined in /etc/network/interfaces (Ubuntu) or the ifcfg-eth0 file (RedHat).

ie:

auto eth0
iface eth0 inet manual