support firmware update for baremetal servers

Registered by Sun Jing on 2013-09-23

It's common requirement that the users or system admins need to update firmware for the baremetal servers, but different venders may have different processes to do it, this blueprint is to provide a common framework in Ironic(may involved in Diskimage-builder project too) to implement the firmware update.

As discussed in the Icehouse design summit in HongKong, this will be a part of utility ramdisk feature. Here is some highlights of utility ramdisk:
1. chain of actions is managed by Chain Manager which belongs at a higher level than ironic (ironic only knows current action)
    for example:
    "chain": "runimage=http://<httpserver>/ firmware-update.tgz, runcmd=/usr/bin/cmd1"
2. when the utility ramdisk loaded in the client side, it does not what to do, just get the jobs through REST API, then handle current job, then get next job.

------------------------------------------------------
Here is the implemenation highlights for firmware update in utility ramdisk(or call it runimage):

1. The basic idea:
use kernel+ramdisk pair to netboot the target node into diskless, then in the ramdisk, wget firmware-update.tgz(the URL is passed in the kernel parameters), extract it and run runme.sh(it can be customized by the admin), after firmware update finished, the target node can run into the shell mode, so the admin can check if the firmware update is desired.

Here is an example of firmware-update.tgz(or any name meaningful for the admin) based on IBM product, to update IMM2 and uEFI, but it should be the common process for other vendors:

----------------------
# cd /install/firmware
# ls
ibm_fw_imm2_1aoo27b-1.10_anyos_noarch.uxz
ibm_fw_imm2_1aoo27b-1.10_anyos_noarch.xml
ibm_fw_uefi_tde111a-1.00_anyos_32-64.uxz
ibm_fw_uefi_tde111a-1.00_anyos_32-64.xml
runme.sh
ibm_utl_uxspi_9.21_rhel6_32-64.bin
# cat runme.sh
./ibm_utl_uxspi_9.21_rhel6_32-64.bin up -L -u
# chmod +x runme.sh ibm_utl_uxspi_9.21_rhel6_32-64.bin
# tar -zcvf firmware-update.tgz .
--------------------

Different vendors may have different content of the tarball, but it should contain: 1. firmware update payload(specific for different firmwares) 2. runme.sh

This change will be relative to diskimage-builder and ironic.

2. others:
    * if the node already has an instance on it, we need to migrate off first, we can not disrupt the tenant.
    * need a maintenance mode flag - set by the admin, to take a node out scheduler awareness during a firmware update, basically just disallows associating tenants with the node, can set maintenance flag on associated nodes.
    * tenants can do firmware updates themselves, so don't put untrusted tenants on baremetal, can use TPM/UEFI to ensure subsequent tenant doesn't boot into an untrusted environment.

3. Here are some open discussions on HK design summit:
https://etherpad.openstack.org/p/IcehouseIronicFirmwareUpdate

updated by Sun Jing, 2013-11-18

Blueprint information

Status:
Not started
Approver:
aeva black
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
Ling Gao
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

I only mentioned the inband firmware update method since I thought this could be a common framework for different vendors, while for the out-of-band methods, different vendors may have different ways, so I will bring a summary of IBM OOB methods to the summit and hope I could have chance to discuss with the experts from other companies.

----------------------

Sun,

Please update the description based on our summit discussion and notes here:
  https://etherpad.openstack.org/p/IcehouseIronicFirmwareUpdate

Thanks! Deva, 2012-11-12

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.