Utility ramdisk

Registered by Roman Prykhodchenko on 2013-11-18

Ironic would use an agent living inside a utility ramdisk to perform maintenance on bare metal nodes, including hardware configuration and management, provisioning, and decommissioning of servers. This agent would expose an API that Ironic could call in order to perform various tasks.

The features it provides include:

– Hardware configuration
   – Turning on or off virtualization features: NEEDS BLUEPRINT
   – Configuring iSCSI devices: NEEDS BLUEPRINT (is this in scope?)
   – Verifying/updating firmware: NEEDS BLUEPRINT
– Provisioning
   – Image deployment: DONE
   – Image caching: DONE
   – Disk partitioning: IN PROGRESS (kozhukalov)
   – Configdrive partition creation: DONE
– Decommissioning
  – Secure disk erasing: NEEDS BLUEPRINT

Blueprint information

Status:
Complete
Approver:
devananda
Priority:
High
Drafter:
Jay Faulkner
Direction:
Approved
Assignee:
Jim Rollenhagen
Definition:
Superseded
Series goal:
Accepted for juno
Implementation:
Needs Code Review
Milestone target:
None
Started by
devananda on 2014-03-05
Completed by
devananda on 2014-07-31

Related branches

Sprints

Whiteboard

I would like to see a more detailed plan for implementing these functions -- what order they'll be done in, what the interface between Ironic and ramdisk will be, and how this will abstract hardware differences. Include some examples of what the ramdisk GET request and result might look like, perhaps a specification for the return format, and so on.

I also had a discussion with lifeless regarding whether the interaction should be programatic (as we discussed at the summit) or declarative (something we did not discuss at the summit). I am ambivalent here; both approaches have pro's/con's in my mind. This is why I would like to see more detail in the blueprint before development starts.

Thanks! -Devananda, 2013-11-19

-----------------------------------------------
I think the blueprint discovery-ramdisk (https://blueprints.launchpad.net/ironic/+spec/discovery-ramdisk) or we could call it "hardware enrollment" should be considered as a part of in utility-ramdisk too, the utility-ramdisk need a mechanism to coordinate all the sub-elements in it, such as hardware enrollment, firmware update, hardware configuration, etc.

Also, as we discussed on HK design summit, there should be a flag "maintenance_mode" be set to the baremetal nodes to separate these nodes from "available" ones, but I'm thinking, for each sub-elements in utility-ramdisk, maybe we want a sub-status too, for example, "hardware enrollment" element will report a status of "hardware_enrolling" via ironic API to update the status of baremetal node, so the admin/user could know what is going on during the maintenance node, any comments?

Sun Jing, 2013-11-22

------------------------------------------------
Devananda, can you get some additional details about alternative approaches programmatic and declarative?

Thanks.
Vladimir Kozhukalov, 2013-11-25

------------------------------------------------
Most of this has been implemented already here: https://github.com/rackerlabs/teeth-agent + https://github.com/rackerlabs/ironic-teeth-driver

Thanks,
Jay Faulkner, 2014-03-05

-----------------------------------------------
Jay,
     Can you give more info on how the teeth agent + driver connect to Ironic and ramdisk? (links to the patches will be great.) Will the teeth agent + driver be shipped in a separate library that Ironic can refer to? I see this blueprint has superseded https://blueprints.launchpad.net/ironic/+spec/discovery-ramdisk blueptint, does this one contain all the items mentioned in that blueprint?

Thanks,
Ling Gao, 2014-03-07

----------------------------------------------
Hi Ling,
Check the repositories Jay linked to for the separate code. There is currently a patch up to the infra group to make teeth-agent a separate project under the bare metal provisioning program (https://review.openstack.org/#/c/79088/). ironic-teeth-driver will eventually go upstream into Ironic.

Our method here is pretty close to the blueprint you mentioned. We do not yet support auth tokens, but expect to in the future. How it works:
- An agent containing the image is PXE booted on the machine to be provisioned.
- The agent exposes a REST API for Ironic to issue commands to.
- The agent sends hardware info to the Ironic API and the URL to the REST API is registered to a node.
- When Ironic decides to provision this node, it issues commands to write the image, write a configdrive, and reboot from disk.
- When the machine is torn down, the agent image is PXE booted again, and Ironic issues commands to "decommission" commands, such as "secure erase disks", etc.

Does that make sense?

Thanks,
Jim Rollenhagen, 2014-03-10

-----------------------------------------------

As of last week, the ironic-python-agent is an Openstack project: http://git.openstack.org/cgit/openstack/ironic-python-agent/tree/ - the driver that matches it is still WIP and will be put into a Gerrit merge request sometime soon (This week?), but the code is already open source and accessible on github here: https://github.com/rackerlabs/ironic-teeth-driver.

This blueprint is well on it's way to implementation.

--Jay Faulkner, 2014-03-25

-----------------------------------------------

Jim,
    Thanks for the detailed description. Yes, it is a very good design! I have one question: when the teeth-agent is being asked by Ironic to perform a task such as firmware update, will the firmware update driver (plugin) be downloaded dynamically to the node or all the drivers (plugins) are loaded with the agent already? We do not want to make the ramdisk image too big for the performance reason.
    Another question, when is the real os image get downloaded? I hope it is downloaded by teeth agent with a wget http call if one of the commands the teeth agent get is "deploy image". This way, the real os image will not have to be downloaded to the node by pxe/tftp if the admin just wants to perform other tasks like firmware update or node discovery.

Thanks,
Ling Gao, 2014-03-26

-----------------------------------------------

I believe the plan has been that most required things (disk images, firmwares, etc) would be stored in swift, and the agent would download them on demand using a temporary swift URL.

Also, in our prototypes, we've been bootstrapping servers into iPXE and downloading the agent image over HTTP. I believe we'll want to implement it this way in Ironic as well.

-Jay Faulkner, 2014-03-27

-----------------------------------------------

Hey Ling, good questions!

The current plan is to have all plugins loaded into the ramdisk image. We'll want to use iPXE and HTTP like Jay mentioned, to avoid performance issues with a large ramdisk.

The OS image is downloaded on demand over http when a bare metal instance is deployed. We'd like to implement some caching in the scheduler to write images ahead of time, as well, but that is an optimization for later. For example, if nova knows that 50% of deployed instances use a Ubuntu 14.04 image, that image should be cached on 50% of free nodes.

Does that answer your questions?

// Jim Rollenhagen, 2014-03-27

----------------------------------------------

Jim and Jay,
     Thanks for the responses. It is looks great!

Ling Gao, 2014-03-27

------------------------------------------------

I am getting up-to-speed on this topic. Just want to make sure that I understand this correctly. It sounded to me that all the plugins (e.g., firmware update tool, hardware config tool, ...etc) will be loaded into ramdisk but the deploy image or firmwre image will be download on demand by the agent. Is this correct? Who is supposed to build the ramdisk image and decide what plugins to be included in the image? The plugins can be vendor specific, for instance, vendor specific firmware update tool. Right? Thanks!

newbie 04-21-14

------------------------------------------------------------------------------------------------------------------------
IPXE was mentioned in this chain of discussion. Just want to point out that iPXE does not support UEFI boot mode. Therefoer, IPA will need to use some other mechanism (e.g., PXE or something else) for systems with UEFI boot mode.

  iron1 04-21-14

------------------------------------------------------------------------------------------------------------------------
newbie - This is correct. Building the ramdisk is up to the operator, and a reference image builder is included in the repository. There is a pluggable hardware manager for the exact reason of vendor-specific functionality - one could include additional plugins via entry points and 'pip install'.

iron1 - http://ipxe.org/download says "iPXE supports the EFI and UEFI environments, as well as the standard PC BIOS. You can build an EFI driver ROM using the .efirom image format." I don't see the issue?

Sorry for the slow responses.

// Jim Rollenhagen, 2014-05-07

------------------------------
As I understand it, the needs laid out by this BP are met by the ironic-python-agent project, but this BP does not appear to have been used for tracking that work.

The ironic-python-agent *driver* was proposed separately, and that was approved recently. https://blueprints.launchpad.net/ironic/+spec/agent-driver

-Devananda

(?)

Work Items

Work items:
[a-gordeev] Get rid of modes. Introduce granular extensions: DONE
[a-gordeev] Arbitrary flow driver: DONE
[kozhukalov] Comprehensive data driven disk partitioning: INPROGRESS

Dependency tree

* Blueprints in grey have been implemented.