Ironic python agent partitioning extension

Registered by Vladimir Kozhukalov

We need to have partitioning extension which will prepare node disks for uploading image.

IPA is supposed follow these points
* Support GPT/MSDOS plain partitions.
* Support primary and logical partitions for MSDOS table.
* Support not fixed partition sizes, e.g. when user sets minimum required size, partition priority and extension extends partitions over a whole disk, according to their priorities and min sizes.
* Support geometry specific partition scheme, i.e. fixed sizes. Fixed size is equivalent to minsize=maxsize and priority=highest.
* Support MD capabilities (mirror, stripe, linear).
* Do not support LVM, as LVM is out of Ironic scope.
* Support scheme based API, i.e. when user defines entire partition scheme instead of creating partitions granularly.
* Ability to reliably recognize hard drives throughout various kernel versions and udev configs, basing on udevadm properties, such as ID_WWN, ID_SERIAL, DEVPATH, etc.
* It is supposed that we'll use image metadata (e.g. glance image properties) to find out if image is appropriate to be deployed on a partition or md device. We also can use image metadata to define a desirable partition scheme using geometry independent capabilities (partition min size, partition priority, growing partition).

-------------------------------------------------------------
* API
-------------------------------------------------------------
POST /commands
DATA: {
    "name": "partition.partition",
    "params": {"scheme":[
          {
                "uspec": {
                       "DEVNAME": "/dev/sda",
                       "ID_SERIAL": "DISKSERIAL",
                       "ID_WWN": "DISKWWN",
                       "DEVTYPE": "disk",
                       "DEVPATH": "/devices/pci0000:00/0000:00:0d.0/host2/target2:0:0/2:0:0:0/block/sda",
                       "DEVLINKS": ["/dev/block/8:0", "/dev/disk/by-id/DISKID", "/dev/disk/by-path/DISKPATH"]
                 }
                 "type": "disk",
                 "table": "msdos",
                 "scheme_id": 0,
                 "partitions": [
                        {"size": 1024, "type": "primary", "flags": ["boot"], "scheme_id": 1},
                        {"minsize": 10240, "maxsize": 102400, "priority": 20, "type": "primary", "flags": [], "scheme_id": 2},
                        {"minsize": 20480, "maxsize": "grow", "priority": 10, "type": "logical", "flags": [], "scheme_id": 3}
                  ]
          },
          {
                "uspec": {
                       "DEVNAME": "/dev/sdb",
                       ...
                 }
                 "type": "disk",
                 "table": "gpt",
                 "scheme_id": 4,
                 "partitions": [
                        {"size": 1024, "type": "primary", "flags": ["boot"], "scheme_id": 5},
                        {"minsize": 0, "maxsize": "grow", "priority": 100, "type": "primary", "flags": [], "scheme_id": 6}
                  ]
          },
          {
                "uspec": {
                       "DEVNAME": "/dev/sdc",
                       ...
                 }
                 "type": "disk",
                 "scheme_id": 7
          },

          {
                 "type": "md",
                 "level": "mirror",
                 "devices": [1, 5],
                 "spare": [7]
          }
        ]
    }
}

Blueprint information

Status:
Complete
Approver:
aeva black
Priority:
Not
Drafter:
Vladimir Kozhukalov
Direction:
Needs approval
Assignee:
Vladimir Kozhukalov
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Jim Rollenhagen

Related branches

Sprints

Whiteboard

All the existing stuff in the agent assumes that an image with partitions is being given in, and writes it to a block device. I'd like to know how this looks in the end: how does the agent determine what disks to partition, how will you ensure the image you're writing is good for writing into a partition vs directly on a block device.

I also have concerns with MD support, as supporting Linux software RAID would bring in an operating system / image dependent feature which could increase scope. Does this mean we should support Windows software raid / dynamic disks as well? I'd just like for the scope around this to be well defined before we get too deep in.
- JayF 2014-04-17

---
Right, if you want you can place partitions into image and copy this image as is to a disk. But there are cases when you want have several disks on a node and you want them to be combined into one huge reliable space. Those use cases are mostly around deployment stuff when you want to have 24x7 long term nodes. You just can not know anything about hard drives on a node during image building stage except maybe minimal required size.

Some notes about MD:
* We can place necessary attributes in the image metadata so as to make ironic and ironic agent able to understand if a particular image can be deployed on MD device or not.
* Image is certainly needs to be linux with MD module.
* If we deploy kernel and initrd on a node into /boot partition (vs. pxe/ipxe), bootloader needs to be able to understand MD metadata in order to be able to get kernel and initrd.
* MD device where /boot is located needs to be mirror (not stripe or any other). Maybe I'am wrong but AFAIK modern bootloaders are not so smart to be able to work with other levels than mirror.
* Initrd needs to contain MD kernel modules in order to be able to assemble MD arrays. We don't need to built-in mdadm.conf into initrd any more. The recommended way nowadays is to assemble md devices automatically basing on their MD metadata only.

- vkozhukalov 2014-04-17

--------------

I'm going to put a few comments in the etherpad regarding some of this.

- JayF 2014-04-17

-------------

Updated API in https://etherpad.openstack.org/p/ironic-disk-partitioning. Removed highly granular API and added extension API with just one command 'allocate'. User sends partitioning specification and extension does everything in a time.

- vkozhukalov 2014-04-21

--------------

The new API LGTM but I want Russell to have a look to be sure. Can you move the spec into the BP or move the old stuff in the etherpad into an obsolete section? Just want to be 100% sure what we're agreeing on.

-Jay 2014-04-24

--------------

Actually, one other thing as well: How do you handle building partitions for an unknown disk size? For instance, should there be a way to indicate the last partition be 'expand to fit' or similar? Or perhaps the behavior we're already doing in a part of the agent where we carve out a configdrive starting from the *end* of the disk.

When we're talking direct geometries, we're still talking about hardware specific config. I'd prefer the parititioning be able to be done in a way that's not 100% tied to specific hardware/raid/disk sizes.

-Jay 2014-04-24

----------------
Updated BP description. Added geometry independent capabilities. Wrote down some points explicitly.

-vkozhukalov 2014-04-25

---------------

First of all, (arbitrary) partitioning and software raid are totally different. This blueprint talks about both of them. And secondly, both are outside the scope of Ironic, in my opinion.

Cinder, which is the OpenStack service responsible for providing block storage, provides raw block devices to instances -- not partitions.
We should not expand the scope of Ironic to include describing block storage (alredy covered by cinder) or partition topology (not part of openstack today) without much more consideration and bringing this before the technical committee.

Furthermore, I don't believe either of these are necessary for Ironic's mission: provision instances on physical hardware.
Support for deploying a single-partition image is necessary, and we have this today: ironic creates the root partition, sized based on flavor, and copies the image in, which resizes on first boot.
Support for deploying a whole-disk image is coming. This will include any partition(s) within that image.
Support for hardware RAID configuration is also within Ironic's scope -- this is a necessary precursor to writing any image to a host which supports hw raid. It is also outside the scope of any other project.

Finally, one could make a case for supporting a "software RAID'ed root volume" and that, without support from Ironic, this is impossible -- but I believe it should be possible to create a whole-disk image which will rebuild a RAID1 on firstboot (though I haven't done this yet).

-devananda, 2014-04-28

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.