support for cpu, accelerator, and network architectures

Registered by Brian Schott on 2011-03-10

Nova should have support for cpu architectures, accelerator architectures, and network interfaces as part of the definition of an instance type (or flavor using RackSpace API parlance). We propose to add cpu_arch, cpu_info, xpu_arch, xpu_info, xpus, net_arch, net_info, and net_mbps as attributes to instance_types, instances, and compute_nodes tables. Conceptually, this information is treated the same way that existing memory_mb, local_gb, vcpus fields are handled. They exist in instance_types and get copied as columns into instances table as instances are created. The architecture aware scheduler will compare these additional fields when selecting target compute_nodes (nova-compute services).

Blueprint information

Status:
Complete
Approver:
Vish Ishaya
Priority:
Medium
Drafter:
Lorin Hochstein
Direction:
Approved
Assignee:
USC-ISI
Definition:
Obsolete
Series goal:
None
Implementation:
Good progress
Milestone target:
None
Started by
Brian Schott on 2011-03-10
Completed by
Jinwoo "Joseph" Suh on 2012-05-11

Whiteboard

==============
--- Ken Pepple:
==============

First off, this will be a welcome feature.

Some questions on design / implementation:

Q1. How will we extend this later ? I fear a day where we have 100+ columns ...
 Q1b. should we move all but the most important top-line (CPU arch, vCPU, etc.) fields into a single "extended args" column which has a json contents ?

Q2. Are all attributes (cpu-arch, cpu-info, etc.) "required" or will we support "optional" attributes ? For example, maybe I want an nVidia GPU but don't require it.
 Q2a. If so, how do we implement something like this ? Add it to the json structure with something like "optional-args: xxx" for each *-info column ?

Q3. What does nova return on when it is unable to schedule due to lack of extend attributes -- there should be something special here to say, "we have capacity just not to the specs you want" instead of just fail. If so, we push the logic of "try this one if not try this one" back to devop. Depending on the answer to Q2, we may want to retry with a non-required attributes.

Q4. How does this play with the APIs ? Launching images should be fine (we just supply the instance_type name of flavorId) but this *-describe libraries probably won't like this.

Suggestions:

 S1. How do we implement this in nova-manage ? I don't believe we can sustain the current approach of positional arguments as (a) we now have too many, (b) too many are optional (which is problematic in our current implementations) and (c) quoting json on the command line will flat out suck.

I see three options:
 Option 1: move to kwargs type parameters for extended attributes
  # nova-manage instance_type create x1.huge 1 128 20 20 cpu-arch=x86

 Options 2: move to parse file input
  #nova-manage instance_type create from_file /tmp/new_instance_type

 Option 3: keep "nova-manage instance_type create" the same and add new command to add attributes (making it a two+ step process)
  # nova-manage instance_type create x1.huge 1 128 20 20 cpu-arch=x86
  x1.huge created
  # nova_manage instance_type add_attributes cpu-info={'{"model":"Nehalem", "features":["tdtscp", "xtpr"]}'}
  x1.huge modified

    S2. nova-manage instance_type list should show basics (add cpu_arch but not other new attributes). We can show extended info on nova-manage instance_type list <instance_type_name> ...

   S3. We should probably add a description field to the instance_type table. Some of these instance_types will probably need more description for end-users than is possible in the name alone.

==============
--- Brian Schott :
==============

Thanks for the review!

A1: We can extend this by documenting the semantics for the respective _info fields. We were reluctant at this point to make massive changes to the schema given the debates between centralized and distributed schedulers.

A1b: The cpu_info field was added to the compute_nodes table by the NTT Data live migration branch. We thought it was good symmetry to replicate this approach in the instance and instance_types tables and extend it to net_ and xpu_ for network interfaces and accelerators.

A2: cpu_arch is certainly required (although defaults to x86_64). If xpu_arch is set, it should be required.
A2b: We could add semantics for optional_features list in the info fields. Would have to discuss the semantics in the scheduler blueprint.

A3: That is the danger of optional attributes. Today on EC2, folks often do this by submitting multiple run_instances requests, possibly selecting smaller instance types or different availability zones as a less-desirable options (running more smaller nodes or paying for more bandwidth to span zones) until their needs are fulfilled. We can't easily handle the conditional requirements of features and it is dangerous to do so without overly complicating the scheduler to the point that users won't understand the behavior.

A4: The euca-describe-* APIs don't really have to change. The instance type name is sufficient. It would be nice to extend them to display _arch fields, but not essential.

S1: I prefer option 1 or 3. The TXRX fields you added at the last minute are also deployment specific and should be optional.

S2: Good idea. I'll add that as a user story for a future sprint.

S3: Possibly. Haven't looked at how nova-dashboard does things. Deployments are more likely to want to have some verbose formatted description, a couple of marketing quotes, and pricing information. That probably is outside of nova.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.