Scheduler: Extensible Resource Tracking (partial)

Registered by Paul Murray

The current resource tracking supports a fixed set of resource types including memory, disk, vcpu. A provider may want to allocate resources according to metrics that do not directly fit this set or to augment this set. Examples may include total network bandwidth or cpu performance.

Blueprint information

Status:
Complete
Approver:
Russell Bryant
Priority:
Medium
Drafter:
Paul Murray
Direction:
Approved
Assignee:
Paul Murray
Definition:
Approved
Series goal:
Accepted for juno
Implementation:
Implemented
Milestone target:
milestone icon 2014.2
Started by
Paul Murray
Completed by
John Garbutt

Related branches

Sprints

Whiteboard

Seems like all that patches are up, marking as "Needs code review" so we don't defer this till Juno (yet)

Sponsors: John Garbutt

Before we approve this, it would be good just to have a rough description of how you plan to report the stats up to the scheduler, and how you plan to persist them. I know that might all change soon, but I guess an extra column with a json blob is one approach that could work OK --johnthetubaguy

Added specification above - yes, of course - its the json blob approach. More detail in spec.

[jyh: I have similar idea, not sure if it's the right one ? https://docs.google.com/document/d/1gI_GE0-H637lTRIyn2UPfQVebfk5QjDi6ohObt6MIc0 --jyh]

[belliott] Your proposal adds a column that basically does what the compute node stats table already does. Based on previous discussions about wanting to avoid lots of SQL joins, I think your method is better. However, it would be unfortunate to have two methods of doing the same thing, so can we see a plan to migrate the stats table to this new JSON column?

[PaulMurray] That does make sense. Once the mechanism is in place it can be used to "normalise" the way this information is handled certainly. The resources part is required to implement other blueprints that have been around in a consistent way (e.g. cpu-entitlement, network-bandwidth-entitlement, others that are cropping up now). The stats handling in resource tracker is like a hard coded version of the plugins I will add. It should be easy to convert it - I would suggest adding a blueprint to cover that work as a follow on. What do you think? I would be happy to do it as a subsequent step.

[belliott] Paul: Yes, would you mind adding a blueprint to fold the stats code and usage into your new JSON-encoded method? (whether or not you have time to do the dev, it'd be nice to capture this formally.)

[jay-lau-513] I was also going to log such a bp to do this. Just finished a very simple doc for "user defined metrics collection" in here: https://docs.google.com/document/d/1D9TtAtjVjwZyj-w83Q3rErA2dn3bDB3QYCGTtakARTg/edit?usp=sharing

[PaulMurray] ^^^^ seems to be protected - can you make it public please

Why is this not solvable using UBS and ceilometer? "Because not everyone wants to use ceilometer" is not the answer I'm looking for :) --dansmith

Etherpad from design summit session - Extensible Scheduler Metrics: https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetrics

I remember that it was brought up, but I don't recall the no-ceilometer justification (nor do I see it in the etherpad). I just think it needs to be documented here, since UBS is also in the works. --dansmith

Hi Dan, Ceilometer wasn't discussed - it was decided in IRC meetings that it was orthogonal prior to session and kept out to keep focus. The idea is that we can communicate data through the database now or anything else when the time comes. The point of this bp is that there is no extension mechanism for accounting of the statically allocated resources (claims at resource tracker and consume at host state). So even with usage data being available, I can't schedule according to allocation (except for the memory, disk and vcpu we have now). Providing this extensibility is the main difference - using a different column in the database is practical because of the update rates of each (i.e. one is on instance creation/termination, the other is ongoing).

[jay-lau-513] @Paul, my fault, it is now published. https://docs.google.com/document/d/1D9TtAtjVjwZyj-w83Q3rErA2dn3bDB3QYCGTtakARTg/edit?usp=sharing

https://wiki.openstack.org/wiki/ExtensibleResourceTracking#Relation_to_Utilization_Aware_Scheduling

Gerrit topic: https://review.openstack.org/#q,topic:bp/extensible-resource-tracking,n,z

Addressed by: https://review.openstack.org/60258 <--- merged
    Add extra_resources field to compute_nodes table

Addressed by: https://review.openstack.org/61773
    Use extra_resources field in filter scheduler

Addressed by: https://review.openstack.org/71557
    Add extensible resources to resource tracker

Addressed by: https://review.openstack.org/66959
    Add extra_resources to ComputeNode object

Seems like this is not really ready for review, deferring to Juno --johnthetubaguy (26th Feb 2014)

So its not dependent on the delayed BP, lets bring this back --johnthetubaguy

The following patch is from bp network-bandwidth-entitlement, it has extensible resource tracking plugins here: https://review.openstack.org/71570

Apologies, this missed the deadline for Feature Freeze. Please rebase patches as soon as Juno opens, and we will try to get this in during that period. --johnthetubaguy (5th March 2014)

Unapproved - please re-submit via nova-spec --johnthetubagy (20th March 2014)

Addressed by: https://review.openstack.org/86050
    Propose: Extensible Resource Tracking

Gerrit topic: https://review.openstack.org/#q,topic:bp/was,n,z

Updated series goal, to ensure all fields match what is required for an approved blueprint --johnthetubaguy 8th May 2014)

Sorry, this has not got enough positive reviews to make it in time for juno-1, moving to juno-2 --johnthetubaguy 10th June 2014

Not all the reviews have +2s and are not all are close to approval, so moving to juno-3. But please move this back to juno-2, should you get your patches approved in time. --johnthetubaguy 21st July 2014

Addressed by: https://review.openstack.org/109301
    Fix Over-writing virt driver stats field

Addressed by: https://review.openstack.org/109643
    Add extensible resources to resource tracker (2)

Addressed by: https://review.openstack.org/110672
    Implement a new "vcpu_ratio" resource

We do have this merged:
https://review.openstack.org/#/c/109643/
(but there was talk of a revert).
And this looks a little blocked now:
https://review.openstack.org/#/c/61773
So we call this partially complete now.
--johnthetubaguy 2nd September 2014

(?)

Work Items

Work items:
Add extra_resources field in compute_nodes table: DONE
Use extra_resources field in filter scheduler: POSTPONED
Track extra_resources in resource tracker: DONE

Dependency tree

* Blueprints in grey have been implemented.