Define backend activities to standard control of backend tasks

Registered by Caitlin Bestler

Standardize a class/handle to provide minimal control over non-openstack activities created by a volume driver so that commn code, especially taskflows, can co-ordinate multi step procedures efficiently even when a non-openstack backend activity is implementing them.

Blueprint information

Status:
Complete
Approver:
John Griffith
Priority:
Undefined
Drafter:
Caitlin Bestler
Direction:
Needs approval
Assignee:
Victor Ordaz
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Sean McGinnis

Related branches

Sprints

Whiteboard

(smcginnis): Marking obsolete as this has been sitting out there for a long time. If this is still needed, please submit a new bp.

A backend activity represents ongoing work being done by a cinder backend.
This be a process, thread, eventlet or simply an object used within a
state machine. Most importantly it is not controlled by nova-compute.

The need to track backend activities is dependent on the volume driver.
When the backend device is just a raw device controlled directly by the volume
driver then there is never something that has to be tracked. All work is accounted
for and tracked by the volume driver itself. But volume drivers are free to launch
non-openstack activities to do work such as creating a volume backup, or copying
the contents of one volume to a different volume on a different backend.

While Volume Drivers already have the option to delegate work to a backend, there
is no method to monitor progress of these backend tasks. Work is either done
by a nova-compute controlled entity or there is effectively no control
available. The only information available is when the activity completes
successfully or in error.

Enabling backend activities requires more coding and development work by
the backend vendor, so there are certain to be performance optimizations enabled
by their use (why else would the vendor develop them?). However, operators may be
reluctant to enable these optimization if it requires losing visibility and control. A backend
activity may replicate the content in 20% less time, but is that worthwhile if it means having
no way to know how close to completion the replication is?

Backend activity objects are created by the Volume Driver to provide a
minimal handle controlling backend activity. It enables the system administrator to have
visibility, and some control. Enabling optimization would not have to be a surrender to
whatever delays the vendor may introduce during what the vendor considers to be
unusual circumstances.

A backend-activity has the followiing methods or attributes:

   progress: returns an 'n of m' indicator of how much of the scheduled
   activity has been An indication of '0 of 0' is used when the activity
   has no anticipated completion.

   self-restarting: This is a reporting attribute. If present and true
   this activity is persistent and will run to completion (or indefinitely).
   There is no need to restart the activity when the storage target restarts.

   suspend: request that no further progress be made on this activity.

   resume: cancels a suspended state.

   cancels: requests that the backend-activity be aborted and rolled back.

   status: once an activity is complete this will indicate the end result.
   A zero status indicates success, non-zero indicatews an errthe or.

   <tbd>: reports on resources used for this activity, such as elapsed time
   and total network traffic consumed.

 The following Volume Driver methods may yield an activity rather than
 modifying the state of the volume if the volume driver advertises the
 can-snapshoot-active-volme capabilit:

   copy_volume
   backup_volume
   migrate_volume

 Definition of backend-activities combined with the features described
 inthe snapshot-replicate and volume-driver-capabilities blueprints can
 enable taskflows to perform complex multi-step procedures which can be
 optimized for backend execution, without requiring that every multi-step
 process be described for each volume driver.

 These simple steps can be combined in complex patterns. One example would
 be a taskflow to create a hot-standby for a volume which is kept very close
 to current with the active volume, and where the taskflow can control failover
 to this hot-standby.

-----
This requires some form of capabilities publishing from each volume driver. This is probably closely related to the reporting that avishay has proposed. My notes on this:

Define a volume driver method to optimize replication of a snapshot between two volume
backends controlled by the volume driver.

Default algorithms will be used when replicating snapshots across different volume-drivers,
or when a volume driver feels that there is no optimization for it to perform.

snapshot_replicate (snapshot,source_server,destination_server[,force_full_copy])

   snapshot: the snapshot to be replicated.
   source_Server: the volume server where the snapshot currently is found.
   destination_server: the volume server where the snapshot is to be created.
   force_full_copy: optional parameter that suppresses use of vendor-specific
      options which would optimize the replication based upon prior replication
      of earier snapshots for the same object.

As with the other replicate methods, this method may yield an external-activity.

<jdg>
Looks like a great session topic for the summit. We'll have to break this down and come up with actionable items to implement. Keeping in mind our ideology of maintaining consistent behaviors and results no matter what driver is in use.

Also in the meantime take a look at the report stats/capabilities that we've implemented and use for filter scheduling. We need to standardize on that and clean it up during the next release.

Gerrit topic: https://review.openstack.org/#q,topic:bp/backend-activity,n,z

Addressed by: https://review.openstack.org/44362
    Taskflow persistence

-----------------------
Further thoughts on implemenetation from Caitlin Bestler:

Basically, status tracking could move from the volume (which makes
all volume actions stateful, even when the backend device is capable
of using zero-cost snapshots to perform those actions in a stateless
manner that has no impact on the availability or usability of volume).

The current method requires modifying the state of the volume. The
backend routine is free to unblock very quickly. Normal access to
the volume can be enabled right away, if the user is willing to use
a "force" flag on their calls. But there is no real guidance available
to the user to let them know if the force flag is safe to use other
than specific knowledge about a specific backend. It is safe with
a ZFS backend. We want users to be more confident in using stateless
operations.

However, we need to maintain compatibility with stateful Volume Drivers.

The following options are worth considering:

Dual states: exercised by the Volume Driver.

    The Volume Driver *always* modifies the Flow state. As with the
    current solution it may set that state to an "in-progress" value
    and return, or it can block until the operation is complete.

    The Volume Driver *may* modify the state of the Volume as
    appropriate.

    Pros:
        This is undoubtedly the correct code for best long term
        maintenance and understandability of the code.
    Cons:
        Nexenta would be proposing code that has the highest impact
        on *other* Volume Drivers. Others should concur that this is
        the simplest design since Nexenta is obviously biased here.

Dual states with attribute: volume state is set by the Flow code

    The volume driver modifies the Flow state, as above.

    If the Volume Driver has *not* published a "stateless-volume-
    operations" attribute then the flow code will change the state
    of the volume to indicate that it is being used by an ongoing
    flow operation.

    Pros: Volume drivers only have to change which state field they
    are updating. It is the same change to all drivers. Those that
    can do stateless operations simply publish an attribute that
    indicates this, and the rest is automatic.

    Cons: Has almost Rube Goldberg levels of indirection.

Dual option on modification:

    The volume driver normally modifies the Volume State, but may
    optionally indicate that it has modified the Flo9w State
    instead.

    Pros: No changes required to existing stateful Volume Drivers
    other than ignoring the extra information about what Flow is
    invoking them.

    Cons: Make the flow code more complex.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.