openstack-entropy

db/dir abstraction to allow both dir/file bases as well as db based audit/repair

Registered by Sulochan Acharya on 2014-04-18

We need a way to abstracting the source of scripts/hosts/schedules. Currently entropy uses file/dir based approach, we need a way to allow both file based as well as database based backend.

One approach is to have config define what the engine should use for backend:

entropy_backend = dir or file or sqlalchemy or something

then have entry point defined for the method:

dir = entropy.backends.impl_dir:DirBackend
sqlalchemy = entropy.backends.impl_sqlalchemy:SqlalchemyBackend
mongodb = entropy.backends.impl_mongodb:MongoDBBackend

For database:

Audits are defined for "a host (fqdn or ip)" or a "group of hosts (computes | hypervisors| control nodes)"
possibly in a yaml:

Example:

compute_auidts.yaml
----------------------------
---
compute:
    audits:
       check_nova_config_exists:
           run_every: 1day
       check_nova_compute_log_file_exists:
           run_every: 30mins

hypervisor_audits.yaml
----------------------
---
hypervisor:
    audits:
        vm_count:
            run_every: 2days

other_audits.yaml
------------------------
---
nova-api.mydomain.com:
    audits:
        vmbooter:
            run_every: 1hour
        api_availability:
            run_every: 30mins

Obviously these are contrived examples but just putting down for discussion.

Defined Tables (discuss more):
1) hosts = holds record for each host entropy can act on
2) enabled_audits [auditor specific table] = holds info on currently defined audits
3) disabled_audits [auditor specific table] = holds info on currently disabled audits
4) audits [auditor specific table] = for every host if "host_type" == "compute" add every enabled compute audit
                                     or if host is "host_type" == "hypervisor"" add every enabled hypervisor audit
                                     or else for audits defined in other_audits.yaml add the relavant audit to the group
                                     or host whose hostname == fqdn on from other_audits

"audits" table host message in the following format for every audit|host pair:
(required rows or keys)
"id": id of this record
"host": name of the host (fqdn)
"ip_address": ip address of the host
"audit_name": name of audit (vmbooter)
"next_run": datetime
"created_at": datetime
"updated_at": datetime
"disabled": True or False
"status": new or processing or processed
"result": [Hold the latest result]
...
...
(add more rows etc)

The first run of this audit will set the "next_run" value. Engine will have a periodic looping call that checks
for any audit record with next_run value < now() and sets its status to "new". Engine will have second task that
will fetch these messages and put it in internal queue (Queue.queue) and the engine will simply spwan new executor
depending on the max_workers defined value.

For dir/file based (need more discussion):

Keep the current structure. However, we might have to run every audit on executor with infinite while loop for sheduling
and might also have to do this for every host we want to run audit on ? Need more discussion on this.

Blueprint information

Status:: Not started

Approver:: None

Priority:: Undefined

Drafter:: Sulochan Acharya

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

JH: An idea, more along the lines of a central scheduler.

Every X minutes the scheduler refreshes itself (either from files or from a set of db tables). The scheduler will as a result of this refreshing adjust its own internal time/scheduling tables to match the new data inputs.

The scheduler will expose this information via a blocking method, `wait_next`, the engine itself will call into this method to get the next piece of audit to trigger (or in case multiple audits to happen at the same time it will return a list of those audits to do). This would likely use something like the iterator protocol in python (wait_next being a thing that yields back everytime a event needs to occur).

The engine would wait for wait_next to yield back an audit/s to do and then it would be the engines responsibility to fire off these audits via some mechanism (threads, post on a message queue, or other). The engine would then go back to waiting for the next event (and repeat). This helps keep the 'time table' as something internal to the scheduler and allows the engine to only concerned with executing the audits reliably (and not caring about the schedule of those audits).

(keekz) For the database table layout: what advantage would we gain by having both enabled_audits and a disabled_audits table? It seems like we could do the same thing with a generic, single "audits" table (using whatever name), that has a column "disabled", that can be flipped on/off. All audits can be in the same table, and we can enable/disable them using the disabled column as necessary. This would be more inline with the nova db format, such as with the services table and being able to disable a compute using the disabled column. Additionally, I'd like to have a "disabled_reason" column (like we have with computes), so we can add notes about why an audit is disabled.

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.