Expand Use of Salt Across Infrastructure

Registered by Anita Kuno

A place to put together our musings, plans, suggestions and observations as we take the existing salt in our structure and sprinkle more around.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

Currently we have a salt module in the config repo: https://github.com/openstack-infra/config/tree/master/modules/salt

<jeblair> i would really love to use os packages of puppet.
<jeblair> or just salt.
<fungi> [needs more salt]
<jeblair> fungi, mordred: i have thoughts on the multi-arch mirror problem.
<jeblair> fungi, mordred: i think i want to move the mirror to static.o.o, and run mirror-builders on dedicated jenkins slaves which rsync at the end of the job.
<mordred> jeblair: I love thoughts
<jeblair> fungi, mordred: they will be bare slaves with a private key in order to do the rsync over ssh
<jeblair> fungi, mordred: they will be dedicated because they'll have a private key, of course.
<jeblair> fungi, mordred: as jenkins slaves, we get the benefit of having the build script output discoverable by anyone for debugging
<mordred> jeblair: I'm assuming that we'd be moving to a non-generated index that way?
<mordred> jeblair: as in, just use apache indexing?
<jeblair> fungi, mordred: ah, yes, and using mod-autoindex on the vhost on static.o.o
<mordred> yeah. sounds good to me
<mordred> jeblair: don't want to beat our heads against uplading to swift again? :)
<jeblair> mordred: though... now that i think about that, i can't actually think of a compelling reason to use autoindex...
<jeblair> mordred: can we still have the nice two-layer directory structure with autoindex?
<fungi> jeblair: mordred: how about having pypi.openstack.org be a jenkins slave which just runs the job to rebuild the index after each arch-specific slave syncs its portion of the mirror?
<jeblair> mordred: as far as swift goes -- there is no autoindex on the cdn, only a static index, and there's a 15 minute delay when updating a file there.
<mordred> jeblair: ah. blech
<mordred> ok
<jeblair> mordred: so we would have to generate the index, and then it could be up to 15 minutes before the new index takes effect
<mordred> static it is
<mordred> jeblair: do we really care about the 2-level?
<mordred> jeblair: and pip spiders, so I believe that autoindex would work just fine
<mordred> with 2 levels
<jeblair> mordred: not sure, i kind of like it when i go to debug something, but i'll play around and see what it looks like.
<mordred> jeblair: ++
<mordred> jeblair: we could also do what fungi suggested
<jeblair> yeah, just reading/thinking about that
<fungi> well, what i suggested is certainly more effort and strife than just relying on autoindexes
<mordred> also, anteaya has been looking at the salt-based-peer-communication problem with UtahDave
<jeblair> mordred: i know; she mentioned that her first target was refreshing puppet, but i plan for the second to be updating the mirror
<mordred> jeblair: awesome
<mordred> jeblair: I believe I owe UtahDave a chat on what we're wanting with that ... perhaps we should just do that in here some time
<jeblair> mordred: with wheels, are we going to want two separate mirrors, or a combined one with files for multiple arch's?
<jeblair> mordred: (i haven't looked into what wheel requires)
<mordred> jeblair: mechanically it should work exactly the same as what you are proposing above
<mordred> jeblair: directory of wheels per machine, can be rsynced
<mordred> SIGH
<jeblair> mordred, fungi: cool, i'll probably just use rsync (only comparing filenames, to deal with the fact that the two builders will have duplicate files) and autoindex then.
<mordred> jeblair: wheel support for pip looks like it might not be released until late 2014

<mordred> UtahDave: the tl;dr is that we want to have some of our nodes trigger action on other nodes
<mordred> so salt peer communication seems like a potentially nice way to do that

<fungi> UtahDave: i've heard bits and pieces. don't think we've ever laid it all out... i think we wanted to start with using salt to trigger puppet updates (puppet agent normally runs on its own timer to check back in with the master for updates)
<fungi> clearly that gets us a flexible migration path to start moving stuff out of puppet configuration over time if we want
<fungi> but beyond that, i'm a little fuzzy on where we were headed with it
<UtahDave> OK, so would you trigger those updates from the salt-master or would jenkins trigger this on a minion?
<fungi> i think for that part we'd have something watching for changes in a git repository, apply them on the master and then signal all the slaves to check for new configuration at the puppet master
<fungi> clarkb2: may have had some thoughts on that, but he's still missing in action at the moment
<fungi> the current puppet update pain is that we have a cron job on the puppet master pulling from a git repo, and then timers in puppet agent on the various servers checking the master for updates
<UtahDave> fungi: so this is something you'd like the master to do on a cron? Or would you initiate this manually when you determine you're ready?
<fungi> no, sorry, was explaining how it works now (which is clearly suboptimal)
<fungi> i think we're looking for something involving only active events, no timers
<fungi> update to git repository triggers update of puppet master's configuration triggers other servers to check the puppet master and apply new configuration
<fungi> so the first stage of that might be a jenkins job triggered from the post pipeline on zuul for changes to our config repository
<fungi> which then told the salt master to pull updates from the git mirror and signal the salt minions to update puppet configurations
<UtahDave> fungi: that's very doable.
<fungi> over time that job could grow to tell the salt master to do other things too, like applying other sorts of configuration
<UtahDave> sure.
<fungi> and then maybe bit at a time we reimplement the things we're doing outside of puppet, but continue to use salt as the channel by which things are managed
<UtahDave> Yeah, that makes a lot of sense.
<fungi> and then eventually there's an empty husk of puppet doing nothing, and can be dismantled when we're ready
<fungi> anyway, that was my reading-between-the-lines interpretation of what's been batted around in the recent past. jeblair or clarkb2 may have a better grip on it
<UtahDave> OK, those things are all doable.
<UtahDave> I'm more than willing to help out wherever you need me.
<fungi> UtahDave: awesome. i guess let's make sure i haven't completely misinterpreted things and that the rest of the group are on the same page, but we're thrilled to have some assistance
<fungi> at this point we've got a salt master on the same vm as our puppet master and salt installed in a minion capacity on at least some servers, but key management may or may not be taken care of yet
<fungi> which is to say, i'm not really sure what the current state is there

<mordred> UtahDave: looks like the above description is pretty spot on
<mordred> UtahDave: I think fungi already said this - but essentially we do not want a jenkins slave to run on the salt master
<mordred> because that's just a recipe for disaster becaus jenkins is an insecure thing that we do not trust at all
<fungi> mordred: i didn't say that specifically, but i agree that's better avoided
<UtahDave> mordred: no problem.
<mordred> cool. I'm not repeating thigns then :)
<UtahDave> So you'd rather just trigger off git repo changes, right?
<fungi> which does make the "maybe we trigger the initial update with a jenkins job" suggestion a tougher nut to crack
<mordred> well, sort of
<mordred> we already have a thing that knows how to cause events to happen when repos change
<mordred> those thigns can be jenkins jobs, or they could in theory be something else
<mordred> to start with, jenkins jobs seem fine
<mordred> if we run a jenkins job on a node which is a salt minion
<mordred> and we configure the peer system to let that minion send a specific signal to salt saying "hey guys, update your puppet"
<jeblair> (all jenkins slaves are currently salt minions, but have no keys)

<mordred> assuming we put keys on minions
<mordred> and if we make it something that's idempotent, like "salt.update_puppet_cron" or "salt.run_puppet_agent"
<mordred> then compromiing that machine would at most allow an attacker to spawn our cron job over and over again :)
<fungi> mordred: this is where my grasp of salt wasn't sufficient. i didn't realize it could act on changes signaled by its minions, but if so that does open up many avenues for this
<UtahDave> mordred: exactly. You can give extremely specific rights to a minion

<mordred> UtahDave: so does that make a general amount of sense?
<UtahDave> Yeah, for sure. This will work really well
<mordred> yay!
<UtahDave> :)
<mordred> so, I guess steps are a) get keys on things b) develop a salt module that does the update_puppet_master and run_puppet_agent steps and then c) make the peer acl file on the master to grant permissions to those commands to a minion that we want to trigger them

<UtahDave> Can you point me to the location in your repos where I can find the exact commands/steps needed to execute puppet on each node?
<jeblair> UtahDave: it's currently a running puppet agent, so we're not actually running that command... but we can tell you what it would be
<jeblair> UtahDave: "puppet agent --test"

(?)

Work Items

Work items:
Have keys created for salt master and minons and have the master get the public key for the minions and the minons get their public and private keys uploaded, started here: https://review.openstack.org/#/c/25066/1/launch/launch-node.py : INPROGRESS
Launch a slave for salt, started here: https://review.openstack.org/#/c/25018/ : INPROGRESS
Develop a salt module that does the update_puppet_master and run_puppet_agent steps : TODO
Make the peer acl file on the master grant permissions to those commands to a minion that we want to trigger them : TODO

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.