[MiniDNS] Master Tracking Blueprint

Registered by Kiall Mac Innes on 2014-01-26

The intent of this blueprint is to describe the reasoning for, advantages and disadvantages of implementing a "Mini DNS" server directly in Designate.

See the linked specification for detail

Blueprint information

Status:
Complete
Approver:
Designate Core
Priority:
Medium
Drafter:
Kiall Mac Innes
Direction:
Needs approval
Assignee:
Kiall Mac Innes
Definition:
Drafting
Series goal:
Accepted for liberty
Implementation:
Implemented
Milestone target:
milestone icon 1.0.0
Started by
Tim Simmons on 2015-05-29
Completed by
Tim Simmons on 2015-05-29

Related branches

Whiteboard

MiniDNS Deep dive
add/change server to a pool/syncing
needs testing
evaluate library options to minimize re-implementation
is the license an issue?
top contender: http://dnspython.org/
or DIY? Example: http://pastie.org/private/w9mvs6xgtqzil6wwuflx5a
more diagrams! data flow crud
white boarded diagram: http://bit.ly/1jBUoax - e-mail <email address hidden> if you want the diagram.ly export of this
code organization
operations/deploy - reference arch
is storage layer scalable and does it deal w/ lag properly to ensure evetual consistency
TSIG as poolkey?
Load testing? What are upper limits of Scale? requests/dbsize

Entry Points:

* mdns-designate-scoped-tsig
* mdns-storage-objects
* mdns-core-mdns-service
* mdns-dns-implementation (DIY Route)

====== MiniDNS and Server Pools Operational Scenarios ======

The following catalogs DNS operational scenarios that need to be considered in the design and implementation of mini dns (and server pools) for Designate. Please add questions, scenarios and/or answers and thoughts. Please include your handle at the end of your question/scenario so we can seek clarification if we need to.

Q1: How does a nameserver start up from scratch look like and how do we get the initial zone definitions for the config file and the initial zonefiles?

A1: Provide a utility like the 'sync' command we currently have to provide the list of zones for the nameserver based on the region of the nameserver. The config for the nameserver will be created from this list. Then the nameserver can then start a series of AXFRs (from mini dns) to get all the zones. As expected, this can take quite a bit of time depending on the number of zones.

Q2: How about the zone creates and deletes going on (through server pool) while the new nameserver is transferring the original list of zones?

A2: Nameservers can typically accommodate both operations simultaneously - doing a transfer of the original set of zones and handling deletes and creates as well. If a nameserver is yet to load a zone from the original list and a delete is done on the zone, the zone is removed from the original list/config and does not request an AXFR for it from mini dns. On the other hand, if a new zone is created that did not exist on the original list, the server pool gets it to the nameserver and it is included in the nameserver config.

Q3: What happens when DNS requests come in to the nameserver while it is loading the initial set of zones?
A: For zones that are already loaded, the nameserver will return a successful response. For zones that are yet to be loaded, the nameserver would return SERVFAIL as it does not know about the zone yet.

Q3: How does a nameserver maintenance work (e.g. it will get behind with updates)?
A3: It is worth noting that, no matter what we do, there will be a period after the nameserver is brought back online where it's zone data will be out of date. Operators will need a outside of designate mechanism for removing a nameserver from active rotation. This may involve external load balancers, public VIPs being moved to another server, etc. This is outside the scope of designate.

If a nameserver is disabled for maintenance it will become stale in 2 ways.

1) Changes to zones will not be applied

Standard DNS AXFR's let us catch up.

2) Zone creates/deletes from during the maintenance period will not have been applied.

For extended maintenance periods, we may choose to reuse the "sync" utility from Q1/A1.
For short maintenance periods, we have two options:

1) Aim for an eventually consistent model where <some service> is periodically comparing the list of zones. "<some service>" will likely be the pool manager as they are implemented.

2) Implement tracking of "tasks". A create zone API call might cause "create zone task", and multiple (1 per nameserver) sub-tasks.

Any tasks which have not been completed within N minutes could be re-issued, allowing for efficient catch-up.

Q4: What if a nameserver goes down and has to be brought up and caught up?
A4: this is similar to "nameserver maintenance work" scenario

I believe the Q3 answer is applicable here.

Q5: How does a nameserver replacement happen?
A5: I believe a mix of Q1 and Q3's answer is applicable here.

New/ Replacement nameservers begin as out of service (disabled at LB, VIP living on a different server etc), are caught up using "sync", and finally placed into service.

Q6: What happens when zones that are created directly on the backend without going through designate-api?
A6: Designate has no reliable way to know if these 3rd party zones are stale zones deleted from designate months ago, or if they are valid unmanaged zones.

Designate should, by default, delete these zones as it finds them.

Designate should also, optionally, choose to log warnings for these zones for operations to manually act upon.

Do we have any better solutions?

Q7: What if mini dns service becomes unresponsive or goes down? How do we get backend updated with changes that occurred during the downtime?
A7: Where possible, the nameservers should be configured with a list of mDNS services as masters. This allows any mDNS server to fail without service interruption.

For nameservers without the ability to list multiple masters (e.g. PowerDNS), we'll hopefully be able to supply a DNS name to slave from, this name would have multiple IP's associated with it. Again, hopefully, PowerDNS and others will Do The Right Thing.

Q8: How can it possibly support DNSSEC? Please note that single zone has to share Key-Signing-Keys between all servers. It means that it requires some key-distribution mechanism otherwise it will fail validation on the client side. (asked by <email address hidden>)
A8: We have several options for implementing this, my personal favorite is, from the point of view of the nameservers (BIND/PowerDNS/etc), to "simulate" offline-signing. The nameservers would have no access to the signing keys, and would AXFR pre-signed zones from mDNS.

mDNS may choose to sign these zones "on the fly" as an AXFR is initiated, or Central may choose to sign these zones as changes are made.

Signing on-the-fly in mDNS is appealing as it allows for having multiple views of a zone. For example - In the future, we may choose to implement the ability to mark a RecordSet as being "internal" only. This RRSet would then only be AXFR'd over to the nameservers handling requests from inside the cloud. This would require multiple variations of the zone are signed.

Finally - We should, long term, support receiving pre-signed zones from customers (We AXFR from their in-house nameservers, and simply re-publish the zone for them).

Side note: All secret key persistent storage should handled by Barbican.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.