diff --git a/README.md b/README.md index b7d1c23e..6d814650 100644 --- a/README.md +++ b/README.md @@ -96,6 +96,7 @@ Please read, enjoy and help improve our documentation! | [function guide](docs/function-guide.md) | for mgmt developers | | [resource guide](docs/resource-guide.md) | for mgmt developers | | [style guide](docs/style-guide.md) | for mgmt developers | +| [service API guide](docs/service-guide.md) | for external developers | | [godoc API reference](https://godoc.org/github.com/purpleidea/mgmt) | for mgmt developers | | [prometheus guide](docs/prometheus.md) | for everyone | | [puppet guide](docs/puppet-guide.md) | for puppet sysadmins | diff --git a/docs/service-guide.md b/docs/service-guide.md new file mode 100644 index 00000000..bc5bbf72 --- /dev/null +++ b/docs/service-guide.md @@ -0,0 +1,145 @@ +# Service API design guide + +This document is intended as a short instructional design guide in building a +service management API. It is certainly intended for someone who wishes to use +`mgmt` resources and functions to interact with their facilities, however it may +be of more general use as well. Hopefully this will help you make smarter design +considerations early on, and prevent some amount of unnecessary technical debt. + +## Main aspects + +What follows are some of the most common considerations which you may wish to +take into account when building your service. This list is non-exhaustive. Of +particular note, as of the writing of this document, many of these designs are +not taken into account or not well-handled or implemented by the major API +("cloud") providers. + +### Authentication + +#### The status-quo + +Many services naturally require you to authenticate yourself. Usually the +initial user who sets up the account and provides credit card details will need +to download secret credentials in order to access the service. The onus is on +the user to keep those credentials private, and to prevent leaking them. It is +convenient (and insecure) to store them in `git` repositories containing scripts +and configuration management code. Since it's likely you will use multiple +different services, it also means you will have a ton of different credentials +to guard. + +#### An alternative + +Instead, build your service to accept a public key that you store in the users +account. Only consumers that can correctly sign messages matching this public +key should be authorized. This mechanism is well-understood by anyone who has +ever uploaded their public SSH key to a server. You can use SSH keys, GPG keys, +or even get into Kerberos if that's appropriate. Best of all, if you and other +services use a standardized mechanism like GPG, a user might only need to keep +track of their single key-pair, even when they're using multiple services! + +### Events + +#### The problem + +People have been building "[CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete)" +and "[REST](https://en.wikipedia.org/wiki/REST)"ful API's for years. The biggest +missing part that most of them don't provide is events. If users want to know +when a resource changes, they have to repeatedly poll the server, which is both +network intensive, and introduces latency. When services were simpler, this +wasn't as much of a consideration, but these days it matters. An embarrassingly +small number of major software vendors implement these correctly, if at all. + +#### Why events? + +The `mgmt` tool is different from most other static tools in that it allows +reading streams of incoming data, and stream of change events from resources we +are managing. If an event API is not available, we can still poll, but this is +not as desirable. An event-capable API doesn't prevent polling if that's +preferred, you can always repeat a read request periodically. + +#### Variants + +The two common mechanisms for receiving events are "callbacks" and +"long-polling". In the former, the service contacts the consumer when something +happens. In the latter, the consumer opens a connection, and the service either +closes the connection or sends the reply, when it's ready. Long-polling is often +preferred since it doesn't require an open firewall on the consumers side. +Callbacks are preferred because it's often cheaper for the service to implement +that. It's also less reliable since it's hard to know if the callback message +wasn't received because it was dropped, or if there just wasn't an event. And it +requires static timeouts when retrying a callback message, and so on. It's best +to implement long-polling or something equivalent at a minimum. + +#### "Since" requests + +When making an event request, some API's will let you tack on a "since" style +parameter that tells the endpoint that we're interested in all of the events +_since_ a particular timestamp, or _since_ a particular sequence ID. This can be +very useful if missing an intermediate event is a concern. Implement this if you +can, but it's better for all concerned if purely declarative facilities are all +that is required. It also forces the endpoint to maintain some state, which may +be undesirable for them. + +#### Out of band + +Some providers have the event system tacked on to a separate facility. If it's +not part of the core API, then it's not useful. You shouldn't have to configure +a separate system in order to start getting events. + +### Batching + +With so many resources, you might expect to have 1000's of long-polling +connections all sitting open and idle. That can't be efficient! It's not, which +is why good API's need a batching facility. This lets the consumer group +together many watches (all waiting on a long-poll) inside of a single call. That +way, a single connection might only be needed for a large amount of information. + +### Don't auto-generate junk + +Please build an elegant API. Many services auto-generate a "phone book" SDK of +junk. It might seem inevitable, so if you absolutely need to do this, then put +some extra effort into making it idiomatic. If I'm using an SDK generated for +`golang` and I see an internal `foo.String` wrapper, then chances are you have +designed your API and code to be easier to maintain for you, instead of +prioritizing your customers. Surely the total volume of all customer code is +more than your own, so why optimize for that instead of the putting the customer +first? + +### Resources and functions + +`Mgmt` has a concept of "resources" and "functions". Resources are used in an +idempotent model to express desired state and perform that work, and "functions" +are used to receive and pull data into the system. That separation has shown to +be an elegant one. Consider it when designing your API's. For example, if some +vital information can only be obtained after performing a modifying operation, +then it might signal that you're missing some sort of a lookup or event-log +system. Design your API's to be idempotent, this solves many distributed-system +problems involving receiving duplicate messages, and so on. + +## Using mgmt as a library + +Instead of building a new service from scratch, and re-inventing the typical +management and CLI layer, consider using `mgmt` as a library, and directly +benefiting from that work. This has not been done for a large production +service, but the author believes it would be quite efficient, particularly if +your application is written in golang. It's equivalently easy to do it for other +languages as well, you just end up with two binaries instead of one. (Or you can +embed the other binary into the new golang management tool.) + +## Cloud API considerations + +Many "cloud" companies have a lot of technical debt and a lot of customers. As a +result, it might be very hard for them to improve their API's, particularly +without breaking compatibility promises for their existing customers. As a +result, they should either add a versioned API, which lets newer consumers get +the benefit, or add new parallel services which offer the modern features. If +they don't, the only solution is for new competitors to build-in these better +efficiencies, eventually offering better value to cost ratios, which will then +make legacy products less lucrative and therefore unmaintainable as compared to +their competitors. + +## Suggestions + +If you have any ideas for suggestions or other improvements to this guide, +please let us know! I hope this was helpful. Please reach out if you are +building an API that you might like to have `mgmt` consume!