docs: Add a guide for writing API services

Hopefully this is useful to companies who want to design their services properly to support modern tooling.
2024-08-16 23:38:27 -04:00
parent 7596f5b572
commit c5dc9c7650
2 changed files with 146 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -96,6 +96,7 @@ Please read, enjoy and help improve our documentation!
 | [function guide](docs/function-guide.md) | for mgmt developers |
 | [resource guide](docs/resource-guide.md) | for mgmt developers |
 | [style guide](docs/style-guide.md) | for mgmt developers |
 | [service API guide](docs/service-guide.md) | for external developers |
 | [godoc API reference](https://godoc.org/github.com/purpleidea/mgmt) | for mgmt developers |
 | [prometheus guide](docs/prometheus.md) | for everyone |
 | [puppet guide](docs/puppet-guide.md) | for puppet sysadmins |
--- a/docs/service-guide.md
+++ b/docs/service-guide.md
@@ -0,0 +1,145 @@
 # Service API design guide
 This document is intended as a short instructional design guide in building a
 service management API. It is certainly intended for someone who wishes to use
 `mgmt` resources and functions to interact with their facilities, however it may
 be of more general use as well. Hopefully this will help you make smarter design
 considerations early on, and prevent some amount of unnecessary technical debt.
 ## Main aspects
 What follows are some of the most common considerations which you may wish to
 take into account when building your service. This list is non-exhaustive. Of
 particular note, as of the writing of this document, many of these designs are
 not taken into account or not well-handled or implemented by the major API
 ("cloud") providers.
 ### Authentication
 #### The status-quo
 Many services naturally require you to authenticate yourself. Usually the
 initial user who sets up the account and provides credit card details will need
 to download secret credentials in order to access the service. The onus is on
 the user to keep those credentials private, and to prevent leaking them. It is
 convenient (and insecure) to store them in `git` repositories containing scripts
 and configuration management code. Since it's likely you will use multiple
 different services, it also means you will have a ton of different credentials
 to guard.
 #### An alternative
 Instead, build your service to accept a public key that you store in the users
 account. Only consumers that can correctly sign messages matching this public
 key should be authorized. This mechanism is well-understood by anyone who has
 ever uploaded their public SSH key to a server. You can use SSH keys, GPG keys,
 or even get into Kerberos if that's appropriate. Best of all, if you and other
 services use a standardized mechanism like GPG, a user might only need to keep
 track of their single key-pair, even when they're using multiple services!
 ### Events
 #### The problem
 People have been building "[CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete)"
 and "[REST](https://en.wikipedia.org/wiki/REST)"ful API's for years. The biggest
 missing part that most of them don't provide is events. If users want to know
 when a resource changes, they have to repeatedly poll the server, which is both
 network intensive, and introduces latency. When services were simpler, this
 wasn't as much of a consideration, but these days it matters. An embarrassingly
 small number of major software vendors implement these correctly, if at all.
 #### Why events?
 The `mgmt` tool is different from most other static tools in that it allows
 reading streams of incoming data, and stream of change events from resources we
 are managing. If an event API is not available, we can still poll, but this is
 not as desirable. An event-capable API doesn't prevent polling if that's
 preferred, you can always repeat a read request periodically.
 #### Variants
 The two common mechanisms for receiving events are "callbacks" and
 "long-polling". In the former, the service contacts the consumer when something
 happens. In the latter, the consumer opens a connection, and the service either
 closes the connection or sends the reply, when it's ready. Long-polling is often
 preferred since it doesn't require an open firewall on the consumers side.
 Callbacks are preferred because it's often cheaper for the service to implement
 that. It's also less reliable since it's hard to know if the callback message
 wasn't received because it was dropped, or if there just wasn't an event. And it
 requires static timeouts when retrying a callback message, and so on. It's best
 to implement long-polling or something equivalent at a minimum.
 #### "Since" requests
 When making an event request, some API's will let you tack on a "since" style
 parameter that tells the endpoint that we're interested in all of the events
 _since_ a particular timestamp, or _since_ a particular sequence ID. This can be
 very useful if missing an intermediate event is a concern. Implement this if you
 can, but it's better for all concerned if purely declarative facilities are all
 that is required. It also forces the endpoint to maintain some state, which may
 be undesirable for them.
 #### Out of band
 Some providers have the event system tacked on to a separate facility. If it's
 not part of the core API, then it's not useful. You shouldn't have to configure
 a separate system in order to start getting events.
 ### Batching
 With so many resources, you might expect to have 1000's of long-polling
 connections all sitting open and idle. That can't be efficient! It's not, which
 is why good API's need a batching facility. This lets the consumer group
 together many watches (all waiting on a long-poll) inside of a single call. That
 way, a single connection might only be needed for a large amount of information.
 ### Don't auto-generate junk
 Please build an elegant API. Many services auto-generate a "phone book" SDK of
 junk. It might seem inevitable, so if you absolutely need to do this, then put
 some extra effort into making it idiomatic. If I'm using an SDK generated for
 `golang` and I see an internal `foo.String` wrapper, then chances are you have
 designed your API and code to be easier to maintain for you, instead of
 prioritizing your customers. Surely the total volume of all customer code is
 more than your own, so why optimize for that instead of the putting the customer
 first?
 ### Resources and functions
 `Mgmt` has a concept of "resources" and "functions". Resources are used in an
 idempotent model to express desired state and perform that work, and "functions"
 are used to receive and pull data into the system. That separation has shown to
 be an elegant one. Consider it when designing your API's. For example, if some
 vital information can only be obtained after performing a modifying operation,
 then it might signal that you're missing some sort of a lookup or event-log
 system. Design your API's to be idempotent, this solves many distributed-system
 problems involving receiving duplicate messages, and so on.
 ## Using mgmt as a library
 Instead of building a new service from scratch, and re-inventing the typical
 management and CLI layer, consider using `mgmt` as a library, and directly
 benefiting from that work. This has not been done for a large production
 service, but the author believes it would be quite efficient, particularly if
 your application is written in golang. It's equivalently easy to do it for other
 languages as well, you just end up with two binaries instead of one. (Or you can
 embed the other binary into the new golang management tool.)
 ## Cloud API considerations
 Many "cloud" companies have a lot of technical debt and a lot of customers. As a
 result, it might be very hard for them to improve their API's, particularly
 without breaking compatibility promises for their existing customers. As a
 result, they should either add a versioned API, which lets newer consumers get
 the benefit, or add new parallel services which offer the modern features. If
 they don't, the only solution is for new competitors to build-in these better
 efficiencies, eventually offering better value to cost ratios, which will then
 make legacy products less lucrative and therefore unmaintainable as compared to
 their competitors.
 ## Suggestions
 If you have any ideas for suggestions or other improvements to this guide,
 please let us know! I hope this was helpful. Please reach out if you are
 building an API that you might like to have `mgmt` consume!