# Service API design guide This document is intended as a short instructional design guide in building a service management API. It is certainly intended for someone who wishes to use `mgmt` resources and functions to interact with their facilities, however it may be of more general use as well. Hopefully this will help you make smarter design considerations early on, and prevent some amount of unnecessary technical debt. ## Main aspects What follows are some of the most common considerations which you may wish to take into account when building your service. This list is non-exhaustive. Of particular note, as of the writing of this document, many of these designs are not taken into account or not well-handled or implemented by the major API ("cloud") providers. ### Authentication #### The status-quo Many services naturally require you to authenticate yourself. Usually the initial user who sets up the account and provides credit card details will need to download secret credentials in order to access the service. The onus is on the user to keep those credentials private, and to prevent leaking them. It is convenient (and insecure) to store them in `git` repositories containing scripts and configuration management code. Since it's likely you will use multiple different services, it also means you will have a ton of different credentials to guard. #### An alternative Instead, build your service to accept a public key that you store in the users account. Only consumers that can correctly sign messages matching this public key should be authorized. This mechanism is well-understood by anyone who has ever uploaded their public SSH key to a server. You can use SSH keys, GPG keys, or even get into Kerberos if that's appropriate. Best of all, if you and other services use a standardized mechanism like GPG, a user might only need to keep track of their single key-pair, even when they're using multiple services! ### Events #### The problem People have been building "[CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete)" and "[REST](https://en.wikipedia.org/wiki/REST)"ful API's for years. The biggest missing part that most of them don't provide is events. If users want to know when a resource changes, they have to repeatedly poll the server, which is both network intensive, and introduces latency. When services were simpler, this wasn't as much of a consideration, but these days it matters. An embarrassingly small number of major software vendors implement these correctly, if at all. #### Why events? The `mgmt` tool is different from most other static tools in that it allows reading streams of incoming data, and stream of change events from resources we are managing. If an event API is not available, we can still poll, but this is not as desirable. An event-capable API doesn't prevent polling if that's preferred, you can always repeat a read request periodically. #### Variants The two common mechanisms for receiving events are "callbacks" and "long-polling". In the former, the service contacts the consumer when something happens. In the latter, the consumer opens a connection, and the service either closes the connection or sends the reply, when it's ready. Long-polling is often preferred since it doesn't require an open firewall on the consumers side. Callbacks are preferred because it's often cheaper for the service to implement that. It's also less reliable since it's hard to know if the callback message wasn't received because it was dropped, or if there just wasn't an event. And it requires static timeouts when retrying a callback message, and so on. It's best to implement long-polling or something equivalent at a minimum. #### "Since" requests When making an event request, some API's will let you tack on a "since" style parameter that tells the endpoint that we're interested in all of the events _since_ a particular timestamp, or _since_ a particular sequence ID. This can be very useful if missing an intermediate event is a concern. Implement this if you can, but it's better for all concerned if purely declarative facilities are all that is required. It also forces the endpoint to maintain some state, which may be undesirable for them. #### Out of band Some providers have the event system tacked on to a separate facility. If it's not part of the core API, then it's not useful. You shouldn't have to configure a separate system in order to start getting events. ### Batching With so many resources, you might expect to have 1000's of long-polling connections all sitting open and idle. That can't be efficient! It's not, which is why good API's need a batching facility. This lets the consumer group together many watches (all waiting on a long-poll) inside of a single call. That way, a single connection might only be needed for a large amount of information. ### Don't auto-generate junk Please build an elegant API. Many services auto-generate a "phone book" SDK of junk. It might seem inevitable, so if you absolutely need to do this, then put some extra effort into making it idiomatic. If I'm using an SDK generated for `golang` and I see an internal `foo.String` wrapper, then chances are you have designed your API and code to be easier to maintain for you, instead of prioritizing your customers. Surely the total volume of all customer code is more than your own, so why optimize for that instead of the putting the customer first? ### Resources and functions `Mgmt` has a concept of "resources" and "functions". Resources are used in an idempotent model to express desired state and perform that work, and "functions" are used to receive and pull data into the system. That separation has shown to be an elegant one. Consider it when designing your API's. For example, if some vital information can only be obtained after performing a modifying operation, then it might signal that you're missing some sort of a lookup or event-log system. Design your API's to be idempotent, this solves many distributed-system problems involving receiving duplicate messages, and so on. ## Using mgmt as a library Instead of building a new service from scratch, and re-inventing the typical management and CLI layer, consider using `mgmt` as a library, and directly benefiting from that work. This has not been done for a large production service, but the author believes it would be quite efficient, particularly if your application is written in golang. It's equivalently easy to do it for other languages as well, you just end up with two binaries instead of one. (Or you can embed the other binary into the new golang management tool.) ## Cloud API considerations Many "cloud" companies have a lot of technical debt and a lot of customers. As a result, it might be very hard for them to improve their API's, particularly without breaking compatibility promises for their existing customers. As a result, they should either add a versioned API, which lets newer consumers get the benefit, or add new parallel services which offer the modern features. If they don't, the only solution is for new competitors to build in these better efficiencies, eventually offering better value to cost ratios, which will then make legacy products less lucrative and therefore unmaintainable as compared to their competitors. ## Suggestions If you have any ideas for suggestions or other improvements to this guide, please let us know! I hope this was helpful. Please reach out if you are building an API that you might like to have `mgmt` consume!