Unified storage IO

November 12, 2016

Why not just use store X?

Durability should be simple and flexible. When I started to design replikativ I investigated many different IO options, including many key-value or document stores like CouchDB, Riak, MongoDB, IndexedDB etc. My trouble was that I wanted to have cross-platform code also targeting ClojureScript while I didn’t need a particular storage access pattern from the start. Also deciding for any particular storage is in most cases premature as IO specific requirements change in the lifecycle of an application. In my experience people tend to use one or two storage solutions they know instead of using something minimilastic and then plugging in a more elaborate storage solution when their application needs it. This then complects what features are really required by the application and what is used just because the storage solution prescribes it.

With ClojureScript in particular a problem arises, as this is not a local decision which can be factored away, but either you introduce callbacks everywhere or you default to core.async, besides other more exotic options. You then need to wrap all your code in go-routines for ClojureScript and hence platform-neutral code becomes in the same way affected as code is by IO in Haskell. Since I wanted to have a reasonable default, we picked the back then new core.async and get a sane sequential programming model. This put me out of luck, as all previous JVM libraries and approaches use synchronous IO. I also wanted to start with the minimum necessary, which was the semantics of a Clojure hash-map for me.

The situation has barely changed as truely platform neutral code for IO operations like storage and network is not yet well established in the Clojure(Script) community. konserve (for storage IO) and kabel (for network IO) are efforts to change that and significantly facilitate the development with core.async.

Flexibility by simplicity

Simple and flexible way to store things reliably. Both the material and the language can be switched to address different tradeoffs: Historic piece of stone with letters

I designed and we use konserve in production as a simple interface to do common tasks like session storages in backends, file storage for caches of binary data in Clojure(Script) and in general any way to durably store state when Datomic’s query capabilities are not necessary. Furthermore advanced storage concepts should be facilitated by such a building block for persistent datastructure concepts, e.g. the hitchhiker-tree or durable-persistence and particulary for the CRDTs in replikativ. So what is desired?

By reducing the storage interface to a simple key-value store with a core.async interface and edn serialization one gets a very good tradeoff to not having to reason about the interface:

(<!! (k/assoc-in store [:bar] 42))
  (<!! (k/update-in store [:bar] inc)) ;; => [42 43]
  (<!! (k/get-in store [:bar])) 

Opting out

It is still possible to get gradually more direct access to the underlying store possible, e.g. for performance or specific features like transactional safety or batch processing, similar to the way Clojure exposes the underlying JVM primitives and interfaces. Konserve furthermore reuses popular Clojure libraries for backends where possible to not reinvent the wheel and allow comfortable direct access. Konserve provides a reasonable default store without dependencies or setup required for the JVM (filestore) and the Browser (IndexedDB).

Performance is fairly good for small key-value pairs accessed in parallel, which is exploited for instance in replikativ by using Merkle-Tree like structures. In general you pay a neglegible cost for the konserve protocol and the usual cost for edn serialization depending on the serializer. The implementation of more advanced storage features on top is intended, for instance a fast append-log with reduction is already implemented and used in replikativ. A write-through caching scheme could also become attractive as well as a small REST interface exposing konserve operations.

Dealing with errors in core.async is not always easy, so it can be helpful to use superv.async instead of plain core.async. Errors happening are returned as wrapped exceptions and need to be rethrown in case you expect this. I hope you find it a useful storage default and report back in the gitter chat or on the Clojure mailing list. Happy storing! :)

Unified storage IO - November 12, 2016 - christian weilbach