10.5.18

Taking a Glance at Containers, Packages and Development

Time to dust off this blog.

Long time ago, I had worked on (yet another) open source implementation of a RPC plugin (code generation + client/server framework) for handling the "services" that are part of protobuf. I had briefly looked at existing C++ solutions, but I remember I primarily cared about finding out what it would take. I looked for ages for a suitable build tool, ultimately settling on gyp and ninja. And then, I still needed to handle dependencies.

At some point I must have lost interest, many years have passed, Bazel was meanwhile open-sourced. What (subjectively) changed more than anything else was that the world has moved on, docker containers, cloud setups are becoming more the standard than the norm, the word "DevOps" was invented, kubernetes is a thing. So people must have a way to deal with APIs. There is something called Swagger which can document APIs and also generate client and server-side code, with many languages and targeting language-specific frameworks...

Docker for Development

Apart from the revolution that this means for operations and delivery, this led me to think that a key development story is addressed this way, namely declaring dependencies and integrating them into the build process. Indeed I found a nice article by Travis Reeder that describes exactly the advantages, along with helpful examples and command line references.

Build tools, Modularity and Packages

Modularity is pretty much essential to serious development. I remember from university times that academic view on modularity seemed primarily from an angle of programming languages (in the old times, "separate compilation", later, through type systems). Nowadays, nobody considers providing a programming language implementation in isolation, for instance Rust has the notion of crates and modules, and what's more is that the cargo tool to manage crates is mentioned in the tutorial. npm would be another good example. In Java-land, there is a distinction between libraries on the class path and managing external dependencies using build tools like maven or gradle.

Just to state the obvious: the old view of considering modularity maybe as a low-level problem (linking and type systems) ignores the fact that a "modern" build tool has to handle package/module/crate repositories (multiple of them). On the other hand, the emergence of such repositories (plural) has its own problems, when "left-pad" package was pulled and broke many people. "npm disaster" is an entertaining search query, but it does lead to not-easily-resolved questions on the right granularity of modules/packages.

Image or Container?

With Docker as an emerging "modularity mechanism", it is very interesting to observe that in conversation (or documentation) people confuse docker image with a docker container. If you are confused, this post will get you on the right path again.

There is a parallel between a "module" as something that can be built and has an interface, and "module" as something that is linked and part of a running system. Since a container is a whole system in itself, networking is part of its interface. When building a larger system from containers as building blocks, the problem of connecting these requires tackling "low-level" configuration of an entirely different kind (see Kubernetes docs) than what "module linking" meant in the 80s. Still, I'd consider this a parallel.

Where are the best practices?

A quick search reveals that indeed there are 53k+ projects have Dockerfiles, including e.g. tensorflow, which explicitly mentions deployment and development Docker images. So while initially, I thought I had not slept through much essential changes, it seems on the contrary that things are happening at an entirely different level. What conventions should one follow, what are the pitfalls?

Fortunately for me, there are smart people good at posting articles about these things, so I can appreciate the problems and tentative solutions/best practices mentioned in Top 5 Kubernetes Best Practices. Hat-tip to the "Make sure your microservices aren’t too micro" advice.

D.C. al Coda

How can there be an end to this? Containerization seems to be a perpetual journey. At this point, I would have liked to offer tips based a "typical" example of a development project and how one would use docker to set up. However, I am not very active in terms of open source development, and the one project I want to get back to most isn't a developer tool at all. Fortunately, I believe best practices for "docker for development" aren't going to be any different from the best practices for using docker in general.

There is an interesting point where, in order to build a container image, one has to still solve the problem of gathering the dependencies. So the problem of gathering dependencies at build time has only been shifted on level (up? down?), though the constraints are very different: an organization can find easy ways to make docker images available to their developers, and only the team that takes care of providing the docker image has to gather the dependencies. This appears to me much more sane than having a development workflow that makes connections to external repositories on the developers box at build time.