19.7.20

Modules, components and their limits

I read Clear explanation of Rust's module system. The TL;DR the module system in rust is a way to (somewhat) break/hide the link between what things are defined and what source files the things are defined in. A rust module provides control over visibility across source files and makes explicit how references reach across "modules".

The post starts with a request to ignore what "module" means in other languages, and this plunged me into thoughts on all the things that are called modules. It is somewhat sad but to be expected that a basic word like "module" (or "package") could never have a standard meaning, despite all these programming languages supposedly built while we have all this PL theory.

The more academic vocabulary has "module" as a language concept that comes with a scope and visibility, a way to bundle types and named values. It's somehwat independent of installation, deployment (the only touch point being support for separate compilation), but - to my knowledge - there is no talk of an "installed-at-a-version" module. So the rust module concept is pretty consistent with that. Arranging a set of modules that fit together is the programmer's problem, not the researcher's.

Alas, there is this thing that happens at the level of Rust crates/npm packages, and it is really a different problem. Nevertheless, those things are also called modules in some languages (C++20 modules, Java modules, go modules).

In Java, source files in the same directory make a Java package and they come with a way to control visibility. The association between files and packages is rigid. More recently, Java modules were added to express dependencies between modules, visibility of packages - something like a "packages of packages", but really more on the level of Rust crates.

I am reminded of "components" (if anybody remembers component-oriented programming) because deployment is part of this game - the notion of dependencies only makes sense in the context of some installation. But then components used to be about runtime services (in addition to code), object brokers and such and here we would actually like to talk about less dynamic things, just code and functionality in the form of source and/or libraries. Still, components are things that get assembled and exist at versions, so I'd say it's an appropriate, if old-fashioned name.

Now, such a (static) component would abstractly describe what npm packages or Rust crates achieve, and also form of Debian development packages (sigh) where you install a C library in binary form and its header files. That gives us even more of a twist now, which is that C++ 20 modules are really modules in the sense of achieving grouping for visibility and scope (check) but at the same time that whole notion has a huge impact on C++ since it should actually make it easier to deploy C++ components. A relevant quote from the C++20 link above:

After a module is compiled once, the results are stored in a binary file that describes all the exported types, functions and templates. That file can be processed much faster than a header file, and can be reused by the compiler every place where the module is imported in a project.

So while a C++20 module is language concept, it is foreseeable that one day all self-respecting newly developed libraries will be deployed as (installed at a version) modules. Oh well. There was really no reason to hope that a standard meaning for "modules" and the concept "above" could be justified in any way, across languages. At least, there remains an insight that it really makes sense to deal with the two levels separately.

I sure wish we could all call the installed-at-a-version thing "component", and the language concept "modules". Just two paragraphs on how that might sound:

Every language needs a module system: a way to deal with grouping, interfaces (in terms of making visible - "exporting" only the interface and keeping the rest internal). Including files was always bad and indicates the lack of a module system. It's when one needs to deal with independently developed, deployed and release units that there comes the problem of dependencies, evolution and versions, and these things are the components. Components are related to modules in that they need to contain at least one of them for talking about the things that are provided and brought in scope.

If there is a dependency between versions of components and there are backwards incompatible (!) changes in the depended-upon component (or a left-pad situation), then this is something that needs to be dealt through changes to the depending component. That isn't something that module systems (or any other language design) can help much with.

No comments: