9 min read

Longevity of systems and data

How do we build digital systems that last?
A bitmapped and saturated image of a rock off a coastline, reminiscent of early video game graphics
A rock off the coast of Kauai, stylized with the 8Bit Photo Lab mobile app / Jesse Kriss

Consider tomorrow is one of Monorail's values.

One of the ways I interpret this phrase is through the question of longevity: if someone is using my software to create or catalog data they value, how do I make sure their investment of time and care is respected over the long term?

It's not as simple as "keep the service running" or "build a company that lasts a long time." Those may be admirable goals, but if we're talking about a time scale of decades or more, they're not realistic, and certainly not guaranteed. So much of the substance of our lives and culture is digital, so if we're serious about keeping the important stuff, we should be thinking on the timescale of lifetimes.

What's the equivalent of having our great grandparents' photo albums? I can't imagine it means passing our Instagram login credentials down through the generations.


There are risks in adopting any new digital tool or service. Will it shut down, get too expensive, or stray too far from your own values? If so, what do you do with your photos, tweets, and friend connections? Businesses face a version of this, too: what's the cost of switching to an alternative or building an internal solution, and is it worth it? Nobody likes to feel trapped or extorted.

Many services do have a data export feature, but that doesn't mean it's especially useful. Maybe it's fine for backup, but you have to remember to actually export your data, and what do you do with it other than save it somewhere, just in case?

Open source software can be helpful, but it's not for everybody. It's good to have free software that you can run yourself, but it's almost always fiddly and complicated, and often takes a huge time investment. You still don't know if any given open source project is going to be well maintained over the long term, and you usually have to pick the open source solution right at the beginning in order to reap the potential longevity benefit later.

Some people have pushed hard on the data ownership side, including Tim Berners-Lee, the inventor of the world wide web. He's created the Solid project, where applications connect to user-owned data sources, but that requires both users and services to commit to a whole new infrastructure. A worthy experiment, sure, but it's risky for developers of software to rely on a new, bespoke tech stack that may or may not take off.

I think a more useful approach is to consider the layers of software services and look at their relative longevity, flexibility, and degree of openness independently, and then see how those properties can inform the design of new systems.

Online services

As users of software systems and tools, we usually interact with apps. The current predominant type is either a web application, or a native app that requires a corresponding web service to function in any meaningful way. The site must be up, and you need an internet connection to use it. End users have no control over changes or upgrades–everybody essentially gets the same version.

This approach is very popular, and legitimately useful, but all kinds of things can go wrong: the service might be temporarily unavailable, the company might fold, it might get too expensive, they might decide not to operate in your country, or the product, terms, or community might change in unacceptable ways.

There is often some opportunity for end users to adapt these services to their needs using APIs provided by the service, but the company chooses what interfaces to implement, and can allow or block specific types of usage, like third party clients.

Apps

With native apps, you have an application installed on your phone, tablet, or computer. If the service goes down or disappears, you can still use the app, and you probably have some control over whether or not you upgrade to a newly released version. One catch here is that many apps require an internet connection for at least some functionality, and they also may require ongoing contact with licensing servers, especially if it's paid for on a subscription basis.

The risks here are similar to online services, but with a bit of a delay: you're likely to be ok for a while using an old, unsupported app, as long as you don't need a subscription. Eventually, it may not run on more recent hardware or software, or it might have security vulnerabilities that aren't patched. If you're using an app that saves out files, though–like a photo app or a text editor–then you can probably at least do something useful with those.

Programming languages and runtimes

This layer is about how the code is written and how it runs. There's no way to skip this part–this is fundamentally what software is–but technology choices affect the requirements necessary to run the software, as well as the knowledge and tools needed to maintain and edit it. Compiled languages, like C, get translated into machine instructions that run directly on specific processors, whereas interpreted languages, like JavaScript, require an additional piece of software–the runtime–that can execute the program, and typically work across a number of environments and on different hardware. This approach is generally more portable, but does require an additional program to run. This runtime could be a command line tool for running scripts, or something like a web browser.

Interpreted languages usually also mean that the code itself is more open to inspection, whether or not it's released under an open source license. In a web browser, you can use the built in developer tools to see what's going on under the hood. This improves the odds that an abandoned application can at least be reverse engineered.

Language choices impact long term maintainability, too. You hear about COBOL in the news when old legacy systems need to be updated, and vanishingly few people know how to read or write it. Some people even invent languages and runtimes with a specific goal of making it maintainable in the far future.

Protocols

This is how systems talk to each other. In the early days of the internet, there was a lot of work on protocols, and many are still in widespread use today. TCP/IP is the set of lower level networking protocols that essentially everything on the internet uses, HTTP is built on top of that, and old email protocols like SMTP, POP, and IMAP were developed in the 1980's and still power email systems globally.

Protocols are fundamentally about interoperability. The whole reason they exist is so that disparate systems, created by different people, potentially using different hardware and different programming languages, can talk to each other and successfully form a larger, heterogeneous, networked system. Using a standard protocols are great for longevity and flexibility–it will remain viable as long as enough people use it, and can survive across many implementations over time.

In recent decades, though, the commercial incentives have pushed most companies to closed, proprietary systems. They don't want interoperability–they want to control the whole ecosystem. Luckily, some of those earlier protocols have become ubiquitous enough that we can reasonably rely on them, and there's nothing stopping us from creating new (and even weird) new protocols that better suit our purposes, constraints, and values.

File formats

Remember files? We definitely don't use them as much as we used to–so much is stored in cloud services. We don't even know what the file format is for something like a Google Doc.

File formats are all about how the data is actually stored on disk. The simplest examples are things like text files and image files, where a single artifact is encoded in a single file, but it can also include more complex container formats or standardized folder structures.

Two important questions are whether a file format is publicly documented, and if it's free to use. MP3s–remember those?–are a publicly documented format, but the method of encoding and decoding was patented until fairly recently. "Documented and free" is certainly the best combination for longevity and flexibility, though in practice even closed, undocumented formats are likely to be reverse engineered, and patents expire over time.


So what does all this mean for the process of system design?

One path through the design process is to repeat the question "What would happen if the outermost layer disappears?" If the hosted service goes down, now what? If the app no longer functions, now what?

You can even think of this as a change-oriented (or even decay-oriented) version of the famous Eliel Saarinen quote:

Always design a thing by considering it in its next larger context — a chair in a room, a room in a house, a house in an environment, an environment in a city plan.

Perhaps we could say:

Always design a thing by considering changes in its next larger context. Can the chair outlive the room? Can the house withstand dramatic change in the environment?

Aside from "will it still work?" it's also important to understand the expected durability and longevity of the components themselves: what choices of language, runtimes, protocols, and file formats are more likely to be maintained, understood, and ported to future systems? This can be evaluated by looking at the history of maintenance (is it old and still widely used?) as well as simplicity (how hard is it to understand, repair, and reimplement?).


How would we build something following this framework? Let's work through a concrete example: a system for saving and sharing photos.

Let's start from the outermost layer and work our way down.

Layer 1 could be a web application, hosted on a domain that's owned and operated by some person, company, or institution. You go to something like photos.monorail.tech to get started, and you see a user interface for uploading photos, editing captions, and viewing photos. This interface is built with HTML, CSS, and JavaScript.

The web is still a great platform: you don't need to download anything, people generally know how to use it, it works on wide variety of devices, it's a rich platform for developing interactive user interfaces, and the languages and runtimes are well established and well maintained, with a long history of backward compatibility, and built on broadly adopted protocols.

What happens if this layer goes away, and photos.monorail.tech just stops working one day?

A screenshot with the message "This photos.monorail.tech page can't be found"
In this case, it doesn't exist because I haven't built it yet. But you get the idea.

Well, you could hope that there was an export function and that you used it before the site disappeared, but we can do better than that.

Let's say that Layer 2 is a desktop app, sharing much of the same HTML, CSS, and JavaScript user interface code, that operates as a single-user version of the photo service and syncs that single user's data from the hosted service to the computer when an internet connection is available.

Now we have something that preserves the photos, data, and whatever browsing and searching capability exists, even if the site is temporarily or permanently unavailable, and it doesn't require to maintain their own backup practice.

Ok, what if that layer collapses? Imagine the desktop app uses frameworks that are no longer available in newer versions of the OS, or your new computer has a different kind of processor that wasn't even invented when the original app was made.

We want to make sure the data is accessible, understandable, and useful, even if these things happen.

Layer 3 in this case is the file format, which is independent of language, runtime, or implementation. The desktop app's data store could be a SQLite database for photo metadata, including captions, comments, and any other relevant information, plus the image files themselves, stored as regular files on disk. The database entries reference the file paths, so the metadata can be linked to the files. Any additional related files, like thumbnails, are stored similarly.

SQLite has been around since 2000, is extremely widely adopted, and has tooling for pretty much every conceivable environment. Most databases run on a server, with clients connecting, but SQLite databases are files that are accessed directly by application code, using open source libraries.

Even without documentation, this file structure would be easy to understand, and could even be used natively by a new application. That is, someone could write a new application that operates directly on the SQLite database and the filesystem, rather than needing to write an importer or converter, letting people continue to add photos and maintain their collection, or even sync the data to a new hosted server.

Technical details aside, this means the data–and the effort that people put into it–can significantly outlast the original service, the original application, the original code, and even the programming languages and protocols used in the original implementation.

Could this layer last for 100 years or more? I think so. Even in the worst case, it's a format that's simple enough that it could be migrated every few decades with relatively little effort.


Consider tomorrow

We have the tools to build this way.

It's not even all that complicated, technically. This framing also provides additional positive pressure towards simplicity, maintainability, and building on proven, supported technology–good general design principles. And it doesn't even require open sourcing your code.

It does require a choice and an investment.

It requires a choice to respect the time and energy of the people who use the tools we build, beyond the time horizon of a particular company's profits. And it requires an investment in design work to make something that can last, even if it not every part of it lasts forever.

I think it's worth it. Let's give it a try.


This is your friendly reminder that paid subscribers can comment on posts, and will get access to software I build. Sign up here!

Have you run into the negative consequences of tools or platforms that weren't designed with longevity in mind? I'd love to hear your stories.