Git and the Intermittent Network

A personal experience with network failures

Within my organisation, we have been moving toward modern web based platforms. This offers many benefits to our users in terms of availability, sustainability, growth capability, and I have been one of the leading proponents for it.

While modern web-based services are the norm, and desirable, there are risks associated with them that should be considered for mitigation. Primarily, these risks revolve around the centralisation of service, as well as the network availability of clients. These risks are well understood and most tools used by modern development teams were designed with these types of issues in mind, however as the internet becomes more pervasive and stable, it is common for us to loose site of its limitations.

A Personal Experience

As I write this (2020–11–03), I am experiencing an internet outage.

This event occurred mid-meeting and has resulted in a situation where I cannot connect to the online resources required to complete my corporate objectives:

  1. No connection to a production server to conduct repairs on that server (or even inspect the logs)
  2. No connection to Microsoft Azure to conduct experimental work in our laboratory environment
  3. All forms of meeting communications have been cut-off (MS Teams, Webex, VOIP telephone)
  4. An Outlook plugin cannot connect to an encryption server and is frozen because it is attempting to show me an encrypted email
  5. I can’t take any of the corporate training I’m supposed to do, or read that manual for that new tool I’m investigating

I’m completely dead in the water.

According to my provider, a fibre-optic line has been cut “somewhere between Halifax and Montreal” resulting in massive connectivity loss for the region. The only productive task left to me is to write up an assessment of the current failure, on my local device, and upload it to the network when communication is re-established.

Hold on… Did I just describe getting work done and loading it later? That is a caching strategy that is well known and commonly used for resolving network latency issues.

The internet, as we understand it, has not always been as accessible, available, or reliable, as we have come to expect.

Scene from a post-apocolyptic movie in which a boy sits under a lean-to tarp with his rifle and dog.
Scene from a post-apocolyptic movie in which a boy sits under a lean-to tarp with his rifle and dog.
Scene from “A Boy and His Dog” (1975), which depicts the kind of conditions the Internet was designed to continue operating under. [Fair Use]

It is worth remembering that The Internet was initially designed as a distributed communication tool to allow the military to continue operating remote computers in the event of massive node loss (dating back to 1966). The assumption of loss of network availability has been an underlying assumption of much of the internet’s growth, and is built into the fabric of the internet.

In the early days of general access to internet services, connections to the network were made intermittently. This was performed by dial-up connections which would be initiated for short periods of time.

As networks became more common and robust, much of the shared development of software (Open Source) began to be shared across the network as opposed to letter carrier and print (via the “Share” catalogue). Unfortunately, internationally, not all network connections are created equally, and some users suffered from several disruptions to connectivity.

What was “Share”?

In the mid-1950s, a user organisation for scientific applications … was formed. One of its most important functions was serving as a clearinghouse for contributed software subroutines. The organisation was called Share … and the contributed routines became the first library of reusable software.

— Robert L. Glass, “Facts and Fallacies of Software Engineering”

Out of this sense of sharing evolved several clearing houses such as SourceForge, Tigris, and eventually GitLab and GitHub. Unfortunately, even these clearing houses were subject to disruption.

In the highly competitive days just after Y2K, several organisations rose and fell in rapid succession and crises formed when code hosting platforms were simply turned off due to corporate takeover, sabotage, copyright infringement lawsuit, or even simple bankruptcy. Thousands of hours of work were lost to the simple issue of the centralised servers being turned off. Modern VCS (Version Control System) solutions (such as Git) evolved in this environment, allowing for each node to retain complete histories of work, allowing for projects to be completely retained through a single surviving distributed node.

CAP Theorem (CouchDB: Definitive Guide)

In more recent years, CAP theorem has evolved to explain that high availability comes at certain costs, which, once the costs are accepted, can offer the benefits seen by the BBC newspaper during the Russo-Georgian conflict. During this period, communication lines were severed meaning that correspondents and readers could not communicate across the national boundary. Service continued to be delivered to each side of the boundary, allowing reporters to continue reporting, and commentors to continue to offer feedback during the entire conflict. Automated synchronisation of news reports, and on-the-ground reader comments, occurred during periods when alternate communication paths were established.(ERROR: I could have sworn this case was described in CouchDB: The Definitive Guide … I can’t seem to find the reference to this anymore)

Each of these historic scenarios have common elements:

  • contributors are forced to disconnect from communication and wait
  • contributors wish to continue to prepare their communications
  • “caching” is used to overcome communication latency, allowing people to continue working locally until the connection is reestablished.

This “batching” or “caching”, offers means to mitigate connectivity issues with web development platforms.

Web Based Publishing

Regardless of the content being published, content development has many common elements through its progression. Whether this is dynamic content of Software, or the static content of News Videos, there is a common process in creating and distributing the content online.

To use the example of an individual publishing an article to their newspaper (maybe their blog), they (the Contributor) would connect to the internet (NetProviderA), and type their article into Open Journal Service, or Wordpress, or Medium (the Server). They can continue to type into the software on the server, perhaps running some spell and grammar checks, until they hit the “publish” button. At this point it is possible for the message to be retrieved by the consumer whenever the customer wants.

Using web based development tools, the process for software development would be the same. The contributor would connect to the internet, edit their document on the server, and indicate readiness to “publish” which would make the application available to consumers of the application.

Intermittent Connections

Looking at the historic development of the internet, and the current issue at hand, we can see that there is a risk associated with the network not being available. We cannot consider the server in isolation, and must also include the effects of the network.

We can see that if the Contributor’s network connection is terminated, they are unable to perform any work. Sticking to the newspaper article example, the author may have a wonderful idea in their mind, or may know of a flaw in the argument, or (frankly) just want to get some work toward the publishing dead-line; unfortunately, they are stopped.

There is another layer that can be considered to overcome this issue, and that is the local computer which can be used to cache work: the Contributor can type their document on their local computer, and save to their local disk.

Looking at the previously discussed internet history, we can look to VCS tools to assist us in solving this problem. SVN and Git (as well as their predecessors and competitors) were developed in an environment where work needed to be buffered against future connections. Specifically, work needs to be performed locally, and stored locally until such time as it is possible to transmit it.

This has been an ongoing evolution, and Linus Torvalds specifically developed Git to resolve buffering issues he saw in SVN.

Tip

Git is not a simple upgrade of SVN, there are trade offs between the two products. Git stores complete copies of the database on every node, SVN stores one copy on each node. Git stores full copies of each state in its database, SVN stores a sequence of state changes in its database.

The end result of these differences is that Git is always recoverable (any single node can rebuild the entire system), but SVN requires less storage space and works better when large binary files are involved.

While Git is now dominant, many users of large binary files (Engineering Diagrams, Cartography) continue to prefer SVN.

Local or Web

Using local development tools and synchronising periodic changes is a common practice to allow us to only communicate changes we are committed to, but this practice also offers the benefit of resolving latency issues. During 2020, lock-downs have resulted in many of us having to work from home, and being remotely positioned to our workspace. We are using networks that were established for scenarios that demand significantly less resilience (binge watching movies) for situations that demand significant resilience (earning income to pay for groceries). For those of us whose livelihoods have become tied to these networks for the first time, this can be surprising. In cases where internet has temporarily failed, and we are left unable to progress, it can be distressing to both our managers and ourselves.

This does not mean I believe that local tools are better than web-based tools.

For many years, my favourite platform was Cloud9, an online web based IDE that allowed for workstations to be stood up on demand. This allowed me to maintain several development environments that met various needs. The ability pick up work from anywhere in the world, allowed me to continue working on projects from a hotel courtyard in Ecuador, from an the old indestructible RCA Cambios. The ability for the vendor to supply me with powerful remote computer meant I could work from a $100 computer. This means I received software upgrades immediately, and could work from any cheap hardware I could scrounge up.

There are trade-offs to be considered, and that is what this has been about. Be aware of the trade-offs before wholly committing to one solution or the other. IDE vendors want you to be tied to their tool, and this looses many of the benefits of distributed VCS platforms. On the other hand, we have computing networks, take advantage of them.

In the end I recommend a balanced approach that takes the lessons from the rich history of information sharing that is the internet.

Use Web based IDEs, but use generic ones. Do not depend on always having access to the vendor’s editor. Instead maintain regular local pulls from your VCS repository, and use programming languages and data formats that are based on simple text. This allows you to switch to a local copy during network outages, as well as protecting you from vendor lock-in.

Further Reading

As ever, Wikipedia has become the place to start. I recommend reading the article on Version Control Systems.

There are several generic, web-based, IDEs that I have enjoyed using:

Interestingly, each of these can be served on your corporate network (to protect your institution’s intellectual property), or installed on your local computer to allow you to continue working when your network gets nuked.

Atomic Blast, Nevada, 1951 (Wikimedia)

Having discovered a passion for business data analysis in my teens, I love to share the beauty of data and complex systems with other devs and clients alike.