# Content Addressable System

Marfeel uses a git based flow to deploy files. Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. This means that at the core, Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.

Git uses merkle trees (opens new window) as their fundamental underlying data structure. Essentially a merkle tree is a tree where each node is labeled with the cryptographic hash (SHA-1) value of their contents, which includes the labels of its children.

Using content addressable references (SHA-1 of the content) rather than file names as identifiers gives strong guarantees that we serve the same content regardless the file name.

That also means that each deploy is immutable. We always serve the contents of the same tree under a domain. When we finish processing new deploys, we only swap the tree to serve. Having immutable trees also prevents us from showing mixed content (serving different files from different branches).

No changes go live on your site’s public URL before all changes have been uploaded. Once all the changes are ready, the new version of the site immediately goes live on the CDN.

This means deploys are atomic, and your site is never in an inconsistent state while you’re uploading a new deploy.

With FTP or S3 uploads, each file is just pushed live one after the other, so you can easily get into situations where a new HTML page is live before the supporting assets (images, scripts, CSS) have been uploaded. And if your connection cuts out in the middle of an upload, your site could get stuck in a broken state for a long time.

Atomic deploys guarantee that your site is always consistent.

To fully understand the internal mechanics of a content addressable repository let’s go through the process of creating a simple html page with an associated CSS and JS file.

foo@bar:~/Developer$ mkdir contentAddressable
foo@bar:~/Developer$ cd contentAddressable
foo@bar:~/Developer/contentAddressable$ git init
Initialized empty Git repository in /Developer/contentAddressable/.git/

Add the following files index.jsp:

<!doctype html>
<head>
  <title>The HTML5 Herald</title>
  <link rel="stylesheet" href="css/main.css">
	<script src="js/main.js"></script>
</head>

<body>
  <div class="container"></div>
</body>
</html>

Source css/main.css:

.container {
	border: 1px solid black;
	background: #999;
}

Source scripts/main.js:

document.querySelector(".container").style.border = "3px solid red"

Commit the files to git:

foo@bar:~/Developer/contentAddressable|master$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	index.jsp
	scripts/
	styles/

nothing added to commit but untracked files present (use "git add" to track)

foo@bar:~/Developer/contentAddressable|master$ git add .
foo@bar:~/Developer/contentAddressable|master$ git commit -m "First Commit"
[master (root-commit) c3694c6] First Commit
 3 files changed, 17 insertions(+)
 create mode 100644 index.jsp
 create mode 100644 scripts/main.js
 create mode 100644 styles/main.css

Git internally stores these files under the folder .git/objects using the SHA-1 of the content as their file name:

foo@bar:~/Developer/contentAddressable|master$ tree .git/objects
.git/objects
├── 22
│   └── 2c559fe2643740b3220215d9b97f369870cb13
├── 72
│   └── 1203cff658f05463d577a66cff6627b0071221
├── 76
│   └── 5eba6cdb7cd4f9cb9b63deb71a2762056695e2
├── 92
│   └── eedb5da79a153ff2b3685ceb2e67c2b5e2718d
├── c3
│   └── 694c6459957d559142755096641706c1ee2ee1
├── c7
│   └── 860dc9e8f829432b95a77ef65eee3bc56730e6
├── eb
│   └── bb1a3d69e040f7ecffc8fbedf9756cf9ae0390
├── info
└── pack

You can inspect the content of any of these files using git cat-file -p. Bear in mind as part of the hash you need to include the 2 characters of the parent folder :

foo@bar:~/Developer/contentAddressable|master$ git cat-file -p 765eba6cdb7cd4f9cb9b63deb71a2762056695e2

<!doctype html>
<head>
  <title>The HTML5 Herald</title>
  <link rel="stylesheet" href="css/styles.css">
	<script src="js/scripts.js"></script>
</head>

<body>
  <div class="container"></div>
</body>
</html>

Similarly you could run git show 765eba6cdb7cd4f9cb9b63deb71a2762056695e2

Let’s now do some simple changes on the code and rename class=“container” to class=“c1”This means we have to change both the html (markup) and the js file (queryselector expression).

foo@bar:~/Developer/contentAddressable|master$ git branch feature1
foo@bar:~/Developer/contentAddressable|master$ git checkout feature1
foo@bar:~/Developer/contentAddressable|feature1$ vi index.jsp
foo@bar:~/Developer/contentAddressable|feature1$ vi scripts/main.js
foo@bar:~/Developer/contentAddressable|feature14$ git commit -a -m "Rename container selector to c1"
[master d4ecea6] Rename container selector to c1
 2 files changed, 2 insertions(+), 2 deletions(-)

At this point we have 2 branches:

  • Master: contains the old class=“container”
  • Feature1: contains the new class=“c1” with changes in main.js and index.jsp

We can assess this running git ls-tree -ron both branches:

foo@bar:~/Developer/contentAddressable|feature1$ git ls-tree -r master
100644 blob 765eba6cdb7cd4f9cb9b63deb71a2762056695e2	index.jsp
100644 blob 92eedb5da79a153ff2b3685ceb2e67c2b5e2718d	scripts/main.js
100644 blob c7860dc9e8f829432b95a77ef65eee3bc56730e6	styles/main.css

foo@bar:~/Developer/contentAddressable|feature1$ git ls-tree -r feature1
100644 blob 72378aac61444489ca5d48611322f5d4f511ea7d	index.jsp
100644 blob 6165b405f423d195d5da81fa8661fe6bcef2b195	scripts/main.js
100644 blob c7860dc9e8f829432b95a77ef65eee3bc56730e6	styles/main.css

As expected the SHA-1 of index.jsp and main.json the two branches is different, while the one for main.cssremains is the same (thus the Object is reused).

git ls-treeallows you to get the matching list of mappings across hashes and file names and paths. You can also use git ls-files —staged

For any given file on the git object store you can get its SHA-1 running:

foo@bar:~/Developer/contentAddressable|feature1$ git hash-object -w index.jsp
72378aac61444489ca5d48611322f5d4f511ea7d