Understanding IPFS in Depth(4/6): What is MultiFormats?
Every Choice in Computing has a Tradeoff. It’s Time to Make Future-Proof Systems.

Free AI web copilot to create summaries, insights and extended knowledge, download it at here
18841
Abstract
re></div><h2 id="aac7">Multiaddr Format</h2><p id="a48b">A multiaddr value is a <i>recursive</i> <code>(TLV)+</code> (type-length-value repeating) encoding. It has two forms:</p><ul><li>a <i>human-readable version</i> to be used when printing to the user (UTF-8)</li><li>a <i>binary-packed version</i> to be used in storage, transmissions on the wire, and as a primitive in other formats.</li></ul><h2 id="84c4">The human-readable version</h2><ul><li>path notation nests protocols and addresses, for example: <code>/ip4/127.0.0.1/udp/4023/quic</code> (this is the repeating part).</li><li>a protocol MAY be only a code, or also have an address value (nested under a <code>/</code>) (eg. <code>/quic</code> and <code>/ip4/127.0.0.1</code>)</li><li>the <i>type</i> <code><addr-protocol-str-code></code> is a string code identifying the network protocol. The table of protocols is configurable. The default table is the <a href="http://multiformats.io/multiaddr/multicodec">multicodec</a> table.</li><li>the <i>value</i> <code><addr-value></code> is the network address value, in natural string form.</li></ul><figure id="f0ac"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*SPnxhvcjUtx9NXcwtMkMYQ.png"><figcaption>Multiaddr human-readable Regex</figcaption></figure><h2 id="487d">The binary-packed version</h2><ul><li>the <i>type</i> <code><addr-protocol-code></code> is a variable integer identifying the network protocol. The table of protocols is configurable. The default table is the <a href="http://multiformats.io/multiaddr/multicodec">multicodec</a> table.</li><li>the <i>length</i> is an <a href="https://github.com/multiformats/unsigned-varint">unsigned variable integer</a> counting the length of the address value, in bytes.</li><li><b>The <i>length</i> is omitted by protocols who have an exact address value size, or no address value.</b></li><li>the <i>value</i> <code><addr-value></code> is the network address value, of length <code>L</code>.</li><li><b>The <i>value</i> is omitted by protocols who have no address value.</b></li></ul><figure id="b340"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MTXnkER00xqy_YgEagU-wQ.png"><figcaption>MultiAddr Binary Regex</figcaption></figure><h2 id="ab22">Implementations</h2><p id="f998">You can find a number of <a href="http://multiformats.io/multiaddr/#implementations">multiaddr implementations</a> in multiple languages.</p><h2 id="45eb">Tutorial</h2><p id="c017">You can find a hands-on tutorial on multiaddr at the <a href="https://hackernoon.com/understanding-ipfs-in-depth-4-6-what-is-multiformats-cf25eef83966#4a20">end of this post</a>.</p><h1 id="3d59">Multibase</h1><p id="cfa1">Multibase is a protocol for disambiguating the encoding of base-encoded (e.g., base32, base64, base58, etc.) binary appearing in the text.</p><p id="93c6">When text is encoded as bytes, we can usually use a one-size-fits-all encoding (UTF-8) because we’re always encoding to the same set of 256 bytes (+/- the NUL byte). When that doesn’t work, usually for historical or performance reasons, we can usually infer the encoding from the context.</p><p id="8bce">However, when bytes are encoded as text (using a base encoding), the base choice of base encoding is often restricted by the context. Worse, these restrictions can change based on where the data appears in the text. In some cases, we can only use <code>[a-z0-9]</code>. In others, we can use a larger set of characters but need a compact encoding. This has lead to a large set of “base encodings”, one for every use-case. Unlike when encoding text to bytes, we can't just standardize around a single base encoding because there is no optimal encoding for all cases.</p><p id="ff7c">Unfortunately, it’s not always clear <i>what</i> base encoding is used; that’s where multibase comes in. It answers the question:</p><p id="6841"><i>Given data d encoded into text s, what base is it encoded with?</i></p><h2 id="8a87">Multibase Format</h2><p id="7990">The Format is:</p><div id="545c"><pre><span class="hljs-section"><base-encoding-character></span><span class="hljs-section"><base-encoded-data></span></pre></div><p id="044c">Where <code><base-encoding-character></code> is used according to the <a href="https://github.com/multiformats/multibase/blob/master/multibase.csv">multibase table</a>.</p><p id="3cc5">Here is an example to show how it works.</p><p id="52fc">Consider the following encodings of the same binary string:</p><div id="36cc"><pre>4D756C74696261736520697320617765736F6D6521205C6F2F <span class="hljs-comment"># base16 (hex)</span> JV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP <span class="hljs-comment"># base32</span> YAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt <span class="hljs-comment"># base58</span> TXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw== <span class="hljs-comment"># base64</span></pre></div><p id="3bbe">And consider the same encodings with their multibase prefix</p><div id="d523"><pre>F4D756C74696261736520697320617765736F6D6521205C6F2F <span class="hljs-comment"># base16 F</span> BJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP <span class="hljs-comment"># base32 B</span> zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt <span class="hljs-comment"># base58 z</span> MTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw== <span class="hljs-comment"># base64 M</span></pre></div><p id="ef65">The base prefixes used are: <code>F, B, z, M</code>.</p><p id="3af4">Now, you can write self-descriptive encoded text :)</p><h2 id="5313">Implementations</h2><p id="44a8">You can find a number of <a href="https://github.com/multiformats/multibase#implementations">multibase implementations</a> in multiple languages.</p><p id="5eb9">Now, the next two, <i>Multicodec</i> and <i>Multistream</i> are a bit inter-related, so I will try to explain the motivation behind these two, but you may need to read them both to understand each one of them.</p><h2 id="3eac">Tutorial</h2><p id="c222">You can find a hands-on tutorial on multibase at the <a href="https://hackernoon.com/understanding-ipfs-in-depth-4-6-what-is-multiformats-cf25eef83966#56cc">end of this post</a>.</p><h1 id="6d63">Multicodec</h1><h2 id="8463">Motivation</h2><p id="3062"><a href="https://github.com/multiformats/multistream">Multistreams</a> are self-describing protocol/encoding streams. Multicodec uses an agreed-upon “protocol table”. It is designed for use in short strings, such as keys or identifiers (i.e <a href="https://github.com/ipld/cid">CID</a>).</p><h2 id="fe29">How does the protocol work?</h2><p id="e999"><code>multicodec</code> is a <i>self-describing multiformat</i>, it wraps other formats with a tiny bit of self-description. A multicodec identifier is a varint.</p><p id="0ef2">A chunk of data identified by multicodec will look like this:</p><div id="4cff"><pre><span class="hljs-section"><multicodec></span><span class="hljs-section"><encoded-data></span> <span class="hljs-comment"># To reduce the cognitive load, we sometimes might write the same line as:</span> <span class="hljs-section"><mc></span><span class="hljs-section"><data></span></pre></div><p id="b0b3">Another useful scenario is when using the multicodec as part of the keys to access data, for example:</p><div id="d211"><pre><span class="hljs-comment"># suppose we have a value and a key to retrieve it</span> <span class="hljs-string">"<key>"</span> -> <span class="hljs-variable"><value></span></pre></div><div id="e87a"><pre><span class="hljs-comment"># we can use multicodec with the key to know what codec the value is in</span> <span class="hljs-string">"<mc><key>"</span> -> <span class="hljs-variable"><value></span></pre></div><p id="beb9">It is worth noting that multicodec works very well in conjunction with <a href="https://github.com/multiformats/multihash">multihash</a> and <a href="https://github.com/multiformats/multiaddr">multiaddr</a>, as you can prefix those values with a multicodec to tell what they are.</p><h2 id="4d4d">MulticodecProtocol Tables</h2><p id="0443">Multicodec uses “<a href="https://github.com/multiformats/multicodec/blob/master/table.csv">protocol tables</a>” to agree upon the mapping from one multicodec code. These tables can be application specific, though — like <a href="https://github.com/multiformats/multihash">with</a> other <a href="https://github.com/multiformats/multiaddr">multiformats</a> — we will keep a globally agreed upon table with common protocols and formats.</p><h2 id="1f9b">Multicodec Path, also known as multistream</h2><p id="a4e7">Multicodec defines a table for the most common data serialization formats that can be expanded overtime or per application bases, however, in order for two programs to talk with each other, they need to know beforehand which table or table extension is being used.</p><p id="cf3c">In order to enable self-descriptive data formats or streams that can be dynamically described, without the formal set of adding a binary packed code to a table, we have <a href="https://github.com/multiformats/multistream"><code>multistr</code>eam</a>, so that applications can adopt multiple data formats for their streams and with that create different protocols.</p><p id="7359">Now let’s answer a few questions to understand its significance.</p><h2 id="345d">Why Multicodec?</h2><p id="2a0f">Because <a href="https://github.com/multiformats/multistream">multistream</a> is too long for identifiers. We needed something shorter.</p><h2 id="1b60">Why varints?</h2><p id="1ad7">So that we have no limitation on protocols.</p><h2 id="6240">Don’t we have to agree on a table of protocols?</h2><p id="3479">Yes, but we already have to agree on what protocols themselves are, so this is not so hard. The table even leaves some room for custom protocol paths, or you can use your own tables. The standard table is only for common things.</p><h2 id="4368">Where did multibase go?</h2><p id="7868">For a period of time, the multibase prefixes lived in this table. However, multibase prefixes are <i>symbols</i> that may map to <i>multiple</i> underlying byte representations (that may overlap with byte sequences used for other multicodecs). Including them in a table for binary/byte identifiers lead to more confusion than it solved.</p><p id="f11b">You can still find the table in <a href="https://github.com/multiformats/multibase/blob/master/multibase.csv">multibase.csv</a>.</p><h2 id="da1b">Implementations</h2><p id="394a">You can find a number of <a href="https://github.com/multiformats/multicodec#implementations">multicodec implementations</a> in multiple languages.</p><h2 id="7b30">Tutorial</h2><p id="6140">You can find a hands-on tutorial on multicodec at the <a href="https://hackernoon.com/understanding-ipfs-in-depth-4-6-what-is-multiformats-cf25eef83966#d578">end of this post</a>.</p><h1 id="4a4d">Multistream</h1><h2 id="bdc4">Motivation</h2><p id="4205">Multicodecs are self-describing protocol/encoding streams. (Note that a file is a stream). It’s designed to address the perennial problem:</p><p id="c0aa"><i>I have a bitstring, what codec is the data coded with?</i></p><p id="8252">Instead of arguing about which data serialization library is the best, let’s just pick the simplest one now, and build <i>upgradability</i> into the system. Choices are never <i>forever</i>. Eventually, all systems are changed. So, embrace this fact of reality, and build change into your system now.</p><p id="54fc">Multicodec frees you from the tyranny of past mistakes. Instead of trying to figure it all out beforehand, or continue using something that we can all agree no longer fits, why not allow the system to <i>evolve</i> and <i>grow</i> with the use cases of today, not yesterday.</p><p id="9e16">To decode an incoming stream of data, a program must either</p><ol><li>know the format of the data a priori, or</li><li>learn the format from the data itself.</li></ol><p id="a460">(1) precludes running protocols that may provide one of many kinds of formats without prior agreement on which. multistream makes (2) neat using self-description.</p><p id="993c">Moreover, this self-description allows straightforward layering of protocols without having to implement support in the parent (or encapsulating) one.</p><h2 id="ad59">How does the protocol work?</h2><p id="8caa"><code>multistream</code> is a <i>self-describing multiformat</i>, it wraps other formats with a tiny bit of self-description:</p><div id="3a73"><pre><span class="hljs-tag"><<span class="hljs-name">varint-len</span>></span>/<span class="hljs-tag"><<span class="hljs-name">codec</span>></span>\n<span class="hljs-tag"><<span class="hljs-name">encoded-data</span>></span></pre></div><p id="e2c6">For example, let’s encode a JSON doc:</p><div id="ece8"><pre><span class="hljs-comment">// encode some json</span> <span class="hljs-keyword">const</span> buf = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Buffer</span>(<span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">stringify</span>({ <span class="hljs-attr">hello</span>: <span class="hljs-string">'world'</span> }))</pre></div><div id="4d52"><pre>const prefixedBuf = multistream.addPrefix('json', buf) // prepends multicodec ('json') console.log(prefixedBuf) // <Buffer<span class="hljs-number"> 06 </span>2f 6a<span class="hljs-number"> 73 </span>6f 6e 2f 7b<span class="hljs-number"> 22 </span>68<span class="hljs-number"> 65 </span>6c 6c 6f<span class="hljs-number"> 22 </span>3a<span class="hljs-number"> 22 </span>77 6f<span class="hljs-number"> 72 </span>6c<span class="hljs-number"> 64 </span>22 7d></pre></div><div id="19ca"><pre><span class="hljs-built_in">console</span>.<span class="hljs-built_in">log</span>(prefixedBuf.toString(<span class="hljs-string">'hex'</span>)) <span class="hljs-comment">// 062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d</span></pre></div><div id="7391"><pre><span class="hljs-comment">// let's get the Codec and then get the data back</span></pre></div><div id="4dfa"><pre>const codec = multicodec.getCodec(prefixedBuf) <span class="hljs-built_in">console</span>.<span class="hljs-built_in">log</span>(codec) <span class="hljs-comment">// json</span></pre></div><div id="0189"><pre>console<span class="hljs-selector-class">.log</span>(multistream<span class="hljs-selector-class">.rmPrefix</span>(prefixedBuf)<span class="hljs-selector-class">.toString</span>()) <span class="hljs-comment">// "{ "hello": "world" }</span></pre></div><p id="d4eb">So, <code>buf</code> is:</p><div id="88c0"><pre><span class="hljs-symbol">hex:</span> <span class="hljs-number">062</span>f<span class="hljs-number">6</span>a<span class="hljs-number">736</span>f<span class="hljs-number">6e2</span>f<span class="hljs-number">7</span>b<span class="hljs-number">2268656</span><span class="hljs-keyword">c</span><span class="hljs-number">6</span><span class="hljs-keyword">c</span><span class="hljs-number">6</span>f<span class="hljs-number">223</span>a<span class="hljs-number">22776</span>f<span class="hljs-number">726</span><span class="hljs-keyword">c</span><span class="hljs-number">64227</span>d <span class="hljs-symbol">ascii:</span> /json\n<span class="hljs-string">"{"</span>hello<span class="hljs-string">":"</span>world<span class="hljs-string">"}"</span></pre></div><p id="e664">Note that on the ASCII version, the varint at the beginning is not being represented, you should account that.</p><h2 id="0eee">The Protocol Path</h2><p id="2869"><code>multistream</code> allows us to specify different protocols in a universal namespace, that way being able to recognize, multiplex, and embed them easily. We use the notion of a <code>path</code> instead of an <code>id</code> because it is meant to be a Unix-friendly URI.</p><p id="8043">A good path name should be decipherable — meaning that if some machine or developer — who has no idea about your protocol — encounters the path string, they should be able to look it up and resolve how to use it.</p><p id="a243">An example of a good path name is:</p><div id="7e47"><pre>/bittorrent<span class="hljs-meta">.org</span>/<span class="hljs-number">1.0</span></pre></div><p id="dfab">An example of a <i>great</i> path name is:</p><div id="e593"><pre>/ipfs/Qmaa4Rw81a3a1VEx4LxB7HADUAXvZFhCoRdBzsMZyZmqHD/ipfs.protocol /http/w3id.org/ipfs/1.1.0</pre></div><p id="2903">These path names happen to be resolvable — not just in a “multistream muxer(e.g <a href="https://github.com/multiformats/multistream-select">multistream-select</a>)” but on the internet as a whole (provided the program (or OS) knows how to use the <code>/ipfs</code> and <code>/http</code> protocols).</p><p id="59a1">Now, let’s answer a few questions to understand its significance.</p><h2 id="5310">Why Multistream?</h2><p id="002c">Today, people speak many languages and use common ones as an interface. But every “common language” has evolved over time, or even fundamentally switched. Why should we expect programs to be any different?</p><p id="c552">And the reality is they’re not. Programs use a variety of encodings. Today we like JSON. Yesterday, XML was all the rage. XDR solved everything, but it’s kinda retro. Protobuf is still too cool for school. capnp (“cap and proto”) is for cerealization hipsters.</p><p id="7801">The one problem is figuring out what we’re speaking. Humans are pretty smart, we pick up all sorts of languages over time. And we can always resort to pointing and grunting (the ASCII of humanity).</p><p id="a395">Programs have a harder time. You can’t keep piping JSON into a protobuf decoder and hope they align. So we have to help them out a bit. That’s what multicodec is for.</p><h2 id="5b95">Full paths are too big for my use case, is there something smaller?</h2><p id="1484">Yes, check out <a href="https://github.com/multiformats/multicodec/blob/master/README.md">multicodec</a>. It uses a varint and a table to achieve the same thing.</p><h2 id="7705">Implementations</h2><p id="8a68">You can find a number of <a href="https://github.com/multiformats/multistream#implementations">multistream implementations</a> in multiple languages.</p><h2 id="e509">Tutorial</h2><p id="75c9">You can find a hands-on tutorial on multisteam at the <a href="https://hackernoon.com/understanding-ipfs-in-depth-4-6-what-is-multiformats-cf25eef83966#520b">end of this post</a>.</p><h1 id="f9f8">Multistream-Select</h1><h2 id="dce4">Motivation</h2><p id="ccba">Some protocols have sub-protocols or protocol-suites. Often, these sub-protocols are optional extensions. Selecting which protocol to use — or even knowing what is available to choose from — is not simple.</p><p id="4419">What if there was a proto
Options
col that allowed mounting or nesting other protocols, and made it easy to select which protocol to use. (This is sort of like ports, but managed at the protocol level — not the OS — and human-readable).</p><h2 id="eb83">How does the Protocol work?</h2><p id="3a33">The actual protocol is very simple. It is a multistream protocol itself, it has a multicodec header. And it has a set of other protocols available to be used by the remote side. The remote side must enter:</p><div id="eb39"><pre>> <multistream-<span class="hljs-selector-tag">header</span>> > <multistream-<span class="hljs-selector-tag">header</span>-for-whatever-protocol-that-we-want-<span class="hljs-selector-tag">to</span>-<span class="hljs-attribute">speak</span>></pre></div><p id="f178">for example:</p><div id="b954"><pre><span class="hljs-meta prompt_">> </span><span class="language-bash">/ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0</span> <span class="hljs-meta prompt_">> </span><span class="language-bash">/ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3</span></pre></div><ul><li>The <code><multistream-header-of-multistream></code> ensures a protocol selection is happening.</li><li>The <code><multistream-header-for-whatever-protocol-is-then-selected></code> hopefully describes a valid protocol listed. Otherwise, we return a <code>na</code>("not available") error:</li></ul><div id="e48f"><pre><span class="hljs-built_in">na</span>\n</pre></div><div id="52f3"><pre><span class="hljs-comment"># in hex (note the varint prefix = 3)</span> <span class="hljs-comment"># 0x036e610a</span></pre></div><p id="c803">for example:</p><div id="5e02"><pre><span class="hljs-meta prompt_"># </span><span class="language-bash">open connection + send multicodec headers, inc <span class="hljs-keyword">for</span> a protocol not available</span> <span class="hljs-meta prompt_">> </span><span class="language-bash">/ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0</span> <span class="hljs-meta prompt_">> </span><span class="language-bash">/ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/some-protocol-that-is-not-available</span></pre></div><div id="449e"><pre><span class="hljs-comment"># open connection + signal protocol not available.</span> < <span class="hljs-regexp">/ipfs/</span>QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2<span class="hljs-regexp">/multistream-select/</span><span class="hljs-number">0.3</span>.<span class="hljs-number">0</span> < na</pre></div><div id="f4a5"><pre><span class="hljs-comment"># send a selection of a valid protocol + upgrade the conn and send traffic</span> <span class="hljs-punctuation">></span> <span class="hljs-string">/ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3</span> <span class="hljs-punctuation">></span> <span class="hljs-string"><dht-traffic></span> <span class="hljs-punctuation">></span> <span class="hljs-string">...</span></pre></div><div id="9b73"><pre><span class="hljs-comment"># receive a selection of the protocol + sent traffic</span> < <span class="hljs-regexp">/ipfs/</span>QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb<span class="hljs-regexp">/ipfs-dht/</span><span class="hljs-number">0.2</span>.<span class="hljs-number">3</span> < <dht-traffic> < ...</pre></div><p id="f7ce"><b>Note 1:</b> Every multistream message is a “length-prefixed-message”, which means that every message is prepended by a varint that describes the size of the message.</p><p id="5f9c"><b>Note 2:</b> Every multistream message is appended by a <code>\n</code> character, this character is included in the byte count that is accounted for by the prepended varint.</p><h2 id="5775">Listing</h2><p id="a581">It is also possible to “list” the available protocols. A list message is simply:</p><div id="932f"><pre><span class="hljs-built_in">ls</span>\n</pre></div><div id="2eb3"><pre><span class="hljs-comment"># in hex (note the varint prefix = 3)</span> <span class="hljs-attribute">0x036c730a</span></pre></div><p id="712d">So a remote side asking for a protocol listing would look like this:</p><div id="8619"><pre><span class="hljs-comment"># request</span> <span class="hljs-section"><multistream-header-for-multistream-select></span> <span class="hljs-attribute">ls</span>\n</pre></div><div id="917f"><pre><span class="hljs-comment"># response</span> <span class="hljs-variable"><varint-total-response-size-in-bytes></span><span class="hljs-variable"><varint-number-of-protocols></span> <span class="hljs-variable"><multicodec-of-available-protocol></span> <span class="hljs-variable"><multicodec-of-available-protocol></span> <span class="hljs-variable"><multicodec-of-available-protocol></span> ...</pre></div><p id="24ce">For example</p><div id="a7d7"><pre><span class="hljs-meta prompt_"># </span><span class="language-bash">send request</span> <span class="hljs-meta prompt_">> </span><span class="language-bash">/ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0</span> <span class="hljs-meta prompt_">> </span><span class="language-bash"><span class="hljs-built_in">ls</span></span></pre></div><div id="0853"><pre><span class="hljs-comment"># get response</span> < <span class="hljs-regexp">/ipfs/</span>QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2<span class="hljs-regexp">/multistream-select/</span><span class="hljs-number">0.3</span>.<span class="hljs-number">0</span> < <span class="hljs-regexp">/ipfs/</span>QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb<span class="hljs-regexp">/ipfs-dht/</span><span class="hljs-number">0.2</span>.<span class="hljs-number">3</span> < <span class="hljs-regexp">/ipfs/</span>QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb<span class="hljs-regexp">/ipfs-dht/</span><span class="hljs-number">1.0</span>.<span class="hljs-number">0</span> < <span class="hljs-regexp">/ipfs/</span>QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb<span class="hljs-regexp">/ipfs-bitswap/</span><span class="hljs-number">0.4</span>.<span class="hljs-number">3</span> < <span class="hljs-regexp">/ipfs/</span>QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb<span class="hljs-regexp">/ipfs-bitswap/</span><span class="hljs-number">1.0</span>.<span class="hljs-number">0</span></pre></div><div id="747b"><pre># send selection, upgrade connection, and start protocol traffic <span class="hljs-meta prompt_">></span> <span class="language-javascript">/ipfs/<span class="hljs-title class_">QmVXZiejj3</span>sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/<span class="hljs-number">0.2</span><span class="hljs-number">.3</span></span> <span class="hljs-meta prompt_">></span> <span class="language-javascript"><ipfs-dht-request-<span class="hljs-number">0</span>></span> <span class="hljs-meta prompt_">></span> <span class="language-javascript"><ipfs-dht-request-<span class="hljs-number">1</span>></span> <span class="hljs-meta prompt_">></span> <span class="language-javascript">...</span></pre></div><div id="a548"><pre><span class="hljs-comment"># receive selection, and upgraded protocol traffic.</span> <span class="hljs-section">< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3 < <ipfs-dht-response-0></span> <span class="hljs-section">< <ipfs-dht-response-1></span> <span class="hljs-section">< ...</span></pre></div><h2 id="ce47">Example</h2><div id="ae5f"><pre><span class="hljs-comment"># greeting</span> > <span class="hljs-regexp">/http/mu</span>ltiproto.io<span class="hljs-regexp">/multistream-select/</span><span class="hljs-number">1.0</span> < <span class="hljs-regexp">/http/mu</span>ltiproto.io<span class="hljs-regexp">/multistream-select/</span><span class="hljs-number">1.0</span>
<span class="hljs-comment"># list available protocols</span> > <span class="hljs-regexp">/http/mu</span>ltiproto.io<span class="hljs-regexp">/multistream-select/</span><span class="hljs-number">1.0</span> > ls < <span class="hljs-regexp">/http/g</span>oogle.com<span class="hljs-regexp">/spdy/</span><span class="hljs-number">3</span> < <span class="hljs-regexp">/http/</span>w3c.org<span class="hljs-regexp">/http/</span><span class="hljs-number">1.1</span> < <span class="hljs-regexp">/http/</span>w3c.org<span class="hljs-regexp">/http/</span><span class="hljs-number">2</span> < <span class="hljs-regexp">/http/</span>bittorrent.org/<span class="hljs-number">1.2</span> < <span class="hljs-regexp">/http/gi</span>t-scm.org/<span class="hljs-number">1.2</span> < <span class="hljs-regexp">/http/i</span>pfs.io<span class="hljs-regexp">/exchange/</span>bitswap/<span class="hljs-number">1</span> < <span class="hljs-regexp">/http/i</span>pfs.io<span class="hljs-regexp">/routing/</span>dht/<span class="hljs-number">2.0</span>.<span class="hljs-number">2</span> < <span class="hljs-regexp">/http/i</span>pfs.io<span class="hljs-regexp">/network/</span>relay/<span class="hljs-number">0.5</span>.<span class="hljs-number">2</span>
<span class="hljs-comment"># select protocol</span> > <span class="hljs-regexp">/http/mu</span>ltiproto.io<span class="hljs-regexp">/multistream-select/</span><span class="hljs-number">1.0</span> > ls > <span class="hljs-regexp">/http/</span>w3id.org<span class="hljs-regexp">/http/</span><span class="hljs-number">1.1</span> > GET <span class="hljs-regexp">/ HTTP/</span><span class="hljs-number">1.1</span> > < <span class="hljs-regexp">/http/</span>w3id.org<span class="hljs-regexp">/http/</span><span class="hljs-number">1.1</span> < HTTP/<span class="hljs-number">1.1</span> <span class="hljs-number">200</span> OK < Content-Type: text/html; charset=UTF-<span class="hljs-number">8</span> < Content-Length: <span class="hljs-number">12</span> < < Hello World</pre></div><h2 id="fb58">Implementations</h2><p id="b4a8">You can find a number of <a href="https://github.com/multiformats/multistream-select">multistream-select implementations</a> in multiple languages.</p><h2 id="85c0">Tutorial</h2><p id="c509">You can find a hands-on tutorial on multistream-select at the <a href="https://hackernoon.com/understanding-ipfs-in-depth-4-6-what-is-multiformats-cf25eef83966#18a1">end of this post</a>.</p><h1 id="4dd8">Multigram</h1><p id="fb3f">Multigram operates on datagrams, which can be UDP packets, Ethernet frames, etc. and which are unreliable and unordered. All it does is prepend a field to the packet, which signifies the protocol of this packet. The endpoints of the connection can then use different packet handlers per protocol.</p><p id="6809">As Multigram is WIP, I will not go through it in depth. But you want to track its development, you can visit <a href="https://github.com/ipfs/specs/pull/123">here</a>.</p><p id="da46">Alright, now as we have covered “What”, let’s play with it a bit to get a taste of its power 🔥</p><h1 id="e833">Let’s Play with Multiformats 🔥🔥🔥</h1><p id="3e64">Here we will play a bit with Multihash, Multiaddr, and Multicodec. You can also find the complete code for this tutorial <a href="https://github.com/vasa-develop/ultimate-ipfs-series/tree/master/multiformats_tut">here</a>.</p><p id="86fe">We will use the JS implementations for our tutorial.</p><h2 id="f1fc">Project Setup</h2><p id="52f1">Create a folder named <code>multiformats_tut</code> . Now, go into the folder.</p><h2 id="3b42">Installation</h2><p id="1222">Make sure that you have installed <code>npm</code> and <code>nodejs</code> on your system.</p><p id="8b17">Run this command.</p><div id="5a17"><pre>npm <span class="hljs-keyword">install </span><span class="hljs-keyword">multiaddr </span><span class="hljs-keyword">multihashes </span><span class="hljs-keyword">multibase</span></pre></div><h2 id="5045">Let’s write some code</h2><h2 id="da70">Multihash</h2><p id="9490">Create a file <code>multihash_tut.js</code> inside the folder.</p> <figure id="d024"> <div> <div>
<iframe class="gist-iframe" src="/gist/vasa-develop/194c7a6ee22b4732e3e207387ac663f4.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="eb92">Now, run the code using <code>node multihash_tut.js</code> .
You will see the same output that we added in the comments.</p><h2 id="4a20">Multiaddr</h2><p id="c1a5">Create a file <code>multiaddr_tut.js</code> inside the folder.</p> <figure id="d6e6"> <div> <div>
<iframe class="gist-iframe" src="/gist/vasa-develop/bc1629e7057f56cc46bc7ce1aba5b216.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="489b">Now run the code using <code>node multiaddr_tut.js</code> .
You will see the same output that we added in the comments.</p><h2 id="56cc">Multibase</h2><p id="bd6f">Create a file <code>multibase_tut.js</code> inside the folder.</p> <figure id="ace0"> <div> <div>
<iframe class="gist-iframe" src="/gist/vasa-develop/34e629d3d882303b987fbe8e948ce8da.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="c631">Now run the code using <code>node multibase_tut.js</code> .
You will see the same output that we added in the comments.</p><h2 id="d578">Multicodec</h2><p id="70d0">You can find multicodec tutorial <a href="https://github.com/multiformats/js-multicodec#usage">here</a>.</p><h2 id="18a1">Multistream</h2><p id="5aef">You can find multistream tutorial <a href="https://github.com/multiformats/js-multistream-select#usage">here</a>.</p><h2 id="520b">Multistream-select</h2><p id="2b84">You can find multistream-select tutorial <a href="https://github.com/multiformats/js-multistream-select#usage">here</a>.</p><p id="789b">Congratulations🎉🎉 You now have the power to Future-proof a lot of things.</p><p id="50cc">That’s it for this part. In the next part, we will explore Libp2p. You can check it out here:</p><div id="0942" class="link-block"> <a href="https://readmedium.com/understanding-ipfs-in-depth-5-6-what-is-libp2p-f8bf7724d452"> <div> <div> <h2>Understanding IPFS in Depth(5/6): What is Libp2p?</h2> <div><h3>What’s it’s Significance, How it works and Building a libp2p Chat Application</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*RTHchFhxC-Qrf12SC4DzOg.jpeg)"></div> </div> </div> </a> </div><p id="0ea5">Thanks for reading ;)</p><h2 id="dc58">Learned something? Press and hold the 👏 to say “thanks!” and help others find this article.</h2><p id="b6b8"><i>Hold down the clap button if you liked the content! It helps me gain exposure.</i></p><figure id="e1ef"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*PLw22EAlfu-TQhvLa6r4Yg.jpeg"><figcaption></figcaption></figure><p id="7e9e"><b>About the Author</b></p><p id="a752"><a href="http://vaibhavsaini.com/">Vaibhav Saini</a> is a Co-Founder of <a href="http://towardsblockchain.com"><b><i>TowardsBlockchain</i></b></a><b><i>, </i></b><i>an MIT Cambridge Innovation Center incubated startup.</i></p><p id="41d8">He works as Senior blockchain developer and has worked on several blockchain platforms including Ethereum, Quorum, EOS, Nano, Hashgraph, IOTA etc.</p><p id="6e4e">He is a Speaker, Writer and a drop-out from <a href="http://www.iitd.ac.in/">IIT Delhi</a>.</p><p id="c0bd"><b><i>Want to learn more? Check out my previous articles.</i></b></p><div id="68ea" class="link-block"> <a href="https://hackernoon.com/consensuspedia-an-encyclopedia-of-29-consensus-algorithms-e9c4b4b7d08f"> <div> <div> <h2>ConsensusPedia: An Encyclopedia of 30+ Consensus Algorithms</h2> <div><h3>A complete list/comparison of all consensus algorithms.</h3></div> <div><p>hackernoon.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*y462aeJ4wg8HHYSs)"></div> </div> </div> </a> </div><div id="edcc" class="link-block"> <a href="https://hackernoon.com/getting-deep-into-ethereum-how-data-is-stored-in-ethereum-e3f669d96033"> <div> <div> <h2>Getting Deep Into Ethereum: How Data Is Stored In Ethereum?</h2> <div><h3>In this post, we will see how states and transactions are stored in Ethereum and how it is different from Bitcoin.</h3></div> <div><p>hackernoon.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*f0vOn0lRgrY5NjFlbUMBFg.jpeg)"></div> </div> </div> </a> </div><div id="3f42" class="link-block"> <a href="https://hackernoon.com/contractpedia-an-encyclopedia-of-40-smart-contract-platforms-4867f66da1e5"> <div> <div> <h2>ContractPedia: An Encyclopedia of 40+ Smart Contract Platforms</h2> <div><h3>A Complete Comparision of all Blockchain/DLT Platforms</h3></div> <div><p>hackernoon.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*zDyDSLiGCSPQHEMq.jpg)"></div> </div> </div> </a> </div><div id="7c8e" class="link-block"> <a href="https://hackernoon.com/a-beginners-ultimate-guide-to-dags-7fc0dd7f39a2"> <div> <div> <h2>A Beginner’s Ultimate Guide To DAGs</h2> <div><h3>What Are DAGs and How do they Work?</h3></div> <div><p>hackernoon.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*jfWZFflYM18tZKhL.jpg)"></div> </div> </div> </a> </div><p id="87f3"><b><i>Clap 50 times and follow me on Twitter: <a href="https://twitter.com/vasa_develop">@vasa_develop</a></i></b></p></article></body>

Receive curated Web 3.0 content like this with a summary every day via WhatsApp, Telegram, Discord, or Email.
This post is a continuation(part 4) in a new “Understanding IPFS in Depth” series which will help anybody to understand the underlying concepts of IPFS. If you want an overview of what is IPFS and how it works, then you should check out the first part too 😊
In part 3, we discussed the Significance of IPNS(InterPlanetary Naming System), How it Works and its technical specification. We also went through a tutorial in which we created and hosted a website totally using IPFS Stack. You can check it out here:
In this part, we are going to dive deep into Multiformats. We will explore:
If you like high-tech Web3 concepts like Multiformats explained in simple words with interactive tutorials, then head here.
I hope you learn a lot about IPFS from this series. Let’s get started!
Every choice in computing has a tradeoff.
This includes formats, algorithms, encodings, and so on. And even with a great deal of planning, decisions may lead to breaking changes down the road, or to solutions which are no longer optimal. Allowing systems to evolve and grow, without introducing breaking changes is important.
To understand the need for Multiformats, let’s take an example of git protocol. A lot of people use it every single day to use services like Github, Gitlab, Bitbucket, etc. We know that git uses hashes for a lot of things. Right now git uses SHA-1 as its hashing algorithm.
These hashing algorithms play a very important role. They keep things secure. Not just in git, but in healthcare, global financial systems, and governments too.

The way they work is that they are one-way functions. So, you can get an output(Hash) from an input(something) using the hashing function, but it’s practically impossible to get the input from the output.
But with time, as more powerful computers are being developed, some of these hash functions have started failing; meaning now you can get the input from the output, hence breaking the security of the systems that use the function. This is what happened to the MD5 hash function.
So, just like MD5, someday SHA1 will be broken…and then we would need to use a better hashing function. But the problem here is that, as these algorithms are hard-wired to the ecosystem, it’s really hard to make such changes. Plus what happens to all the codebase that was using the old SHA1? All of that will be rendered incompatible…that sucks!
And this problem of non-future-proofing and incompatibility is not just limited to the hashing algorithms. The network protocols are also a prime host of these problems.
Take the example of HTTP/2, which was introduced in 2015. From a network’s viewpoint, HTTP/2 made a few notable changes. As it’s a binary protocol, so any device that assumes it’s HTTP/1.1 is going to break. And that meant changing all the things that use HTTP/1.1, that includes browsers and your web servers.
So, summing up there are a number of problems that we face:
These problems not only break things but also make the whole development cycle slow, as it takes a lot of time to carefully shift the whole system.
Almost every system that we see today was NEVER designed with keeping the fact in mind that someday it is going to be outdated.
This is not the way we want our future to be. So, we need to embrace the fact that things change.
The Multiformats Project introduces a set of standards/protocols that embrace this fact and allows multiple protocols to co-exist so that even if there is a breaking change, the ecosystem still supports all the versions of the protocol.
Now, as we know we know “Why”, let’s see “What”…
The Multiformats Project is a collection of protocols which aim to future-proof systems, today. They do this mainly by enhancing format values with self-description. This allows interoperability, protocol agility, and helps us avoid lock-in.
The self-describing aspects of the protocols have a few stipulations:
Currently, we have the following multiformat protocols:
Each of the projects has its list of implementations in various languages.
We will go through each of these multiformat protocols and try to understand how they work.
Multihash is a protocol for differentiating outputs from various well-established hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes and allow multiple hash functions to coexist.
Multihash is particularly important in systems which depend on cryptographically secure hash functions. Attacks may break the cryptographic properties of secure hash functions. These cryptographic breaks are particularly painful in large tool ecosystems, where tools may have made assumptions about hash values, such as function and digest size. Upgrading becomes a nightmare, like all tools which make those assumptions would have to be upgraded to use the new hash function and new hash digest length. Tools may face serious interoperability problems or error-prone special casing. As we discussed earlier, there can be a number of problems in the git example:
This is precisely where Multihash shines. It was designed for upgrading.
When using Multihash, a system warns the consumers of its hash values that these may have to be upgraded in case of a break. Even though the system may still only use a single hash function at a time, the use of multihash makes it clear to applications that hash values may use different hash functions or be longer in the future. Tooling, applications, and scripts can avoid making assumptions about the length, and read it from the multihash value instead. This way, the vast majority of tooling — which may not do any checking of hashes — would not have to be upgraded at all. This vastly simplifies the upgrade process, avoiding the waste of hundreds or thousands of software engineering hours, deep frustrations, and high blood pressure.
A multihash follows the TLV (type-length-value) pattern.
<hash-func-type> is an unsigned variable integer identifying the hash function. There is a default table, and it is configurable. The default table is the multicodec table.<digest-length> is an unsigned variable integer counting the length of the digest, in bytes<digest-value> is the hash function digest, with a length of exactly <digest-length> bytes.

To understand the significance of multihash format, let’s use some visual aid.






I hope this sums up Multihash pretty well.
You can find a number of multihash implementations in multiple languages.
You can find a hands-on tutorial on multihash at the end of this post.
Multiaddr is a format for encoding addresses from various well-established network protocols. It is useful to write applications that future-proof their use of addresses and allow multiple transport protocols and addresses to coexist.
The current network addressing scheme in the internet IS NOT self-describing. Addresses of the following forms leave much to interpretation and side-band context. The assumptions they make cause applications to also make those assumptions, which causes lots of “this type of address”-specific code. The network addresses and their protocols rust into place, and cannot be displaced by future protocols because the addressing prevents change.
For example, consider:
127.0.0.1:9090 # ip4. is this TCP? or UDP? or something else?
[::1]:3217 # ip6. is this TCP? or UDP? or something else?http://127.0.0.1/baz.jpg
http://foo.com/bar/baz.jpg
//foo.com:1234
# use DNS, to resolve to either ip4 or ip6, but definitely use
# tcp after. or maybe quic... >.<
# these default to TCP port :80.Instead, when addresses are fully qualified, we can build applications that will work with network protocols of the future, and do not accidentally ossify the stack.
/ip4/127.0.0.1/udp/9090/quic /ip6/::1/tcp/3217 /ip4/127.0.0.1/tcp/80/http/baz.jpg /dns4/foo.com/tcp/80/http/bar/baz.jpg /dns6/foo.com/tcp/443/https
A multiaddr value is a recursive (TLV)+ (type-length-value repeating) encoding. It has two forms:
/ip4/127.0.0.1/udp/4023/quic (this is the repeating part)./) (eg. /quic and /ip4/127.0.0.1)<addr-protocol-str-code> is a string code identifying the network protocol. The table of protocols is configurable. The default table is the multicodec table.<addr-value> is the network address value, in natural string form.
<addr-protocol-code> is a variable integer identifying the network protocol. The table of protocols is configurable. The default table is the multicodec table.<addr-value> is the network address value, of length L.
You can find a number of multiaddr implementations in multiple languages.
You can find a hands-on tutorial on multiaddr at the end of this post.
Multibase is a protocol for disambiguating the encoding of base-encoded (e.g., base32, base64, base58, etc.) binary appearing in the text.
When text is encoded as bytes, we can usually use a one-size-fits-all encoding (UTF-8) because we’re always encoding to the same set of 256 bytes (+/- the NUL byte). When that doesn’t work, usually for historical or performance reasons, we can usually infer the encoding from the context.
However, when bytes are encoded as text (using a base encoding), the base choice of base encoding is often restricted by the context. Worse, these restrictions can change based on where the data appears in the text. In some cases, we can only use [a-z0-9]. In others, we can use a larger set of characters but need a compact encoding. This has lead to a large set of “base encodings”, one for every use-case. Unlike when encoding text to bytes, we can't just standardize around a single base encoding because there is no optimal encoding for all cases.
Unfortunately, it’s not always clear what base encoding is used; that’s where multibase comes in. It answers the question:
Given data d encoded into text s, what base is it encoded with?
The Format is:
<base-encoding-character><base-encoded-data>Where <base-encoding-character> is used according to the multibase table.
Here is an example to show how it works.
Consider the following encodings of the same binary string:
4D756C74696261736520697320617765736F6D6521205C6F2F # base16 (hex)
JV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP # base32
YAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt # base58
TXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw== # base64And consider the same encodings with their multibase prefix
F4D756C74696261736520697320617765736F6D6521205C6F2F # base16 F
BJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP # base32 B
zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt # base58 z
MTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw== # base64 MThe base prefixes used are: F, B, z, M.
Now, you can write self-descriptive encoded text :)
You can find a number of multibase implementations in multiple languages.
Now, the next two, Multicodec and Multistream are a bit inter-related, so I will try to explain the motivation behind these two, but you may need to read them both to understand each one of them.
You can find a hands-on tutorial on multibase at the end of this post.
Multistreams are self-describing protocol/encoding streams. Multicodec uses an agreed-upon “protocol table”. It is designed for use in short strings, such as keys or identifiers (i.e CID).
multicodec is a self-describing multiformat, it wraps other formats with a tiny bit of self-description. A multicodec identifier is a varint.
A chunk of data identified by multicodec will look like this:
<multicodec><encoded-data>
# To reduce the cognitive load, we sometimes might write the same line as:
<mc><data>Another useful scenario is when using the multicodec as part of the keys to access data, for example:
# suppose we have a value and a key to retrieve it
"<key>" -> <value># we can use multicodec with the key to know what codec the value is in
"<mc><key>" -> <value>It is worth noting that multicodec works very well in conjunction with multihash and multiaddr, as you can prefix those values with a multicodec to tell what they are.
Multicodec uses “protocol tables” to agree upon the mapping from one multicodec code. These tables can be application specific, though — like with other multiformats — we will keep a globally agreed upon table with common protocols and formats.
Multicodec defines a table for the most common data serialization formats that can be expanded overtime or per application bases, however, in order for two programs to talk with each other, they need to know beforehand which table or table extension is being used.
In order to enable self-descriptive data formats or streams that can be dynamically described, without the formal set of adding a binary packed code to a table, we have multistream, so that applications can adopt multiple data formats for their streams and with that create different protocols.
Now let’s answer a few questions to understand its significance.
Because multistream is too long for identifiers. We needed something shorter.
So that we have no limitation on protocols.
Yes, but we already have to agree on what protocols themselves are, so this is not so hard. The table even leaves some room for custom protocol paths, or you can use your own tables. The standard table is only for common things.
For a period of time, the multibase prefixes lived in this table. However, multibase prefixes are symbols that may map to multiple underlying byte representations (that may overlap with byte sequences used for other multicodecs). Including them in a table for binary/byte identifiers lead to more confusion than it solved.
You can still find the table in multibase.csv.
You can find a number of multicodec implementations in multiple languages.
You can find a hands-on tutorial on multicodec at the end of this post.
Multicodecs are self-describing protocol/encoding streams. (Note that a file is a stream). It’s designed to address the perennial problem:
I have a bitstring, what codec is the data coded with?
Instead of arguing about which data serialization library is the best, let’s just pick the simplest one now, and build upgradability into the system. Choices are never forever. Eventually, all systems are changed. So, embrace this fact of reality, and build change into your system now.
Multicodec frees you from the tyranny of past mistakes. Instead of trying to figure it all out beforehand, or continue using something that we can all agree no longer fits, why not allow the system to evolve and grow with the use cases of today, not yesterday.
To decode an incoming stream of data, a program must either
(1) precludes running protocols that may provide one of many kinds of formats without prior agreement on which. multistream makes (2) neat using self-description.
Moreover, this self-description allows straightforward layering of protocols without having to implement support in the parent (or encapsulating) one.
multistream is a self-describing multiformat, it wraps other formats with a tiny bit of self-description:
<varint-len>/<codec>\n<encoded-data>For example, let’s encode a JSON doc:
// encode some json
const buf = new Buffer(JSON.stringify({ hello: 'world' }))const prefixedBuf = multistream.addPrefix('json', buf) // prepends multicodec ('json')
console.log(prefixedBuf)
// <Buffer 06 2f 6a 73 6f 6e 2f 7b 22 68 65 6c 6c 6f 22 3a 22 77 6f 72 6c 64 22 7d>console.log(prefixedBuf.toString('hex'))
// 062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d// let's get the Codec and then get the data backconst codec = multicodec.getCodec(prefixedBuf)
console.log(codec)
// jsonconsole.log(multistream.rmPrefix(prefixedBuf).toString())
// "{ \"hello\": \"world\" }So, buf is:
hex: 062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d
ascii: /json\n"{\"hello\":\"world\"}"Note that on the ASCII version, the varint at the beginning is not being represented, you should account that.
multistream allows us to specify different protocols in a universal namespace, that way being able to recognize, multiplex, and embed them easily. We use the notion of a path instead of an id because it is meant to be a Unix-friendly URI.
A good path name should be decipherable — meaning that if some machine or developer — who has no idea about your protocol — encounters the path string, they should be able to look it up and resolve how to use it.
An example of a good path name is:
/bittorrent.org/1.0An example of a great path name is:
/ipfs/Qmaa4Rw81a3a1VEx4LxB7HADUAXvZFhCoRdBzsMZyZmqHD/ipfs.protocol /http/w3id.org/ipfs/1.1.0
These path names happen to be resolvable — not just in a “multistream muxer(e.g multistream-select)” but on the internet as a whole (provided the program (or OS) knows how to use the /ipfs and /http protocols).
Now, let’s answer a few questions to understand its significance.
Today, people speak many languages and use common ones as an interface. But every “common language” has evolved over time, or even fundamentally switched. Why should we expect programs to be any different?
And the reality is they’re not. Programs use a variety of encodings. Today we like JSON. Yesterday, XML was all the rage. XDR solved everything, but it’s kinda retro. Protobuf is still too cool for school. capnp (“cap and proto”) is for cerealization hipsters.
The one problem is figuring out what we’re speaking. Humans are pretty smart, we pick up all sorts of languages over time. And we can always resort to pointing and grunting (the ASCII of humanity).
Programs have a harder time. You can’t keep piping JSON into a protobuf decoder and hope they align. So we have to help them out a bit. That’s what multicodec is for.
Yes, check out multicodec. It uses a varint and a table to achieve the same thing.
You can find a number of multistream implementations in multiple languages.
You can find a hands-on tutorial on multisteam at the end of this post.
Some protocols have sub-protocols or protocol-suites. Often, these sub-protocols are optional extensions. Selecting which protocol to use — or even knowing what is available to choose from — is not simple.
What if there was a protocol that allowed mounting or nesting other protocols, and made it easy to select which protocol to use. (This is sort of like ports, but managed at the protocol level — not the OS — and human-readable).
The actual protocol is very simple. It is a multistream protocol itself, it has a multicodec header. And it has a set of other protocols available to be used by the remote side. The remote side must enter:
> <multistream-header>
> <multistream-header-for-whatever-protocol-that-we-want-to-speak>for example:
> /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3<multistream-header-of-multistream> ensures a protocol selection is happening.<multistream-header-for-whatever-protocol-is-then-selected> hopefully describes a valid protocol listed. Otherwise, we return a na("not available") error:na\n# in hex (note the varint prefix = 3)
# 0x036e610afor example:
# open connection + send multicodec headers, inc for a protocol not available
> /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/some-protocol-that-is-not-available# open connection + signal protocol not available.
< /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
< na# send a selection of a valid protocol + upgrade the conn and send traffic
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
> <dht-traffic>
> ...# receive a selection of the protocol + sent traffic
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
< <dht-traffic>
< ...Note 1: Every multistream message is a “length-prefixed-message”, which means that every message is prepended by a varint that describes the size of the message.
Note 2: Every multistream message is appended by a \n character, this character is included in the byte count that is accounted for by the prepended varint.
It is also possible to “list” the available protocols. A list message is simply:
ls\n# in hex (note the varint prefix = 3)
0x036c730aSo a remote side asking for a protocol listing would look like this:
# request
<multistream-header-for-multistream-select>
ls\n# response
<varint-total-response-size-in-bytes><varint-number-of-protocols>
<multicodec-of-available-protocol>
<multicodec-of-available-protocol>
<multicodec-of-available-protocol>
...For example
# send request
> /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
> ls# get response
< /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/1.0.0
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-bitswap/0.4.3
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-bitswap/1.0.0# send selection, upgrade connection, and start protocol traffic
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
> <ipfs-dht-request-0>
> <ipfs-dht-request-1>
> ...# receive selection, and upgraded protocol traffic.
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
< <ipfs-dht-response-0>
< <ipfs-dht-response-1>
< ...# greeting
> /http/multiproto.io/multistream-select/1.0
< /http/multiproto.io/multistream-select/1.0
# list available protocols
> /http/multiproto.io/multistream-select/1.0
> ls
< /http/google.com/spdy/3
< /http/w3c.org/http/1.1
< /http/w3c.org/http/2
< /http/bittorrent.org/1.2
< /http/git-scm.org/1.2
< /http/ipfs.io/exchange/bitswap/1
< /http/ipfs.io/routing/dht/2.0.2
< /http/ipfs.io/network/relay/0.5.2
# select protocol
> /http/multiproto.io/multistream-select/1.0
> ls
> /http/w3id.org/http/1.1
> GET / HTTP/1.1
>
< /http/w3id.org/http/1.1
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 12
<
< Hello WorldYou can find a number of multistream-select implementations in multiple languages.
You can find a hands-on tutorial on multistream-select at the end of this post.
Multigram operates on datagrams, which can be UDP packets, Ethernet frames, etc. and which are unreliable and unordered. All it does is prepend a field to the packet, which signifies the protocol of this packet. The endpoints of the connection can then use different packet handlers per protocol.
As Multigram is WIP, I will not go through it in depth. But you want to track its development, you can visit here.
Alright, now as we have covered “What”, let’s play with it a bit to get a taste of its power 🔥
Here we will play a bit with Multihash, Multiaddr, and Multicodec. You can also find the complete code for this tutorial here.
We will use the JS implementations for our tutorial.
Create a folder named multiformats_tut . Now, go into the folder.
Make sure that you have installed npm and nodejs on your system.
Run this command.
npm install multiaddr multihashes multibaseCreate a file multihash_tut.js inside the folder.
Now, run the code using node multihash_tut.js .
You will see the same output that we added in the comments.
Create a file multiaddr_tut.js inside the folder.
Now run the code using node multiaddr_tut.js .
You will see the same output that we added in the comments.
Create a file multibase_tut.js inside the folder.
Now run the code using node multibase_tut.js .
You will see the same output that we added in the comments.
You can find multicodec tutorial here.
You can find multistream tutorial here.
You can find multistream-select tutorial here.
Congratulations🎉🎉 You now have the power to Future-proof a lot of things.
That’s it for this part. In the next part, we will explore Libp2p. You can check it out here:
Thanks for reading ;)
Hold down the clap button if you liked the content! It helps me gain exposure.

About the Author
Vaibhav Saini is a Co-Founder of TowardsBlockchain, an MIT Cambridge Innovation Center incubated startup.
He works as Senior blockchain developer and has worked on several blockchain platforms including Ethereum, Quorum, EOS, Nano, Hashgraph, IOTA etc.
He is a Speaker, Writer and a drop-out from IIT Delhi.
Want to learn more? Check out my previous articles.
Clap 50 times and follow me on Twitter: @vasa_develop