Communication Protocols and Distributed Systems
We often discuss with you the future of the economy regarding the development of the Internet and exciting innovations that can play a key role in its formation. One of the key technologies may be the Interplanetary File System (IPFS). This is a peer-to-peer (p2p) file sharing system, the purpose of which is to radically change the way information is distributed around the world.
IPFS consists of several innovations in communication protocols and distributed systems that have been combined to create a file system that is unlike any other. Thus, in order to understand the breadth and depth of what IPFS is trying to achieve, it is important to understand the technological advances that make this possible.
In order for two people to exchange information, they need common sets of rules, which are known as communication protocols. Previously, computers could not communicate with each other and existed as isolated computing devices until the early 1980s, when the first communication protocols were invented.
Communication protocols usually exist in packets (called a set of protocols) of several levels, each of which is responsible for certain functions. In addition to communication protocols, it is important to understand the relationship between the main computers and their basic structure. This is known as system architecture. There are several types, but only two are important for us: client-server and peer-to-peer networks. The Internet is dominated by client-server relationships, which are based on a set of Internet protocols.
Of these, the Hypertext Transfer Protocol (HTTP) is the basis for communication. This protocol solved many scalability and security issues, but the control over the data still belongs to the one who controls the server, and this can be either an official representative or an attacker. But the client-server model and HTTP have served the Internet quite reliably for most of their history, although they are not designed to transfer large amounts of data, which is a problem today.
Interplanetary File System (IPFS)
IPFS is trying to address the weaknesses of the client-server model and the HTTP network with the new open source p2p file-sharing system. This system is a synthesis of several new and existing innovations. Hundreds of developers around the world have contributed to the development of IPFS and here are its main components.
Distributed Hash Tables
A hash table is a data structure that stores information in the form of key / value pairs. In distributed hash tables (DHTs), data is distributed across a network of computers and effectively coordinated to provide efficient access and search between nodes. The main benefits of DHT are decentralization, resiliency, and scalability.
Nodes do not require central coordination, the system works even when the nodes fail, and DHT can scale to accommodate millions of nodes. Together, these functions lead to the fact that the system, as a rule, is more stable than client-server structures.
The popular Bittorrent file sharing system is able to successfully coordinate data transfer between millions of nodes, relying on an innovative data exchange protocol, but it is limited by the torrent ecosystem. IPFS implements a generic version of this protocol called BitSwap, which works as a market for any type of data.
This is a mixture of Merkle tree and oriented acyclic graph (DAG). Merkle trees ensure that the data blocks exchanged in p2p networks are correct, intact, and immutable. This verification is performed by organizing data blocks using cryptographic hash functions. It is simply a function that takes input and computes a unique alphanumeric string (hash) corresponding to that input value.
It’s easy to verify that input will result in a given hash, but it’s incredibly difficult to guess input from a hash. Individual data blocks are called “end nodes”, which are hashed to form “non-end nodes”. These end nodes can then be concatenated and hashed until all data blocks are represented by a single root hash. Simply put, DAG is a way to model topological sequences of information that do not have loops.
A simple example of a DAG is a family tree. DAG Merkle is basically a data structure in which hashes are used to refer to data blocks and objects in a DAG. This creates several useful features: all content in IPFS can be uniquely identified, since each data block has a unique hash. In addition, the data is resistant to unauthorized change.
IPFS Version Control Systems
Another powerful feature of the Merkle DAG framework is that it allows you to create a distributed version control system (VCS). The most popular example of this is Github, which allows developers to easily, collaboratively, and simultaneously work on projects. Files on Github are stored and managed using the Merkle DAG.
This allows users to independently duplicate and edit multiple versions of a file, save these versions, and later merge the changes with the original file. IPFS uses a similar model for data objects: if objects corresponding to the source data and any new versions are available, you can retrieve the entire file history. Given that data blocks are stored locally throughout the network and can be cached indefinitely, this means that IPFS objects can be stored permanently. In addition, IPFS does not rely on access to Internet protocols.
Data can be distributed in networks built on another network. These features are noteworthy because they are key elements in a censorship-resistant network. This can be a useful tool in promoting freedom of speech to counter the spread of Internet censorship around the world, but we must also be aware of the potential for abuse by bad players.
Self Certifying File System (SFS)
The last important component of IPFS that we will look at is the Self-Controlling File System (SFS). This is a distributed file system that does not require special permissions to exchange data. This is ‘self-certification’ because the data transmitted to the client is authenticated by the file name (which is signed by the server). As a result, you can securely access remote content with local storage transparency.
IPFS builds on this concept for creating an interplanetary namespace (IPNS). This is SFS, using public key cryptography to self-certify objects published by network users. We mentioned earlier that all objects in IPFS can be uniquely identified, but this also applies to hosts.
Each node in the network has a set of public keys, private keys and a node identifier, which is a hash of its public key. Therefore, nodes can use their private keys to “sign” any data objects that they publish, and the authenticity of this data can be verified using the sender’s public key.
Why is it important
IPFS provides high throughput, low latency, data distribution, decentralization and security. It can be used to deliver content to websites, global file storage with automatic version control and backup, to ensure secure file sharing and encrypted communication.
It is also used as an additional file system for public blockchains and other p2p applications. Now it may take several dollars to store a kilobyte of data in the Ethereum smart contract. This is a major hurdle, and there is currently a massive increase in new decentralized applications (DApps). IPFS is compatible with smart contracts and blockchain data, so it can add reliable and affordable storage capacity to the Ethereum ecosystem. Trying to make Ethereum blockchain data natively available in IPFS is a separate protocol known as IPLD (interplanetary related data).
Despite the impressive performance of IPFS, some problems have not yet been fully resolved. First, IPNS content addressing is currently not very user friendly. A typical IPNS link is as follows:
These links can be shortened to simpler names using the Domain Name System (DNS), but this creates an external point of failure for content distribution. However, the content is still available through the original IPNS address. Some users also report that IPNS can be slow when resolving domain names with a delay of up to several seconds.
IPFS also has few incentives for hosts to support long-term network backups. Nodes can choose to clear cached data to save space, that is, theoretically, files can eventually “disappear” over time if there are no nodes left in which the data is stored. At current levels, this is not a significant problem, but in the long run, backing up large amounts of data requires strong economic incentives.
IPFS is a very ambitious undertaking. Using IPFS is quite interesting, and understanding the technical magic that makes this possible is even more fun. If successful, IPFS and its additional protocols can provide a fault-tolerant infrastructure for the next generation of the Internet. A network, which by definition should be widespread, secure, and transparent, can truly become one.