How IPFS is broken

I once fell for this talk about “content-addressing”. It sounds very nice. You know a certain file exists, you know there are probably people who have it, but you don’t know where or if it is hosted on a domain somewhere. With content-addressing you can just say “start” and the download will start. You don’t have to care.

Other magic properties that address common frustrations: webpages don’t go offline, links don’t break, valuable content always finds its way, other people will distribute your website for you, any content can be transmitted easily to people near you without anyone having to rely on third-party centralized servers.

But you know what? Saying a thing is good doesn’t automatically make it possible and working. For example: saying stuff is addressed by their content doesn’t change the fact that the internet is “location-addressed” and you still have to know where peers that have the data you want are and connect to them.

And what is the solution for that? A DHT!

DHT?

Turns out DHTs have terrible incentive structure (as you would expect, no one wants to hold and serve data they don’t care about to others for free) and the IPFS experience proves it doesn’t work even in a small network like the IPFS of today.

If you have run an IPFS client you’ll notice how much it clogs your computer. Or maybe you don’t, if you are very rich and have a really powerful computer, but still, it’s not something suitable to be run on the entire world, and on web pages, and servers, and mobile devices. I imagine there may be a lot of unoptimized code and technical debt responsible for these and other problems, but the DHT is certainly the biggest part of it. IPFS can open up to 1000 connections by default and suck up all your bandwidth – and that’s just for exchanging keys with other DHT peers.

Even if you’re in the “client” mode and limit your connections you’ll still get overwhelmed by connections that do stuff I don’t understand – and it makes no sense to run an IPFS node as a client, that defeats the entire purpose of making every person host files they have and content-addressability in general, centralizes the network and brings back the dichotomy client/server that IPFS was created to replace.

Connections?

So, DHTs are a fatal flaw for a network that plans to be big and interplanetary. But that’s not the only problem.

Finding content on IPFS is the most slow experience ever and for some reason I don’t understand downloading is even slower. Even if you are in the same LAN of another machine that has the content you need it will still take hours to download some small file you would do in seconds with scp – that’s considering that IPFS managed to find the other machine, otherwise your command will just be stuck for days.

Now even if you ignore that IPFS objects should be content-addressable and not location-addressable and, knowing which peer has the content you want, you go there and explicitly tell IPFS to connect to the peer directly, maybe you can get some seconds of (slow) download, but then IPFS will drop the connection and the download will stop. Sometimes – but not always – it helps to add the peer address to your bootstrap nodes list (but notice this isn’t something you should be doing at all).

IPFS Apps?

Now consider the kind of marketing IPFS does: it tells people to build “apps” on IPFS. It sponsors “databases” on top of IPFS. It basically advertises itself as a place where developers can just connect their apps to and all users will automatically be connected to each other, data will be saved somewhere between them all and immediately available, everything will work in a peer-to-peer manner.

Except it doesn’t work that way at all. “libp2p”, the IPFS library for connecting people, is broken and is rewritten every 6 months, but they keep their beautiful landing pages that say everything works magically and you can just plug it in. I’m not saying they should have everything perfect, but at least they should be honest about what they truly have in place.

It’s impossible to connect to other people, after years there’s no js-ipfs and go-ipfs interoperability (and yet they advertise there will be python-ipfs, haskell-ipfs, whoknowswhat-ipfs), connections get dropped and many other problems.

So basically all IPFS “apps” out there are just apps that want to connect two peers but can’t do it manually because browsers and the IPv4/NAT network don’t provide easy ways to do it and WebRTC is hard and requires servers. They have nothing to do with “content-addressing” anything, they are not trying to build “a forest of merkle trees” nor to distribute or archive content so it can be accessed by all. I don’t understand why IPFS has changed its core message to this “full-stack p2p network” thing instead of the basic content-addressable idea.

IPNS?

And what about the database stuff? How can you “content-address” a database with values that are supposed to change? Their approach is to just save all values, past and present, and then use new DHT entries to communicate what are the newest value. This is the IPNS thing.

Apparently just after coming up with the idea of content-addressability IPFS folks realized this would never be able to replace the normal internet as no one would even know what kinds of content existed or when some content was updated – and they didn’t want to coexist with the normal internet, they wanted to replace it all because this message is more bold and gets more funding, maybe?

So they invented IPNS, the name system that introduces location-addressability back into the system that was supposed to be only content-addressable.

And how do they manage to do it? Again, DHTs. And does it work? Not really. It’s limited, slow, much slower than normal content-addressing fetches, most of the times it doesn’t even work after hours. But still although developers will tell it is not working yet the IPFS marketing will talk about it as if it was a thing.

Archiving content?

The main use case I had for IPFS was to store content that I personally cared about and that other people might care too, like old articles from dead websites, and videos, sometimes entire websites before they’re taken down.

So I did that. Over many months I’ve archived stuff on IPFS. The IPFS API and CLI don’t make it easy to track where stuff are. The pin command doesn’t help as it just throws your pinned hash in a sea of hashes and subhashes and you’re never able to find again what you have pinned.

The IPFS daemon has a fake filesystem that is half-baked in functionality but allows you to locally address things by names in a tree structure. Very hard to update or add new things to it, but still doable. It allows you to give names to hashes, basically. I even began to write a wrapper for it, but suddenly after many weeks of careful content curation and distribution all my entries in the fake filesystem were gone.

Despite not having lost any of the files I did lose everything, as I couldn’t find them in the sea of hashes I had in my own computer. After some digging and help from IPFS developers I managed to recover a part of it, but it involved hacks. My things vanished because of a bug at the fake filesystem. The bug was fixed, but soon after I experienced a similar (new) bug. After that I even tried to build a service for hash archival and discovery, but as all the problems listed above began to pile up I eventually gave up. There were also problems of content canonicalization, the code the IPFS daemon use to serve default HTML content over HTTP, problems with the IPFS browser extension and others.

Future-proof?

One of the core advertised features of IPFS was that it made content future-proof. I’m not sure they used this expression, but basically you have content, you hash that, you get an address that never expires for that content, now everybody can refer to the same thing by the same name. Actually, it’s better: content is split and hashed in a merkle-tree, so there’s fine-grained deduplication, people can store only chunks of files and when a file is to be downloaded lots of people can serve it at the same time, like torrents.

But then come the protocol upgrades. IPFS has used different kinds of hashing algorithms, different ways to format the hashes, and will change the default algorithm for building the merkle-trees, so basically the same content now has a gigantic number of possible names/addresses, which defeats the entire purpose, and yes, files hashed using different strategies aren’t automagically compatible.

Actually, the merkle algorithm could have been changed by each person on a file-by-file basis since the beginning (you could for example split a book file by chapter or page instead of by chunks of bytes) – although probably no one ever did that. I know it’s not easy to come up with the perfect hashing strategy in the first go, but the way these matters are being approached make me wonder that IPFS promoters aren’t really worried about future-proof, or maybe we’re just in Beta phase forever.

Ethereum?

This is also a big problem. IPFS is built by Ethereum enthusiasts. I can’t read the mind of people behind IPFS, but I would imagine they have a poor understanding of incentives like the Ethereum people, and they tend towards scammer-like behavior like getting a ton of funds for investors in exchange for promises they don’t know they can fulfill (like Filecoin and IPFS itself) based on half-truths, changing stuff in the middle of the road because some top-managers decided they wanted to change (move fast and break things) and squatting fancy names like “distributed web”.

The way they market IPFS (which is not the main thing IPFS was initially designed to do) as a “peer-to-peer cloud” is very seductive for Ethereum developers just like Ethereum itself is: as a place somewhere that will run your code for you so you don’t have to host a server or have any responsibility, and then Infura will serve the content to everybody. In the same vein, Infura is also hosting and serving IPFS content for Ethereum developers these days for free. Ironically, just like the Ethereum hoax peer-to-peer money, IPFS peer-to-peer network may begin to work better for end users as things get more and more centralized.