
“We’re almost there in finding a truly clear use case for Bitcoin,” jokes a commentator as he points to a fresh application of Bitcoin’s proof of work (PoW). And no, it’s not about stashing your digital gold – this time it’s about something practical: warding off aggressive bots that scrape the internet.
And it’s not a small problem. A FOSS project reported that a crawler in May 2024 raked in a whopping 73 TB of zipped HTML files – 10 TB in one day. The bill? Over $5,000 in bandwidth costs. In the past, you had one IP that scoured 100 pages, now it’s 100 IPs that grab one page. The good news? Those bots often come from Big Tech servers like Amazon or Google Cloud. You could block them, but those tech giants have IPs like grains of sand. Anubis, with its open-source proof of work approach, seems a smarter answer. It’s fast, free, and forces bots into a resource battle they’d rather avoid. The project explains it as follows:
“Anubis uses a multi-threaded proof of work check to ensure that users’ browsers are up-to-date and support modern standards.”
Sounds fancy, but a smart coder can probably bypass it. However, that’s not the point – it’s about curbing the resource hunger of those bot models. The downside? It also blocks search engine crawlers, preventing your site from being indexed. But hey, who still relies on Google Search? “It’s only good for your grocery list,” someone sneers. The search giant seems lost in its own bureaucracy – with a GDP size of a small country, that might not be so strange.
Half jokingly, someone suggested a return to the 90s: pages full of blue links to index the web. But bot models seem to be the modern solution for organizing info. They just lag behind on fresh data and rely on search engines for the current stuff. This puts OpenAI and Anthropic up against Google in a battle for the search crown. Or maybe an open-source challenger will emerge? Imagine: you drop a new page. Who do you tell? For people, there’s Twitter, for bots, Google. But why always Google? “We’ve grown up with it so much that we don’t even think about alternatives,” says an insider.
What if you have a decentralized server where publishers ping their new pages? No need for Google, data open for everyone. It targets server administrators, not consumers, but it’s those administrators who have been forgotten for too long. You don’t need a billion-dollar data center to run a script that checks the latest URLs. Publishers often want you to find their stuff – that’s the difference with those ‘follow-the-link’ scrapers.
Perhaps we should prohibit less and organize more. For up-to-date info, smart protocols are the answer, but with heavy model crawlers, it’s about property rights. “Nobody would complain if we got something back for that crawling,” it is said. Now it feels like they’re replacing you without even a click. Yet that’s shortsighted: these bots don’t create new worlds, they archive what we make. And with open-source models like Llama or Deepseek on your laptop, power is back with us.
For the end user, they are info organizers on steroids – but always check the source, because they sometimes talk nonsense. They don’t replace us, they make it more efficient. It’s a net gain, provided you go along with the shift. People create, bots collect. Only those bots need to stop making terabytes of mistakes. Big players need to compensate for that, but the real fix? A collective reinvention of how pages talk – an open-source publisher-server protocol. Until then? Let them sweat with proof of work. The crypto world shows the way!