What is nostrdb?

I thought it might be interesting to do a quick technical writeup of nostrdb and the new nostrdb profile searching. This is the first text search index within nostrdb, if you're interested in the nitty gritty details of things, this article is for you!

For the most part, nostrdb has been a copy of the design of strfry, in terms of its indices and multi-process architecture. strfry doesn't support search (yet), so this is a novel feature of nostrdb. What is strfry? If it weren't for strfry, nostr wouldn't work. It is simply the best way to implement a nostr database and relay. SQL servers are just too slow to serve dynamic queries at scale.

I wanted something like strfry that I could embed into native nostr clients like Damus, this is how nostrdb was born!

First, let's look at how nostrdb at a high level.

nostrdb

nostrdb is an embedded library for native nostr clients. It uses the Lightning-mapped database (LMDB) for very efficient querying. This allows it to skip SQL query parsing and query planning. We don't really need all that with nostr. nostr search filters are a bit more restrictive, so we can build custom indices for the most common nostr queries. This is a huge win CPU-wise, nostr queries can be very dynamic, and skipping the query parsing and planning saves CPU and battery.

Before we get into any of that, let's look at how you would use the library from the highest level. The entrypoint for notes in nostrdb is the ndb_process_event function. When you receive nostr events from another relay, this function is called for each event.

Event processing

ndb_process_event('["EVENT","subid",{kind:1, ...}]')

We begin by queueing the event for processing in the multi-threaded ingester. This allows it to return immediately and not block the client.

The ingester is multi-threaded because validating note signatures can be pretty slow. We want note processing to be as quick as possible so nostrdb doesn't have a bottleneck here.

nostrdb is very smart about not burning CPU when it doesn't need to. The custom json parser will stop when it finds the note ID field, lookup that note in the database to see if we already have it, and then stop JSON parsing if we do. This saves CPU for large notes like contact lists, and skips the need to re-validate the signature as well. Even strfry doesn't make this optimization, so nostrdb has a speed, cpu, and battery life advantage here.

During the processing step, nostrdb will detect different note types such as profile metadata. It will look at the name and display_name field and add a custom index for searching user profiles. Since keys in LMDB are lexicographically sorted and support range queries, our profile search indices is simply (name + pubkey + created_at). This allows you to do a ranged key-lookup on "jb" and it will position the db cursor to the first record that starts with jb. LMDB uses a b+ tree, this is not a linear scan, so it is very fast! Eventually this index will be used for implementing nip-50 search on the nostrdb relay interface, which is coming sometime in the future.

Once we're done writing the indices and validating the note, we store nostr notes in a custom, compact binary format that is easy to access directly in memory from any programming language. We also do this with profile records by leveraging flatbuffers.

So in the end, what does this achieve? It enables you to store as much nostr data as you want with near zero query and serialization overhead. Since the data stored in nostrdb is just flatbuffers, you can access data directly from the operating system's page cache (just a pointer to memory) and from any programming language via flatbuffer's schema and codegen tools. It's so fast it will be guaranteed once of the fastest things in your codebase. You can even run it in your main UI thread so you can worry about other things such as UX and design without introducing complex async logic in your user interface.

What's next

nostrdb is already partially integrated into Damus iOS. Damus uses nostrdb's compact note format for storing notes in memory, but eventually everything will be switched to use nostrdb directly. The next version of Damus testflight will remove the in-memory and core-data profile cache and switch to nostrdb profiles. Damus currently has a complex in-memory trie data structure for profile searches, but it only knows about profiles it has seen during the current session. This is a common source of confusion, sometimes Damus doesn't auto-complete profiles it has already seen sometime in the past. nostrdb profile searches will allow @ mentions and user search to find every profile it has ever seen in realtime. This will be a huge usability win for Damus and other clients looking to adopt nostrdb.

Right now nostrdb exposes its functionality via direct function calls: ndb_get_profile, etc. The plan is that most of these calls won't be necessary. Once nostrdb has nostr filter parsing, it will be able to support dynamic queries of the kind you would expect from your typical nostr relay. This will turn nostrdb-powered clients into relays themselves.

Once nostrdb is more relay-like, then we will be able to leverage strfry's negentropy set-reconciliation queries to only fetch notes that we don't already have. This will be insanely useful for reducing bandwidth usage when querying strfry relays. Eventually it may make the most sense to just let nostrdb do all the websocket querying behind the scenes, becoming a kind of local multiplexing relay.

The future of nostrdb is very exciting. I plan on using it in Damus NoteDeck and Damus Android, why duplicate all this work in every client? nostrdb will make developing native clients much easier.

Support

Damus and nostrdb are mainly supported by donations. nostrdb is open source, MIT-licensed. Damus is GPL. We are dedicated to building the best and most free open source tech on nostr. If you would like to support our work, please consider buying our merch !

Thanks for getting this far! Until next time...