Why Nostr? What is Njump?
2023-05-07 18:51:57
in reply to

mikedilger on Nostr: Very cool. Thanks for the explanations. On the separate mmap, I'm writing the ...

Very cool. Thanks for the explanations.

On the separate mmap, I'm writing the serialized OpaqueEvent first, then writing the indexes. On a crash we might have a useless event that didn't get indexed, but it wouldn't be corrupt. The writer locks a write-lock, appends, then updates the 'end' pointer at the beginning of the map, then releases the write lock.

Also, these events are opaque (customer-ready, pre-validated). I never access the fields.

I noticed strfry does an index scan of these flatbuffers (of stripped down events) when necessary, using the fields to filter. I'm instead trying to use multiple btree-like indexes:
id -> offset
(created_at, id) -> offset
(pubkey, created_at, id) -> offset
(event_kind, created_at, id) -> offset
(tagdata, created_at, id) -> offset
Each key has to be unique per event, hence the id on each. And the created_at is to scan a since/until range. It's a lot of data to store, but I think that is okay on a personal relay. I don't think it would cause more page faults and cache misses.

I'm finding the set of events that match the filter by doing a functional intersection over the results from these filters. I say functional because I don't collect each result into a set and then intersect them, I intersect as I go: first index: collect the offsets in a hashset; second index: check each in the previous hashset and if it exists, copy to the next hashset (thus functionally being a set intersection) otherwise drop it.

Maybe this is a dumb algorithm. Databases are more my brother's thing, but I'm stubbernly trying to reinvent them myself (I would never learn as well otherwise).

I think deleted events will just be marked as such in another hash table, and a periodic rewrite will filter them out. I haven't really worked that out and now that I'm thinking about it, it could be quite expensive.

A lot of these OS level features like sendfile() or pwritev() become very difficult to use when relying on an upstream library that doesn't expose it's API in a way to utilize those. If I use tungstenite for websockets, I don't think I can write into it's outbound buffers at a low level.

At this point I'm really still exploring. It will change.
Author Public Key
npub1acg6thl5psv62405rljzkj8spesceyfz2c32udakc2ak0dmvfeyse9p35c