Written by Jonathan Johnson. Published 2022-02-26.
What is BonsaiDb?
BonsaiDb is a new database aiming to be the most developer-friendly Rust database. BonsaiDb has a unique feature set geared at solving many common data problems. We have a page dedicated to answering the question: What is BonsaiDb?.
It's the last weekend of the month, and I want to make it a habit at the end of each month to reflect on the previous month as well as talk about what I'm currently focused on or what is coming next.
I released our initial alpha on February 5. I am ecstatic with the warm reception. Our Discord server has grown with several active users who have been running BonsaiDb through its paces. Several bugs have been discovered and fixed, with these releases happening in February:
In addition to these releases, the underlying storage layer (Nebari) has been updated as well with several new features, bug fixes, and performance improvements. The primary performance improvements that have been released have centered around better usage of Rust's type system to avoid extra allocations.
@ModProg noticed that I had open issues for adding derive macros for some common traits. He became the second contributor to the BonsaiDb repository when these macros were added in v0.1.1.
I have an important design philosophy with BonsaiDb: Everything should be as ergonomic as possible without macros. Macros should be used to improve an already-well-designed API. I was comfortable writing schemas by hand, but I recognized how often I copy and pasted boilerplate code and then tweaked small portions -- a good sign that a macro would help!
The way these macros are written is using a new crate @ModProg wrote: attribute-derive. I've made a few updates to the macros since his pull request, and it's been a wonderful experience. If you are looking into writing an attribute proc macro, you might want to consider using this crate to make simplify parsing and support more Rust-y syntaxes.
I've tried to be very responsive to all questions, feedback, and issues. It's fun seeing BonsaiDb evaluated in new ways. As I recalled in the LZ4 Compression feature introduction, I introduced that feature the day after a potential user ran into space issues when evaluating BonsaiDb against their current Sled code.
@ModProg one morning tried running BonsaiDb on a Raspberry Pi, which
led to us diagnosing a bug/edge case that's been fixed in the sysinfo
crate. It was a fun
morning debugging BonsaiDb, an async codebase, over a Discord screenshare
showing an ssh session into the Raspberry Pi running GDB!
A few days ago, the same user that was testing a somewhat large dataset reported
an issue. It turned out to be a long-standing
bug
in Nebari's transaction log reading code. The unit tests were inadequate, and
the only reason it was noticed was due to the LZ4 decompression failing once a
transaction log entry was large enough to be stored compressed across multiple
pages. The associated data isn't currently being used outside of the
list_executed_transactions()
function, but eventually is planned to be a core
part of replication. A cargo update
is all that is needed for BonsaiDb to
adopt the updated version of Nebari.
Despite the bug that was just found, I still felt fairly confident in BonsaiDb's stability. Yet, it showed I hadn't really tested a big dataset. With a user trying to use BonsaiDb with a large dataset, I felt like I should do my own testing to feel as confident in BonsaiDb's abilities as I can be.
After looking through a few locations, I stumbled upon the Open Library Data Dumps. The data set I'm going to talk about in this post contains these entities:
Collection | Records |
---|---|
Authors | 9,145,605 |
Editions | 33,102,390 |
Works | 24,010,896 |
The uncompressed file that this data is loaded from is 51,447mb. It contains additional data from the types listed, so direct size comparisons aren't able to be done post-import.
Writing an importer was a fun project. Since the data being imported is too large to read into RAM before processing, I started with a streaming TSV parser that uses flume to send parsed records to another task. This uses a bounded channel which will park the TSV parser thread if it outpaces the code that saves the records to the disk.
The data importing logic is fairly straightforward. I have one method that receives records from the TSV parser and creates batched transactions.
Importing the data with LZ4 compression enabled took 60m56s, and with compression disabled it took 64m12s. The resulting file sizes are interesting:
Collection | LZ4 Uncompacted | LZ4 Compacted | Uncompacted | Compacted |
---|---|---|---|---|
Authors | 13,316mb | 3,675mb | 23,508mb | 4,159mb |
Editions | 74,459mb | 33,155mb | 118,330mb | 38,372mb |
Works | 40,443mb | 12,082mb | 71,601mb | 13,565mb |
BonsaiDb uses Nebari, which uses an append-only file format for its database. As the internal trees are built, old nodes are left behind on disk. The only way to reclaim the disk space currently is to compact the database -- a process which rewrites all the active data to a fresh file and swaps the files atomically once fully synchronized. Compacting the compressed database took 6m55s, and compacting the uncompressed database took 7m20s.
After testing importing and overwriting the database several times, my confidence level in BonsaiDb grew. I built a simple CLI that allows querying authors, editions, and works by their unique ID. Everything was working great.
I wanted to add support to show all the books (works) that an author wrote.
In the data coming from OpenLibrary, the Work
type has a list of author roles.
This is the
View
definition that enables this query:
#[derive(View, Debug, Clone)]
#[view(name = "by-author", collection = Work, key = String, value = u32)]
struct WorksByAuthor;
impl CollectionViewSchema for WorksByAuthor {
type View = Self;
fn map(
&self,
document: CollectionDocument<<Self::View as View>::Collection>,
) -> ViewMapResult<Self::View> {
document
.contents
.authors
.into_iter()
.filter_map(|role| role.author)
.map(|author| {
document
.header
.emit_key_and_value(author.into_key().replace("/a/", "/authors/"), 1)
})
.collect()
}
}
The data has some sanitization issues, including that keys sometimes are
shortened from /authors/UNIQUEID
to /a/UNIQUEID
. Because importing the data
takes so long, I chose to clean up the data post-import rather than force myself
to re-run the import with some code to clean it up while importing.
The entire function to display an author in the CLI is:
async fn summarize(&self, database: &Database) -> anyhow::Result<()> {
if let Some(name) = &self.name {
println!("Name: {}", name);
}
if let Some(bio) = &self.bio {
println!("Biography:\n{}", bio.value())
}
let works = database
.view::<WorksByAuthor>()
.with_key(self.key.clone())
.query_with_collection_docs()
.await?;
if !works.is_empty() {
println!("Works:");
for work in works.documents.values() {
if let Some(title) = &work.contents.title {
println!("{}: {}", work.contents.key, title)
}
}
}
Ok(())
}
While this was easy to implement, I knew that this was going to be a painful
process to run. BonsaiDb's views are lazily evaluated (with the exception of
unique views). The first time the view is accessed, the view will need to have
it's map()
function invoked for each document stored in the collection. For
the Works collection, that means processing 24 million rows.
The initial query took over an hour. Subsequent queries are fast: the time open-library author OL1394865A
command completes in less than 400ms once the
view is indexed.
I really wanted to be able to turn this example into a benchmarking suite, and the only way to do that without driving myself crazy was to start optimizing.
The initial implementation of the view mapper was aimed at being correct, not necessarily fast. It was originally designed to operate within Sled, which allowed non-transactional reads and writes. The thought was that I could simply parallelize some loops and magically the view mapping system would be faster because Sled would handle ensuring the data was written to disk correctly with multi-threaded writes.
After switching to Nebari, the view mapper always runs in a transactional context. Each view mapping operation created its own transaction -- the slowest way to write to the database. The biggest priority was going to be to refactor the view mapper to batch the operations into larger transactions.
My initial refactoring did just that and lowered the execution time significantly (~70%), but it also introduced a subtle bug. As a result, not all mappings were being saved, but I didn't notice the issue until much later in the refactoring process.
One issue I was running into is that Nebari's TransactionTree
required an
exclusive borrow on the ExecutingTransaction
. This prevented being able to read
documents in one thread while writing view entries in another thread. I
refactored Nebari's transaction handling
code
to allow this sort of parallelization.
After adopting the new functionality, here are the view mapping times on the compressed dataset:
Collection | Size | View | Unoptimized (LZ4) | Optimized (LZ4) | Optimized | Uncompacted | Compacted |
---|---|---|---|---|---|---|---|
Works | 12,082mb | WorksByAuthor | 1h02m03s | 10m16s | 10m13s | 16,241mb | 1,863mb |
Editions | 33,155mb | EditionsByWork | 26m52s | 12m23s | 12m48s | 17,507mb | 3,396mb |
While preparing this post, I've re-run the pre-optimized code and optimized code
to verify my notes on timings. When initially starting work on optimization, I
hadn't implemented EditionsByWork
. I've verified the output of both
unoptimized versions and optimized versions match a random set of entries that
I've compared against their website. To clarify some of the numbers: all sizes
listed above are the compressed sizes. Adding all sizes made the table too large
to be easily understood.
At this stage, I no longer have any low hanging fruit from an algorithm perspective. The next step in optimization would be to start profiling, but I've reached my initial goals, so I'm going to be focusing on wrapping up this pull request by finishing this example. I would enjoy turning this into a benchmark suite, but I'm also not excited at the prospects of how long such a suite would take to debug fully.
I'm going to continue prioritizing any stability or performance issues as they are reported. There have been a few repeated requests that I'm going to be working on soon: "best practices" overview, internal archicture documentation, and a high-level roadmap.
Several people have expressed interest in learning more about BonsaiDb's internals and roadmap with a goal of potentially contributing. I will be working on those requests soon. In the meantime, I want to extend my offer to anyone reading this: if you have any questions about any part of BonsaiDb, don't hesitate to ask! Regardless of your experience with Rust, if you're excited at the idea of helping build BonsaiDb, I will be happy to help you get up to speed.
Aside from improved documentation and project planning, I was working on building a persistent job queue, which I will be returning to and hopefully ship in March. I'm building this feature as a standalone crate that will allow users to create independent job queues with different job distribution strategies, and the long-term goal is for this crate to be the first "plugin" for BonsaiDb.
Our homepage has basic setup instructions and a list of examples. We have started writing a user's guide, and we have tried to write good documentation.
We would love to hear from you if you have questions or feedback. We have community Discourse forums and a Discord server, but also welcome anyone to open an issue with any questions or feedback.
We dream big with BonsaiDb, and we believe that it can simplify writing and deploying complex, data-driven applications in Rust. We would love additional contributors who have similar passions and ambitions.
Lastly, if you build something with one of our libraries, we would love to hear about it. Nothing makes us happier than hearing people are building things with our crates!