2 posts tagged with "storage"

Marginal Gains: Major Impact

December 16, 2025 · 6 min read

Project Maintainer

In professional cycling, the concept of marginal gains became famous through Team Sky. Rather than chasing dramatic breakthroughs, they focused on making hundreds of small improvements: slightly better bike fit, marginally lighter components, improved sleep, cleaner nutrition. None of these changes mattered much on their own, but together they reshaped performance—and helped dominate the sport for years.

Software systems, especially large distributed ones, work much the same way. Rarely does a single feature transform everything overnight. More often, real progress comes from careful attention to small details: shaving latency here, reducing contention there, simplifying a hot path, rethinking a data structure.

Stalwart v0.15 is very much a release in this spirit. It does not introduce a long list of headline features. Instead, it is the result of revisiting core subsystems—spam filtering, search, storage, and data access—and making many targeted improvements that, together, have a significant impact on performance, reliability, and usability.

Rethinking Spam Classification

Stalwart v0.14 (and earlier) included a spam classifier which was a direct port of the classifier used by Rspamd. This classifier is grounded in Bayesian theory and uses more advanced methods to combine probabilities, including sparse bigrams (OSB) and the inverse chi-square distribution. This approach is well understood and robust, particularly when training data is limited. It produces reasonable results quickly and has a long track record in production systems.

However, it comes with a significant cost in distributed environments. Both Rspamd and Stalwart v0.14 relied on OSB-5, which generates a very large number of features per message. Each of these features was stored in Redis. Even with aggressive caching, training or classifying a single message could involve hundreds or even thousands of round trips to Redis. At scale, this becomes a bottleneck: latency increases, throughput drops, and horizontal scaling becomes inefficient.

For v0.15, we went back to first principles and redesigned the spam classifier from scratch, guided by more recent research. We evaluated several models and ultimately settled on a logistic regression classifier trained using the FTRL-Proximal (Follow the Regularized Leader) algorithm. This algorithm—famously used by Google for large-scale online learning—is particularly well suited to spam classification workloads where models must be updated continuously and efficiently.

One immediate benefit of this approach is that Stalwart can now support collaborative filtering out of the box. Multiple users can benefit from a single shared classifier trained on aggregated data, dramatically improving accuracy in environments with many accounts. At the same time, individual users can still maintain their own personal classifiers trained solely on their own messages.

The new classifier also adopts feature hashing (often called the hashing trick) to keep the feature space compact and predictable. This significantly reduces memory usage and improves cache locality. For very large deployments, cuckoo feature hashing is available to further reduce hash collisions. If you are interested in the theoretical background, the original feature hashing paper is available at Feature Hashing for Large Scale Multitask Learning and the cuckoo feature hashing paper at Cuckoo Feature Hashing: Dynamic Weight Sharing for Sparse Analytics.

With the default configuration of 2²⁰ features, the entire model fits in approximately 4 MB of memory and is loaded only once after each training cycle. The result is a classifier that is both faster and more accurate than the previous version, particularly in distributed deployments where network overhead matters.

We also evaluated RetVec (Resilient and Efficient Text Vectorizer), the embedding technique used by Gmail. RetVec excels at generating compact semantic representations of email content, but it is primarily designed to feed neural networks and deep learning models. For now, logistic regression offers a better balance of simplicity, performance, and operational transparency. That said, we plan to ship a pre-trained RetVec model alongside BERT in a future release.

A Faster, Simpler Search Layer

Search is another area where small architectural choices have outsized effects. In Stalwart v0.15, the search layer has been substantially rewritten.

For deployments using PostgreSQL or MySQL, Stalwart now leverages the built-in full-text search capabilities of the database instead of relying on a custom implementation. This reduces complexity, improves query planning, and allows the database to do what it already does well.

We have also added support for Meilisearch, a lightweight, fast search engine with excellent performance characteristics and simple operational semantics. Meilisearch offers low-latency full-text search, typo tolerance, and efficient indexing, making it a good fit for many Stalwart deployments.

For large installations backed by FoundationDB, we plan to significantly improve the built-in search functionality by embedding Seekstorm. Until that work is complete, we recommend pairing FoundationDB with an external search engine such as OpenSearch or Meilisearch to achieve the best performance.

Faster Database Access and Leaner Storage

Stalwart v0.15 includes a number of optimizations to the database access layer. We have reduced the number of reads and writes required to store and retrieve messages, particularly along hot paths such as IMAP and JMAP access. The result is noticeably faster message retrieval and improved overall responsiveness under load.

In parallel, we revisited how email metadata is stored and reduced some serialization overhead. This lowers disk usage and improves cache efficiency, which again compounds into better performance at scale.

Individually, these changes are modest. Collectively, they make the system feel tighter and more predictable under real-world workloads.

Meet Us at FOSDEM 2026

We are excited to announce that Stalwart will be present at FOSDEM 2026 in Brussels, Belgium.

Our talk, Stalwart: Can Open Source do Gmail-scale Email?, builds naturally on the marginal gains theme. While v0.15 focuses on incremental improvements, the talk zooms out to the other end of the spectrum: what it takes to design and operate a truly large-scale email system.

Using a 1,024-node cluster as a concrete example, we will explore how modern providers store and index petabytes of messages, survive hardware failures without data loss, and run spam and phishing filtering across billions of daily deliveries. We will walk through the architectural patterns behind distributed storage, large-scale spam filtering, MTA queue management, and load balancing for IMAP, JMAP, and SMTP.

We will also discuss cluster coordination, orchestration, autoscaling, and how to reason about failure before it happens. The goal is to give attendees a practical understanding of how planet-scale email systems are built, and how those same principles can be applied using open-source technology.

If you are attending FOSDEM, we would love to meet you, answer questions, and talk about where Stalwart is heading next.

Looking Ahead

Stalwart v0.15 is a release shaped by the philosophy of marginal gains. There are no new flashy features, but there are dozens of small improvements that add up to something meaningful: faster spam classification, better scalability, simpler search, leaner storage, and more predictable performance.

If you are already running Stalwart, we encourage you to try v0.15 and let us know how it performs in your environment. Your feedback continues to guide where we focus next.

The team is already working on future releases that build on this foundation. With the core systems now leaner and more robust, we can continue to add new capabilities without compromising performance or reliability.

As with cycling, progress comes from steady, thoughtful refinement. Stalwart v0.15 is one more step in that direction.

Elevating Performance and Flexibility

December 27, 2023 · 3 min read

Mauro D.

Project Maintainer

We are excited to announce the release of Stalwart Mail Server v0.5.0. As we approach the end of the year, this significant update marks a major advancement in our journey to provide a robust, efficient, and versatile mail server solution. This latest version incorporates a range of performance enhancements, storage layer improvements, and new features, designed to elevate your email server experience.

Performance Enhancements

In the realm of performance, Stalwart v0.5.0 introduces multiple improvements in how messages are handled and stored. Messages are now parsed only once, with their offsets stored in the database. This approach eliminates the need for parsing messages on every FETCH request, significantly boosting server efficiency and response time. Moreover, the server now performs full-text indexing in the background, seamlessly enhancing search capabilities. We have also optimized our database access functions, ensuring smoother and faster interactions with the underlying data store.

Storage Layer Improvements

Stalwart v0.5.0 expands the options for storage backends. In addition to FoundationDB and SQLite, users can now choose RocksDB, PostgreSQL, or MySQL as their storage backend, offering flexibility to suit different operational needs. Blob storage has also been made more versatile, allowing blobs to be stored in any of the supported data stores, not just limited to the file system or S3/MinIO. This update provides more integrated data management solutions. Full-text search capabilities have been enhanced, with options to conduct searches internally or delegate them to ElasticSearch. Additionally, spam databases can now be stored in any of the supported data stores or Redis, removing the requirement for an SQL server for spam filter usage.

Internal Directory

With the introduction of an internal directory in Stalwart v0.5.0, user account, group, and mailing list management can now be conducted directly within Stalwart, eliminating the dependency on external LDAP or SQL directories. This feature is complemented by the addition of an HTTP API, offering a more accessible and programmable interface for managing users, groups, domains, and mailing lists.

Additional Features

Enhancing compatibility with older IMAP clients, Stalwart v0.5.0 now supports the IMAP4rev1 Recent flag, ensuring a smoother user experience. The server also accommodates LDAP bind authentication, catering to LDAP servers like lldap that do not expose the userPassword attribute. Another significant improvement is the automated handling of spam – messages marked as spam by the filter can now be automatically moved to the user's Junk Mail folder.

Conclusion

As we release Stalwart Mail Server v0.5.0, we also want to take a moment to wish everyone a Happy New Year. This new version is a testament to our continuous efforts to evolve and adapt to the needs of our users. We believe that Stalwart v0.5.0 will not only meet but exceed your expectations, whether you're setting up a new mail server or upgrading an existing one.

For more details, visit our website, and don't forget to join our Discord community to share your experiences, get support, and connect with other Stalwart users.

Here's to a new year filled with success, innovation, and secure email communications!

Rethinking Spam Classification​

A Faster, Simpler Search Layer​

Faster Database Access and Leaner Storage​

Meet Us at FOSDEM 2026​

Looking Ahead​

Performance Enhancements​

Storage Layer Improvements​

Internal Directory​

Additional Features​

Conclusion​