4 posts tagged with "spam"

Marginal Gains: Major Impact

December 16, 2025 · 6 min read

Project Maintainer

In professional cycling, the concept of marginal gains became famous through Team Sky. Rather than chasing dramatic breakthroughs, they focused on making hundreds of small improvements: slightly better bike fit, marginally lighter components, improved sleep, cleaner nutrition. None of these changes mattered much on their own, but together they reshaped performance—and helped dominate the sport for years.

Software systems, especially large distributed ones, work much the same way. Rarely does a single feature transform everything overnight. More often, real progress comes from careful attention to small details: shaving latency here, reducing contention there, simplifying a hot path, rethinking a data structure.

Stalwart v0.15 is very much a release in this spirit. It does not introduce a long list of headline features. Instead, it is the result of revisiting core subsystems—spam filtering, search, storage, and data access—and making many targeted improvements that, together, have a significant impact on performance, reliability, and usability.

Rethinking Spam Classification

Stalwart v0.14 (and earlier) included a spam classifier which was a direct port of the classifier used by Rspamd. This classifier is grounded in Bayesian theory and uses more advanced methods to combine probabilities, including sparse bigrams (OSB) and the inverse chi-square distribution. This approach is well understood and robust, particularly when training data is limited. It produces reasonable results quickly and has a long track record in production systems.

However, it comes with a significant cost in distributed environments. Both Rspamd and Stalwart v0.14 relied on OSB-5, which generates a very large number of features per message. Each of these features was stored in Redis. Even with aggressive caching, training or classifying a single message could involve hundreds or even thousands of round trips to Redis. At scale, this becomes a bottleneck: latency increases, throughput drops, and horizontal scaling becomes inefficient.

For v0.15, we went back to first principles and redesigned the spam classifier from scratch, guided by more recent research. We evaluated several models and ultimately settled on a logistic regression classifier trained using the FTRL-Proximal (Follow the Regularized Leader) algorithm. This algorithm—famously used by Google for large-scale online learning—is particularly well suited to spam classification workloads where models must be updated continuously and efficiently.

One immediate benefit of this approach is that Stalwart can now support collaborative filtering out of the box. Multiple users can benefit from a single shared classifier trained on aggregated data, dramatically improving accuracy in environments with many accounts. At the same time, individual users can still maintain their own personal classifiers trained solely on their own messages.

The new classifier also adopts feature hashing (often called the hashing trick) to keep the feature space compact and predictable. This significantly reduces memory usage and improves cache locality. For very large deployments, cuckoo feature hashing is available to further reduce hash collisions. If you are interested in the theoretical background, the original feature hashing paper is available at Feature Hashing for Large Scale Multitask Learning and the cuckoo feature hashing paper at Cuckoo Feature Hashing: Dynamic Weight Sharing for Sparse Analytics.

With the default configuration of 2²⁰ features, the entire model fits in approximately 4 MB of memory and is loaded only once after each training cycle. The result is a classifier that is both faster and more accurate than the previous version, particularly in distributed deployments where network overhead matters.

We also evaluated RetVec (Resilient and Efficient Text Vectorizer), the embedding technique used by Gmail. RetVec excels at generating compact semantic representations of email content, but it is primarily designed to feed neural networks and deep learning models. For now, logistic regression offers a better balance of simplicity, performance, and operational transparency. That said, we plan to ship a pre-trained RetVec model alongside BERT in a future release.

A Faster, Simpler Search Layer

Search is another area where small architectural choices have outsized effects. In Stalwart v0.15, the search layer has been substantially rewritten.

For deployments using PostgreSQL or MySQL, Stalwart now leverages the built-in full-text search capabilities of the database instead of relying on a custom implementation. This reduces complexity, improves query planning, and allows the database to do what it already does well.

We have also added support for Meilisearch, a lightweight, fast search engine with excellent performance characteristics and simple operational semantics. Meilisearch offers low-latency full-text search, typo tolerance, and efficient indexing, making it a good fit for many Stalwart deployments.

For large installations backed by FoundationDB, we plan to significantly improve the built-in search functionality by embedding Seekstorm. Until that work is complete, we recommend pairing FoundationDB with an external search engine such as OpenSearch or Meilisearch to achieve the best performance.

Faster Database Access and Leaner Storage

Stalwart v0.15 includes a number of optimizations to the database access layer. We have reduced the number of reads and writes required to store and retrieve messages, particularly along hot paths such as IMAP and JMAP access. The result is noticeably faster message retrieval and improved overall responsiveness under load.

In parallel, we revisited how email metadata is stored and reduced some serialization overhead. This lowers disk usage and improves cache efficiency, which again compounds into better performance at scale.

Individually, these changes are modest. Collectively, they make the system feel tighter and more predictable under real-world workloads.

Meet Us at FOSDEM 2026

We are excited to announce that Stalwart will be present at FOSDEM 2026 in Brussels, Belgium.

Our talk, Stalwart: Can Open Source do Gmail-scale Email?, builds naturally on the marginal gains theme. While v0.15 focuses on incremental improvements, the talk zooms out to the other end of the spectrum: what it takes to design and operate a truly large-scale email system.

Using a 1,024-node cluster as a concrete example, we will explore how modern providers store and index petabytes of messages, survive hardware failures without data loss, and run spam and phishing filtering across billions of daily deliveries. We will walk through the architectural patterns behind distributed storage, large-scale spam filtering, MTA queue management, and load balancing for IMAP, JMAP, and SMTP.

We will also discuss cluster coordination, orchestration, autoscaling, and how to reason about failure before it happens. The goal is to give attendees a practical understanding of how planet-scale email systems are built, and how those same principles can be applied using open-source technology.

If you are attending FOSDEM, we would love to meet you, answer questions, and talk about where Stalwart is heading next.

Looking Ahead

Stalwart v0.15 is a release shaped by the philosophy of marginal gains. There are no new flashy features, but there are dozens of small improvements that add up to something meaningful: faster spam classification, better scalability, simpler search, leaner storage, and more predictable performance.

If you are already running Stalwart, we encourage you to try v0.15 and let us know how it performs in your environment. Your feedback continues to guide where we focus next.

The team is already working on future releases that build on this foundation. With the core systems now leaner and more robust, we can continue to add new capabilities without compromising performance or reliability.

As with cycling, progress comes from steady, thoughtful refinement. Stalwart v0.15 is one more step in that direction.

Goodbye Spam: Introducing Faster, Smarter Spam Filtering

January 6, 2025 · 3 min read

Mauro D.

Project Maintainer

As we step into 2025, we're excited to share some significant enhancements to Stalwart Mail Server version 0.11.0, starting with a complete overhaul of its built-in spam filter. These changes bring dramatic improvements in speed, ease of use, and flexibility while addressing feedback from our community. Here’s a closer look at what’s new.

A Faster, Smarter Spam Filter

In earlier versions of Stalwart Mail Server, the spam filter was implemented as a Sieve script. This design choice was inspired by platforms like Rspamd, which use scripting languages like Lua to allow customizations. However, over time, we identified two key challenges with this approach. First, because it was an interpreted script, the spam filter’s performance was slightly slower than we’d like. Second, many users found it complicated to update the script when adding custom rules or configuring custom DNSBL (Domain Name System Blocklist) servers.

To address these issues, we rewrote the spam filter entirely in Rust. The result is a system that is five times faster than before, delivering superior performance while keeping resource usage minimal. Moreover, defining new rules or adding DNSBL servers is now as simple as editing the configuration file—no scripting expertise required. This shift eliminates complexity while maintaining the high level of customization our users expect. For those who still need advanced control, Stalwart continues to support custom Sieve scripts and expressions, ensuring maximum flexibility.

Enhanced Training

One of the most requested features we’ve added is the ability for end users to train their own spam filter Bayesian model. Now, users can customize their spam filtering by simply moving messages to and from the "Junk Mail" folder or by adding and removing the $Junk flag. This personalized approach allows each account to maintain its own tailored spam filter, providing greater accuracy and user satisfaction.

Improved Performance

This update isn’t just about the spam filter. We’ve also made broader performance enhancements to Stalwart Mail Server. Previously, we relied on LRU (Least Recently Used) caches. With this release, we’ve switched to scan-resistant S3-FIFO caches, offering better performance under heavy workloads. Additionally, we’ve optimized Stalwart’s handling of large distributed SMTP queues, ensuring smoother operation in clustered environments. These changes make Stalwart even more capable of handling demanding enterprise setups.

Meet Us at FOSDEM'25

We’re thrilled to announce that Stalwart Mail Server will be featured at FOSDEM’25! Join us on February 1st at 12:00 PM in Brussels, where we’ll showcase these new features and share insights into what’s coming next for Stalwart. This is a fantastic opportunity to connect with our team, ask questions, and explore how Stalwart can power your email infrastructure.

Upgrade Today

These improvements are available now, and we’re confident they’ll make a big difference for administrators and users alike. Whether you’re drawn to the speed of the new spam filter, the enhanced training capabilities, or the overall performance boosts, this update is designed to help you get the most out of Stalwart Mail Server.

As always, thank you for choosing Stalwart. We’re committed to delivering a reliable, feature-rich email server that evolves with your needs. Here’s to a productive and spam-free 2025!

Revolutionize Your Email Workflow with AI

October 7, 2024 · 6 min read

Mauro D.

Project Maintainer

We are happy to announce the release of Stalwart Mail Server v0.10.3, which introduces support for AI models —a powerful new feature now available to Enterprise Edition users as well as our GitHub and OpenCollective sponsors. With this feature, Stalwart Mail Server can be integrated with both self-hosted and cloud-based Large Language Models (LLMs), bringing advanced email processing capabilities like never before.

This integration allows you to use AI models for a variety of tasks, including enhanced spam filtering, threat detection, and intelligent email classification. Whether you choose to host your own models with LocalAI or leverage cloud-based services like OpenAI or Anthropic, this release provides the flexibility to incorporate cutting-edge AI into your email infrastructure.

Unlocking the Power of AI

With the introduction of AI model integration, Stalwart Mail Server can now analyze email content more deeply than traditional filters ever could. For instance, in the realm of spam filtering and threat detection, AI models are highly effective at identifying patterns and detecting malicious or unsolicited content. The system works by analyzing both the subject and body of incoming emails through the lens of an LLM, providing more accurate detection and filtering.

In addition to bolstering security, AI integration enhances email classification. By configuring customized prompts, administrators can instruct AI models to categorize emails based on their content, leading to more precise filtering and organization. This is particularly useful for enterprises managing a high volume of messages that span various topics and departments, as AI-driven filters can quickly and intelligently sort messages into categories like marketing, personal correspondence, or work-related discussions.

The flexibility of using either self-hosted or cloud-based AI models means that Stalwart can be tailored to your infrastructure and performance needs. Self-hosting AI models ensures full control over data and privacy, while cloud-based models offer ease of setup and access to highly optimized, continuously updated language models.

LLMs in Sieve Scripts

One of the most exciting features of this release is the ability for users and administrators to access AI models directly from Sieve scripts. Stalwart extends the Sieve scripting language by introducing the llm_prompt function, which allows users to send prompts and email content to the AI model for advanced processing.

For example, the following Sieve script demonstrates how an AI model can be used to classify emails into specific folders based on the content:

require ["fileinto", "vnd.stalwart.expressions"];

# Base prompt for email classification
let "prompt" '''You are an AI assistant tasked with classifying personal emails into specific folders. 
Your job is to analyze the email's subject and body, then determine the most appropriate folder for filing. 
Use only the folder names provided in your response.
If the category is not clear, respond with "Inbox".

Classification Rules:
- Family:
   * File here if the message is signed by a Doe family member
   * The recipient's name is John Doe
- Cycling:
   * File here if the message is related to cycling
   * File here if the message mentions the term "MAMIL"
- Work:
   * File here if the message mentions "Dunder Mifflin Paper Company, Inc." or any part of this name
   * File here if the message is related to paper supplies
   * Only classify as Work if it seems to be part of an existing sales thread or directly related to the company's operations
- Junk Mail:
   * File here if the message is trying to sell something and is not work-related
   * Remember that John lives a minimalistic lifestyle and is not interested in purchasing items
- Inbox:
   * Use this if the message doesn't clearly fit into any of the above categories

Analyze the following email and respond with only one of these folder names: Family, Cycling, Work, Junk Mail, or Inbox.
''';

# Prepare the base Subject and Body
let "subject" "thread_name(header.subject)";
let "body" "body.to_text";

# Send the prompt, subject, and body to the AI model
let "llm_response" "llm_prompt('gpt-4', prompt + '\n\nSubject: ' + subject + '\n\n' + body, 0.6)";

# Set the folder name
if eval "contains(['Family', 'Cycling', 'Work', 'Junk Mail'], llm_response)" {
    fileinto "${llm_response}";
}

This example demonstrates how the llm_prompt function can be used to classify emails into different categories such as Family, Cycling, Work, or Junk Mail based on the content. The AI model analyzes the message’s subject and body according to the classification rules defined in the prompt and returns the most appropriate folder name. The email is then automatically filed into the correct folder, making it easier to organize incoming messages based on their content.

Self-Hosted or Cloud-Based

With this new feature, Stalwart Mail Server allows for seamless integration with both self-hosted and cloud-based AI models. If you prefer full control over your infrastructure, you can opt to deploy models on your own hardware using solutions like LocalAI. Self-hosting gives you complete ownership over your data and ensures compliance with privacy policies, but it may require significant computational resources, such as GPU acceleration, to maintain high performance.

Alternatively, you can integrate with cloud-based AI providers like OpenAI or Anthropic, which offer access to powerful, pretrained models with minimal setup. Cloud-based models provide cutting-edge language processing capabilities, but you should be aware of potential costs, as these providers typically charge based on the number of tokens processed. Whether you choose self-hosted or cloud-based models, Stalwart gives you the flexibility to tailor the AI integration to your specific needs.

Available for Enterprise Users and Sponsors

This exciting AI integration feature is exclusively available for Enterprise Edition users as well as GitHub and OpenCollective monthly sponsors. If you want to harness the full potential of AI-powered email processing in Stalwart Mail Server, upgrading to the Enterprise Edition or becoming a sponsor is a great way to access this feature and other advanced capabilities.

Try It Out Today!

The release of Stalwart Mail Server v0.10.3 marks a major milestone in our journey toward building intelligent, highly customizable email management solutions. By combining traditional email filtering with the power of LLMs, Stalwart gives you the tools to take your email infrastructure to the next level, enhancing security, organization, and automation in ways that were previously impossible. We’re excited to see how you’ll use this new feature to optimize your email workflows!

Introducing Advanced Spam and Phishing Filtering

October 25, 2023 · 3 min read

Mauro D.

Project Maintainer

In today's digital age, the safety and authenticity of your emails are paramount. With that in mind, we're happy to announce the release of the Spam and Phishing filter in Stalwart Mail Server v0.4.0. This release is packed with features that not only enhance your email security but also ensure a seamless communication experience.

Here's a deep dive into what's new:

Comprehensive Filtering Rules: We've crafted a set of rules that stand shoulder-to-shoulder with the best solutions out there.
Statistical Spam Classifier: Empower your server with a classifier that constantly learns, adapts, and keeps spam at bay.
DNS Blocklists (DNSBLs): Safeguard your users' inboxes from notorious spammers through meticulous checks on IP addresses, domains, and hashes.
Collaborative Digest-Based Filtering: By integrating digest-based spam filtering, we ensure even greater accuracy in weeding out unwanted emails.
Phishing Protection: Defend against cunning phishing tactics, from homographic URL attacks to deceptive sender spoofing.
Trusted Replies Tracking: By recognizing and prioritizing genuine replies, we ensure your genuine conversations remain uninterrupted.
Sender Reputation: An automated system that assesses sender credibility based on their IP, ASN, domain, and email address.
Greylisting: An added shield against spam, by temporarily holding back unfamiliar senders.
Spam Traps: Crafty decoy email addresses that help us catch and scrutinize spam, ensuring your users' inboxes remain clutter-free.
Built-in & Ready to Roll: No dependency on third-party software. Unbox and deploy – it's that simple!

Comparative Analysis

While we have immense respect for both RSpamd and SpamAssassin, it's essential to highlight some distinctions. RSpamd stands out for its speed and standalone capabilities but necessitates additional configuration and maintenance. Meanwhile, SpamAssassin, built on Perl, might not deliver the same speed as RSpamd due to its heavy reliance on regular expressions.

Stalwart Mail Server's spam and phishing filter offers a level of protection equivalent to both RSpamd and SpamAssassin with one notable advantage: speed. Since the message remains within the server during the entire filtering process, it's considerably quicker. Furthermore, while third-party solutions re-execute checks for DMARC, DKIM, SPF, and ARC, Stalwart has already performed these, making our built-in filter more efficient and streamlined.

In essence, with Stalwart Mail Server, you receive a blend of speed, efficiency, and top-tier protection.

Conclusion

In essence, with Stalwart Mail Server v0.4.0, you're not just getting an email server, but a comprehensive, fast, and efficient email security solution.

We're committed to continuous innovation and ensuring that your communication remains genuine, secure, and spam-free. Upgrade to Stalwart Mail Server v0.4.0 and experience the difference today!

Rethinking Spam Classification​

A Faster, Simpler Search Layer​

Faster Database Access and Leaner Storage​

Meet Us at FOSDEM 2026​

Looking Ahead​

A Faster, Smarter Spam Filter​

Enhanced Training​

Improved Performance​

Meet Us at FOSDEM'25​

Upgrade Today​

Unlocking the Power of AI​

LLMs in Sieve Scripts​

Self-Hosted or Cloud-Based​

Available for Enterprise Users and Sponsors​

Try It Out Today!​

Comparative Analysis​

Conclusion​

Rethinking Spam Classification

A Faster, Simpler Search Layer

Faster Database Access and Leaner Storage

Meet Us at FOSDEM 2026

Looking Ahead

A Faster, Smarter Spam Filter

Enhanced Training

Improved Performance

Meet Us at FOSDEM'25

Upgrade Today

Unlocking the Power of AI

LLMs in Sieve Scripts

Self-Hosted or Cloud-Based

Available for Enterprise Users and Sponsors

Try It Out Today!

Comparative Analysis

Conclusion