Auto-detecting recruiter responses from forwarded emails with confidence-gated automation and one-click undo.
I wanted job tracking to stop breaking at the inbox.
That sounds small, but it is one of the biggest failure points in job-search products. Applications scatter across weeks, companies reply from inconsistent addresses, subject lines drift, and users eventually stop maintaining their pipeline by hand because the administrative cost becomes annoying enough to ignore.
So I built a system where the user forwards recruiter mail to a private address and the product decides whether that message should update state, stage a review, or do nothing.
Parsing email is not the hard part. The hard part is touching user state with enough accuracy that the automation feels helpful instead of reckless.
That means the real requirements are:
- identify who the message is about
- identify what kind of message it is
- decide whether the system is confident enough to act
- preserve a correction path when it is wrong
That last requirement matters as much as the classifier itself.
The system works because most of it is deterministic.
I did not want the model deciding everything from scratch every time. The product behaves much better when the first passes are simple, inspectable, and cheap:
- verify the webhook
- identify the user from the forwarding alias
- normalize sender identity
- match likely company
- classify obvious message types
- reject risky or conflicting cases before they become automatic updates
Only the ambiguous cases need a model.
That was the right shape for this problem. Email classification sounds like an AI problem until you look at the volume of cases that are repetitive, structured, and better served by rules plus guardrails.
Here is a real path. A recruiter at Stripe sends a rejection email. The user has "Stripe - Senior Engineer" in their pipeline with status "applied."
The webhook arrives with a cryptographic signature. If the signature fails, the message is dropped. If it passes, the system extracts the forwarding alias to identify the user.
Next, the sender address [email protected] goes through brand extraction. The system strips the domain, handles multi-part TLDs correctly (so amazon.co.uk resolves to "amazon," not "co"), and normalizes to the brand "stripe."
Then the subject and body are scanned against keyword lists for rejection, interview, offer, and other common message types. This message hits rejection language. Zero hits on anything else. No conflicting signals. Category: rejection. The classifier does not just count hits - it checks for conflicts. If the same email contains both rejection and offer language, the system forces manual review instead of guessing which one is right.
The normalized brand is matched against the user's pipeline. The system finds the job, and because it was a domain-level match (not fuzzy text matching), it gets full confidence.
Confidence is mapped into an operational tier: high enough to auto-apply, plausible enough to stage for review, or too uncertain to act on. Each tier has a different product behavior rather than just displaying a number. This message qualifies for auto-apply.
But before the auto-apply executes, a guardrail fires: is the current job status already something positive like "offer" or "accepted"? If so, a rejection email cannot overwrite it, regardless of confidence. That guardrail exists because one misclassified email should not undo what might be the most important state in the user's pipeline. In this case the job is "applied," so the update goes through.
The system updates the job to "rejected," records the response timing, and writes a log entry that preserves the previous state. The user sees the update in their pipeline with a one-click undo. If they reject it, the job reverts and the log is marked as overridden.
The whole pipeline ran without a model. The model path exists for cases where the brand cannot be matched deterministically or the email language is ambiguous enough that keyword patterns are not sufficient.
There are really two separate questions:
- Which application does this email belong to?
- What does the email mean?
The matching path works best when it stays conservative. Domain and brand normalization handle a lot, but there are obvious traps:
- generic mailbox providers
- recruiters writing from personal addresses
- hiring platforms that do not map cleanly to one employer
- companies with several active roles for the same user
The classifier has similar problems. Some messages are easy. Some are mixed. Some are only meaningful in context. Some contain language that points in more than one direction at once.
That is why the product needs conflict handling just as much as it needs confident classification.
The model exists for ambiguity, not for the whole pipeline.
When deterministic rules cannot safely resolve company, role, or message category, the fallback path tries to produce a narrower interpretation of what the email likely means. But even there, the result is not treated as self-justifying. It still has to pass through confidence and guardrail logic before it can change state automatically.
That design mattered a lot.
The useful mental model is not "AI email reader." It is "deterministic triage with model-assisted resolution for the leftover cases."
The system only feels trustworthy if it fails in the right direction.
That means two things:
- confidence needs to be translated into operational behavior rather than just displayed as a number
- certain conflicts should block automation entirely
In practice, the system behaves much better when it distinguishes between:
- safe enough to auto-apply
- plausible but review-worthy
- too ambiguous or too risky to automate
That matters more than squeezing out a few extra auto-updates. The product earns trust by being conservative where the cost of a wrong state transition is high.
One of the best product decisions in this system was making every automatic change reversible.
The point of automation here is not to prove that the classifier is brilliant. It is to reduce manual maintenance while keeping the user in control.
So the product has to preserve:
- what changed
- what the prior state was
- whether the update was automatic
- whether the user later confirmed, rejected, or overrode it
That is what turns the feature from a hidden background process into something people can actually trust.
I did not want the sync log to become a second mailbox.
The system needs enough information to audit decisions and explain behavior, but not so much that it stores full raw message history forever just because the feature can.
Store enough for accountability. Do not turn the classifier into an excuse for unnecessary retention.
The system works because it is not trying to be maximally autonomous.
The good version of this feature is:
- deterministic where possible
- model-assisted where necessary
- conservative on risky transitions
- explicit about confidence
- reversible when it gets things wrong
That is a much better product than a system that boasts about "fully automated job inbox intelligence" and quietly corrupts pipeline state in edge cases.
- split the ingestion path into cleaner provider and classification modules
- improve handling for recruiter platforms and indirect sender identities
- expand phrase support beyond the current English-first assumption
- keep tightening calibration between classifier confidence and auto-apply behavior
This is part of P.A.T.H.O.S.. The broader product context is in How I built P.A.T.H.O.S..