env: scheme: concurrent resolution loses the value (ai) #3

Open
opened 2026-05-25 13:26:00 +02:00 by heiko · 0 comments
Owner

env: scheme: concurrent resolution loses the value (ai)

Summary

The env: scheme in secret.Get() unsets the referenced environment
variable after the first read. When two goroutines call
secret.Get("env:FOO") concurrently, one of them races into a
"variable already unset" state and receives the empty string (or the
fallback, if the spec is chained). This is a data-loss footgun for any
caller that resolves the same env-backed secret from multiple
goroutines.

Reproducer (illustrative)

os.Setenv("PWD_FOO", "hunter2")

var wg sync.WaitGroup
results := make([]string, 4)
for i := range results {
    i := i
    wg.Add(1)
    go func() {
        defer wg.Done()
        v, _ := secret.Get("env:PWD_FOO")
        results[i] = v
    }()
}
wg.Wait()
// expected: all four entries == "hunter2"
// observed: one entry == "hunter2", the other three == ""

Impact

Downstream dmarc will resolve IMAP account passwords (and similar
credentials) from a long-running daemon. Workers retry on connection
failure; each retry re-resolves the spec. The current behavior means a
single transient IMAP fault followed by a re-resolve returns an empty
password, which then fails authentication permanently — even though
the env var was correctly set at process startup.

In dmarc we are working around this by wrapping every secret.Get
call in a process-local Resolver that caches the first-success value
under a mutex with singleflight-style inflight dedup. That works for
us, but every consumer of this library will need to reinvent the same
wrapper, and the destructive-by-default semantics is itself surprising.

Requested resolution

One of:

  1. Make env: non-destructive by default. Introduce env-once:
    (or env!:) as the opt-in destructive variant. Existing callers
    that depend on the unset-after-read behavior change one byte;
    everyone else stops being footgunned.
  2. Make env: internally concurrency-safe. A mutex around the
    env access + return-cached-value semantics. Behavior change is
    invisible to single-threaded callers, fixes the race for
    multi-threaded ones.
  3. Document the limitation loudly at the top of the package
    godoc, with a recommended wrapper pattern, and accept that every
    consumer reimplements it. Lowest-effort fix; worst long-term cost.

Strong preference for (1) — explicit semantics are friendlier to new
users than implicit-magic-mutex behavior.

  • Errors returned from Get() may include the spec string in the
    message text (e.g., file:/etc/foo: no such file). Consumers
    redacting secrets in logs have to sanitize the error themselves.
    A GetSafe() returning sanitized errors, or typed errors, would
    remove that work.
  • No syntactic validator without resolution. Validate(spec) error
    (parse-only) lets callers reject malformed TOML at decode time.
  • No scheme introspection. Scheme(spec) (string, error) returning
    the first scheme of a chain would let callers render
    <scheme>:[redacted] without reimplementing the || parser.

These are surfaced for context; only the env: race is blocking
production use.

Severity

Blocking prod use for any multi-goroutine consumer of
filesystem/env-backed credentials, which includes dmarc once IMAP
import lands (planned for feature/imap-import, after
feature/config-and-logging merges).

# `env:` scheme: concurrent resolution loses the value (ai) ## Summary The `env:` scheme in `secret.Get()` unsets the referenced environment variable after the first read. When two goroutines call `secret.Get("env:FOO")` concurrently, one of them races into a "variable already unset" state and receives the empty string (or the fallback, if the spec is chained). This is a data-loss footgun for any caller that resolves the same env-backed secret from multiple goroutines. ## Reproducer (illustrative) ```go os.Setenv("PWD_FOO", "hunter2") var wg sync.WaitGroup results := make([]string, 4) for i := range results { i := i wg.Add(1) go func() { defer wg.Done() v, _ := secret.Get("env:PWD_FOO") results[i] = v }() } wg.Wait() // expected: all four entries == "hunter2" // observed: one entry == "hunter2", the other three == "" ``` ## Impact Downstream `dmarc` will resolve IMAP account passwords (and similar credentials) from a long-running daemon. Workers retry on connection failure; each retry re-resolves the spec. The current behavior means a single transient IMAP fault followed by a re-resolve returns an empty password, which then fails authentication permanently — even though the env var was correctly set at process startup. In dmarc we are working around this by wrapping every `secret.Get` call in a process-local `Resolver` that caches the first-success value under a mutex with singleflight-style inflight dedup. That works for us, but every consumer of this library will need to reinvent the same wrapper, and the destructive-by-default semantics is itself surprising. ## Requested resolution One of: 1. **Make `env:` non-destructive by default.** Introduce `env-once:` (or `env!:`) as the opt-in destructive variant. Existing callers that depend on the unset-after-read behavior change one byte; everyone else stops being footgunned. 2. **Make `env:` internally concurrency-safe.** A mutex around the env access + return-cached-value semantics. Behavior change is invisible to single-threaded callers, fixes the race for multi-threaded ones. 3. **Document the limitation loudly** at the top of the package godoc, with a recommended wrapper pattern, and accept that every consumer reimplements it. Lowest-effort fix; worst long-term cost. Strong preference for (1) — explicit semantics are friendlier to new users than implicit-magic-mutex behavior. ## Related observations (low priority, separate issues if you want) - Errors returned from `Get()` may include the spec string in the message text (e.g., `file:/etc/foo: no such file`). Consumers redacting secrets in logs have to sanitize the error themselves. A `GetSafe()` returning sanitized errors, or typed errors, would remove that work. - No syntactic validator without resolution. `Validate(spec) error` (parse-only) lets callers reject malformed TOML at decode time. - No scheme introspection. `Scheme(spec) (string, error)` returning the first scheme of a chain would let callers render `<scheme>:[redacted]` without reimplementing the `||` parser. These are surfaced for context; only the `env:` race is blocking production use. ## Severity **Blocking prod use** for any multi-goroutine consumer of filesystem/env-backed credentials, which includes `dmarc` once IMAP import lands (planned for `feature/imap-import`, after `feature/config-and-logging` merges).
heiko changed title from env: scheme: concurrent resolution loses the value to env: scheme: concurrent resolution loses the value (ai) 2026-05-25 13:27:10 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
heiko/secret#3
No description provided.