Daily Log - 2026-07-01: path filters, provider limits, certificate chains, and health checks
A sanitized log from a day split between site automation, cloud-provider edge cases, certificate tooling, and a noisy gateway investigation. The details are private; the useful parts are the patterns.
1. Path filters are product decisions
I tightened this public site's deploy workflow so pushes only trigger publishing when docs or blog
content changes. The mechanics are simple: add paths under the push trigger and name the
directories that matter. The reason was also simple: I only wanted content-related changes to run
the publish pipeline.
The interesting part is not the YAML. It is the product decision hidden inside the filter.
If the deploy runs only for content paths, then changes to build inputs such as site config, theme code, static assets, dependencies, or the workflow itself will not publish the site. That can be exactly right for a content-only publishing pipeline, but it should be an explicit tradeoff.
Since the repo is public, here are the commits:
04ae1c317fixed MDX truncate markers in older posts so Docusaurus would stop warning about untruncated blog entries.a737ec6ddfixed the remark pre-commit flow to validate files using Docusaurus-relative paths, then cleaned up the content fallout from that stricter check.69a4c6513added the actual workflow path filter fordocs/**andblog/**.106ba3cd1updated the Node setup steps in the GitHub Actions workflow.
Checklist I would use before adding a path filter:
- List every file category that can affect the final artifact.
- Decide whether the workflow is a content publisher or a full site deploy.
- Add only the paths that match that intent.
- Call out the excluded paths in the review so nobody is surprised later.
2. GCP provider limits need first-class product behavior
The day's deepest debugging thread was a GCP limit: a service account can only hold 10 user-managed keys. GCP documents that in the IAM quotas and limits page under "Service account keys for a service account". A rotation system that creates provider keys has to treat that limit as part of its own lifecycle, not as a surprise error from the provider.
The failure mode is easy to miss. Each rotated object can correctly manage the key it knows about, but the provider counts every key on the shared account: keys from sibling objects, grace periods, manual keys, deleted objects that left provider state behind, and anything created outside the product.
The immediate workaround was to delete excess keys directly in GCP, which freed a slot and allowed rotation to succeed again. That is a useful recovery step, but not a durable product behavior.
The recurrence-prevention suggestions were:
- Pre-flight the service account's current key count before creating a new key.
- Reconcile at the service-account scope and clean stale or orphaned keys before hitting the cap.
- Stop hiding provider-key delete failures; log, metric, or retry them.
- Clean provider-side keys when deleting the object that created or tracks them.
- Return an error that says "GCP service account key limit" instead of a generic precondition failure.
- Operationally, avoid many rotating objects sharing one service account when possible.
- Set a default key expiry at the project, folder, or organization level with
constraints/iam.serviceAccountKeyExpiryHours. GCP notes that service-account keys do not expire by default; setting this policy makes newly created keys expire after a chosen lifetime.
The reusable lesson:
- Before creating provider-side credentials, check the provider's quota or object count.
- Reconcile at the provider scope, not only at the local object scope.
- Treat cleanup failures as observable events, not silent best-effort footnotes.
- When a provider limit is hit, return an error that names the limit and points at the cleanup path.
"The API returned precondition failed" is a symptom. "This account is at the provider's key limit, and these stale keys are blocking rotation" is an operator-ready diagnosis.
3. Certificate helpers should output what consumers need
A certificate generation helper that only writes a leaf cert and a combined PEM is often one format short of useful. Real deployments usually need several shapes:
- A leaf certificate for inspection or direct use.
- A private key.
- A full chain for TLS endpoints that must present intermediates.
- A Kubernetes TLS secret using the certificate chain.
- A certificate item in the secret-management system so the same material can be consumed by other automation.
The two traps I hit again:
- Usage text drifts from reality when optional arguments are added over time.
- Passing sensitive values as command-line arguments can leak them through local process listings.
For local lab scripts, guarded CLI steps are fine: check whether each tool exists, print what is
being created, and make Kubernetes writes idempotent with --dry-run=client -o yaml | kubectl apply -f -. For shared systems, prefer file or stdin based secret input where the CLI supports
it.
The gist of the lab script:
- Keep one lab CA and use it to sign every leaf certificate.
- Generate the leaf certificate for the public lab hostname.
- Add the in-cluster Kubernetes service DNS name to the certificate SANs:
<service>.<namespace>.svc.cluster.local. - Add
localhostand127.0.0.1to the SANs so the same certificate works throughkubectl port-forward. - Write a full-chain PEM by concatenating the leaf certificate and the lab CA certificate.
- Put that full-chain certificate into the Kubernetes TLS secret, not just the leaf certificate.
The useful OpenSSL config shape is:
[alt_names]
DNS.1 = gateway.example.test
DNS.2 = *.gateway.example.test
DNS.3 = gateway.default.svc.cluster.local
DNS.4 = localhost
IP.1 = 127.0.0.1
And the Kubernetes write is intentionally idempotent:
kubectl create secret tls "$K8S_SECRET_NAME" \
--cert="$CHAIN_FILE" \
--key="$KEY_FILE" \
--namespace "$KUBERNETES_NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
4. Health-check investigations need exact evidence
Another thread was a gateway producing noisy 405-style alerts. The tempting move is to search the
log bundle for 405 and stop at the first large-looking count. That is how false positives happen:
timestamps, IDs, and unrelated fields can all contain the same digits.
The better pattern:
- Search for the exact error signature, not just the status number.
- Search for structured fields such as
status: 405instead of bare405. - Separate access logs from application exception logs.
- Compare method, path, status, remote address, and user-agent together.
- Verify that the captured log window actually overlaps the alert window.
In this case, the useful conclusion was not "we found the caller." It was "this bundle does not contain the reported storm, so we need the exact alert source, query, time window, and raw sample line." That is still progress: it keeps the investigation from building a root cause on data that isn't in the evidence.
The access log format had the fields needed for a better search:
ACCESS: StartTime: <ns>, RemoteAddr: <ip:port>, Method: <method>, Path: <path>, status: <code>, ResponseSize: <bytes>, RequestDuration: <ms>, RequestId: <id>, ClientId: <id>, UserAgent: <ua>, ...
So instead of counting loose 405 strings, I grouped only structured 4xx access entries by method,
path, and status:
for pod in /path/to/extracted/pods/*; do
[ -d "$pod" ] || continue
echo
echo "== $(basename "$pod") =="
awk '
match($0, /Method: [^,]+, Path: [^,]+, status: 4[0-9][0-9]/) {
s = substr($0, RSTART, RLENGTH)
gsub(/^Method: /, "", s)
gsub(/, Path: /, "\t", s)
gsub(/, status: /, "\t", s)
count[s]++
}
END {
for (k in count) print count[k] "\t" k
}
' "$pod"/access.log* 2>/dev/null | sort -nr
done
That script answers a narrower question: "which HTTP method, endpoint, and 4xx status combinations
are actually present in these access logs?" It also avoids the trap where 405 appears inside a
timestamp, request duration, port, or ID and looks like an HTTP status when it is not.
Health checks remain a classic source of phantom errors. Pod probes, cloud load-balancer probes, gateway probes, and third-party internal load balancers can all be configured in different places. When a service exposes multiple ports, every one of those components may have its own idea of the right protocol, port, and path.