This is the Third entry in “The Migration Tax” series. You can read the previous entry here.
Most teams “intend” to do fixity. Then the backlog hits, the queue backs up, and checksums get demoted to “we’ll validate later”—which is like promising to check your parachute after you jump. Fixity isn’t a task you sprinkle on top; it’s core metadata that must travel with the object from its first breath to its centennial birthday. If you treat checksums like a sidecar, they’ll fall off the motorcycle the moment you hit a pothole (tape recall, POSIX copy, S3 upload, random “smart” tool that “helpfully” rewrites headers…).
Let’s make fixity first-class and stop role-playing “Schrödinger’s Archive.”
What “fixity first” actually means (and why you care)
Fixity = a verifiable claim that the content you have now is bit-for-bit the same as the content you had then. You prove that by calculating a checksum (hash) and carrying that value forward in a place that’s hard to lose and easy to read.
- If you don’t capture the checksum at first contact, everything you do afterward is based on vibes.
- If you calculate checksums over and over at each hop, you’re wasting CPU and—ironically—introducing new failure windows.
- If you stash checksums in a random CSV “for later,” Future You will hate Present You.
Principle: Compute once, carry always, verify cheaply, rehash only when necessary.
Hash families: “good enough” vs. cryptographic (and when to use which)
You don’t need SHA-512 to detect bit rot (silent corruption). A fast, non-cryptographic checksum like xxHash64 or CRC32C has an astronomically low false-positive rate for random media flips. Use fast hashes to continuously guard living data. Use cryptographic hashes (SHA-256/512) whenever you’re:
- Publishing or exchanging outside your boundary.
- Cross-system assurance between unlike stacks (filesystems ↔ object stores ↔ tape).
- Audited contexts where you don’t want to debate collision theory on a conference call.
Pattern that works:
- At ingress/staging: compute SHA-256 and xxHash64.
- Store both in metadata.
- Use xxHash64 for frequent, cheap guardrails.
- Use SHA-256 at boundaries, recalls, and audits.
Where to put the checksum so it doesn’t die
Short answer: in the object itself, as metadata—not just in a side database. Do both.
POSIX & clustered filesystems
All of these support extended attributes (xattrs) in the user.* namespace, which you can read/write with getfattr/setfattr or your language’s xattr library:
- IBM Spectrum Scale (GPFS): xattrs supported; also ILM policies can copy/migrate attributes. Use keys like:
user.hash.sha256 = hex
user.hash.xx64 = hex
user.fixity.ts = ISO 8601
user.fixity.src = “tape://<barcode>/path”
- Lustre: xattrs supported across MDT/OST; same user.* convention.
- BeeGFS: xattrs supported; beware of path-based operations via gateways—verify attribute preservation on toolchains.
- CephFS: POSIX-like xattrs; can flow through snapshots and replication—test your mds/journal behavior.
- ScoutFS: supports xattrs; many teams already use user.* for lineage/provenance—piggyback cleanly.
- ZFS call-out: ZFS already has end-to-end block checksums (Fletcher, SHA-256). Keep using that; still store an application-level SHA-256 in xattrs for cross-system moves (object, tape, cloud).
Example (Linux):
# Compute once at first touch (illustrative)
sha256sum file.bin | awk '{print $1}' | xargs -I{} setfattr -n
user.hash.sha256 -v {} file.bin xxhsum -H0 file.bin | awk '{print $1}' | xargs -I{} setfattr -n
user.hash.xx64 -v {} file.bin date -Iseconds | xargs -I{} setfattr -n user.fixity.ts -v {} file.bin
Object storage (S3-compatible, including Ceph RGW, MinIO, cloud)
- Object Metadata: x-amz-meta-hash-sha256: <hex>, x-amz-meta-hash-xx64: <hex>, x-amz-meta-fixity-ts: 2025-09-26T12:34:56Z.
- Tags: use short keys if you’ll filter/search: HashValid=True, SHA256Valid=True.
- ETag Reality Check:
Tape (LTFS or managed by HSM)
- Manifests next to payloads (BagIt-style) with SHA-256 lines.
- If you encapsulate small files in container files (tar, dar, zip), store the container’s hash plus per-member hashes in a manifest that travels with the tape and is mirrored in your DB.
End-to-end flow (your visual)
[Tape / Source]
│ stage
▼
[Staging FS] --compute→ (xxHash64 + SHA-256) --store→ xattrs/user.* --log→ DB
│ verify-current (compare to manifest/source metadata)
▼ [Hash-compare gate] --pass→
│
├─> [Object Write] --write→ x-amz-meta-hash-* --record→ MPU partsize/count
│
└─> [Quarantine] (mismatch: retry/re-read/second-copy)
[Post-Write Verify] --read-sample/full→ rehash → compare(xattrs/SHA-256)
│
├─> [Mark HashValid=True] (tags/DB)
└─> [Escalate] (if mismatch; see triage below)
Pin this above your desk. If your pipeline skips boxes, it’s not a pipeline; it’s a rumor.
Eliminating redundant hashing (and saving CPUs for real work)
Redundant hashing happens when each hop distrusts the last hop but forgets the hash already exists. Don’t re-hash because you lost the value; re-hash because your policy told you to verify.
Rules of engagement:
- Propagate SHA-256 and xxHash64 forward as metadata (xattrs → object metadata).
- When writing to object storage, trust but verify: compare the destination readback hash to the source xattr before deleting the stage file.
- On recall, compute the hash while streaming and compare to the stored value on the fly. Don’t stage twice.
- Sample where safe, full-verify where required. (E.g., 100% for preservation masters, 1–5% for derivatives, plus rolling spot checks.)
Result: one heavy compute at first contact, lightweight comparisons thereafter.
Validating at recall (because “we wrote it once” is not a warranty)
Every recall is a chance to catch silent corruption early—from tape, object bit-flips, or that one node with a grudge.
Recall flow:
- Read → stream hash (xxHash64 for speed).
- Compare to stored xxHash64. If mismatch, retry from alternate path/sibling drive.
- If mismatch persists, compute SHA-256 to rule out hash-family artifacts.
- If still off, this is a data incident. Quarantine the asset, raise Mismatch with provenance.
Why stream hashing? Because hashing after you write again is two I/Os and a lie.
Triage for mismatches (don’t panic; don’t hand-wave)
When stored_hash != computed_hash:
- Retry the read (I/O can lie temporarily).
- Recompute with the other family (xxHash64 vs. SHA-256) to rule out bugs.
- Check transforms:
- Consult provenance: last known-good SHA-256, size, and timestamp; if you have multiple independent copies (tape A, tape B, cloud), compare all three.
- Decide: restore from alt copy, re-ingest, or mark unrestorable (and stop pretending otherwise).
Incident math to keep you honest:
MismatchRate = mismatches / verified_in_window
Alarm if MismatchRate > 0.01% over 100k assets / 24h
VerificationDebt = objects_written - objects_verified
If VerificationDebt grows for a week, you’re running on hope.
Filesystem-specific notes (so your junior can implement this by lunch)
Spectrum Scale (GPFS)
- Use setfattr/getfattr for xattrs.
- Policy engine (ILM) can copy files; verify your migration rules preserve xattrs.
- Consider a small SQLite/Postgres table keyed by inode+fsid or path UUID to mirror fixity events for fast reporting.
Lustre
- xattrs survive across OSTs; verify your lfs copy/mirror commands preserve them.
- Be mindful of striping: align your hasher concurrency to stripe count to avoid cache thrash.
BeeGFS
- xattrs supported; beware of proxy/gateway tools (rsync with -X/-A as needed).
- Use beegfs-ctl --getentryinfo alongside xattrs to correlate location vs. fixity events.
CephFS
- xattrs travel with snapshots; replicate across MDS.
- For object (RGW): use both metadata (x-amz-meta-*) and tags (for cheap filters). Store MPU part size & part count for synthetic ETag recompute.
ScoutFS
- Great place to store lineage (user.fixity.src, user.hash.*).
- Pair with a tiny event log (append-only) so you can answer “when did we last prove Copy-3 exists for this asset?” without scraping the universe.
ZFS
- Keep ZFS checksums on (obviously). Still store user.hash.sha256 for cross-system transfers (object/cloud).
Everything POSIX
- Standardize keys:
user.hash.sha256 = 64-hex
user.hash.xx64 = 16-hex
user.fixity.ts = ISO 8601
user.fixity.family = sha256,xx64
user.mpu.partsize = bytes (if relevant)
user.mpu.partcount = int
Everything S3-ish reality: user metadata is (mostly) immutable post-upload
- Object metadata (your x-amz-meta-* keys) must be set at PUT time (or at InitiateMultipartUpload for MPUs). You can’t patch metadata in place later. The only way to add/modify it after the fact is a self-copy with MetadataDirective='REPLACE', which creates a new version of the object.
- Tags are different: you can add/modify after upload (PutObjectTagging) without rewriting the object.
- ETag on MPU still isn’t a content MD5. Backfilling x-amz-meta-mpu-partsize/x-amz-meta-mpu-parts later requires a copy anyway; otherwise your “synthetic ETag” audit won’t work.
- Checksums (AWS system feature): you can now send/validate x-amz-checksum-sha256/-sha1/-crc32/-crc32c at upload. Those are system checksums, distinct from your user metadata (keep both if you care about portability across S3-alikes).
“So doing it today isn’t the most help” — exactly. Here’s the practical split:
Next time (correct-by-construction)
- On PUT or InitiateMultipartUpload, set: x-amz-meta-hash-sha256, x-amz-meta-hash-xx64, x-amz-meta-fixity-ts, x-amz-meta-mpu-partsize, x-amz-meta-mpu-parts (MPU: part info known at completion; record it), plus tags HashValid=True, SHA256Valid=True, Provenance=GPFS|TapeA:BARCODE.
- Optionally include AWS checksums (x-amz-checksum-sha256, etc.) so S3 validates the wire.
Today (backfill plan for existing objects)
- Discover gaps: use S3 Inventory or a bucket listing to find objects missing x-amz-meta-hash-sha256 (and friends).
- Compute hashes without re-downloading if you can: prefer your on-prem xattrs/manifests; otherwise stream from S3 with range GETs in a controlled lane.
- Self-copy to write metadata (creates a new version; charges requests; requires restore for Glacier/DA; respects Object Lock):
- Write/merge tags via PutObjectTagging (safe post-upload).
- Versioning/Lock: if Object Lock (Compliance) is active and retention unexpired, you cannot replace metadata (copy will be blocked). Plan windows or use legal holds appropriately.
- KMS/SSE-C: include the source encryption headers on COPY; otherwise you’ll get 400s and a headache.
- Audit: after backfill, HEAD a sample to confirm headers/tags; keep a “BackfillDebt = total − updated” counter until zero.
Minimal boto3 patterns (drop-in)
Backfill metadata via self-copy (single-part)
import boto3, datetime
s3 = boto3.client('s3')
def backfill_metadata(bucket, key, sha256_hex, xx64_hex, partsize=None, parts=None):
meta = {
'hash-sha256': sha256_hex,
'hash-xx64': xx64_hex,
'fixity-ts': datetime.datetime.utcnow().isoformat(timespec='seconds')+'Z'
}
if partsize: meta['mpu-partsize'] = str(partsize)
if parts: meta['mpu-parts'] = str(parts)
s3.copy_object(
Bucket=bucket,
Key=key,
CopySource={'Bucket': bucket, 'Key': key},
Metadata={f'x-amz-meta-{k}': v for k, v in meta.items()}, # boto3 sends as user-meta
MetadataDirective='REPLACE'
Tag (or re-tag) after upload
def set_tags(bucket, key, tags):
s3.put_object_tagging(
Bucket=bucket, Key=key,
Tagging={'TagSet': [{'Key': k, 'Value': v} for k, v in tags.items()]}
)
# Example:
set_tags('my-bucket','path/obj', {
'HashValid':'True', 'SHA256Valid':'True', 'Provenance':'GPFS'
})
Multipart copy skeleton (very large objects)
def multipart_copy_replace_metadata(bucket, key, meta, part_size_bytes=128*1024*1024):
mpu = s3.create_multipart_upload(
Bucket=bucket, Key=key,
Metadata={f'x-amz-meta-{k}': v for k, v in meta.items()}
)
upload_id = mpu['UploadId']
# HEAD to get size; then loop CopyPart with byte ranges of part_size_bytes
# finally CompleteMultipartUpload with collected ETags
TL;DR you can paste into your doc
- Set user metadata at upload; you cannot patch it later without a COPY (REPLACE) that makes a new version.
- Tags are patchable post-upload; metadata isn’t.
- MPU ETag ≠ content MD5 — store mpu-partsize & mpu-parts if you ever want synthetic ETag audits.
- Backfill = rewrite (request charges, version churn, restore first if in cold tiers). Plan a batch operation (S3 Batch + Lambda/Step Functions) and track BackfillDebt to zero.
there—you’ve got the “do it right next time” and the “we didn’t, now what?” playbooks in one place.
“But isn’t hashing expensive?” (only if you do it wrong)
- Compute once at first touch. You’re already reading the bytes—fold the hash into that I/O.
- Reuse the value—propagate via xattrs/metadata.
- Verify smartly—streaming comparison on recall, full verify on high-value classes, rolling samples elsewhere.
- Budget CPU—pin hashers to cores, align with stripes/placement, and cap concurrency so caches don’t fight you.
- Offload—BLAKE3 or GPU hashing can help, but fix your I/O path first. Hashers starved on I/O measure your patience, not your integrity.
“Junior admin” summary (show this during onboarding)
- Every file/object gets two hashes at first touch: SHA-256 (portable proof) and xxHash64 (fast guardrail).
- We store them in xattrs (POSIX) and metadata/tags (S3).
- We never delete source until the destination proves it matches.
- On recall, we stream-hash and compare before we do anything heroic.
- Mismatches follow a runbook (retry → second copy → alt hash → quarantine → incident).
- Our dashboards track MismatchRate and VerificationDebt; if either budges, we care.
A little snark to keep you awake
If your fixity plan is “the storage vendor said they do checksums,” that’s adorable. When auditors ask your system for proof, try replying, “Trust me, bro.” See how that plays in production.
So what — your next five moves
- Standardize keys (user.hash.*, x-amz-meta-hash-*, HashValid tag). Write it down.
- Instrument the pipeline to compute at first touch and propagate forward.
- Add recall-verify gates (stream hash on read; compare; quarantine on mismatch).
- Publish SLOs for verification windows and mismatch alarms.
- Ship the dashboard (MismatchRate, VerificationDebt, Verify throughput, Quarantine queue age).
Do this and your “digital preservation” stops being a poster and becomes a habit your systems can prove—without a séance.
