Result values with abnormal flags like >H are rejected as SCHEMA_VIOLATION

The validator only accepts flag characters in the configured pattern. If the analyzer uses site-specific abnormal codes, extend the pattern deliberately and document each token, never widening it to match anything.

Validating ASTM E1394 Instrument Output with Python

Problem Statement

ASTM E1394 is a framed, checksum-protected wire format, and the failure mode this page solves is a specimen result that reaches the LIMS structurally intact but silently wrong — a truncated frame that reassembles into a plausible number, a checksum that was never verified, or an R-record field that lands in the wrong OBX because the parser trusted the delimiter count instead of enforcing it. The correct build treats the ingestion boundary as a control point: reconstruct multi-frame transmissions deterministically, verify the modulo-256 checksum before any field is read, coerce the record into a typed model, and route every failure to a quarantine with an auditable reason — so a serial baud mismatch or a NAK storm can never masquerade as a clean result.

This is the ASTM-specific implementation of the stage specified in Schema Validation & Error Handling; it inherits that stage’s ingress/egress contract and its “quarantine, never guess” invariant, and adds only the framing and checksum logic that the ASTM E1394 protocol demands. It sits inside the broader Instrument Data Ingestion & HL7/CSV Pipelines tier, one layer above durable commit.

Prerequisites

Runtime: Python 3.11+ (for X | None unions and datetime.UTC).
Libraries: pydantic>=2.6 for typed record models; pytest for the verification suite. Framing and checksum logic use only the standard library (asyncio, re, uuid, logging).
Instrument firmware: analyzers configured for the ASTM E1394 low-level protocol (ENQ/ACK line discipline) over RS-232 or a TCP bridge. Confirm the frame checksum mode matches the standard’s modulo-256 sum; a handful of legacy firmware builds emit a non-standard XOR checksum and must be normalized at the serial/FTP polling architecture boundary before this stage sees them.
Regulatory baseline: the laboratory has a documented results-review procedure on file (a CLIA §493.1253 requirement) and an append-only audit store for ingestion events.
Upstream contract: raw byte frames arrive already captured and staged with an immutable tracking identifier by the polling layer. This stage never opens a socket itself; it validates bytes handed to it.

Step-by-Step Implementation

Step 1: Reconstruct the frame and verify the ASTM checksum

An E1394 frame is delimited by <STX> (0x02) and <ETX> (0x03), terminated by <CR>, and secured with a two-character hexadecimal checksum: the 8-bit modular sum of every byte from the frame-number character through <ETX> inclusive. Verify the checksum before reading any field — a corrupted frame will never become well-formed by re-parsing it. Raise explicit, coded exceptions so the orchestration layer can route each failure class deterministically.

python

from __future__ import annotations

import logging
import re
from datetime import datetime, UTC

logger = logging.getLogger("astm_validator")


class ASTMValidationError(Exception):
    """Structural or semantic ASTM frame failure, carrying a routable code."""

    def __init__(self, code: str, message: str, frame_id: str | None = None):
        self.code = code
        self.message = message
        self.frame_id = frame_id
        super().__init__(f"[{code}] {message}")


def astm_checksum(payload: bytes) -> str:
    """ASTM E1394 §7: modulo-256 sum of the checksum payload, two upper-hex digits."""
    return f"{sum(payload) & 0xFF:02X}"


def reconstruct_frame(raw_frame: bytes, frame_id: str) -> bytes:
    """Strip line noise, verify STX/ETX boundaries and checksum, return the data bytes.

    Layout: <STX> <frame_no> <data...> <ETX> <CS_hi> <CS_lo> <CR>
    The checksum covers <frame_no> through <ETX> inclusive.
    """
    content = raw_frame.strip(b"\r\n")
    if len(content) < 5:  # STX + >=1 byte + ETX + 2 checksum chars
        raise ASTMValidationError("FRAME_TRUNCATED", "Too few bytes for checksum extraction", frame_id)
    if content[0:1] != b"\x02":
        raise ASTMValidationError("MISSING_STX", "Frame does not begin with STX (0x02)", frame_id)

    etx_pos = content.find(b"\x03")
    if etx_pos == -1:
        raise ASTMValidationError("MISSING_ETX", "ETX (0x03) not found in frame", frame_id)

    cs_payload = content[1:etx_pos + 1]                       # frame_no .. ETX inclusive
    expected = content[etx_pos + 1:etx_pos + 3].decode("ascii", errors="ignore").upper()
    computed = astm_checksum(cs_payload)
    if computed != expected:
        raise ASTMValidationError(
            "CHECKSUM_MISMATCH", f"computed={computed}, expected={expected}", frame_id
        )
    return content[2:etx_pos]                                 # data bytes, frame_no dropped

The strict length guard and the exclusive/inclusive boundary arithmetic are what keep reassembly deterministic: the same bytes always yield the same verdict, which is what makes replay from the quarantine safe and lets an auditor reconstruct months later exactly why a frame was accepted or rejected.

Step 2: Enforce the R-record schema with Pydantic v2

A verified checksum proves the bytes are intact, not that they mean what the LIMS expects. Split the |-delimited R-record and coerce it into a frozen pydantic model so field constraints — identifier format, unit presence, result shape — are declarative data rather than scattered if branches. A ValidationError here is a semantic failure, distinct from the transport failures of Step 1, and must carry its own code.

python

from pydantic import BaseModel, Field, ValidationError, field_validator

# ASTM abnormal result flags: L/H (low/high), plus reference-range and null variants.
_FLAG_RE = re.compile(r"[<>]?[LHNAU]+|[\d.\-]+")


class ASTMResult(BaseModel):
    model_config = {"frozen": True}

    sequence: str = Field(pattern=r"^\d{1,3}$")
    test_code: str = Field(min_length=2, max_length=15)
    result_value: str
    units: str = Field(min_length=1)          # a result without a unit is uninterpretable
    reference_range: str | None = None
    observed_at: datetime

    @field_validator("result_value")
    @classmethod
    def numeric_or_flag(cls, v: str) -> str:
        if not _FLAG_RE.fullmatch(v):
            raise ValueError(f"result must be numeric or an ASTM flag (L, H, >H, N): {v!r}")
        return v


def parse_r_record(data: bytes, frame_id: str) -> ASTMResult:
    """Map an R-record's pipe-delimited fields onto the canonical result model."""
    fields = data.decode("utf-8", errors="replace").split("|")
    if not fields or fields[0].strip() != "R":
        raise ASTMValidationError("NOT_A_RESULT_RECORD", f"expected R-record, got {fields[:1]}", frame_id)
    if len(fields) < 5:
        raise ASTMValidationError("MALFORMED_RECORD", "insufficient pipe-delimited fields", frame_id)
    try:
        return ASTMResult(
            sequence=fields[1].strip(),
            test_code=fields[2].strip(),
            result_value=fields[3].strip(),
            units=fields[4].strip(),
            reference_range=fields[5].strip() if len(fields) > 5 and fields[5].strip() else None,
            observed_at=datetime.now(UTC),
        )
    except ValidationError as ve:
        raise ASTMValidationError("SCHEMA_VIOLATION", str(ve), frame_id) from ve

The patient identifier lives in the E1394 P-record, not the R-record, so this model deliberately does not invent one; result-to-patient linkage is resolved from the enclosing message’s P/O records by the same schema validation error handling layer before commit, keeping this function’s contract to a single record type.

Step 3: Route failures through a quarantine with an immutable audit trail

Validation is only trustworthy if every outcome — pass or fail — is recorded and every failure is recoverable. Wrap the two steps above in an async orchestrator that emits an AuditRecord for each frame, quarantines failures with the original hex so they can be replayed, and applies a circuit breaker when a run degrades. Keep the parse fully async so it composes with the rest of the pipeline without blocking the event loop.

python

import asyncio
import uuid
from collections import deque
from dataclasses import dataclass


@dataclass
class AuditRecord:
    frame_id: str
    decided_at: datetime
    status: str                       # VALIDATED | QUARANTINED
    error_code: str | None = None
    destination: str = "PENDING"


class ASTMIngestionPipeline:
    def __init__(self, max_concurrency: int = 10, quarantine_size: int = 1000):
        self.queue: asyncio.Queue[bytes] = asyncio.Queue()
        self.audit_trail: deque[AuditRecord] = deque(maxlen=10_000)
        self.quarantine: deque[dict] = deque(maxlen=quarantine_size)
        self.max_concurrency = max_concurrency
        self._paused = False

    def emergency_pause(self, reason: str) -> None:
        self._paused = True
        logger.critical("EMERGENCY_PAUSE: %s", reason)

    def resume(self) -> None:
        self._paused = False
        logger.info("PIPELINE_RESUMED")

    async def process_frame(self, raw_frame: bytes) -> ASTMResult | None:
        frame_id = uuid.uuid4().hex[:8]
        try:
            data = reconstruct_frame(raw_frame, frame_id)
            result = parse_r_record(data, frame_id)
            self.audit_trail.append(AuditRecord(frame_id, datetime.now(UTC), "VALIDATED", destination="LIMS_COMMIT"))
            return result
        except ASTMValidationError as e:
            self.audit_trail.append(AuditRecord(frame_id, datetime.now(UTC), "QUARANTINED", e.code))
            self.quarantine.append({
                "frame_id": frame_id, "error_code": e.code, "detail": e.message,
                "raw_hex": raw_frame.hex(), "at": datetime.now(UTC).isoformat(),
            })
            return None

    async def worker(self) -> None:
        while True:
            raw_frame = await self.queue.get()
            try:
                if not self._paused:
                    await self.process_frame(raw_frame)
            finally:
                self.queue.task_done()

    async def start(self) -> None:
        await asyncio.gather(*(self.worker() for _ in range(self.max_concurrency)))

A validated ASTMResult is handed to the async batch processing workers, which own throughput, idempotency, and the transactional LIMS commit; this stage owns only structural truth and never mutates a clinical value.

Step 4: Assemble multi-frame transmissions with a bounded buffer

Serial links fragment: a single R-record may span several frames, and a slow read can split one frame across two socket reads. Accumulate bytes in a bounded buffer and only invoke Step 1 once a complete <STX>…<CR> frame is present, so baud mismatches surface as an explicit FRAME_TRUNCATED rather than a garbled parse.

python

class FrameAssembler:
    """Accumulates raw bytes and yields complete STX..CR frames."""

    def __init__(self, max_buffer: int = 64 * 1024):
        self._buf = bytearray()
        self._max = max_buffer

    def feed(self, chunk: bytes) -> list[bytes]:
        self._buf.extend(chunk)
        if len(self._buf) > self._max:                       # runaway stream: refuse to grow unbounded
            self._buf.clear()
            raise ASTMValidationError("BUFFER_OVERFLOW", "frame exceeded max buffer without terminator")
        frames: list[bytes] = []
        while (cr := self._buf.find(b"\r")) != -1:
            frame, self._buf = bytes(self._buf[:cr + 1]), bytearray(self._buf[cr + 1:])
            if b"\x02" in frame:                             # ignore stray ENQ/ACK control bytes
                frames.append(frame)
        return frames

Verification & Testing

Confirm the two behaviors that matter most: a tampered byte fails the checksum before any field is read, and a clean frame round-trips into a typed result. Pin them with pytest.

python

import pytest


def _frame(body: bytes) -> bytes:
    """Build a valid ASTM frame: STX + body + ETX + checksum + CR."""
    payload = body + b"\x03"                                  # frame_no already in body
    cs = astm_checksum(payload).encode("ascii")
    return b"\x02" + payload + cs + b"\r"


def test_valid_r_record_round_trips():
    frame = _frame(b"1R|1|Na|140|mmol/L|136-145")
    data = reconstruct_frame(frame, "t1")
    result = parse_r_record(data, "t1")
    assert result.test_code == "Na"
    assert result.result_value == "140"
    assert result.units == "mmol/L"


def test_single_flipped_bit_fails_checksum():
    frame = bytearray(_frame(b"1R|1|K|5.1|mmol/L|3.5-5.1"))
    frame[6] ^= 0x01                                          # corrupt one data byte
    with pytest.raises(ASTMValidationError) as exc:
        reconstruct_frame(bytes(frame), "t2")
    assert exc.value.code == "CHECKSUM_MISMATCH"


def test_missing_unit_is_a_schema_violation():
    frame = _frame(b"1R|1|Cl|101||98-107")
    data = reconstruct_frame(frame, "t3")
    with pytest.raises(ASTMValidationError) as exc:
        parse_r_record(data, "t3")
    assert exc.value.code == "SCHEMA_VIOLATION"

Expected results: all three pass. The first proves the boundary arithmetic and field mapping agree; the second proves corruption is caught at the transport tier (CHECKSUM_MISMATCH) and never reaches the schema; the third proves a unit-less result is rejected as a SCHEMA_VIOLATION rather than committed. Extend the suite with golden-file fixtures — a captured multi-frame transmission fed byte-by-byte through FrameAssembler — and assert the quarantine receives exactly the frames you corrupted.

Compliance Note

Verifying the checksum and quarantining every unverifiable frame is the auditable mechanism for the results-review obligation in CLIA §493.1253(b), which requires the laboratory to establish and follow procedures ensuring results are accurate before release: a frame that fails checksum or schema is never released, and the coded quarantine record is the evidence. Because each AuditRecord is attributable, timestamped in UTC, and appended immutably alongside the original raw_hex, the trail also satisfies the tamper-evident electronic-record requirements of 21 CFR Part 11.10(e), letting a reviewer reconstruct the exact bytes and the exact reason for every accept-or-reject decision.

Troubleshooting

Every frame from one analyzer fails with CHECKSUM_MISMATCH but the data looks fine

The firmware is almost certainly emitting a non-standard checksum (an XOR or a modulo variant that excludes <ETX>) rather than the E1394 modulo-256 sum of frame-number-through-ETX. Capture a known-good frame, compute the expected value by hand, and if it disagrees with astm_checksum, normalize the checksum at the serial/FTP polling architecture boundary before this stage — do not loosen verification here.

Frames intermittently fail with FRAME_TRUNCATED under load

The reader is handing partial frames to reconstruct_frame directly. Route all raw bytes through FrameAssembler.feed first so a frame split across two socket reads is reassembled before validation, and wrap the read in asyncio.wait_for with a timeout so an unresponsive analyzer surfaces as an explicit stall instead of a half-frame.

The instrument keeps re-sending the same frame in a NAK loop

Your ACK/NAK line discipline is rejecting a frame the analyzer believes is valid, so it retransmits. If checksum validation fails three consecutive times for the same sequence number, stop NAK-ing, call emergency_pause, and alert biomedical engineering — a persistent mismatch is a wiring or firmware fault, and the same discipline described in handling HL7 ACK timeouts in clinical data pipelines applies to the ASTM link.

Result values with abnormal flags like >H are being rejected as SCHEMA_VIOLATION

The numeric_or_flag validator only accepts the flag characters in _FLAG_RE. If your analyzer uses site-specific abnormal codes (for example ! for a panic value or TNP for test-not-performed), extend the pattern deliberately and document each accepted token — never widen it to .*, which would let genuinely malformed values through the boundary.

Quarantined frames cannot be replayed because the payload is unreadable

Confirm you are storing raw_frame.hex() and not the decoded string. A frame is quarantined precisely because decoding or validation failed, so the decoded form is lossy; only the original bytes can be re-fed through FrameAssembler after the root cause is fixed.

Schema Validation & Error Handling — the stage specification this page implements, with the full ingress/egress contract and tiered error taxonomy.
Serial & FTP Polling Architectures — the upstream layer that captures raw ASTM bytes and normalizes non-standard checksums before this stage.
Handling HL7 ACK timeouts in clinical data pipelines — the ACK/NAK and circuit-breaker discipline that governs the ASTM low-level protocol.
Building a Python FTP watcher for hematology analyzers — a sibling ingestion build for analyzers that export files instead of streaming frames.
Converting legacy CSV instrument logs to HL7 ORU messages — the transformation stage a validated ASTM result feeds on its way to the LIMS.

Part of: Schema Validation & Error Handling