Streaming Excel to a Database Without Losing a Single Row

Part 3 ended with the API surface table and two endpoints flagged for deeper treatment: GET /warehouse/snapshot (the simple export path) and POST /warehouse/update (the hard import path). Both endpoints move warehouse inventory data in and out of the system, but the engineering effort behind them is wildly asymmetric. The export is one stored procedure call, one serialization step, one HTTP response, a few dozen lines of code that worked on the first try. The import is a gRPC streaming pipeline with row-by-row validation, typed error codes, a staging table, and an atomic database swap, built that way because a single malformed row in a bulk update can corrupt every inventory position in the database if you’re not careful.

This post covers both paths end-to-end, then the database architecture that makes them possible: the Shared Database Pattern, the stored procedures that serve as a contract layer with the live ERP, and the SQL Server 2008 to 2019 migration. It closes with honest lessons learned from building a production system alone for 2.5 years.

Why Excel

Warehouse staff needed two things from the data layer: offline snapshots they could print, annotate, and share with colleagues during physical inventory counts; and a way to bulk-update the warehouse when stock arrived, positions moved, or the team reorganized shelf contents. They had no direct database access, and giving them database access was never a serious option. A stray UPDATE against the live ERP could break order processing for the entire company.

Excel was the interface because it was already the interface. The warehouse team used Excel for everything: ordering, scheduling, reporting. Asking them to learn a different tool for inventory management would have added training friction to a system whose value depended on immediate adoption. The system met them where they were: click a button, get a spreadsheet; edit the spreadsheet, upload it back. The complexity of making that round-trip safe was mine to absorb, not theirs.

The Export Path

The simple case first. An operator clicks “Download Snapshot” in the React UI. The frontend sends GET /warehouse/snapshot to the REST API. The controller calls the manager, which calls the repository, which executes a single stored procedure (Snapshot export) against the ERP database. The procedure returns the full warehouse state: every inventory slot, its item, and its current quantity. The manager serializes the result into an Excel file with a canonical 4-column schema and returns it as a binary HTTP response. The browser downloads warehouse-snapshot.xlsx.

Warehouse snapshot export: one stored procedure, one serialization

// packages/rest-api/src/warehouse/manager.ts

async function exportWarehouseSnapshot(
  repo: WarehouseRepository
): Promise<Buffer> {
  // One stored procedure call returns the full warehouse state
  const slots = await repo.getSnapshot();

  // Serialize to the canonical 4-column schema
  return generateExcel(slots, {
    columns: [
      { header: 'Position',    key: 'position'    },
      { header: 'Item ID',     key: 'itemId'      },
      { header: 'Quantity',    key: 'quantity'     },
      { header: 'Description', key: 'description'  },
    ],
    sheetName: 'Warehouse Snapshot',
  });
}

The 4-column schema is deliberate: slot position (e.g., A4.2), human-friendly item ID (e.g., ESP32S3), available quantity (e.g., 45), and a description. These same four columns appear in both the export and the import. An operator can export the current warehouse state, open the file in Excel, edit quantities or add new rows, and upload the modified file as an import. The schema round-trips by design: the output format of the export is the input format of the import. This meant the warehouse team could treat the Excel file as a working document: export the truth, edit it to reflect reality after a stock delivery or a shelf reorganization, import the changes. No separate data entry tool, no manual transcription between formats.

The Excel exporter is a utility package in the monorepo (Part 3 describes the package structure). It wraps the exceljs library with the warehouse-specific column definitions and sheet formatting. One function, one stored procedure, one download. The entire export path is roughly 40 lines of code across the manager, repository, and utility. It worked on the first deployment and never broke.

Export Path Walkthrough

Watch the export flow from button click to file download.

React UIClick "Download Snapshot"

REST APIGET /warehouse/snapshot

Stored ProcExecute Snapshot export

Excel ExportSerialize 4-column schema

Downloadwarehouse-snapshot.xlsx

Why the Import Path Is Hard

The export is a read. The import is a write to a live ERP database. Three risks make this categorically different from the export.

Partial state. If the import processes 200 rows and crashes on row 147, what happens to the first 146? Without explicit transaction boundaries, they’ve already been written to production. The warehouse database is now in a state that doesn’t match either the old reality or the new one. Some positions reflect the Excel file, others reflect the previous state. An operator looking at the system would see inconsistencies they can’t explain, and the only recovery would be re-importing the entire file after manually verifying which rows actually made it.

Bad data overwriting good data. If row 83 contains a typo, A4.2 instead of A4.12, or a negative quantity, or an item ID that doesn’t exist in the ERP, and the system writes it to production without validation, it corrupts a real inventory record. The warehouse team wouldn’t discover the error until an operator walks to position A4.2 and finds the wrong item, or the optimizer routes someone to a position that no longer contains what the database claims.

Memory pressure. A large Excel file loaded entirely into memory in the REST API process, on a 4 GB server already running the WASM optimization engine and serving route requests, risks an out-of-memory crash that takes down the entire API. The export path is safe because the stored procedure result set is bounded by the warehouse size (fewer than 100 positions, a few hundred rows at most). The import path is riskier because the file size depends on what the operator uploads, and I couldn’t predict or control that.

Every design decision in the import pipeline exists to mitigate one of these three risks: streaming to avoid memory exhaustion, validation to reject bad data before it touches production, and staging with an atomic swap to prevent partial state.

The Import Flow: gRPC Client-Side Streaming

When an operator uploads an Excel file through the React UI, the frontend sends it to POST /warehouse/update as a multipart form upload. The REST API receives the file, parses it into rows using the CSV parser utility package, and then, instead of writing those rows directly to the database, opens a gRPC stream to the CSV updater service (the separate process described in Part 3) and sends each row individually.

gRPC client-side streaming: rows sent one at a time

// packages/rest-api/src/warehouse/manager.ts

async function streamWarehouseUpdate(
  rows: WarehouseRow[],
  updaterClient: WarehouseUpdaterClient
): Promise<UpdateResult> {
  return new Promise((resolve, reject) => {
    const stream = updaterClient.updateWarehouse(
      (err, result) => {
        if (err) return reject(err);
        resolve(result);
      }
    );

    for (const row of rows) {
      stream.write(row);
    }

    stream.end();
  });
}

The streaming architecture solves the memory problem. The REST API parses the Excel file and writes rows to the gRPC stream incrementally. The CSV updater receives rows one at a time and processes them incrementally. At no point does either process hold the entire file’s worth of validated data in memory simultaneously; the stream provides natural backpressure, and each row is processed (validated, staged) as it arrives.

The separate gRPC service solves the isolation problem. The CSV updater runs as its own Node.js process with its own systemd service unit (shown in Part 3’s deployment section). If the import hangs, whether from a slow stored procedure, a lock contention issue, or an unexpected edge case in the validation logic, the REST API continues serving route optimization requests without interruption. If the updater crashes, systemd restarts it automatically, and the next import starts clean.

gRPC solves the protocol problem. Client-side streaming is a first-class pattern in gRPC’s protobuf contract: the client opens a stream, sends messages one at a time, and the server responds with a single result after all messages are received. REST has no standard equivalent. HTTP chunked transfer encoding can approximate streaming, but it lacks the typed message framing, backpressure semantics, and error propagation that gRPC provides through protobuf. For a pipeline where each streamed row carries structured data and the server needs to return structured errors, gRPC’s contract was a natural fit.

On client-side streaming: gRPC client-side streaming works like this. The client opens a single HTTP/2 connection and sends multiple protobuf-encoded messages (rows) sequentially over the same stream. The server receives each message as it arrives, but withholds its response until the client signals the stream is complete (by calling stream.end()). The server then sends a single response message, in this case UpdateResult containing the success flag, row count, and any validation errors. The key difference from REST is that the connection stays open for the entire duration of the upload, and the server can begin processing rows before the last one has been sent.

The Protobuf Contract

The protobuf definition specifies the exact shape of every message that crosses the gRPC boundary: the row data, the two categories of validation errors, and the final result.

warehouse-updater.proto: typed rows, typed errors

// proto/warehouse-updater.proto

message WarehouseRow {
  string position = 1;   // e.g., "A4.2"
  string item_id  = 2;   // e.g., "ESP32S3"
  int32  quantity = 3;    // non-negative integer
  string description = 4; // human-readable label
}

message ParseError {
  enum Code {
    INVALID_POSITION_FORMAT = 0;
    NON_NUMERIC_QUANTITY    = 1;
    NEGATIVE_QUANTITY       = 2;
    EMPTY_ITEM_ID           = 3;
  }
  Code   code    = 1;
  int32  row     = 2; // 1-indexed row in the Excel file
  string detail  = 3; // human-readable explanation
}

message SemanticError {
  enum Code {
    ITEM_NOT_FOUND     = 0;
    POSITION_NOT_FOUND = 1;
  }
  Code   code    = 1;
  int32  row     = 2;
  string detail  = 3;
}

message UpdateResult {
  bool                success         = 1;
  int32               rows_processed  = 2;
  repeated ParseError parse_errors    = 3;
  repeated SemanticError semantic_errors = 4;
}

service WarehouseUpdater {
  // Client streams rows one at a time; server responds with the result.
  rpc UpdateWarehouse (stream WarehouseRow) returns (UpdateResult);
}

Two error types, each with explicit integer codes. ParseError covers structural problems: malformed position formats that don’t match {Aisle}{Rack}.{Shelf}, non-numeric or negative quantities, empty item IDs. SemanticError covers logical problems: an item ID that doesn’t exist in the ERP’s item table, or a position code that doesn’t correspond to any slot in the warehouse grid. The distinction matters for the operator: parse errors mean “fix your spreadsheet,” semantic errors mean “this item or position doesn’t exist in the system.”

Error codes are explicit integers, not string messages. The REST API can map each code to a localized, human-readable message for the React frontend without parsing free-text error strings. If the Italian warehouse team needed error messages in Italian, only the frontend’s translation table would change; the protobuf contract, the gRPC service, and the validation logic would remain untouched. The error codes are the contract; the human-readable messages are a presentation concern.

Validation Rules

Each streamed row passes through validation before it can reach the staging table. The rules are strict and exhaustive because the alternative, partial validation with fallback behavior, would mean bad data reaching production under conditions the operator can’t predict or understand.

Position format. The position string must match the {Aisle}{Rack}.{Shelf} encoding introduced in Part 1: a letter followed by digits, a dot, then more digits. A4.2 passes. A4 fails (no shelf). 4A.2 fails (aisle must be a letter). The regex is simple; the point is that it runs on every row, not just the ones that look suspicious.

Item existence. The item ID must exist in the ERP’s item master table. This is a database lookup, not a format check. It catches typos that happen to be valid strings but don’t correspond to real items. ESP32S3 exists, ESP32S4 doesn’t, and the validation tells the operator exactly which row contains the problem.

Numeric quantity. Must be a valid non-negative integer. Rejects text values, negative numbers, empty cells, and floating-point values. A quantity of 0 is valid; it has special semantics described below.

Slot reference semantics. This is where the import becomes more than a simple data load. Each row in the Excel file carries one of three implicit operations, determined by the relationship between the row’s data and the current production state:

Create. The row references a position-item pair that doesn’t currently exist in production. The import creates a new inventory slot, meaning the item is now stored at a position where it wasn’t before. This happens when the warehouse team adds stock to new locations.
Update. The position-item pair already exists in production, but the quantity differs. The import overwrites the old quantity with the new one. This is the common case after a physical count reveals discrepancies.
Destroy. The row sets the quantity to 0 for a position-item pair that exists in production. The import removes that inventory slot entirely; the item is no longer stored at that position. This happens when the warehouse team empties a shelf during reorganization.

The warehouse team could perform a full reorganization, moving items between positions, clearing old shelves, stocking new ones, entirely through a single Excel import. Export the current state, edit it to reflect the new layout, upload. The create/update/destroy semantics meant they never needed to distinguish between “add this” and “change this” and “remove this” in the spreadsheet; the system inferred the operation from the data.

On the import semantics: the three slot reference semantics mean the import is not an idempotent PUT. It’s a declarative state transfer: the Excel file describes what the warehouse should look like for the positions it covers, and the system computes the diff against production. Positions not mentioned in the file are untouched. Positions mentioned with quantity > 0 are created or updated. Positions mentioned with quantity = 0 are destroyed. The atomic swap described below applies this diff as a single transaction.

Staging Table and Atomic Swap

Validated rows don’t go directly into the production InventorySlots table. They land in a staging table, a temporary holding area that mirrors the production table’s schema but is invisible to the rest of the system. The optimization engine, the item lookup endpoints, the snapshot export: none of them read from the staging table. It’s a quarantine zone.

Once all rows are validated and staged, a stored procedure (Warehouse update) executes inside a single SQL Server transaction. Three steps, all or nothing:

Atomic swap: staging to production in one transaction

-- Stored Procedure: Warehouse update (simplified)

BEGIN TRANSACTION

  -- Step 1: Delete current production data for affected positions
  DELETE FROM InventorySlots
  WHERE position_id IN (
    SELECT position_id FROM StagingInventory
  );

  -- Step 2: Copy validated staging data into production
  INSERT INTO InventorySlots (position_id, item_id, quantity)
  SELECT position_id, item_id, quantity
  FROM StagingInventory
  WHERE quantity > 0;  -- quantity = 0 means "destroy this slot reference"

  -- Step 3: Clean up the staging table
  DELETE FROM StagingInventory;

COMMIT TRANSACTION
-- If any step fails, the entire transaction rolls back.
-- Production data is never in a partial state.

Step 1 deletes the current production data for every position that appears in the staging table. Step 2 copies the staged rows into production, but only rows with quantity > 0. Rows with zero quantity are the “destroy” semantic, and they accomplish their goal through Step 1’s delete without being re-inserted. Step 3 cleans up the staging table.

If any step fails, whether a constraint violation, a deadlock, or a disk error, the entire transaction rolls back. The production table returns to its pre-import state. No partial writes, no inconsistencies, no rows from the Excel file mixed with rows from the previous state.

This is a swap, not an upsert. The distinction matters. An upsert (INSERT ... ON CONFLICT UPDATE) would update existing rows and insert new ones, but it wouldn’t handle the “destroy” case. Positions that existed in production but were removed from the Excel file (set to quantity 0) would survive the upsert and remain as stale inventory records. The swap approach deletes everything for the affected positions first, then inserts only what the Excel file says should exist. The Excel file is the single source of truth for the positions it covers; the swap enforces that guarantee.

On atomicity: an atomic transaction means that all operations within the transaction boundary either succeed together or fail together. There is no intermediate state visible to other database connections. While the Warehouse update procedure is executing, other queries (the route optimizer, the item lookup API) see the old production data until the COMMIT completes. The moment the commit succeeds, they see the new data. SQL Server’s transaction isolation guarantees this; it’s not application-level locking, it’s the database engine’s fundamental consistency mechanism.

Import Pipeline Simulator

Import rejected at validation

Excel Rows (10)

#	Position	Item	Qty
1	A4.2	ESP32S3	45
2	B3.5	C100nF	200
3	4A.2	M8x16	340
4	C7.1	M8x16	340
5	E5.3	XYZZY	52
6	H2.7	LED5mm	1200
7	I3.4	RES10K	-5
8	A1.1	CAP22u	0
9	D6.2	USB-C	30
10	F4.3	XT60-F	18

Staging Table (0 rows)

Empty

Error Handling and Partial Failure

Three things can go wrong during an import, and the system handles each differently.

Validation errors during streaming. The gRPC service validates every row as it arrives and collects all parse and semantic errors into a list. It does not stop at the first error; it validates the entire file and returns every problem at once. The import is then rejected entirely; nothing reaches the staging table, and the production database is untouched. The operator receives the full error list in the React UI and can fix every problem in the Excel file before retrying. Rejecting the entire import on any validation failure is a deliberate choice: a partial import (apply the valid rows, skip the invalid ones) would leave the warehouse in a state that matches neither the old reality nor the operator’s intended new reality. All or nothing is easier to reason about, and the operator always knows exactly what state the database is in.

Transaction failure. If the stored procedure’s atomic swap fails, whether from a deadlock with another ERP process accessing the same tables, a constraint violation from a concurrent write, or a disk space error, the SQL Server transaction rolls back entirely. The staging table is cleaned up. The production data is unchanged. The gRPC service returns an error, the REST API forwards it to the frontend, and the operator can retry. Because the swap is a single transaction, there is no ambiguity about what happened: either the entire import succeeded, or nothing changed.

gRPC service crash during import. If the CSV updater process crashes while rows are being streamed, whether from an unhandled exception, an out-of-memory error, or a segfault in a native module, the REST API detects the broken gRPC stream and returns an error to the frontend. The staging table may contain partial data from rows that were validated before the crash, but because the atomic swap stored procedure hasn’t executed, production is untouched. The staging table is cleaned up on the next import attempt. The operator sees a generic “import failed” error and retries; the system’s state is exactly what it was before the import started.

The common thread across all three failure modes: production data never enters an inconsistent state. The staging table absorbs the risk. The atomic swap is the only path from staging to production, and it either completes fully or rolls back entirely.

Excel Round-Trip Demo

Edit quantities in the Excel panel (click a number). Set to 0 to destroy a slot. Then click "Re-import".

Production (current)

Pos	Item	Qty
A4.2	ESP32S3	45
B3.5	C100nF	200
C7.1	M8x16	340
E5.3	XT60-M	52
H2.7	LED5mm	1200
I3.4	RES10K	500

Excel file (editable)

Pos	Item	Op
A4.2	ESP32S3	–
B3.5	C100nF	–
C7.1	M8x16	–
E5.3	XT60-M	–
H2.7	LED5mm	–
I3.4	RES10K	–

On the operator experience: the round-trip means the warehouse team doesn’t need a second tool for data entry. Export the current state, edit it in the same Excel they already know, upload the changes. Failed imports produce a clear list of every bad row (“row 14: item XYZZY not found,” “row 23: position B99.4 doesn’t exist”), not cryptic database errors. Production data is never in a partial state. The worst that happens on a failed import is that nothing changes, and the operator fixes the spreadsheet and tries again.

The Database: Shared by Design

The warehouse log system reads and writes the same Microsoft SQL Server database that Vivaldi’s ERP uses for order management, invoicing, and inventory tracking. No separate database, no replication, no synchronization jobs. The system queries the ERP’s item master table to validate imports, reads order line tables to resolve batch orders, and writes inventory slot data that the ERP’s own workflows can also see.

Shared databases are generally considered an anti-pattern in service-oriented architecture. The canonical advice is to give each service its own data store, communicate through APIs, and accept the complexity of eventual consistency. That advice exists for good reasons: shared databases create tight coupling between services, schema changes in one service can break another, and a misbehaving query from one service can degrade performance for all others.

None of those reasons applied here. There was one engineer (me), one application consuming the warehouse tables, and one ERP that owned the schema. Introducing a separate database would have meant building a synchronization layer to keep the two data stores consistent, and getting that synchronization wrong would be far worse than the coupling I was trying to avoid. The ERP’s item table changes when someone adds a new product; my system needs to see that change immediately, not after a replication lag. The ERP’s order table changes when a customer places an order; the warehouse system needs to resolve that order within seconds of the barcode scan. A separate database with eventual consistency would have added latency, complexity, and an entire category of “stale data” bugs, all to solve a coupling problem that didn’t exist for a single-engineer project.

Stored procedures serve as the contract layer between the warehouse log system and the ERP. The system never runs raw SQL against ERP tables; every database interaction goes through a stored procedure with a defined signature. If the ERP vendor changes a column name or reorganizes a table, only the stored procedure’s internals change. The REST API, the gRPC services, and the optimization engine are insulated from the schema change because they call the procedure by name and receive a result set with stable column names. The stored procedures are the seam between two systems that share physical storage but have independent release cycles.

Five stored procedures handle all database interactions:

Procedure	Purpose	Used By
Item resolver	Given ORC order codes, returns every requested item with all positions and available quantities	Route optimization (Part 2)
Order lookup	Returns which order codes requested each item	Order detail UI
Warehouse update	Atomic swap: copies staging table to production within a single transaction	Excel import (this post)
Snapshot export	Full warehouse state export for Excel download	Excel export (this post)
Open orders list	Lists all open order codes for the batch order UI	Order selection UI

The stored procedures are the only code that touches ERP tables directly. The REST API calls procedures; the gRPC services call procedures; the optimizer consumes procedure results. If I had to redraw the system’s trust boundary, it would sit at the stored procedure layer. Everything above it is the warehouse log system, everything below it is the ERP.

Stored Procedure Contracts

Click a procedure to see its inputs, outputs, and transaction logic. Teal borders indicate procedures covered in this post.

All 5 procedures are the only code that touches ERP tables directly.

SQL Server Migration: 2008 to 2019

The ERP database migrated from SQL Server 2008 to SQL Server 2019 during the project’s lifetime. This wasn’t my decision; Vivaldi’s IT team drove the migration as part of a broader infrastructure upgrade. But I had to update every stored procedure to work on the new version and take advantage of features that didn’t exist in 2008.

There is no direct upgrade path from SQL Server 2008 to 2019. Microsoft supports upgrading only between adjacent major versions, so the options were a two-step upgrade (2008 to 2012 to 2019, each requiring a full compatibility check and potential downtime) or a backup/restore to a fresh 2019 instance. Vivaldi’s IT team chose the backup/restore path, which meant testing every stored procedure against the new instance before cutting over.

The most useful change was STRING_AGG. SQL Server 2008 has no built-in function for concatenating values from multiple rows into a single comma-separated string. The Item resolver procedure needed this for its output format: when an item exists at three warehouse positions, the procedure returns a single row with all positions concatenated. In 2008, I wrote this using the FOR XML PATH('') trick: convert the values to XML, strip the XML tags, and you get a comma-separated string. It works, but it’s unreadable:

-- SQL Server 2008 polyfill for STRING_AGG
STUFF((
  SELECT ',' + p.position_code
  FROM Positions p
  JOIN InventorySlots s ON s.position_id = p.id
  WHERE s.item_id = i.id
  FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 1, '')

SQL Server 2017 introduced STRING_AGG as a native aggregate function. The same query becomes:

-- SQL Server 2017+ native
STRING_AGG(p.position_code, ',')

One line instead of six. More importantly, STRING_AGG communicates intent, “concatenate these values,” where the XML trick communicates mechanism. Every stored procedure that aggregated strings was rewritten to use the native function.

The migration had one subtle gotcha that cost me an afternoon of debugging. Restoring a SQL Server 2008 backup onto a 2019 instance preserves the original database’s compatibility level. The engine is 2019, but the database behaves as if it’s running on 2008. New syntax, new functions, new query optimizer behaviors are all disabled until you explicitly raise the compatibility level:

ALTER DATABASE VivaldiERP SET COMPATIBILITY_LEVEL = 150; -- SQL Server 2019

I wrote the updated stored procedures, tested them against the 2019 instance, and they failed with syntax errors on STRING_AGG. The function existed in the engine but was inaccessible because the database was still running at compatibility level 100 (SQL Server 2008). Raising the compatibility level fixed it immediately, but the error message, “STRING_AGG is not a recognized built-in function name,” gave no hint that the problem was the compatibility level, not the SQL Server version. If you’re migrating SQL Server databases: check the compatibility level first. The engine version and the database compatibility level are independent settings, and the database level is what determines which features are available.

On the compatibility level: this gotcha is a consequence of SQL Server’s backwards-compatibility guarantee. A database restored from a backup behaves identically to the source version, regardless of the target engine’s capabilities, until the DBA explicitly opts in to the new behavior. This is a feature, not a bug. It means you can migrate the engine without risking query plan regressions, but it violates the principle of least surprise for anyone who assumes “new engine = new features.”

Lessons Learned

I’m ending the series with lessons, not conclusions, because the honest takeaways from 2.5 years of solo development are more useful than a summary of what I’ve already described.

I should have set up CI/CD from the start. Manual SCP deployment was “good enough” for two years, and I kept telling myself I’d automate it later. I never did. The cost was invisible until it wasn’t: at least twice I deployed a build that worked on my machine but failed on the server due to a Node.js version mismatch or a missing native dependency. A basic GitHub Actions pipeline with a staging step would have caught both issues before they reached production. The barrier wasn’t technical; it was convincing Vivaldi’s IT team to open SSH access to a CI runner, and I chose to spend that political capital elsewhere. In hindsight, that was a mistake. Every manual deployment was a dice roll where the odds were fine 98% of the time and catastrophic the other 2%.

I have no real performance data. The REST API measures WASM execution time per request and logs it via Pino.js, but those measurements aren’t persisted. I can’t state the p95 response time over the project’s lifetime, the distance cache hit rate, or how memory usage trended across months of production operation. I know the system was “fast enough” because operators never complained about wait times, but “fast enough” is not a number. The cache holds up to 4,560 entries (Part 2 explains why); I never measured how quickly it filled or what the hit rate was after the warm-up period. If I were building this again, structured metrics from day one: response time percentiles, memory snapshots, distance cache statistics, stored procedure execution times. The logging infrastructure was there (Pino.js, the gRPC pipeline from Part 3); I just never wrote a metrics persistence layer to go with it.

The core query has a performance anti-pattern. The Item resolver stored procedure uses LIKE '%...', a leading-wildcard pattern match, to filter order codes. Leading wildcards prevent SQL Server from using an index on that column; the query engine falls back to a full table scan. I wrote it this way because the ERP’s order code format was inconsistent; some codes had prefixes, some didn’t, and a leading-wildcard match was the simplest way to handle all variants. A table-valued parameter approach (pass the exact codes as a structured parameter, join against them) would scale better if order volume increased. At Vivaldi’s throughput, dozens of batch orders per day, not thousands, it was never a bottleneck. But it’s the kind of decision that would bite hard in a higher-throughput environment, and I documented it here so the next engineer who touches this code knows where the landmine is.

The first request after a restart is slow. Startup pays three costs simultaneously: WASM module instantiation, grid singleton initialization (the OnceCell from Part 2), and a completely cold JPS distance cache. Subsequent requests are fast because everything is warm. A single warm-up request fired from the systemd ExecStartPost hook, sending a dummy batch order to /items/sort-batch-orders immediately after the process starts, would have eliminated this latency spike entirely. I never added it because restarts were rare (the system ran for weeks between deploys) and the delay was a few seconds at most. An operator hitting the system at the exact moment of a restart would wait noticeably longer for their first route; every request after that would be instant. The fix is trivial; the fact that I never implemented it is a reminder that “rare and minor” problems accumulate into a system that’s slightly worse than it could be.

~74% test coverage sounds fine until you look at where the gaps are. The shared utility packages had 100% coverage: every domain type, every validation function, every configuration parser. The environment configuration package had 100% coverage. The critical path through the WASM optimization engine had none. The stored procedures had no automated tests at all; I tested them manually against a development database and hoped the production schema was close enough. A single number, “74% coverage,” is misleading because it treats a tested utility function and an untested database query as equally important. Test coverage as a metric is only meaningful when you know what isn’t covered. In this codebase, the untested code was the code that mattered most.

The Final Lesson

Compile-time guarantees and operational simplicity mattered more than theoretical optimality. The multi-phase greedy solver is theoretically suboptimal for pure TSP distance; simulated annealing would find shorter tours. But the greedy solver is deterministic, it encodes ergonomic priorities that stochastic optimization can’t express structurally, and it never produces a route that surprises an operator. The atomic swap is theoretically more expensive than an upsert; it deletes and re-inserts instead of updating in place. But the swap guarantees that production data is always consistent, and the warehouse team never has to wonder whether a failed import left stale rows behind. The staging table adds a step that a direct write wouldn’t need. The gRPC streaming adds a process boundary that an in-process function wouldn’t need. The systemd deployment adds manual steps that Docker would automate.

Every one of these choices traded theoretical efficiency for operational predictability. A system that’s slightly slower but always correct, always recoverable, and always debuggable by one engineer six months later with no context, that’s the system worth building. The warehouse team didn’t care about theoretical distance optimality or database write throughput. They cared that the routes were sensible, the imports were safe, and the system was running when they showed up to work in the morning. For 2.5 years, it was.

Development Statistics

Metric	Value
Total commits	319 (across all branches)
Active development	September 2019 – March 2022 (2.5 years)
Peak activity	February 2020 (53 commits in one month)
Longest dormancy	May 2021 – December 2021 (8 months)
TypeScript packages	14
Rust crates	2
WASM binary size	~144 KB
Stored procedures	10+
Test coverage	~74%
Optimization generations	3 (TypeScript → C++/N-API → Rust/WASM)
Database	SQL Server 2008 → 2019
Languages used	TypeScript, C++ → Rust + WASM, SQL, Haskell (removed)