There are two ways teams usually add observability.

The first is deliberate: initialize telemetry before the application boots so you capture startup, request lifecycle, and error context from the beginning.

The second is common: bolt things on later, discover blind spots during incidents, and slowly fill them in while promising to clean it up “next sprint.”

This gateway leans toward the first path, and that is one of the strongest signals that it was built with production in mind.

The key design choice

In package.json, both start and dev run Node with --import ./src/instrument.js before starting src/server.js.

That small startup detail does a lot of work. It ensures OpenTelemetry and Sentry are initialized before the Express app is imported and before any requests are handled. If you care about capturing the full request path and startup behavior, initialization order matters.

Too many services get this wrong and then wonder why their traces start halfway through the stack.

What this setup includes

src/instrument.js wires together three observability layers:

  • OpenTelemetry auto-instrumentation with an OTLP exporter to Honeycomb
  • Sentry error and log ingestion
  • Pino-based application and HTTP logging through src/services/logger.js

I like this combination because each tool is doing a distinct job:

  • traces explain request flow and dependency timing
  • logs explain local events and debugging detail
  • Sentry captures failures and gives operators an incident workflow

The service is not trying to force one tool to solve every problem.

The privacy detail I was happy to see

beforeSend() removes x-api-key and authorization headers before events leave the process. That is one of those details that separates “we installed Sentry” from “we thought about operating this safely.”

Telemetry systems are where sensitive data goes to become permanent if you are careless. Scrubbing secrets at the boundary is not glamorous work, but it is the sort of thing mature teams automate early.

What is good and what is missing

The code already captures a lot of value with relatively little machinery. Prometheus metrics exist in src/services/metrics.js, request IDs are added in src/middleware/requestContext.js, upstream attempts are logged with timing in src/services/httpClient.js, and Express error handling is connected to Sentry.

But the observability story is not finished, which makes it interesting.

The custom metrics expose a request counter labeled by endpoint, mode, and status. That is useful, but it is only the start of a RED-style metrics view. I would want latency histograms, cache hit ratios, breaker-open counters, and region-level upstream health metrics before calling this fully mature.

Why I still like the current approach

This code is a good example of choosing the right first 80 percent. It does not try to build an internal observability platform. It uses standard tools, initializes them early, and keeps the integration close to the app lifecycle.

That is exactly the sort of restraint I want in a service like this. Instrumentation should make the system clearer, not become a second system that also needs constant care.

What I would improve next

I would add:

  • Prometheus histograms for request duration and upstream latency
  • explicit cache metrics for LRU and Redis tiers
  • region labels for upstream error counts
  • a small dashboard that correlates health probes, breaker events, and request latency

With those additions, operators could answer the important questions faster: is the gateway slow, is one region unhealthy, or is the cache just cold?

The lesson

Good observability is not about collecting everything. It is about collecting the right signals early enough that you can explain the system under stress.

Preloading telemetry before app.listen() is a strong architectural move because it says observability is part of the runtime contract, not an afterthought. The tools may change later. That design instinct should not.