Observability in a Flask + Celery App Is Easy Until You Instrument It Twice
Most observability tutorials assume a simpler world than the one production Python apps actually live in.
They assume:
- one app process
- one startup path
- one instrumentation moment
- one idea of request lifecycle
Trek Point is not that world.
We have:
- a Flask app factory
- Gunicorn-style web processes
- Celery workers
- SQLAlchemy engines that should be instrumented once
- requests, Redis, and task execution crossing process boundaries
That means the hard part of observability is not “how do we emit spans?” It is “how do we avoid producing a noisy, misleading mess?”
Why We Used Both Sentry and OpenTelemetry
I do not believe one tool cleanly solves all observability needs in most product teams.
For Trek Point:
- Sentry gives us application error visibility and a familiar debugging workflow
- OpenTelemetry gives us a path for traces and logs across Flask, SQLAlchemy, Celery, Redis, and outbound HTTP
Those tools are not redundant. They answer different questions.
When a request crashes, Sentry is often the fastest route to the error. When a request is merely slow, fragmented across services, or degraded somewhere in a queue-backed path, tracing becomes more valuable.
That division of labor is healthy.
The Real Problem Was Instrumentation Lifecycle
What bit us conceptually was not how to turn tracing on. It was when instrumentation happens.
In an app-factory world, create_app() may run more often than you think:
- once for the web app
- again in worker contexts
- sometimes twice per process depending on boot paths and imports
That makes “instrument everything during startup” trickier than it sounds. If you patch SQLAlchemy, Flask, Celery, requests, or Redis repeatedly, you can end up with warnings, duplicate hooks, or inconsistent runtime behavior.
That is why I liked the discipline in our telemetry setup: treat cross-cutting instrumentors as per-process singletons, guard them carefully, and only instrument the app itself when needed.
This is the kind of detail that does not show up in architecture diagrams but absolutely matters in production.
SQLAlchemy Was a Good Example
Database instrumentation is often deceptively stateful.
If you instrument after engines are already created, you can miss things. If you instrument too broadly on every app startup, you can get duplicate instrumentation warnings. In a codebase with an app factory and worker imports, the timing matters.
That is why observability code deserves the same design care as business logic. It is not just config.
Logs, Traces, and Errors Need a Shared Mental Model
One thing I try to avoid is collecting every possible signal without deciding how engineers should use them.
The better question is:
“What debugging story are we trying to support?”
For Trek Point, the useful story looked something like this:
- an exception reaches Sentry
- traces show the request path, SQL timing, Redis behavior, and outbound requests
- task execution can be correlated when work moves from request thread to Celery
- logs can be exported with the same service identity into the same telemetry backend
That is much better than a tool-by-tool rollout where each signal exists in isolation.
Production Deployment Details Matter
Telemetry setup is one of those areas where local success tells you almost nothing.
A production-ready setup has to account for:
- exporter configuration
- service naming
- sampling strategy
- process model
- whether instrumentation is safe under repeated boot
I have seen plenty of teams “add OpenTelemetry” and still end up blind because the lifecycle assumptions were wrong. Instrumentation is code. It needs to be reviewed with runtime behavior in mind.
What I’d Encourage More Teams to Do
Treat observability setup as a first-class subsystem, not a wrapper around environment variables.
That means:
- document how each process type is instrumented
- guard singleton patchers carefully
- decide what each telemetry tool is responsible for
- trace the paths users actually care about, not just happy-path web requests
In products like Trek Point, some of the most interesting failures happen between the request and the worker, or between the upload and the derived media. If your observability story stops at Flask requests, you are missing half the product.
The Main Lesson
The difficulty of observability in Python is rarely “can we install the package?” The difficulty is making instrumentation reflect the real execution model of the app.
In Trek Point, the good work was not just turning on tracing. It was being explicit about repeated startup paths, singleton instrumentation, and how web requests, database work, outbound calls, Redis, and Celery should appear as one understandable system.
That is what observability should do: make a multi-part product feel legible when it misbehaves.