Observability in a Flask + Celery App Is Easy Until You Instrument It Twice

Most observability tutorials assume a simpler world than the one production Python apps actually live in.

They assume:

one app process
one startup path
one instrumentation moment
one idea of request lifecycle

Trek Point is not that world.

We have:

a Flask app factory
Gunicorn-style web processes
Celery workers
SQLAlchemy engines that should be instrumented once
requests, Redis, and task execution crossing process boundaries

That means the hard part of observability is not “how do we emit spans?” It is “how do we avoid producing a noisy, misleading mess?”

Why We Used Both Sentry and OpenTelemetry

I do not believe one tool cleanly solves all observability needs in most product teams.

For Trek Point:

Sentry gives us application error visibility and a familiar debugging workflow
OpenTelemetry gives us a path for traces and logs across Flask, SQLAlchemy, Celery, Redis, and outbound HTTP

Those tools are not redundant. They answer different questions.

When a request crashes, Sentry is often the fastest route to the error. When a request is merely slow, fragmented across services, or degraded somewhere in a queue-backed path, tracing becomes more valuable.

That division of labor is healthy.

The Real Problem Was Instrumentation Lifecycle

What bit us conceptually was not how to turn tracing on. It was when instrumentation happens.

In an app-factory world, create_app() may run more often than you think:

once for the web app
again in worker contexts
sometimes twice per process depending on boot paths and imports

That makes “instrument everything during startup” trickier than it sounds. If you patch SQLAlchemy, Flask, Celery, requests, or Redis repeatedly, you can end up with warnings, duplicate hooks, or inconsistent runtime behavior.

That is why I liked the discipline in our telemetry setup: treat cross-cutting instrumentors as per-process singletons, guard them carefully, and only instrument the app itself when needed.

This is the kind of detail that does not show up in architecture diagrams but absolutely matters in production.

SQLAlchemy Was a Good Example

Database instrumentation is often deceptively stateful.

If you instrument after engines are already created, you can miss things. If you instrument too broadly on every app startup, you can get duplicate instrumentation warnings. In a codebase with an app factory and worker imports, the timing matters.

That is why observability code deserves the same design care as business logic. It is not just config.

Logs, Traces, and Errors Need a Shared Mental Model

One thing I try to avoid is collecting every possible signal without deciding how engineers should use them.

The better question is:

“What debugging story are we trying to support?”

For Trek Point, the useful story looked something like this:

an exception reaches Sentry
traces show the request path, SQL timing, Redis behavior, and outbound requests
task execution can be correlated when work moves from request thread to Celery
logs can be exported with the same service identity into the same telemetry backend

That is much better than a tool-by-tool rollout where each signal exists in isolation.

Production Deployment Details Matter

Telemetry setup is one of those areas where local success tells you almost nothing.

A production-ready setup has to account for:

exporter configuration
service naming
sampling strategy
process model
whether instrumentation is safe under repeated boot

I have seen plenty of teams “add OpenTelemetry” and still end up blind because the lifecycle assumptions were wrong. Instrumentation is code. It needs to be reviewed with runtime behavior in mind.

What I’d Encourage More Teams to Do

Treat observability setup as a first-class subsystem, not a wrapper around environment variables.

That means:

document how each process type is instrumented
guard singleton patchers carefully
decide what each telemetry tool is responsible for
trace the paths users actually care about, not just happy-path web requests

In products like Trek Point, some of the most interesting failures happen between the request and the worker, or between the upload and the derived media. If your observability story stops at Flask requests, you are missing half the product.

The Main Lesson

The difficulty of observability in Python is rarely “can we install the package?” The difficulty is making instrumentation reflect the real execution model of the app.

In Trek Point, the good work was not just turning on tracing. It was being explicit about repeated startup paths, singleton instrumentation, and how web requests, database work, outbound calls, Redis, and Celery should appear as one understandable system.

That is what observability should do: make a multi-part product feel legible when it misbehaves.