As an ex-app engineer though, I kind of prefer my queue logic to be in code, in Git, but maybe with the right tooling, you can change my mind. :)
What's the story for version control, debugging, testing, releasing? It'd be cool to have everything together for data locality and simplifying the stack, but it feels you'd lose a lot of useful knowledge about how to do stuff "properly".
https://github.com/microsoft/duroxide - also OSS, the durable execution framework pg_durable is built on itself supports function versions. We can leverage that to get similar support in pg_durable.
That said, we did hand-build a simple job queue (just lock, poll, reserve on a column, poll and update reservation to mark job done) on top of postgres at my previous startup. Something like pgque would have made that much more polished.
i have always had maintenance packages for this type of stuff. if i could deploy them alongside the database itself that could be kind of cool.
but yeah i agree with you that i do prefer having this in the code layer.
How is this project at all comparable to something like Temporal? Am I misunderstanding the limitation implied by this particular recommendation?
It's an interesting technical achievement I guess, but it's very bizarre to try and read this
SELECT df.start(
@> (
($$SELECT ... FROM demo.invoices WHERE status = 'pending'$$ |=> 'inv')
~> df.if_rows('inv',
$$UPDATE ... SET status = 'processing'$$
~> (df.http(...) |=> 'resp')
~> df.if($$SELECT $r.ok$$,
-- classify, branch, wait for signal ...
),
df.sleep(5)
)
),
'invoice-approval-pipeline'
);The steps are:
1. Get all the pending invoices
2. Set their state to "processing"
3. Call out to an external service/process to do the actual processing, wait for a response.
4. If the response is OK, do something
5. Wait 5 seconds and then start again.
Not sure I love the syntax and the way SQL is embedded between the $$
But it is in the database, can be updated and modified in the same way as all the other stored procedures/functions, allows job control, I assume other control structures for parallel steps etc.
Gonna go read the doco now.
Not trying to dismiss the project - it looks like a lot of hard work has gone in and somebody has a use for it. I just come from an airflow style external orchestrator frame of mind that manages durability state in postgres but keeps the control flow out. Sorry if I came off as a bit snarky
Why would I want to store my control flow in the database and not in code? It feels strange.
Not trying to dismiss the project, I'm just not getting it yet I think.
This one seems to be more database-specific use case. The advantage is probably that you can track the exact state of the job in the database itself, rather than having to cross-reference the workflow log with the codebase and trace through it line by line to figure out what the state is. Plus I assume it's less overhead and latency, and operationally one less thing to spin up.
[1] https://learn.microsoft.com/en-us/azure/durable-task/common/...
Indeed Durable tasks is an exceptional project and was a unique innovation at the time.
pg_durable brings the same reliability and durablity semantics to long running operations within the database.
We have tons of interesting scenarios on the roadmap. Stay tuned! :)
For example, you cant use this: https://www.paradedb.com/blog/hybrid-search-in-postgresql-th...
Also for example, you dont get ultra-wide high dimensionality vectors.
It is nice they are open sourcing pg_durable, but how about adopting table stakes I'd get with AWS?
Don't need to synchronize the backups with anything else that is part of the same data store, good for ETL pipelines and other state machine type jobs.
If your ETL is mostly SQL anyway, then having the actual job being run on the same server helps as well.
Also if all the "state" is in one database, then you have better chance of getting consistent backups.
We use Postgres for that on https://transport.data.gouv.fr (Elixir app which does a fair bit of processing), and it helps.
Not familiar yet with pg_durable though, but I have used or implemented similar solutions and can relate.
One would be able to trigger maintenance jobs via simple lambda functions whose duration is capped.
The provider is an extensibility point. We just shipped the simplest version of it. Happy to take contribs if someone sends a pgmq based provider!