I think you're operating in a scale that is small enough that there's little risk.
You'll be able to iterate if you run into anything that doesn't work. You should however be clear on what problem you and your team are solving, and not just "get some rag".
Per "how to handle dynamic queries", it's admittedly pretty different b/c we're an ORM (https://joist-orm.io/) that "fetches entities" instead of adhoc SQL queries, but our pattern for "variable number of filters/joins" looks like:
const { date, name, status } = args.filter;
await em.find(Employee, { date, name, employer: { status } });
Where the "shape" of the query is static, but `em.find` will drop/prune any filters/joins that are set to `undefined`.
So you get this nice "declarative / static structure" that gets "dynamically pruned to only what's applicable for the current query", instead of trying to jump through "how do I string together knex .orWhere clauses for this?" hoops.
Out of curiosity, the post you linked mentions that it won't work for renames. What's the approach for these and other types of procedural migrations, such as data transformations (ie: splitting a column, changing a type, etc.)
With a declarative model, would you run the migration and follow immediately with a one off script?
For both data migrations and renames, there isn't really a one-size-fits-all solution. That's actually true when doing data changes or renames in imperative (incremental) migrations tools too; they just don't acknowledge it, but at scale these operations aren't really viable. They inherently involve careful coordination alongside application deploys, which cannot be timed to occur at the exact same moment as the migration completion, and you need to prevent risk of user-facing errors or data corruption from intermediate/inconsistent state.
With row data migrations on large tables, there's also risk of long/slow transactions destroying prod DB performance due to MVCC impact (pile-up of old row versions). So at minimum you need to break up a large data change into smaller chunked transactions, and have application logic to account for these migrations being ongoing in the background in a non-atomic fashion.
That all said, to answer from a mechanical standpoint of "how do companies using declarative schema management also handle data migrations or renames":
At large scale, companies tend to implement custom/in-house data migration frameworks. Or for renames, they're often just outright banned, at least for any table with user-facing impact.
At smaller scale, yeah you can just pair a declarative tool for schema changes with an imperative migration tool for non-schema changes. They aren't really mutually exclusive. Some larger schema management systems handle both / multiple paradigms.
Seems great for really small apps where you want your resource definitions colocated with the code using them. I'd imagine the benefits start to break down as your infrastructure gets more complicated.
The bigger answer is that if you're proficient and happy with CDK or anything else to wire resource up, you're probably not going to see much (if any) benefit.
True, I have written my share of CloudFormation custom resources.
Funny anecdote: it was faster for an SA when i was at AWS to create a Terraform module and have it merged into Terraform than it was for us to wait for AWS to add support for the same resource in CloudFormation. They are getting much better now
I think the make in the title is a bit misleading, the author is actually just advocating for having a consistent file you use for adhoc scripting and testing in your application.
The thrust of the article could be summarized as: if you type more than one command into the shell, make a script.
I did too, and I've had a challenging time convincing people outside of those ecosystems that this is possible, reasonable, we've been doing it for over a decade.
I worked on a product that was built around planning an estimation with ranged estimates (2-4h, 1-3d, etc)
2-12d conveys a very different story than 6-8d. Are the ranges precise? Nope, but they're useful in conveying uncertainty, which is something that gets dropped in any system that collapses estimates to a single point.
That said, people tend to just collapse ranges, so I guess we all lose in the end.
In agile, 6-8d is considered totally reasonable variance, while 2-12d simply isn't permitted. If that's the level of uncertainty -- i.e. people simply can't decide on points -- you break it up into a small investigation story for this sprint, then decide for the next sprint whether it's worth doing once you have a more accurate estimate. You would never just blindly decide to do it or not if you had no idea if it could be 2 or 12 days. That's a big benefit of the approach, to de-risk that kind of variance up front.
> you break it up into a small investigation story for this sprint, then decide for the next sprint whether it's worth doing
That's just too slow for business in my experience though. Rightly or wrongly, they want it now, not in a couple of sprints.
So what we do is we put both the investigation and the implementation in the same sprint, use the top of the range for the implementation, and re-evaluate things mid-sprint once the investigation is done.
Of course this messes up predictability and agile people don't like it, but they don't have better ideas either on how to handle it.
Not sure if we're not enough agile or too agile for scrum.
That's definitely one way of doing it! And totally valid.
I think it often depends a lot on who the stakeholders are and what their priorities are. If the particular feature is urgent then of course what you describe is common. But when the priority is to maximize the number of features you're delivering, I've found that the client often prefers to do the bounded investigation and then work on another feature that is better understood within the same sprint, then revisit the investigation results at the next meeting.
But yes -- nothing prevents you from making mid-sprint reevaluations.
If you measure how long a hundred "3-day tasks" actually take, in practice you'll find a range that is about 2-12. The variance doesn't end up getting de-risked. And it doesn't mean the 3-day estimate was a bad guess either. The error bars just tend to be about that big.
If a story-sized task takes 4x more effort than expected, something really went wrong. If it's blocked and it gets delayed then fine, but you can work on other stories in the meantime.
I'm not saying it never happens, but the whole reason for the planning poker process is to surface the things that might turn a 3 point story into a 13 point story, with everyone around the table trying to imagine what could go wrong.
You should not be getting 2-12 variance, unless it's a brand-new team working on a brand new project that is learning how to do everything for the first time. I can't count how many sprint meetings I've been in. That level of variance is not normal for the sizes of stories that fit into sprints.
Try systematically collecting some fine grained data comparing your team's initial time estimates against actual working time spent on each ticket. See what distribution you end up with.
Make sure you account for how often someone comes back from working on a 3-point story and says "actually, after getting started on this it turned out to be four 3-point tasks rather than one, so I'm creating new tickets." Or "my first crack at solving this didn't work out, so I'm going to try another approach."
That's literally what retrospectives are for. You do them at the end of every sprint.
Granted, they're point estimates not time estimates, but it's the same idea -- what was our velocity this sprint, what were the tickets that seemed easier than expected, what were the ones that seemed harder, how can we learn from this to be more accurate going forwards, and/or how do we improve our processes?
Your tone suggests you think you've found some flaw. You don't seem to realize this is explicitly part of sprints.
I'm describing my experiences with variances based on many, many, many sprints.
I think it really depends on how teams use their estimates. If you're locking in an estimate and have to stick with it for a week or a month, you're right, that's terrible.
If you don't strictly work on a Sprint schedule, then I think it's reasonable to have high variance estimates, then as soon as you learn more, you update the estimate.
I've seen lots of different teams do lots of different things. If they work for you and you're shipping with reliable results then that's excellent.
It natively supports vector embeddings, which seems like it could be nice. The sqlite extensions I've tried for vector embeddings have been a challenge to get working (may just be me though).
You'll be able to iterate if you run into anything that doesn't work. You should however be clear on what problem you and your team are solving, and not just "get some rag".
reply