[Discussion] Devs and Devops: if you have running containers in production, how are you upgrading those containers?

Gnubyte@lemdit.com · 11 months ago

[Discussion] Devs and Devops: if you have running containers in production, how are you upgrading those containers?

astral_avocado@programming.dev · edit-2 11 months ago

Off the top of my head we use AWS ECS which provides a rolling upgrade method. Push up the new container into ECR (from github actions after they pass tests) run the upgrade command, and new containers will start booting. Once they pass their health checks the load balancer starts serving traffic to them. Once they’re live, the old containers are removed.

We also use a blue/green deployment method so we don’t have to worry about breaking the production database with database changes.

Gnubyte@lemdit.com · 11 months ago

Thanks for weighing in. Yeah! This is basically what I am thinking I’ll have to do. I just tried Github actions and runners with a very small internal app and I liked it. I’ve never worked closely in AWS but I’ve gotten trained in/used Azure a few times and it’s basically the same thing on my end.

Robust tests, larger conditional workflows in github actions, and some sort of staggered rollout I think are the conclusion I’m arriving at.

astral_avocado@programming.dev · 11 months ago

AWS is expensive and confusingly structured, but I’ve been impressed with the ECS stuff. The UI for all of it is also way overcomplicated and stupid, but once you have it working it works.

cerberus@lemmy.world · 11 months ago

Versioning is one way; if you have an API you can have different versions which will can change depending on version.

E.G. let’s say you have contact information coming through an API… the database can largely remain the same, but if you want to add a field or change a fields name, you could upgrade the version. When the app is ready to use the new version, switch the app over.

Rereading your question once more, I believe you can just resolve all your issues with a common API.

JWolf@programming.dev · 11 months ago

For me(programmer) it usually depends on the upgrade.

The system I am working on contains like 5-10 containers (depending on customer needs).

In generel I try to have as few connections between services as possible, Low Coupling, High Cohesion. No 2 containers share a database scheme, because that makea database updates way more difficult. The 2 services have to talk to each other through a welldefined API which can be versioned.

Application bugfix Sending out a bugfix usually only effects the actions of a single container. Then we upload the new docker image to the server, shuts down the old instance and start the new container, but reuses all the settings of the old container. This does result in like 10-20 seconds downtime, which is completly acceptable in my case.
Application feature When we implement a big new feature we usually also needs to update the database (new tables/columns). First take a backup of the database before the upgrade. Then like with the bugfix upload and start the new container. In our case we have the application push out the database update (C# dotnet EF core). If it is an even bigger update we may have multiple docker containers that is upgraded at the same time.
Database version upgrade We check the release note for the specific database on how to upgrade it, and follow those steps.
Other utility containers (Nginx, Elastic search, …) Same as database, check the release note for breaking changes to make sure the upgrade goes well.

Always always always make sure to test the update in a controlled environment that resembles the production environemnt as closely as possible. (maybe even a copy of production if I am lucky)

netwren@lemmy.world · 11 months ago

This also assumes you have adequate testing capabilities which are important so that you can evaluate unit capabilities, integrated capabilities, and performance characteristics.

On cloud you might have your testing infra be ephemeral and spin up whenever you need to perform the testing (as well as your IAC capabilities).

But this would partially evaluate any issues with the app coordinating with the database.

I would also assume DBA’s can write some test scripts to execute as well.

saucyloggins@lemmy.world · 11 months ago

We do migrations for schema on app startup (built into the app).

Any general data changes are done outside the pipeline as a pre or post deployment step.

Migrations on startup aren’t perfect, you have to be careful that it doesn’t take too long on startup.

Honestly the pros outweighs the cons. You can fire up a new site/db without much effort which is something we do often.

bignavy@programming.dev · 11 months ago

Junior-ish DevOps with some blue/green experience.

It’s a very thorny problem, and I think your willingness to put up with the trade offs really would drive what patttern of architectecture.

Most of our blue/green deployment types use a unitary database behind the backend infra. There’s a lot ways to implement changes to the database (mostly done through scripting in the pipeline, we don’t typically use hibernate or other functionality that wants to control the schema more directly), and it avoids the pain of trying to manage consistency with multiple db instances. It helps that most of our databases are document types, so a lot of db changes can be implemented via flag. But I’ve seen some SQL implementations for table changes that lend themselves to blue/green - you just have to be very mindful to not Bork the current live app with what you’re doing in the background. It requires some planning - not just “shove the script into source control and fire the pipeline.”

If we were using SQL with a tightly integrated schema and/or we couldn’t feature flag, I think we’d have to monkey around with blue/greening the database as well. But consistency is non trivial, especially depending on what kind of app it is. And at least one time when a colleague set up a database stream between AWS accounts, he managed a circular dependency, which….well it wasn’t prod so it wasn’t a big deal, but it easily could’ve been. The data transfer fees are really what kills you. We managed to triple our Dev AWS bill prototyping some database streams at one point. Some of it undoubtedly was inefficient code, but point stands. With most blue/green infra, your actual costs are a lot less than 2x what a ‘unitary’ infra would cost, because most infra is pay for use and isn’t necessary except when you go to deploy new code anyway. But database consistency, at least when we tried it, was way MORE expensive than just 2x the cost of a unitary db, because of the compute and transfer fees.

Sparrow_1029@programming.dev · 11 months ago

Though we are moving to kubernetes & helm soon, currently we use migration scripting tools (like alembic) for schema and data migration on app start, and our infrastructure/devops team uses ansible for deployment. Currently, we don’t have CI/CD straight to production—it’s still a manual process—but I hope to change that as our organization starts using k8s.

Gnubyte@lemdit.com · 11 months ago

Godspeed. I hope the transition goes well. If you need to baby step towards it, I felt like docker swarm was easier to approach but kubernetes is far more standard. I recommend budgeting training into the rollout if your shop can afford it. For CI/CD I recently had a great experience with github and github actions but I had a coworker setup on-premise gitlab in the past too.

Somewhat of a tangent - My experience with alembic of over four years is that it is leagues better than manual SQL dealings, and also very easy to understand what you’re looking at. But I have to say that when I used sequelize in NodeJS, it has an autosync and autoupgrade schema that made alembic look silly.

In regards to my own post I think for now what I’m mostly seeing is that for each new deployment - is going to have to have an internal smoke test, then staggered rollout of updates.

Sparrow_1029@programming.dev · 11 months ago

I’ll look into sequelize! Also, we are undergoing a training right now. I have some previous experience from $lastJob with k8s, but I’m sure my knowledge is out of date so glad to be doing it.

MyNameIsIgglePiggle@sh.itjust.works · 11 months ago

Great thing about mongoose / MongoDB is that you don’t need to define the schema, and I’ve rarely had bugs crop up as a result. Upgrade and go!

Lysergid@lemmy.ml · 11 months ago

First of all, what you are doing is integration via DB. Unless that was conscious, I would avoid using this approach. System is much easier to manage when DB/schema used at most by one app. You have two ways to achieve it, move towards micro-services or monolith. Yes monolith is still great for some use cases.

If we talking about your current system state every app should do changes in most backward compatible way. It worst cases it will lead to duplications. Let’s say if you want to change table X because there is such need for app A and at the same time app B uses this table. Instead of modifying X you create table Y that satisfies needs of A and make sure that data written into Y as well as into X to maintain B.

Gnubyte@lemdit.com · 11 months ago

Reading what you wrote here - I think this is confirming my looming suspicion. Which is that there is no standard today for upgrading docker containers. Since upgrades happen app to app. For example if I have a docker-compose deployment and service A is lemmy, and service B is postgres the app in this case service A will have to have its own logic for handling upgrades or code migrations.

In other words, the upgrade process can depend on how the software developer writes the software; independent of docker/k8s/vm’s or whatever deployment strategy you are running.

I think what I was hoping for was that I’d ask if there was a newer smooth standardized way to do software upgrades besides A/B testing or staggered rollouts but I’m not really seeing that.

I’m not super familiar with Lemmy’s codebase but it looks like they’re using diesel ORM here and have migration handling on a case by case basis for some major changes. https://github.com/LemmyNet/lemmy/blob/main/src/code_migrations.rs**___****___****___**