So for those of you who were refreshing the page and looking at our wonderful maintenance page it took way longer than we planned! A full write up I’ll do after I’ve dealt with a couple time out issues.

Here is a bonus meme.

So? How’d it go…

Exactly how we wanted it to go… except with a HUGE timeframe.
As part of the initial testing with object storage I tested using a backup of our files. I validated that the files were synced, and that our image service could retrieve them while on the object store.

What I did not account for was the latency to backblaze from Australia, how our image service handled migrations, and the response times from backblaze.

  • au-east -> us-west is about 150 to 160ms.
  • the image service was single threaded
  • response times to adding files are around 700ms to 1500ms (inclusive of latency)

We had 43000 files totaling ~15GB of data relating to images. If each response time is 1.5 seconds per image, and we are only operating on one image at a time, yep, that is a best case scenario of 43000 seconds or just under 12 of transfer time at an average of 1s per image.

The total migration took around 19 hours as seen by our pretty transfer graph:

So, not good, but we are okay now?

That was the final migration we will need to do for the foreseeable future. We have enough storage to last over 1 year of current database growth, with the option to purchase more storage on a yearly basis.
I would really like to purchase a dedicated server before that happens and if we continue having more and more amazing people join our monthly donations on our Reddthat open collective, I believe that can happen.

Closing thoughts

I would like to take this opportunity to apologise for this miscalculation of downtime as well as not fully understanding the operational requirements on our usage of object storage.
I may have also been quite vocal on the Lemmy Admin matrix channel regarding the lack of a multi-threaded option for our image service. I hope my sleep deprived ramblings were coherent enough to not rub anyone the wrong way.
A big final thank you to everyone who is still here, posting, commenting and enjoying our little community. Seeing our community thrive gives me great hope for our future.

As always. Cheers,
Tiff

PS.

Our bot defence in our last post was unfortunately not acting as we hoped it would and it didn’t protect us from a bot wave. So I’ve turned registration applications back on for the moment.

PPS. I see the people on reddit talking about Reddthat. You rockstars!


Edit:

Instability and occasional timeouts

There seems to be a memory leak with Lemmy v0.18 and v0.18.1 which some other admins have reported as well and has since been plaguing us. Our server would be completely running fine, and then BAM, we’d be using more memory than available and Lemmy would restart. These would have lasted about 5-15 seconds, and if you saw it would have meant super long page loads, or your mobile client saying “network error”.

Temporary Solution: Buy more RAM.
We now have double the amount of memory courtesy of our open collective contributors, and our friendly VPS host.

In the time I have been making this edit I have already seen it survive a memory spike, without crashing. So I’d count that as a win!

Picture Issues

This leaves us with the picture issues. It seems the picture migration had an error. A few of the pictures never made it across or the internal database was corrupted! Unfortunately there is no going back and the images… were lost or in limbo.

If you see something like below make sure you let the community/user know:

Also if you have uploaded a profile picture or background you can check to make sure it is still there! <3 Tiff

  • RedM
    link
    fedilink
    English
    arrow-up
    5
    ·
    11 months ago

    Me staring into the abyss waiting for reddthat to come back online

  • Parculis Marcilus
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    Fortunately and unfortunately my depression kicked in so I didn’t experience anxiety like everyone else. Otherwise I’ll need to keep myself drunk during the downtime.

  • RavenerA
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    I noticed that some images wouldn’t load for me, is that normal? Like going to my profile now I cannot see my profile picture/banner.

    • TiffOPMA
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      11 months ago

      Well that isn’t great! No it is not normal. Our picture service should only be using our object storage. I just removed our local storage… Can you re-upload it and see if that fixes it?

      • RavenerA
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 months ago

        yes a re-upload fixes it, I was just wondering if the images were corrupted on your side or something, hopefully not.

        • TiffOPMA
          link
          fedilink
          English
          arrow-up
          2
          ·
          11 months ago

          Bad news, our internal image database, which is a KV store, suffered corruption as part of the instabilities. As all the images were there (correct file numbers) I can only surmise there has been corruption in the KV store.

          This is unfortunately corroborated by the logs:

          reddthatcom-pictrs-1 |    0: Error in DB
          reddthatcom-pictrs-1 |    1: Error in sled
          reddthatcom-pictrs-1 |    2: Required field was not present
          

          We’ll just have to take it as it comes. But moving forward I’m not only backing up our postgres db now!

          • RavenerA
            link
            fedilink
            English
            arrow-up
            1
            ·
            11 months ago

            no worries, it’s all fine. It’s good to learn from these type of mistakes early on before we grow too much, thank you for all your work, must be tough being a sysadmin.

  • Lodion 🇦🇺@aussie.zone
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Have you looked at Wasabi? S3 API compatible object storage in Sydney. Best of all… no API or egress charges. One drawback… minimum charge of 1TB per month, but I imagine that is more than offset by no API/egress fees. Working well for me over on aussie.zone 😀 🇦🇺

    • TiffOPMA
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      o/ Lodion

      Yeah I looked at Wasabi but chose B2 due to it being cheaper at the <1TB/month. As I wanted to keep month-on-month costs as low as possible to ensure I had enough capital to survive at least a year. Tbh, I didn’t see the no egress fees! That’s insane! I’ll probably move later on once everything is “quieter”. July 1st is right around the corner after all.

      Fortunately once everything is in the cache it’s all fine. Looking back, the migration never would have gone smoothly. Migrating ~50k files, single threaded, regardless of where the storage is, takes a long time! If I could do it again I’d go object storage at the very start and eat the $6 or $10/m. Unfortunately here we are. I also never expected pictrs to cache as much as it did from federated instances. Definitely the biggest “gotcha” I’ve seen so far in hosting lemmy.

      Now that we’ve migrated to the object storage, it’s quite easy to move between them. You don’t need to use the migrate pictrs command (at this point in time) you can perform an out of bounds sync from one object storage to the other, stop pictrs, perform the final sync, modify pictrs config, and start. As pictrs doesn’t hold any information about the endpoint of the object storage in its sled-db.

      🇦🇺 🤜

      • Lodion 🇦🇺@aussie.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        I’m using the provided pict-rs from the docker-compose. I cheated and just used s3fs for /files in the pict-rs volume.