edit: you are right, it’s the I/O WAIT that it destroying my performance:
%Cpu(s): 0,3 us, 0,5 sy, 0,0 ni, 50,1 id, 49,0 wa, 0,0 hi, 0,1 si, 0,0 st
I could clearly see it using nmon > d > l > - such as was suggested by @SayCyberOnceMore. Not quite sure what to do about it, as it’s simply my sdb1 drive which is a Samsung 1TB 2.5" HDD. I have now ordered a 2TB SSD and maybe I am going to reinstall from scratch on that new drive as sda1. I realize that’s just treating the symptom and not the root cause, so I should probably also look for that root cause. But that’s for another Lemmy thread!

I really don’t understand what is causing this. I run a few very small containers, and everything is fine - but when I start something bigger like Photoprism, Immich, or even MariaDB or PostgreSQL, then something causes the CPU load to rise indefinitely.

Notably, the top command doesn’t show anything special, nothing eats RAM, nothing uses 100% CPU. And yet, the load is rising fast. If I leave it be, my ssh session loses connection. Hopping onto the host itself shows a load of over 50,or even over 70. I don’t grok how a system can even get that high at all.

My server is an older Intel i7 with 16GB RAM running Ubuntu22. 04 LTS.

How can I troubleshoot this, when ‘top’ doesn’t show any culprit and it does not seem to be caused by any one specific container?

(this makes me wonder how people can run anything at all off of a Raspberry Pi. My machine isn’t “beefy” but a Pi would be so much less.)

  • Neuromancer@lemm.ee
    link
    fedilink
    English
    arrow-up
    18
    ·
    10 months ago

    run top and paste the output the top portion of the screen.

    I would suspect it is IO wait. You can get into disk contention if you have multiple containers fighting for disk. You will notice the IO queue is building up and that is shows you are waiting for IO transactions.

    %Cpu(s): 67.4 us, 13.0 sy, 0.0 ni, 19.4 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st

    See the field labeled WA, that is wait time. Basically time you are waiting for IO to complete.

    If that is high, you can increase the cache used by Linux BUT if the system crash you are at risk of losing saves.