It seems to be that monitoring is killing the SSD from time to time. I do not have much evidence, but it seems to be that there is a race condition under which the firmware of the SSD can crash and the system loses storage which causes all user-space services to stop functioning. First we considered it being smartctl, however that is not active on a second system that showed the same crash. Since snmpd is involved, my current guess is lm-sensors which reads a temperature sensor of the SSD.
Michael confirmed to me yesterday that this bug is fixed.