Nerandel's crash: the mess

Or how a big update crashs everything.

Thanks to @BioTheWolf for translating the original blog post. You rock :3

I think I can write a small article on the beautiful crash that happened on August 13th of this year. Not really a crash, but something like this. No disk corruption (Link in FR) or any other crap but an update a bit huge...

August 13th 2018, the night that turned to drama

410 packages, and hop ! Pom popopom popopom pom-
Ouuuuhhhhh ! *crack* Ouch ouch ouch ouch ! *light crash* Oooooooo !

Could we call it a figurative scene? Maybe. The results of this big update suprised me because many things have been broken. Between SystemD that refuses to start something because Failed at step USER spawning /usr/bin/python: No such process and a python update (3.6.6 => 3.7) that broke my bot on Aranel and forces me go on the rewrite version of The image below is me in "recovery position".

Me in recovery position after update

So, I found myself with a broken system and a bot that did not want to start. I had two choices :

  • Move the bot on Nerandel and put Python 3.6 on it?
  • Port the bot code to the rewrite version of

I made the first choice.

All is really broken

A friend told me that the Lutim subdomain returns 502 Bad Gateway. Strange...I look at the logs to see what's happening

Failed at step USER spawning /usr/bin/perl: No such process

Again ? Well, reboot.

A few moments later; quelques moments plus tard

Failed at step USER spawning /usr/bin/perl: No such process

Oh, great... I have to reinstall \o/

I dumped the contents of the SSD in a file in case I'd need a partition on the server disk, by making some dd if=/dev/sdb of=ssd.img bs=8M.

Next day, August 14th

Hopefully it's a day off :p

I took the opportunity to reinstall all and use Gitea on Postgresql because it's good. The day was long, plus I was stopped sometimes by some RP1 requests. Regarding the server distribution, it is now on Debian Stretch and for the moment, it runs without "eating" the two tons that it does not have.

Following days

Well, I put things in their initial places, one thing at a time. Yeah, the DB2 datas are all gone because I changed the RDBMS but the rest should be good to go. One thing at a time, and all things should be back as normal.

For those who wonder, no, Arch is not very stable for a prod server and can litteraly break in your hands without any clue x')

PS: Because I like to troll from time to time and make some humor... Is this a stable OS for a prod ?


RolePlay 2: DataBase