Search

open.mp forum

RegisterLogin

Discussion

General
Chat
Tech
Life
Art
Programming
open.mp
Development Updates
Questions and Suggestions
SA-MP
General Discussions
Videos and Screenshots
Support
Pawn Scripting
Tutorials
Releases
Plugins
Libraries
Filterscripts
Gamemodes
Maps
Advertisements
Other languages
Spanish/Espa?ol
Programaci?n
Discusi?n GTA SA Multijugador
Mods
Offtopic
Juegos
Portuguese/Portugu?s
Russian/???????
Italian/Italiano
Dutch/Nederlands
German/Deutsch
Romanian/Rom?na
Ex-Yu
Polish/Polski
Og?lne
Serwery
Skryptowanie
Filmiki i zdjecia
Lithuanian/Lietuvi?kas
French/Fran?ais
Hungarian/Magyar
Hindi/Urdu
Turkish
Other
Internal
Team
Hidden
Archived

Library

 Collections Links Members Roles

Burgershot Outage Post-mortem

Hey everyone!

ID
d6ouivfilegovhg7b6r0
author
southclaws's avatar

Southclaws

@southclaws


View profile
Copy link
  Report member
started
Apr 18, 2019
replies
0
participating
No

scroll to top

powered by storyden

Login
Discussion
Tech
Burgershot Outage Post-mortem
southclaws's avatar

Southclaws

@southclaws


View profile
Copy link
  Report member
• 7y
Tech

Burgershot Outage Post-mortem

Hey everyone!



This is a post-mortem of the forum outage. This was reported?2019-04-17 at 21:28 GMT and resolved 2019-04-18 at 00:56 GMT.



The outage was caused by a disk filling up with logs from another service running on the same machine.



Cause of why the logs of this service reached 335237897538 bytes are currently unknown. It seems that Docker does not rotate logs for services that run indefinitely. The service in question has been online since October 2018 and log output has built up substantially.







-rw-r-----? 1 root root 335237897538 Apr 17 23:45 384a7fd0aff65a82d3dfb406767edcbf5a16d321404a5b1848cfdc3ead95f624-json.log





The node is configured with the default logging driver: https://docs.docker.com/config/containers/logging/json-file/



Steps to Prevent



So, to prevent this happening again I am going to do something I have been meaning to do for a long time and move logging aggregation to an external service. This is yet to be decided but I should have some time before this happens again.



In the meantime, I will be configuring AlertManager (Prometheus) to properly alert me (and potentially other staff members) of these issues ahead of time so we can mitigate these events before they happen.



-



Thank you for your understanding, we live and learn!

0 likes0 replies

    Please sign up or log in to reply