My Philosophy on Alerting, based my observations while I was a Site Reliability Engineer at Google
Author: Rob Ewaschuk [email protected]
Link : Google Docs
这是最近比较火的开源监控架构Prometheus在Alerting Practices上的推荐阅读,见http://prometheus.io/docs/practices/alerting/
中心思想:
Keep alerting simple, alert on symptoms, have good consoles to allow pinpointing causes, and avoid having pages where there is nothing to do.
读后感:
任何知识都是从知识到技能,最后达到方法论,OP的技能也不外如此。OP们,搞好报警,过个好年吧
