
https://aws.amazon.com/message/41926/
At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.
1 holyghost Mar 3, 2017 不知道这哥们要看多少个小时的无聊小视频 |
2 XiaoFaye Mar 3, 2017 很难想象这种批处理命令不需要 Review 。。。 |
3 acoder2013 Mar 3, 2017 Amazon 的工程师也是 just so so 啦, 23333333 |
4 just4test Mar 3, 2017 所以删除服务器这种事没有机器人管么? ''' 操作被拒绝。该操作将影响以下子系统: 索引子系统: 30%容量被移除,余下容量不足以支撑线上压力 放置子系统: 20%容量被移除,余下容量不足以支撑 N+1 要强制执行此命令,使用 --fuckyou 参数重试。 ''' |
6 stevele Mar 3, 2017 那也得用啊 |
8 21grams Mar 3, 2017 命令输错了? 难道不应该做成脚本吗? |
9 vingz Mar 3, 2017 并不能所有的维护过程都变成自动化啊 |
11 eyp82 Mar 3, 2017 应该是用了 ansible 之类的东西 |
12 bingwenshi Mar 3, 2017 @21grams 用了脚本,但是参数写错了 |
13 okampfer Mar 3, 2017 尤记得上次 gitlab 的 rm -rf / |
14 matrix67 Mar 3, 2017 playbook 的话肯定是 ansible 吧。 salt 不叫这个名字。 |
15 vindurriel Mar 3, 2017 >> Removing a significant portion of the capacity caused each of these systems to require a full restart 应该有办法改进吧 |
18 taowen Mar 3, 2017 说明运维自动化的抽象层次还是太低了,这么大的厂,居然还能用 ansible 搞这么底层的事情。还以为 AWS 的运维已经脱离了 bash 的低级趣味呢 |
19 donghui Mar 4, 2017 via iPad 一不小心就删错 |
20 xiaq Mar 4, 2017 via iPad 这里的 Playbook 指的应该是事故处理的手册 |