This topic created in 4394 days ago, the information mentioned may be changed or developed.
要处理一个 XenServer dom0 崩溃的问题,感觉是硬件问题(应该是处理器)。需要查 MCE 错误代码,但问题是 XenServer dom0 (CentOS 5.x) kern.log 的格式是非标准的,用 mcelog 程序没法直接读,需要处理一下,查到一个 perl 1-liner 但本人对 perl 一无所知,调试了一下无果,错误看不懂。
cat kern.log | perl -ne '/CPU(\d+), BANK(\d+), addr (\S+), state (\S+)/; print "CPU $1 BANK $2 TSC 00000000000000\nMISC 0000000000000000 ADDR $3 \nSTATUS $4\n" > mce_fmtd.log'
Can't take log of 0 at -e line 1, <> line 1.
请问一下,有没有办法用 bash 脚本,利用 awk, sed, GNU 命令行工具处理得到同样的结果呢?
kern.log 的格式如下
Apr 16 16:42:10 server109 kernel: MCE_DOM0_LOG: enter dom0 mce vIRQ handler
Apr 16 16:42:10 server109 kernel: MCE_DOM0_LOG: No more urgent data
Apr 16 16:42:10 server109 kernel: [CPU8, BANK0, addr db484ffc0, state cc0000c00001009f]
Apr 16 16:42:10 server109 kernel: MCE_DOM0_LOG: No more nonurgent data
Apr 16 16:42:10 server109 kernel: MCE_DOM0_LOG: enter dom0 mce vIRQ handler
Apr 16 16:42:10 server109 kernel: MCE_DOM0_LOG: No more urgent data
Apr 16 16:42:10 server109 kernel: [CPU8, BANK0, addr db484ffc0, state cc0005000001009f]
Apr 16 16:42:10 server109 kernel: MCE_DOM0_LOG: No more nonurgent data
Apr 16 16:42:26 server109 kernel: MCE_DOM0_LOG: enter dom0 mce vIRQ handler
Apr 16 16:42:26 server109 kernel: MCE_DOM0_LOG: No more urgent data
Apr 16 16:42:26 server109 kernel: [CPU8, BANK0, addr db484fcc0, state cc0004c00001009f]
Apr 16 16:42:26 server109 kernel: MCE_DOM0_LOG: No more nonurgent data
Apr 16 16:42:57 server109 kernel: MCE_DOM0_LOG: enter dom0 mce vIRQ handler
Apr 16 16:42:57 server109 kernel: MCE_DOM0_LOG: No more urgent data
Apr 16 16:42:57 server109 kernel: [CPU8, BANK0, addr db484ef40, state cc0002800001009f]
Apr 16 16:42:57 server109 kernel: MCE_DOM0_LO: No more nonurgent data
7 replies 1970-01-01 08:00:00 +08:00  | | 1 11138 Apr 22, 2014 1 cat kern.log | perl -ne '/CPU(\d+), BANK(\d+), addr (\S+), state (\S+)/; print "CPU $1 BANK $2 TSC 00000000000000\nMISC 0000000000000000 ADDR $3 \nSTATUS $4\n"' > mce_fmtd.log |
 | | 2 terry Apr 22, 2014 其实我只要知道
cat kern.log | perl -ne '/CPU(\d+), BANK(\d+), addr (\S+), state (\S+)/; print "CPU $1 BANK $2 TSC 00000000000000\nMISC 0000000000000000 ADDR $3 \nSTATUS $4\n"
是要把下面的文本弄成什么样的目标格式就行了
CPU8, BANK0, addr db484ffc0, state cc0000c00001009f
转成
CPU 8, BANK 0, TSC 00000000000000 MISC 0000000000000000 ADDR $3 STATUS $4
这样么?
谢谢先 |
 | | 3 terry Apr 22, 2014 @ 11138 谢了,仔细看后就发现问题了。 下班前果然没心思看,竟然没发现单引号把重定向符号给包起来了... |
 | | 4 terry Apr 22, 2014 给感兴趣的人看一下结果
decode 之后
sudo mcelog --cpu=westmere --ascii --no-dmi --ignorenodev --file mce_fmtd.log > mce_decoded.log
CPU 8 BANK 0 TSC 00000000000000 MISC 0000000000000000 ADDR db484ef40 Hardware event. This is not a software error. CPU 8 BANK 0 MISC 0 ADDR db484ef40 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR Transaction: Memory read error Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 10 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 0 STATUS cc0002800001009f MCGSTATUS 0 STATUS cc0002800001009f] CPU 8 BANK 0 TSC 00000000000000 MISC 0000000000000000 ADDR db484ef40 Hardware event. This is not a software error.
知道的一看就明白是什么问题了... |
 | | 5 ngn999 Apr 22, 2014 cat kern.log | perl -ne 'print "CPU $1 BANK $2 TSC 00000000000000\nMISC 0000000000000000 ADDR $3 \nSTATUS $4\n" if /CPU(\d+), BANK(\d+), addr (\S+), state (\S+)/' > mce_fmtd.log |
 | | 6 ngn999 Apr 22, 2014 perl在命令行下当 sed + awk用 |
 | | 7 leoYu Apr 23, 2014 用sed也可以解决的,lz可以邮件我 |