32 调试


调试代码要比写代码困难两倍。因此,你写代码时越多的使用奇技淫巧(自做聪明),顾名思义,你越难以调试它。 —Brian Kernighan

Bash shell中不包含内置的debug工具,甚至没有调试专用的命令和结构。当调试非功能脚本,产生语法错误或者有错别字时,往往是无用的错误提示消息。

例子 32-1. 一个错误脚本

  1. #!/bin/bash
  2. # ex74.sh
  3. # 这是一个错误脚本,但是它错在哪?
  4. a=37
  5. if [$a -gt 27 ]
  6. then
  7. echo $a
  8. fi
  9. exit $? # 0! 为什么?

脚本的输出:

  1. ./ex74.sh: [37: command not found

上边的脚本究竟哪错了(提示: 注意if的后边)

例子 32-2. 缺少关键字

  1. #!/bin/bash
  2. # missing-keyword.sh
  3. # 这个脚本会提示什么错误信息?
  4. for a in 1 2 3
  5. do
  6. echo "$a"
  7. # done #所需关键字'done'在第8行被注释掉.
  8. exit 0 # 将不会在这退出!
  9. #在命令行执行完此脚本后
  10. 输入:echo $?
  11. 输出:2

脚本的输出:

  1. missing-keyword.sh: line 10: syntax error: unexpected end of file

注意, 其实不必参考错误信息中指出的错误行号. 这行只不过是Bash解释器最终认定错误的地方.
出错信息在报告产生语法错误的行号时, 可能会忽略脚本的注释行.
如果脚本可以执行, 但并不如你所期望的那样工作, 怎么办? 通常情况下, 这都是由常见的逻辑错误所
产生的.

例子 32-3.

  1. #!/bin/bash
  2. # 这个脚本应该删除在当前目录下所有文件名中含有空格的文件
  3. # 它不能正常运行,为什么?
  4. badname=`ls | grep ' '`
  5. # Try this:
  6. # echo "$badname"
  7. rm "$badname"
  8. exit 0

可以通过把echo “$badname”行的注释符去掉,找出例子 29-3中的错误, 看一下echo出来的信息,是否按你期望的方式运行.

在这种特殊的情况下,rm “$badname”不能得到预期的结果,因为$badname不应该加双引号。加上双引号会让rm只有一个参数(这就只能匹配一个文件名).一种不完善的解决办法是去掉$badname外 面的引号, 并且重新设置$IFS, 让$IFS只包含一个换行符, IFS=$’\n’. 但是, 下面这个方法更简单.

  1. # 删除包含空格的文件的正确方法.
  2. rm *\ *
  3. rm *" "*
  4. rm *' '*
  5. # 感谢. S.C.

总结一下这个问题脚本的症状:
>

  1. 由于”syntax error”(语法错误)使得脚本停止运行,
  2. 或者脚本能够运行, 但是并不是按照我们所期望的那样运行(逻辑错误).
  3. 脚本能够按照我们所期望的那样运行, 但是有烦人的副作用(逻辑炸弹).

如果想调试脚本, 可以用以下方式:

  1. echo语句可以放在脚本中存在疑问的位置上, 观察变量的值, 来了解脚本运行时的情况.

    1. ### debecho (debug-echo), by Stefano Falsetto ###
    2. ### Will echo passed parameters only if DEBUG is set to a value. ###
    3. debecho () {
    4. if [ ! -z "$DEBUG" ]; then
    5. echo "$1" >&2
    6. # ^^^ to stderr
    7. fi
    8. }
    9. DEBUG=on
    10. Whatever=whatnot
    11. debecho $Whatever # whatnot
    12. DEBUG=
    13. Whatever=notwhat
    14. debecho $Whatever # (Will not echo.)
  1. 使用过滤器tee来检查临界点上的进程或数据流.
  2. 设置选项-n -v -x

    sh -n scriptname不会运行脚本, 只会检查脚本的语法错误. 这等价于把set -n或set -o noexec插入脚本中. 注意, 某些类型的语法错误不会被这种方式检查出来.

    sh -v scriptname将会在运行脚本之前, 打印出每一个命令. 这等价于把set -v或set -o verbose插入到脚本中.

    选项-n和-v可以同时使用. sh -nv scriptname将会给出详细的语法检查.

    sh -x scriptname会打印出每个命令执行的结果, 但只使用缩写形式. 这等价于在脚本中插入set
    -x或set -o xtrace.

    把set -u或set -o nounset插入到脚本中, 并运行它, 就会在每个试图使用未声明变量的地方给出一个unbound variable错误信息.

    1. set -u # Or set -o nounset
    2. # Setting a variable to null will not trigger the error/abort.
    3. # unset_var=
    4. echo $unset_var # Unset (and undeclared) variable.
    5. echo "Should not echo!"
    6. #sh t2.sh
    7. #t2.sh: line 6: unset_var: unbound variable
  3. 使用“断言”功能在脚本的关键点进行测试的变量或条件。 (这是从C借来的一个想法)

    Example 32-4. Testing a condition with an assert

    ```

    !/bin/bash

    assert.sh

    #

    assert () # If condition false,
    { #+ exit from script

    1. #+ with appropriate error message.

    E_PARAM_ERR=98
    E_ASSERT_FAILED=99

  1. if [ -z "$2" ] # Not enough parameters passed
  2. then #+ to assert() function.
  3. return $E_PARAM_ERR # No damage done.
  4. fi
  5. lineno=$2
  6. if [ ! $1 ]
  7. then
  8. echo "Assertion failed: \"$1\""
  9. echo "File \"$0\", line $lineno" # Give name of file and line number.
  10. exit $E_ASSERT_FAILED
  11. # else
  12. # return
  13. # and continue executing the script.
  14. fi
  15. } # Insert a similar assert() function into a script you need to debug.
  16. #######################################################################
  17. a=5
  18. b=4
  19. condition="$a -lt $b" # Error message and exit from script.
  20. # Try setting "condition" to something else
  21. #+ and see what happens.
  22. assert "$condition" $LINENO
  23. # The remainder of the script executes only if the "assert" does not fail.
  24. # Some commands.
  25. # Some more commands . . .
  26. echo "This statement echoes only if the \"assert\" does not fail."
  27. # . . .
  28. # More commands . . .
  29. exit $?
  30. ```
  1. 使用变量$LINENO和内建命令caller.

  2. 捕获exit返回值.

    The exit command in a script triggers a signal 0, terminating the process, that is, the script itself. [1] It is often useful to trap the exit, forcing a “printout” of variables, for example. The trap must be the first command in the script.

捕获信号

trap
Specifies an action on receipt of a signal; also useful for debugging.

A signal is a message sent to a process, either by the kernel or another process, telling it to take some specified action (usually to terminate). For example, hitting a Control-C sends a user interrupt, an INT signal, to a running program.

A simple instance:

  1. trap '' 2
  2. # Ignore interrupt 2 (Control-C), with no action specified.
  3. trap 'echo "Control-C disabled."' 2
  4. # Message when Control-C pressed.

Example 32-5. Trapping at exit

  1. #!/bin/bash
  2. # Hunting variables with a trap.
  3. trap 'echo Variable Listing --- a = $a b = $b' EXIT
  4. # EXIT is the name of the signal generated upon exit from a script.
  5. #
  6. # The command specified by the "trap" doesn't execute until
  7. #+ the appropriate signal is sent.
  8. echo "This prints before the \"trap\" --"
  9. echo "even though the script sees the \"trap\" first."
  10. echo
  11. a=39
  12. b=36
  13. exit 0
  14. # Note that commenting out the 'exit' command makes no difference,
  15. #+ since the script exits in any case after running out of commands.

Example 32-6. Cleaning up after Control-C

  1. #!/bin/bash
  2. # logon.sh: A quick 'n dirty script to check whether you are on-line yet.
  3. umask 177 # Make sure temp files are not world readable.
  4. TRUE=1
  5. LOGFILE=/var/log/messages
  6. # Note that $LOGFILE must be readable
  7. #+ (as root, chmod 644 /var/log/messages).
  8. TEMPFILE=temp.$$
  9. # Create a "unique" temp file name, using process id of the script.
  10. # Using 'mktemp' is an alternative.
  11. # For example:
  12. # TEMPFILE=`mktemp temp.XXXXXX`
  13. KEYWORD=address
  14. # At logon, the line "remote IP address xxx.xxx.xxx.xxx"
  15. # appended to /var/log/messages.
  16. ONLINE=22
  17. USER_INTERRUPT=13
  18. CHECK_LINES=100
  19. # How many lines in log file to check.
  20. trap 'rm -f $TEMPFILE; exit $USER_INTERRUPT' TERM INT
  21. # Cleans up the temp file if script interrupted by control-c.
  22. echo
  23. while [ $TRUE ] #Endless loop.
  24. do
  25. tail -n $CHECK_LINES $LOGFILE> $TEMPFILE
  26. # Saves last 100 lines of system log file as temp file.
  27. # Necessary, since newer kernels generate many log messages at log on.
  28. search=`grep $KEYWORD $TEMPFILE`
  29. # Checks for presence of the "IP address" phrase,
  30. #+ indicating a successful logon.
  31. if [ ! -z "$search" ] # Quotes necessary because of possible spaces.
  32. then
  33. echo "On-line"
  34. rm -f $TEMPFILE # Clean up temp file.
  35. exit $ONLINE
  36. else
  37. echo -n "." # The -n option to echo suppresses newline,
  38. #+ so you get continuous rows of dots.
  39. fi
  40. sleep 1
  41. done
  42. # Note: if you change the KEYWORD variable to "Exit",
  43. #+ this script can be used while on-line
  44. #+ to check for an unexpected logoff.
  45. # Exercise: Change the script, per the above note,
  46. # and prettify it.
  47. exit 0
  48. # Nick Drage suggests an alternate method:
  49. while true
  50. do ifconfig ppp0 | grep UP 1> /dev/null && echo "connected" && exit 0
  51. echo -n "." # Prints dots (.....) until connected.
  52. sleep 2
  53. done
  54. # Problem: Hitting Control-C to terminate this process may be insufficient.
  55. #+ (Dots may keep on echoing.)
  56. # Exercise: Fix this.
  57. # Stephane Chazelas has yet another alternative:
  58. CHECK_INTERVAL=1
  59. while ! tail -n 1 "$LOGFILE" | grep -q "$KEYWORD"
  60. do echo -n .
  61. sleep $CHECK_INTERVAL
  62. done
  63. echo "On-line"
  64. # Exercise: Discuss the relative strengths and weaknesses
  65. # of each of these various approaches.
  66. Example 32-7. A Simple Implementation of a Progress Bar
  67. #! /bin/bash
  68. # progress-bar2.sh
  69. # Author: Graham Ewart (with reformatting by ABS Guide author).
  70. # Used in ABS Guide with permission (thanks!).
  71. # Invoke this script with bash. It doesn't work with sh.
  72. interval=1
  73. long_interval=10
  74. {
  75. trap "exit" SIGUSR1
  76. sleep $interval; sleep $interval
  77. while true
  78. do
  79. echo -n '.' # Use dots.
  80. sleep $interval
  81. done; } & # Start a progress bar as a background process.
  82. pid=$!
  83. trap "echo !; kill -USR1 $pid; wait $pid" EXIT # To handle ^C.
  84. echo -n 'Long-running process '
  85. sleep $long_interval
  86. echo ' Finished!'
  87. kill -USR1 $pid
  88. wait $pid # Stop the progress bar.
  89. trap EXIT
  90. exit $?

Note
The DEBUG argument to trap causes a specified action to execute after every command in a script. This permits tracing variables, for example.

Example 32-8. Tracing a variable

  1. #!/bin/bash
  2. trap 'echo "VARIABLE-TRACE> \$variable = \"$variable\""' DEBUG
  3. # Echoes the value of $variable after every command.
  4. variable=29; line=$LINENO
  5. echo " Just initialized \$variable to $variable in line number $line."
  6. let "variable *= 3"; line=$LINENO
  7. echo " Just multiplied \$variable by 3 in line number $line."
  8. exit 0
  9. # The "trap 'command1 . . . command2 . . .' DEBUG" construct is
  10. #+ more appropriate in the context of a complex script,
  11. #+ where inserting multiple "echo $variable" statements might be
  12. #+ awkward and time-consuming.
  13. # Thanks, Stephane Chazelas for the pointer.

Output of script:

VARIABLE-TRACE> $variable = “”
VARIABLE-TRACE> $variable = “29”
Just initialized $variable to 29.
VARIABLE-TRACE> $variable = “29”
VARIABLE-TRACE> $variable = “87”
Just multiplied $variable by 3.
VARIABLE-TRACE> $variable = “87”
Of course, the trap command has other uses aside from debugging, such as disabling certain keystrokes within a script (see Example A-43).

Example 32-9. Running multiple processes (on an SMP box)

  1. #!/bin/bash
  2. # parent.sh
  3. # Running multiple processes on an SMP box.
  4. # Author: Tedman Eng
  5. # This is the first of two scripts,
  6. #+ both of which must be present in the current working directory.
  7. LIMIT=$1 # Total number of process to start
  8. NUMPROC=4 # Number of concurrent threads (forks?)
  9. PROCID=1 # Starting Process ID
  10. echo "My PID is $$"
  11. function start_thread() {
  12. if [ $PROCID -le $LIMIT ] ; then
  13. ./child.sh $PROCID&
  14. let "PROCID++"
  15. else
  16. echo "Limit reached."
  17. wait
  18. exit
  19. fi
  20. }
  21. while [ "$NUMPROC" -gt 0 ]; do
  22. start_thread;
  23. let "NUMPROC--"
  24. done
  25. while true
  26. do
  27. trap "start_thread" SIGRTMIN
  28. done
  29. exit 0
  30. # ======== Second script follows ========
  31. #!/bin/bash
  32. # child.sh
  33. # Running multiple processes on an SMP box.
  34. # This script is called by parent.sh.
  35. # Author: Tedman Eng
  36. temp=$RANDOM
  37. index=$1
  38. shift
  39. let "temp %= 5"
  40. let "temp += 4"
  41. echo "Starting $index Time:$temp" "$@"
  42. sleep ${temp}
  43. echo "Ending $index"
  44. kill -s SIGRTMIN $PPID
  45. exit 0
  46. # ======================= SCRIPT AUTHOR'S NOTES ======================= #
  47. # It's not completely bug free.
  48. # I ran it with limit = 500 and after the first few hundred iterations,
  49. #+ one of the concurrent threads disappeared!
  50. # Not sure if this is collisions from trap signals or something else.
  51. # Once the trap is received, there's a brief moment while executing the
  52. #+ trap handler but before the next trap is set. During this time, it may
  53. #+ be possible to miss a trap signal, thus miss spawning a child process.
  54. # No doubt someone may spot the bug and will be writing
  55. #+ . . . in the future.
  56. # ===================================================================== #
  57. # ----------------------------------------------------------------------#
  58. #################################################################
  59. # The following is the original script written by Vernia Damiano.
  60. # Unfortunately, it doesn't work properly.
  61. #################################################################
  62. #!/bin/bash
  63. # Must call script with at least one integer parameter
  64. #+ (number of concurrent processes).
  65. # All other parameters are passed through to the processes started.
  66. INDICE=8 # Total number of process to start
  67. TEMPO=5 # Maximum sleep time per process
  68. E_BADARGS=65 # No arg(s) passed to script.
  69. if [ $# -eq 0 ] # Check for at least one argument passed to script.
  70. then
  71. echo "Usage: `basename $0` number_of_processes [passed params]"
  72. exit $E_BADARGS
  73. fi
  74. NUMPROC=$1 # Number of concurrent process
  75. shift
  76. PARAMETRI=( "$@" ) # Parameters of each process
  77. function avvia() {
  78. local temp
  79. local index
  80. temp=$RANDOM
  81. index=$1
  82. shift
  83. let "temp %= $TEMPO"
  84. let "temp += 1"
  85. echo "Starting $index Time:$temp" "$@"
  86. sleep ${temp}
  87. echo "Ending $index"
  88. kill -s SIGRTMIN $$
  89. }
  90. function parti() {
  91. if [ $INDICE -gt 0 ] ; then
  92. avvia $INDICE "${PARAMETRI[@]}" &
  93. let "INDICE--"
  94. else
  95. trap : SIGRTMIN
  96. fi
  97. }
  98. trap parti SIGRTMIN
  99. while [ "$NUMPROC" -gt 0 ]; do
  100. parti;
  101. let "NUMPROC--"
  102. done
  103. wait
  104. trap - SIGRTMIN
  105. exit $?
  106. : <<SCRIPT_AUTHOR_COMMENTS
  107. I had the need to run a program, with specified options, on a number of
  108. different files, using a SMP machine. So I thought [I'd] keep running
  109. a specified number of processes and start a new one each time . . . one
  110. of these terminates.
  111. The "wait" instruction does not help, since it waits for a given process
  112. or *all* process started in background. So I wrote [this] bash script
  113. that can do the job, using the "trap" instruction.
  114. --Vernia Damiano
  115. SCRIPT_AUTHOR_COMMENTS

Note
trap ‘’ SIGNAL (two adjacent apostrophes) disables SIGNAL for the remainder of the script. trap SIGNAL restores the functioning of SIGNAL once more. This is useful to protect a critical portion of a script from an undesirable interrupt.

  1. trap '' 2 # Signal 2 is Control-C, now disabled.
  2. command
  3. command
  4. command
  5. trap 2 # Reenables Control-C