When All You Have is a Hammer

Today we’re gonna talk about bash.
I sometimes see adverts which basically say “Learn Scripting with BASH!” and I can’t help but feel they miss the whole point of bash.
I mean yes, bash is a scripting language, but it’s better described as a shell that supports scripting.
Why is this important? Because using the wrong tool can work in the short term but give you trouble down the road.
I spent a lot of my early days writing long messy shell scripts when I really should have used python (or ruby, or anything else of that nature). I wrote a network manager in bash, for goodness sake. This happened because I learned bash early-on but neglected to learn when to use it – and when not to.
Bash vs Python
Let’s compare bash with a “real” scripting language like python. The key difference is intention. Shells like bash are meant to be interactive, pretty much by definition. Languages like python are meant to run scripts, and their interactive mode is secondary.
Some key differences:
- Bash makes it easy to run external commands, complete with I/O redirection and unnamed pipes. Python focuses on internal functions.
- Python has first-class support for numbers, strings, arrays, dicts, and custom objects. Bash is heavily string-oriented.
- By default, bash ignores most errors and keeps running. Python stops when it encounters an unhandled error.
- In terms of referencing things: bash overlays functions over external commands, overlays internal variables over environment variables, uses global variables by default, and uses bare words. Python is more structured.
The Shell Niche
Bash is great when used as intended: for operating system administration.
The focus on external programs makes it easy to leverage the OS’s core utilities and other tools.
This is part of Unix philosophy: instead of a monolithic program that does everything, have a bunch of small utilities that each do one thing, and do it well.
grep, sed, awk, rm, cat, etc are independent programs, potentially written in different languages by different people working at different organizations.
(In reality, they’re usually all written in C by GNU or BSD, but the point stands.) A shell ties them all together.
A second niche for shells is scripting for barebones OS tools, resource-limited environments, certain embedded systems, etc. Your home router is unlikely to have python, but it probably has a shell in there somewhere.
Bash Strengths
Pipes
Pipes are probably bash’s biggest strong point. Consider this piece of code:
curl -sSi "$url" | tee >(sed "s/^/[$(date -I)] /" >> /tmp/curls.log) | sed '1,/^\s*$/d' | less
This is an abomination a glorious poundcake of a one-liner.
It downloads a url, displays the results onscreen, and keeps a log of the downloaded data.
The log includes HTTP headers and has the current date prepended to each line, while the displayed text is the plain HTTP response without headers.
The heavy lifting is performed by external programs – curl downloads the url, sed slices and dices, less makes the output manageable on a terminal.
Bash invokes the programs with minimal ceremony and ties it all together with pipes.
Adjustable Fault Tolerance
By default, bash has a high tolerance for errors. This is useful when used interactively (because humans mistype things) but less helpful when writing a script.
In particular:
- Most errors are non-fatal.
- Errors in the middle of a pipe are not reflected in the return code.
- References to unset variables are basically ignored.
Each of these can be adjusted.
set -ecauses bash to exit on error.set -o pipefailsurfaces pipeline errrors.set -utriggers an error when an unset variable is referenced.
Note that this is the tip of the iceberg. Each of the above has caveats that need to be understood and there are lots of other bash flags. I recommend further reading like this or this to better understand the above options. Traps can also help.
Bash Weaknesses
Data Structures
While bash does support the data types you’d expect of a scripting language, it has a clear preference for strings.
Setting and referencing strings is easy:
# setting variables
x=foo # bare word is fine
y='b a z' # quotes allow spaces
z=$y # copy a string (no quotes needed)
y=bar # overwrites y='b a z'
# referencing variables
printf '%s.' $x # 'foo.'
printf '%s.' $y # 'bar.'
printf '%s.' $z # 'b.a.z.' (bash parsed the spaces)
printf '%s.' "$z" # 'b a z.' (quotes prevent space parsing)
With arrays it’s… less easy:
# create an array
arr=(foo bar 'b a z') # three-element array ['foo', 'bar', 'b a z']
# now let's try to copy arr
b=$arr # nope, that's wrong! we just get 'foo' this way
c=${arr[@]} # also wrong! we get 'foo bar b a z'
d=(${arr[@]}) # still wrong! we get ['foo', 'bar', 'b', 'a', 'z']
e=("${arr[@]}") # THIS is how you do it
# now let's try to set arr to a string
arr='qux' # guess what this does...
# examine arr
printf '%s.' "$arr" # 'qux.' (makes sense, we set arr=qux)
printf '%s.' "${arr[@]}" # 'qux.bar.b a z.' (wait, arr is still an array?!)
# examine e
printf '%s.' "${e[@]}" # 'foo.bar.b a z.' (at least e was correctly copied)
And don’t even get me started on dicts.
We can complain about the syntax, but the real problem is that it’s easy to do things wrong. Bash loves to join and separate things by spaces, and will often do so by default. Bash also does not support multidimensional arrays, or any structure where the elements are not strings.
Silent Failures
In a lot of cases bash’s fault tolerance can be a good thing, but in certain cases it’s downright nasty.
Look at this script (but don’t try to run it). What’s wrong with it?
01 │ #!/bin/bash
02 │ # create temporary directory
03 │ tmp_dir="/tmp/foo"
04 │ mkdir -p "$tmp_dir"
05 │
06 │ # download a provided url, save in the tmpdir
07 │ # also strip CR chars ('\r')
08 │ url=$1
09 │ curl -s "$uri" | tr -d '\r' > $tmp_dir/data
11 │
12 │ # exit if the download failed
13 │ if [[ $? -ne 0 ]]; then
14 │ echo "Failed downloading $url!" >&2
15 │ exit 1
16 │ fi
17 │
18 │ # view the downloaded data
19 │ less "$tmp_dir/data"
20 │
21 │ # ask if the downloaded data should be kept
22 │ echo -n "Keep? [Y/n] "
23 │ read resp
24 │ if [[ $resp == *[Nn]* ]]; then
25 │ rm –rf "$tmpdir/"
26 │ else
27 │ echo "File stored in $tmpdir/"
28 │ fi
The first real problem is on line 9: the variable is typed $uri instead of $url.
Since uri is unset, this line is effectively
curl -s "" | tr -d '\r' > $tmp_dir/data
In scripts, curl is often invoked with -s to suppress progress reporting, but this also suppresses error messages.
In this case curl silently fails, tr returns success because it successfully operated on 0 bytes of input, $tmp_dir/data is created as a 0-byte file, and the command as a whole returns success.
Lines 13-16 are meant to handle a download failure, but the return code $? comes straight from tr which is happy as a clam.
There’s nothing wrong with line 19 by itself, but since $tmp_dir/data is 0 bytes long the user will be staring at a blank terminal until they quit less.
The real kicker is line 25. The variable is typed $tmpdir instead of $tmp_dir.
(Admit it, we’ve all done this at least once.)
Since tmpdir is unset, this line becomes:
rm –rf /
Well that’s a little terrifying. This was supposed to be a silly little script to download a webpage. WTF?
Thankfully this is common enough that rm has built-in protection against removing /.
Conclusion
Bash and other shells have evolved over the years to be highly suited for operating system administration and related scripting. However, it is often tempting to use them outside their niche because they do certain things very easily. When creating a new script, think carefully about which language to use.
| Created: | 2020-11-21 |
| Updated: | 2020-12-29 |
| Tags: | bash |

