Die shell script, DIE !
In this post, I'll show how easy it ease to convert fragile shell scripts to Python scripts, using sh.py. I'll use as an example a simple script to check your HTML code from the command-line, using the W3C validator.
Now you ask me, why the hell would Python be better than shell scripts ? I love the simplicty of a short Bash script. But they are simply too brittle to be safely used in production:
- shell scripts miss some very useful features: exceptions and try/catch blocks, stack traces displayed in case of failure, classes and modules, built-in dictionaries, assertions, list-comprehensions...
- Python has more built-in static checks: uninitialized variables and undefined functions can be detected before execution, and horrific syntax errors like
foo () { = 42; }
(this is valid Bash code !) are not tolerated - Python has a more easy to read & understand syntax, and Python code is generally easier to maintain
- Python comes with "batteries included", a HUGE community and has tens of thousands of PyPI packages, so you don't need to reinvent the wheel. On the other end, c ode reuse among shell scripts tends to be difficult.
- many Unix utility commands (awk, grep, sed...) can be replaced by simple Python code, meaning less commands invocations and faster execution
- finally, shell scripts can cause this kind of issue
To begin with, there is our initial Bash script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
To paraphrase the code, this script retrieves the vnu.jar
validator from Github on the first run, and use it to check the HTML code passed by standard input or filename.
Hence, the script downloads and uncompresses a ZIP archive file. Moreover, it filters out from the HTML file any mustache string pattern.
Now, compare it to the following equivalent Python code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
First observation: the code is definitively lengthier. 88% more characters exactly. But look at the benefits !
- the script keeps its general structure: pipe-based, functional, applying simple filters on the input stream
- all commands are checked at import time so they are guaranteed to exist:
java
,wget
... By the way, you are not limited to Linux coreutils, you can use any command on your system. - try/catch blocks !!!
- in case of failure, you can inspect the objects at runtime with
pdb
- finally, two commands invocation (
grep
andawk
) have been replaced by native Python inline generators. I prefer avoiding unnecessary command calls, but in the process of migrating this script I initially sticked with them:
return grep('-vF', 'meta http-equiv="X-UA-Compatible"', _in=input_pipe, _iter=True)
return sed('-e' 's/{[{%][^{}]\+[%}]}/DUMMY_MUSTACHE/g', _in=input_pipe, _iter=True)
I hope I convinced you: next time you write a script, think about Python !
But BEWARE, even Python scripts can become spaghetti code monsters.
EDIT[17/08/2015] : as long as the interpreter is installed on your system, you can run a standalone Python script just like a Bash script and easily benefit from the THOUSANDS libraries in Pypy, including sh.py
, by invoking pip
FROM YOUR SCRIPT !
import pip
pip.main(['install', '--user', 'retrying==1.3.3', 'requests==2.7.0' 'sh==1.11'])
# if you're using the logging module & a pip version newer or equal to 6.0, you'll need this bugfix (cf. https://github.com/pypa/pip/issues/3043) :
logging.root.handlers = []
import requests
from retrying import retry
from sh import grep, sed