First, we will specify all the required packages and their required minimum versions; looking at the preceding code, you can see that Spark 2.3.1 requires Java 1.8+ and Python 3.4 or higher (and we will always be checking for these two environments). Additionally, if you want to use R or Scala, the minimal requirements for these two packages are 3.1 and 2.11, respectively. Maven, as mentioned earlier, will be used to compile the Spark sources, and for doing that, Spark requires at least the 3.3.9 version of Maven.
Next, we parse the command-line arguments:
if [ "$_args_len" -ge 0 ]; then
while [[ "$#" -gt 0 ]]
do
key="$1"
case $key in
-m|--Maven)
_check_Maven_req="true"
shift # past argument
;;
-r|--R)
_check_R_req="true"
shift # past argument
;;
-s|--Scala)
_check_Scala_req="true"
shift # past argument
;;
*)
shift # past argument
esac
done
fi
You, as a user, can specify whether you want to check additionally for R, Scala, and Maven dependencies. To do so, run the following code from your command line (the following code will check for all of them):
./checkRequirements.sh -s -m -r
The following is also a perfectly valid usage:
./checkRequirements.sh --Scala --Maven --R
Next, we call three functions: printHeader, checkJava, and checkPython. The printHeader function is nothing more than just a simple way for the script to state what it does and it really is not that interesting, so we will skip it here; it is, however, fairly self-explanatory, so you are welcome to peruse the relevant portions of the checkRequirements.sh script yourself.
Next, we will check whether Java is installed. First, we just print to the Terminal that we are performing checks on Java (this is common across all of our functions, so we will only mention it here):
function checkJava() {
echo
echo "##########################"
echo
echo "Checking Java"
echo
Following this, we will check if the Java environment is installed on your machine:
if type -p java; then
echo "Java executable found in PATH"
_java=java
elif [[ -n "$JAVA_HOME" ]] && [[ -x "$JAVA_HOME/bin/java" ]]; then
echo "Found Java executable in JAVA_HOME"
_java="$JAVA_HOME/bin/java"
else
echo "No Java found. Install Java version $_java_required or higher first or specify JAVA_HOME variable that will point to your Java binaries."
exit
fi
First, we use the type command to check if the java command is available; the type -p command returns the location of the java binary if it exists. This also implies that the bin folder containing Java binaries has been added to the PATH.
If you are certain you have the binaries installed (be it Java, Python, R, Scala, or Maven), you can jump to the Updating PATH section in this recipe to see how to let your computer know where these binaries live.
If this fails, we will revert to checking if the JAVA_HOME environment variable is set, and if it is, we will try to see if it contains the required java binary: [[ -x "$JAVA_HOME/bin/java" ]]. Should this fail, the program will print the message that no Java environment could be found and will exit (without checking for other required packages, like Python).
If, however, the Java binary is found, then we can check its version:
_java_version=$("$_java" -version 2>&1 | awk -F '"' '/version/ {print $2}')
echo "Java version: $_java_version (min.: $_java_required)"
if [[ "$_java_version" < "$_java_required" ]]; then
echo "Java version required is $_java_required. Install the required version first."
exit
fi
echo
We first execute the java -version command in the Terminal, which would normally produce an output similar to the following screenshot:
We then pipe the previous output to awk to split (the -F switch) the rows at the quote '"' character (and will only use the first line of the output as we filter the rows down to those that contain /version/) and take the second (the $2) element as the version of the Java binaries installed on our machine. We will store it in the _java_version variable, which we also print to the screen using the echo command.
If you do not know what
awk is or how to use it, we recommend this book from Packt:
http://bit.ly/2BtTcBV.
Finally, we check if the _java_version we just obtained is lower than _java_required. If this evaluates to true, we will stop the execution, instead telling you to install the required version of Java.
The logic implemented in the checkPython, checkR, checkScala, and checkMaven functions follows in a very similar way. The only differences are in what binary we call and in the way we check the versions:
- For Python, we run "$_python" --version 2>&1 | awk -F ' ' '{print $2}', as checking the Python version (for Anaconda distribution) would print out the following to the screen: Python 3.5.2 :: Anaconda 2.4.1 (x86_64)
- For R, we use "$_r" --version 2>&1 | awk -F ' ' '/R version/ {print $3}', as checking the R's version would write (a lot) to the screen; we only use the line that starts with R version: R version 3.4.2 (2017-09-28) -- "Short Summer"
- For Scala, we utilize "$_scala" -version 2>&1 | awk -F ' ' '{print $5}', given that checking Scala's version prints the following: Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
- For Maven, we check "$_mvn" --version 2>&1 | awk -F ' ' '/Apache Maven/ {print $3}', as Maven prints out the following (and more!) when asked for its version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T00:58:13-07:00)
If you want to learn more, you should now be able to read the other functions with ease.