I am very close to solve this issue so we can all move on to developing and testing!
I have found where are the right headers and native Hadoop libraries on Multivac cluster. So the way it should be linked against Hadoop on Multivac is as follow:
# Optional
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export HADOOP_CONF_DIR=/etc/hadoop/conf:/etc/hive/conf
# Very important when you are executing the C/C++ code. This helps the JAVA to find and link the libraries (classpath) if I am not mistaken
link the libraries if I am not mistaken
export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server:/opt/cloudera/parcels/CDH/lib/
export CLASSPATH=$CLASSPATH:`hadoop classpath --glob`
# How to link C/C++ to the right headers and Hadoop native libraries
g++ -o cpp_exe2 main.cpp -std=c++11 -I/opt/cloudera/parcels/CDH/include/ -L/opt/cloudera/parcels/CDH/lib/ -lhdfs
I have tested the compiled version from the server:
```bash
./cpp_exe2 #enter
/user/[USERNAME]/tmp/test.txt # enter
This successfully created a file test.txt
on HDFS on the given path! (hooray)
Now, I am going to solve the other issue, this compile version cannot be executed by Spark RDD Pipe
on the cluster because other machines/executors don’t have access to LD_LIBRARY_PATH
path.
This shouldn’t be hard, I am going to find a way to either add this to Spark session or distribute this on all clients as a config with Zookeeper
.
This is the GitHub repo I made to put some codes and instruction for the future. It is not complete yet, but you are welcome to keep an eye on it