HDFS access

hdfs
multivac-dsl

(Hubert Naacke) #1

Hi Maziyar,
I tried your sample code
val pubmedDF = spark.read
.option(“delimiter”, “|”).csv("/projects/equipe/datasets/pubmed/MH_SH_items.gz")

but I got the error:
org.apache.hadoop.security.AccessControlException: Permission denied:

It seem I do not have the permission to read /projects/equipe/datasets/pubmed

regards,


(Maziyar Panahi) #2

Hi @hnaacke,

Could you try again and see if you see the same error. Thanks.

PS: Also, you should give write permission to users if you want them to be able to write something in your notebooks :slight_smile:


(Hubert Naacke) #3

Hi @mpanahi,
It works fine. Thanks.
Could you remane the /projects/equipe into /projects/epique (which is the name of te project)
Could you also add the (more simple) text version of the Mesh classification ?
ftp://nlmpubs.nlm.nih.gov/online/mesh/MESH_FILES/meshtree/mtrees2018.bin
and maybe the README file:
https://mbr.nlm.nih.gov/Download/2017/Data/README
thanks,

How to grant read permission on files stored in / user/ hnaacke/ to the other members of the Epique project
(is there a epique group ) ?


(Maziyar Panahi) #4

Hi @hnaacke,

I was wondering about the name! I renamed it to epique.

For any files, you can simply browse to /projects/epique in Hue and choose upload:

33

The second question, if there is a data that needs to be shared you can put it inside /prijects/epique and other users can access. Your own directory is for your personal files/directories/results. However, if you wish to make something readable to the entire users not just your teammates, you can simple right click and change the permission and grant read/write/execute to Others.


(Hubert Naacke) #5

Hi again,
I tried to upload a file into /projects/epique/datasets/pubmed as you suggested but I got that error:
Erreur : AccessControlException: Permission denied: user=hnaacke, access=WRITE, inode="/projects/epique/datasets/pubmed":mpanahi:mpanahi:drwxr-xr-x (error 500)


(Maziyar Panahi) #6

Can you please try one more time?
Also, for real-time support you can use https://chat.iscpif.fr and look for #multivac channel to communicate through chat.


(Hubert Naacke) #7

It works, thanks.


(Hubert Naacke) #8

About the gz datafile that is too slow to load into spark, I will wait for the parquet version to be available.


(Maziyar Panahi) #9

Sure, I’ll make ‘parquets’ and put in wiki how to read parquet partitions.


(Maziyar Panahi) #10