HDFS access

hnaacke · 2018-02-21 17:00:59 UTC

Hi Maziyar,
I tried your sample code
val pubmedDF = spark.read
.option(“delimiter”, “|”).csv("/projects/equipe/datasets/pubmed/MH_SH_items.gz")

but I got the error:
org.apache.hadoop.security.AccessControlException: Permission denied:

It seem I do not have the permission to read /projects/equipe/datasets/pubmed

regards,

mpanahi · 2018-02-22 10:48:50 UTC

Hi @hnaacke,

Could you try again and see if you see the same error. Thanks.

PS: Also, you should give write permission to users if you want them to be able to write something in your notebooks

hnaacke · 2018-02-22 12:59:30 UTC

Hi @mpanahi,
It works fine. Thanks.
Could you remane the /projects/equipe into /projects/epique (which is the name of te project)
Could you also add the (more simple) text version of the Mesh classification ?
ftp://nlmpubs.nlm.nih.gov/online/mesh/MESH_FILES/meshtree/mtrees2018.bin
and maybe the README file:
https://mbr.nlm.nih.gov/Download/2017/Data/README
thanks,

How to grant read permission on files stored in / user/ hnaacke/ to the other members of the Epique project
(is there a epique group ) ?

mpanahi · 2018-02-22 16:22:28 UTC

Hi @hnaacke,

I was wondering about the name! I renamed it to epique.

For any files, you can simply browse to /projects/epique in Hue and choose upload:

The second question, if there is a data that needs to be shared you can put it inside /prijects/epique and other users can access. Your own directory is for your personal files/directories/results. However, if you wish to make something readable to the entire users not just your teammates, you can simple right click and change the permission and grant read/write/execute to Others.

hnaacke · 2018-02-22 16:53:12 UTC

Hi again,
I tried to upload a file into /projects/epique/datasets/pubmed as you suggested but I got that error:
Erreur : AccessControlException: Permission denied: user=hnaacke, access=WRITE, inode="/projects/epique/datasets/pubmed":mpanahi:mpanahi:drwxr-xr-x (error 500)

mpanahi · 2018-02-22 17:01:21 UTC

Can you please try one more time?
Also, for real-time support you can use https://chat.iscpif.fr and look for #multivac channel to communicate through chat.

hnaacke · 2018-02-22 17:35:42 UTC

It works, thanks.

hnaacke · 2018-02-22 17:45:32 UTC

About the gz datafile that is too slow to load into spark, I will wait for the parquet version to be available.

mpanahi · 2018-02-22 18:27:44 UTC

Sure, I’ll make ‘parquets’ and put in wiki how to read parquet partitions.

mpanahi · 2018-02-22 18:28:09 UTC