Some ideas about Multi-user Multi-tenant OpenMOLE

multiuser
openmole

(Maziyar Panahi) #1

Hi!

Due to my recent development (Multivac Hadoop as a Service), I had to try few solutions for multi-tenant Scientific Notebooks where users can login via LDAP and run their code (Scala, Python and R) over my Hadoop cluster (Spark/Hive/HDFS).

I’ve noticed some interesting approaches taking by the developers to overcome the major issues in multi-user and multi-tenant environment (isolation, security, user impersonation, ownership, etc.)

I can see three main areas when it comes to install/config any of these Notebooks:

  1. Authentication: simple (based on linux user/group), LDAP/Active Directory with over kerberos.
  2. Spawner: create a new process for each user, create a new server for each user, or create a new docker (managed by kubernetes)
  3. Connection: How to connect to different components (HDFS, Spark (Livy or Toree), or kernels like R/Python, etc.)

Now the last part can have different names in OpenMOLE (Grid, Cloud, AdHoc, etc.), but I think the rest is pretty much the same.

So I thought we can start looking at these examples and see how we can learn and adopt their solutions into OpenMOLE for having multi-user multi-tenant feature.

Apache Hue
Apache Zeppelin
JupyterHub

They all support most of the authentications (simple, LDAP/AD). They have different approaches when it comes to spawning and connecting to existing services. For instance, JupyterHub spawns a new Jupyter Notebook server for each connected user or it can spawn a new docker (needs to be installed via plugin). As the other two create new process in the current system.

Now we used Hue and Zeppelin heavily for the last 5 months. Because of LDAP integration I can control the scheduler/queue on the YARN and do resource management easily. It also allows for a better access control over HDFS ownership or Hive DB/tables access. Basically, user logs in with his/her LDAP account and everything is takin care of.

Some screenshots of how they look in my implementations:

Apache Zeppelin:

Apache Hue:

Best,
Maziyar


#2

Thanks Maz, we’ll definitely look into that.