Problem with session consistency on Zeppelin

(David Chavalarias) #1

We have experienced several session disconnection using zeppeling. The last one for session
gave the error
Error with 400 StatusCode: "requirement failed: Session isn’t active."
during a computation.

(Maziyar Panahi) #2

Hi david,

I checked the sessions for today. They are between 30m to 1 hour. So it is not a timeout or disconnection. Usually when this happens there is an error in the code.

I checked this specific application application_1516970792259_0019, you are not supposed to cache something that big. Cache is for the last results or stuff that you know can fit in the memory of the executors. The app has many errors exceeding the memory. Sometimes caching not only results in crashes but also slows down the process since it has to put the data on disk if it can’t fit it in the memory.

PS: It is really hard to reproduce this error. It really depends on how many times you call an action like .count() and bring data to driver or force the executors to read the data from disks and replace it into memory.

I highly recommend to not cache anything over 10 million. Please try again without caching and see how it goes. Usually the parallelization and distributed computations are enough to speed up everything.

(David Chavalarias) #3

Thanks a lot Maziyar for the investigation. We note the recommendation about caching !

(Maziyar Panahi) #4