Many Execution Failed on Remote Environment vo.complex-systems.eu


(Aymericvie) #1

Hi,
I am getting around 50% of the jobs failing on the computing grid vo.complex-systems.eu.
All runs well in local execution, but something may be going on on the grid. Is ther any workaround?
Full error below.
Thanks!
Best,
Aymeric

OpenMOLE job execution failed on remote environment
org.openmole.core.exception.InternalProcessingError: Job status is FAILED
DETAILS:
Stdout was:
https://sbgse1.in2p3.fr:443/dpm/in2p3.fr/home/vo.complex-systems.eu/
lpnhe-wn103.in2p3.fr
Wed, 11 Nov 2020 16:25:00 +0100
MemTotal: 65777196 kB
MemFree: 22376720 kB
MemAvailable: 43821472 kB
Buffers: 1836 kB
Cached: 21159844 kB
SwapCached: 38056 kB
Active: 25014364 kB
Inactive: 16455452 kB
Active(anon): 13740876 kB
Inactive(anon): 6626608 kB
Active(file): 11273488 kB
Inactive(file): 9828844 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 67108860 kB
SwapFree: 66876668 kB
Dirty: 1824 kB
Writeback: 0 kB
AnonPages: 20269352 kB
Mapped: 289564 kB
Shmem: 59344 kB
Slab: 1184004 kB
SReclaimable: 839500 kB
SUnreclaim: 344504 kB
KernelStack: 20800 kB
PageTables: 133380 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 99997456 kB
Committed_AS: 43679764 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 412236 kB
VmallocChunk: 34325399548 kB
Percpu: 9984 kB
HardwareCorrupted: 0 kB
AnonHugePages: 569344 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 156412 kB
DirectMap2M: 6107136 kB
DirectMap1G: 62914560 kB
core file size (blocks, -c) 2097151
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256609
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10240
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 256609
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
_CONDOR_ANCESTOR_31795=31855:1605107740:4208917403
_CONDOR_JOB_PIDS=
DIRAC=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot
MANPATH=:/usr/man:/usr/share/man
HOSTNAME=lpnhe-wn103.in2p3.fr
DIRAC_PROCESSORS=1
VO_VO_HESS_EXPERIMENT_EU_SW_DIR=/grid/software/vo.hess-experiment.eu
GFAL_PLUGIN_DIR=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib/gfal2-plugins
HISTSIZE=1000
DIRACPYTHON=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/bin/python2.7
VO_VO_FRANCE_GRILLES_FR_DEFAULT_SE=grid05.lal.in2p3.fr
GLOBUS_PATH=/usr
GLOBUS_LOCATION=/usr
TMPDIR=/var/lib/condor/execute/dir_31795
VO_OPS_DEFAULT_SE=lpnse1.in2p3.fr
JOBID=107554894
PYTHONUNBUFFERED=yes
GT_PROXY_MODE=old
DPM_HOST=lpnse1.in2p3.fr
LCAS_LOG_LEVEL=1
_CONDOR_SCRATCH_DIR=/var/lib/condor/execute/dir_31795
VO_VO_LPNHE_IN2P3_FR_DEFAULT_SE=grid05.lal.in2p3.fr
DIRAC_WHOLENODE=False
X509_CERT_DIR=/etc/grid-security/certificates
DIRACLIB=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib
GLITE_LOCATION_LOG=/var/log/glite
http_proxy=http://lpnhe-cache.in2p3.fr:3128
USER=comsya113
_CHIRP_DELAYED_UPDATE_PREFIX=Chirp*
LD_LIBRARY_PATH=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib/mysql:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib:/usr/lib64/classads:/usr/lib64:/usr/lib
TEMP=/var/lib/condor/execute/dir_31795
VO_DTEAM_SW_DIR=/grid/software/dteam
BATCH_SYSTEM=HTCondor
XRD_RUNFORKHANDLER=1
VO_CMS_SW_DIR=/cvmfs/cms.cern.ch
GLITE_LOCATION_TMP=/tmp
DIRACROOT=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot
_CONDOR_CHIRP_CONFIG=/var/lib/condor/execute/dir_31795/.chirp.config
LCMAPS_LOG_LEVEL=1
ATLAS_LOCAL_AREA=/grid/software/atlas
VO_LHCB_DEFAULT_SE=lpnse1.in2p3.fr
HTCONDOR_JOBID=48245.1
CONDORCE_COLLECTOR_HOST=lpnhe-ce02.in2p3.fr:9619
VO_OPS_SW_DIR=/grid/software/ops
VO_ATLAS_DEFAULT_SE=lpnse1.in2p3.fr
GLOBUS_IO_IPV6=TRUE
VO_VO_FRANCE_GRILLES_FR_SW_DIR=/grid/software/vo.france-grilles.fr
LCMAPS_DEBUG_LEVEL=0
MAIL=/var/spool/mail/comsya113
PATH=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/bin:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/scripts:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/bin:/usr/bin:/usr/externals/bin:/usr/sbin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lib/jvm/java/bin
DPNS_HOST=lpnse1.in2p3.fr
_CONDOR_BIN=/usr/bin
VO_CMS_DEFAULT_SE=node12.datagrid.cea.fr
VO_DTEAM_DEFAULT_SE=lpnse1.in2p3.fr
VO_AUGER_DEFAULT_SE=grid05.lal.in2p3.fr
PWD=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894
PYTHONOPTIMIZE=x
LANG=en_US.UTF-8
_CONDOR_WRAPPER_ERROR_FILE=/var/lib/condor/execute/dir_31795/.job_wrapper_failure
VO_VO_COMPLEX_SYSTEMS_EU_DEFAULT_SE=grid05.lal.in2p3.fr
VO_VO_CTA_IN2P3_FR_DEFAULT_SE=node12.datagrid.cea.fr
_CONDOR_ANCESTOR_1553=1841:1604484011:751937813
X509_VOMS_DIR=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/etc/grid-security/vomsdir
DCOMMANDS_PPID=31897
DIRACSCRIPTS=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/scripts
_CONDOR_SLOT=slot1_13
PERLLIB=/usr/lib64/perl
VO_VO_LPNHE_IN2P3_FR_SW_DIR=/grid/software/vo.lpnhe.in2p3.fr
MYPROXY_SERVER=myproxy.grif.fr
DIRACSITE=LCG.LPNHE.fr
SSL_CERT_DIR=/etc/grid-security/certificates
HISTCONTROL=ignoredups
VO_LHCB_SW_DIR=/cvmfs/lhcb.cern.ch
VO_VO_HESS_EXPERIMENT_EU_DEFAULT_SE=polgrid4.in2p3.fr
HOME=/home/grid/vo.complex-systems.eu/comsya113
GLITE_LOCATION_VAR=/var/glite
DIRACJOBID=107554894
SHLVL=10
TERMINFO=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/share/terminfo:/usr/share/terminfo:/etc/terminfo
GLOBUS_TCP_PORT_RANGE=20000,25000
_CONDOR_MACHINE_AD=/var/lib/condor/execute/dir_31795/.machine.ad
MACHINEFEATURES=/etc/machinefeatures
OPENSSL_CONF=/tmp
X509_USER_PROXY=/var/lib/condor/execute/dir_31795/tmp9QTZym
DIRACBIN=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/bin
ARC_PLUGIN_PATH=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib/arc
DYLD_LIBRARY_PATH=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib/mysql:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/lib:
VO_AUGER_SW_DIR=/cvmfs/auger.egi.eu
GFAL_CONFIG_DIR=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/etc/gfal2.d
AGENT_WORKDIRECTORY=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/work/WorkloadManagement/JobAgent
LCG_GFAL_INFOSYS=topbdii.grif.fr:2170,cclcgtopbdii01.in2p3.fr:2170
PYTHONPATH=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot:/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot
TMP=/var/lib/condor/execute/dir_31795
LOGNAME=comsya113
SINGULARITY_BINDPATH=/cvmfs,/grid/software/cms/SITECONF:/cvmfs/cms.cern.ch/SITECONF
PYTHONPATH_SAVE=/usr/lib64/python
ATLAS_SITE_NAME=GRIF-LPNHE
LESSOPEN=||/usr/bin/lesspipe.sh %s
OMP_NUM_THREADS=1
_CONDOR_JOB_AD=/var/lib/condor/execute/dir_31795/.job.ad
LCAS_DEBUG_LEVEL=0
VO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
GLOBUS_FTP_CLIENT_IPV6=TRUE
GLITE_LOCATION=/usr
_CONDOR_JOB_IWD=/var/lib/condor/execute/dir_31795
VO_VO_CTA_IN2P3_FR_SW_DIR=/grid/software/vo.cta.in2p3.fr
_CONDOR_ANCESTOR_1841=31795:1605107739:3755701182
VO_VO_COMPLEX_SYSTEMS_EU_SW_DIR=/grid/software/vo.complex-systems.eu
SITE_NAME=GRIF
RRD_DEFAULT_FONT=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/Linux_x86_64_glibc-2.17/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf
DIRACPLAT=Linux_x86_64_glibc-2.17
_=/usr/bin/env
/var/lib/condor/execute/dir_31795/tmp9QTZym

stderr was:

  • echo https://sbgse1.in2p3.fr:443/dpm/in2p3.fr/home/vo.complex-systems.eu/

  • hostname

  • date -R

  • cat /proc/meminfo

  • ulimit -n 10240

  • ulimit -a

  • env

  • echo /var/lib/condor/execute/dir_31795/tmp9QTZym

  • unset http_proxy

  • unset https_proxy

  • BASEPATH=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894

  • CUR=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • test -e /var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • mkdir /var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • export HOME=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • HOME=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • cd /var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • export OPENMOLE_HOME=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • OPENMOLE_HOME=/var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254
    ++ uname -m

  • ‘[’ x86_64 = x86_64 ‘]’

  • curl --connect-timeout 1800 --max-time 1800 --cert /var/lib/condor/execute/dir_31795/tmp9QTZym --key /var/lib/condor/execute/dir_31795/tmp9QTZym --cacert /var/lib/condor/execute/dir_31795/tmp9QTZym --capath /etc/grid-security/certificates -f -L https://sbgse1.in2p3.fr:443/dpm/in2p3.fr/home/vo.complex-systems.eu/openmole-32e6da9c-9536-4c03-960c-b0bebe78c363/persistent/1600347188561_e8baf727-bbc6-4391-8468-024b9cb97d53 -o /var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254/jvm.tar.gz
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed

    0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
    0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
    100 640 100 640 0 0 1662 0 --:–:-- --:–:-- --:–:-- 1662

79 58.9M 79 46.9M 0 0 38.4M 0 0:00:01 0:00:01 --:–:-- 38.4M
100 58.9M 100 58.9M 0 0 42.1M 0 0:00:01 0:00:01 --:–:-- 67.4M

  • tar -xzf jvm.tar.gz

  • rm -f jvm.tar.gz

  • curl --connect-timeout 1800 --max-time 1800 --cert /var/lib/condor/execute/dir_31795/tmp9QTZym --key /var/lib/condor/execute/dir_31795/tmp9QTZym --cacert /var/lib/condor/execute/dir_31795/tmp9QTZym --capath /etc/grid-security/certificates -f -L https://sbgse1.in2p3.fr:443/dpm/in2p3.fr/home/vo.complex-systems.eu/openmole-32e6da9c-9536-4c03-960c-b0bebe78c363/persistent/1600347185488_f4525922-7d9f-44d7-adb4-ce93dd3260e1 -o /var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254/openmole.tar.gz
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed

    0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
    100 639 100 639 0 0 1691 0 --:–:-- --:–:-- --:–:-- 1694

    0 0 0 0 0 0 0 0 --:–:-- 0:00:01 --:–:-- 0
    0 0 0 0 0 0 0 0 --:–:-- 0:00:02 --:–:-- 0
    0 0 0 0 0 0 0 0 --:–:-- 0:00:03 --:–:-- 0
    0 0 0 0 0 0 0 0 --:–:-- 0:00:03 --:–:-- 0
    0 0 0 0 0 0 0 0 --:–:-- 0:00:03 --:–:-- 0
    curl: (22) The requested URL returned error: 403 Forbidden

  • RETURNCODE=22

  • cd …

  • rm -rf /var/lib/condor/execute/dir_31795/DIRAC_Az3btHpilot/107554894/ws18254

  • exit 22