HTCondor Workshop, CERN, December 8-9, 2014

Agenda

https://indico.cern.ch/event/272794/

Introduction to High Throughput Computing - G. Thain

HTC: maximize sum of (completed) job run times in a given amount of wall clock time

  • Over a long period of time
  • No attempt to optimize WC time of a particular job
  • Tension maximimizing number of machines aggregated (minimize constraint on them) and number of job run (have jobs running at every place)

HTCondor manages jobs and machines

  • Jobs
    • Survive crashes, network glitches
    • Each job with its logs, policy...
  • Machine's owner is King!
    • Owner's policy trumps all
    • After a job is done, every trace is removed
    • HTCondor knows the resources a machine has
  • also manages data without the need of a shared file system: sandboxes
    • HTCondor knows the sandbox size
    • Supports 3d party transfers
  • Many security protocols
  • Scalability up to 200K jobs running per schedd
    • Scale out achieved by adding more schedd
  • Network: can accodomate almost any config, including firewall, single inbound port
  • Supports workflow: DAGs, complex/huge workflows and not only bag of tasks

Discussion

  • Does HTCondor normalize CPU time?
    • No. Attemps to grab any CPU available without trying to optimize one job duration

Using HTCondor - T. Tannenbaum

Requirements, preferences and (custom) attributes

  • Used by jobs express what they need and what kind of job they are
  • Used by machines to express what they provide (their preferred jobs)
  • HTCondor brings thems togheter: match making between job requirements and machine resources
  • Expressed in ClassAds: language allowing to express/define a set of key/value pairs
    • Attributes can be literals or expressions: in expressions, references, operators, built-in functions
    • Semi-structured, no schema
    • (logical) expressions can return true, false, undefined and error. 'undefined' is a value that can be tested. '=?=' and '=!=' never return value 'undefined'.
    • True and False can be used in numerical expressions and respectively evaluate to 1 and 0

Matchmaking between ClassAds is based on their attributes Requirements and Rank

  • Rank: float, the higher the better

Jobs organized in 'universe': local, grid, parallel, ...

  • HTCondor supports many universe
  • Universe describes a "category" of resources

Jobs submitted with a submit file used to build the job ClassAds

  • Submit files are close to ClassAds but they are not ClassAds
  • Requested CPU, memory... are put in the Requirement ClassAds
  • Several macros to refer to other ClassAds attributes, e.g. attributes in machine ClassAds
  • A submit file can be used to submit multiple jobs: this set of job is called a 'cluster'
  • A job ID has the format clusterID.JobID

Useful commands for troubleshooting problems:

  • condor_q: display job status
  • condor_status: display machine status
  • condor_ssh_to_job: ssh session co-located to the job
    • Very restricted connection, controlled by HTCondor
  • condor_chirp: access to job files

DAG submitted with condor_submit_dag

  • DAG defined in the DAG file, describing relationships between jobs (nodes) making the DAG
  • Executable submitted is DAGman
    • DAGman started each job with its own submit file
  • In case of errors, DAGman continues until cannot make any more progress: at this point a rescue file is created
  • The DAG can be restarted at the point if failed with the rescue file
  • DAGs can be nested and spliced

HTCondor Administration Basics - G. Thain

Job lifecycle: Idle -> XferIn -> Running -> XferOut -> Complet

  • XferIn and XferOut: tranfer of sandboxes, can take a while
  • In Idle and Running states, can become Held
  • In Running state, can move to Suspend state
    • In Suspend state, the job remains on the machine
  • In Running and Complete states, can move back to Idle: happens in particular if something wrong happened on the machine
    • Job restarted from scratch

Machine "life cycle"

  • startd advertizes machine capabilities through a ClassAds to a Collector
  • The collector is a directory of all the machines available
  • negotiator passes the selected machine(s) (provisions machines) to schedd that claims the machine resources
    • One condor_shadow is created per running job
    • schedd is a database: plays a role similar to queues in traditional batch systems. Has information on all jobs.
  • schedd: many possible in the same pool

condor_master: the master Condor daemon present on any Condor machines (except submit only machines)

  • Acces like initd for Condor daemons: manages/monitors them

Submission side

  • condor_submit is one tool to talk to schedd
    • Other possibilities: Python bindings, SOAP...
  • Not much policy configuration on submit side: mostly about scalability and security
  • schedd is managing all the jobs submitted to it: 200K max per schedd
    • Add more schedd if needing more jobs
    • Forked by condor_master
    • Forks a condor_shadow
    • Possible to have several schedd on the same machine, even though unusual
  • condor_master starts also condor_procd to rebuild the process trees from all the processes running on the machine

Execute side primilarly managed by startd

  • Also one condor_started per running job
  • Machine ClassAds built by startd from live information and configuration file
    • Configuration mainly about policy
  • Supports preemption and eviction policy
  • startd started by condor_master and responsible for starting a condor_starter per job
    • Restarting startd kills jobs

"Middle" side: collector and negotiator, acting as the central manager

  • collector and negotiator started by condor_master
    • condor_userprio is the main tool to interact with them
  • condor_master also starts a condor_procd, as on the submit machine
  • Not a master node
  • Not a bottleneck for perfs
  • Stateless

Installing/Configuring Condor

  • Tar ball
  • Native package for Debian, RH derivatives...
    • meta-RPM is htcondor in Fedora, condor in CentOS/SL
  • Minimal configuration required: pool creation
    • File locations have defaults but may require adjustments depending on local configuration
    • Configuration attributes are called knobs
    • All daemons share the same configuration: main configuration file by default is /etc/condor/condor_config that includes other files
    • Config files can use macros with syntax $(A): macro is evaluated when the variable is used, not when it is assigned
    • Configuration can be defined in environment variable starting with _condor: will override any definition of the same knob in the config files
    • condor_config_val allows to query config files for specific knobs or even to set them. Can restrict query to knobs changed from default values
    • condor_reconfig tells daemon to reread their config files: some knobs cannot be updated by reconfig but require a restart
  • 2 types of configuration: Personal and Distributed Condor
    • Personal Condor: all daemons on the same machine, good for experimenting/learning
    • Minmal knobs: DAEMON_LIST (all deamons), CONDOR_HOST (localhost), ALLOW_WRITE (localhost)
  • Starting Condor: 'service condor start' or 'condor_master -f'
    • Restarting HTCondor: condor_restart
    • Checking daemons: 'ps auxww | grep [Cc]ondor'
    • All daemons except condor_procd run with effective UID 'condor' but they are started as root and can be root if needed

Real (distributed) pools can be harder if every machine has its own policy

  • Decide UID job should run as
    • Nobody: the safest from the machine perspective but not necessarily convenient for all use cases, in particular if accessing a shared FS
      • Does not protect one job from another
    • Submitting user: the most useful from the user perspective, generally required for shared FS
    • Slot user: a user per slot, protect jobs against the others
      • Traceability a bit harder but audit trail maintained (all the information required logged to map a proxy to a given job slot)
    • Main knobs defined on execute nodes: UID_DOMAIN (also required on submit nodes), STARTER_ALLOW_RUNAS_OWNER, EXECUTE_LOGIN_IS_DEDICATED, SLOTx_USER (1 per slot)
    • Different schedd open to different users allow to implement different mapping policies
  • Shared file systems: not required but properly handled if presents
    • Not necessarily present on all execute nodes: FILESYSTEM_DOMAIN allows to advertize which file systems are available (used by submitter and execute nodes)
    • Can force file transfer even though a shared file system is available
  • Central manager: DAEMON_LIST=MASTER,negotiator,COLLECTOR
  • Debugging the pool: condor_q, condor_status, condor_ping, condor_who

Condor monitoring: see slides

  • Lots of possible pitfalls!

Networking in HTCondor - G. Thain

Condor considers that every daemon has only one address: no longer the case when a machine has multiple interfaces, uses NAT...

  • Several knobs to decide the policy to identify the addresse to pick: by default, takes the address the daemon used to contact the collector

Firewalls: can define HIGHPORT/LOWPORT either both directions or INBOUND only

  • Also Condor Connection Broker (CCB) that allows to bypass firewalls by reversing connections: no need of HIGH/LOWPORT in this case
    • Requires one machine not firewalled: generally the connector
    • Does not work with standard universe

startd and schedd require a potentially high number of ports: risk of port exhaustion, to be monitored

  • Strange behaviour difficult to diagnose when it happens
  • Shared port feature allow to reduce this needs
    • Defined by USE_SHARED_PORT and requires an additional daemon SHARED_PORT
    • Does not work with the standard universe
    • Default since last release (last spring)

Can define "private" networks: 2 machines sharing the same "private" network will use it to communicate

  • Despite the name, the network doesn't have to be private in the IP meaning

CCB, shared ports and private networks can be combined

  • CCB + private network can be a big performance win

Site Experiences

Care and Feeding of HTCondor @FNAL - S. Timm

Authentication/authorization

  • Daemons authenticate through GSI: a proxy is made from the host cert and is shared with all machines
    • Need to know the pool membership
  • Users authorization: mapped to one user per group
    • Do not share home directories between WN and interactive machines: security risks
    • Don't give a shell to accounts
    • Use of glexec to map each DN to a separate account going away
    • Uses nss_db to share passwd files on each WN

Filesystems and memory

  • Each job executes in its own disk partition: SLOTn_EXECUTE
  • Memory: constrain on ResidentSize, killing high mem jobs with SYSTEM_PERIODIC_REMOVE

Condor central nodes: 2 negotiator/collector in high availability config as documented

  • negotiator in failover, collector in load balancing

condor_schedd configuration on general purpose grid farm: 5 schedd for 10K slots

  • An overkill: 3 would be enough
  • The others are for functional separation of opportunistic jobs
  • Needs a lot of memory: 4GB for 1000 running jobs
  • Few fast cores rather than a large number of slower cores: schedd is mainly single threaded

Keeping track of WNs

  • Maintain a list of expected WNs
  • Compare with condor_status

Upgrades: careful testing, always a couple of changed behaviours, some config files moved

  • Do not use dev version (odd second digit)
  • Do not use .0 minor releases
  • Since 8.0, it is possible to upgrade schedd without killing running jobs (???)

Configuration management: use the same condor_config on all nodes, despite Puppet allowing to do the opposite

  • Use nodename-based conditional
  • schedd configuration put in a special file
  • See slides for the most importants knobs used

Downtimes/draining: 'peaceoff' feature that will drain each nodes

Preemption policy: each VO guaranteed a certain number of non preemptable jobs

  • Also plays with policy to manage the sharing between local users (or users with pledges) and opportunistic jobs

Job filling patterns: both horizontal and vertical (job packing on a node) filling supported

  • FERMI used horizontal filling since the beginning

Log files: EVENT_LOG allow to use a common file for all user logs

  • Very important feature, big time saver
  • Don't be stinger on log max size and the information logged
    • FNAL running with D_FULLDEBUG for a couple of years

Monitoring HTCondor - A. Lahiff

Condor commands to monitor jobs, machines and users

  • See slides: lots of detailed examples

Ganglia: condor_gangliad gathers ClassAds from collector and published them to Ganglia spoofing host names

  • Can run on any node
  • Very useful to get the overall picture of the Condor configuration
  • condor_gangliad can invoke a callout allowing in fact to send information to potentially to any monitoring system

Nagios: used to check health of the Condor daemons and the CE communication with Condor (schedd health, submission test)

Local tools

  • RAL web app reporting the same kind of information as condor_q
  • Mimic to display the health of the clusters

HTCondor Deployment @RAL - A. Lahiff

Configuration

  • High availability of central managers
  • schedd on each CE
  • Worker nodes
  • condor_ganglia
  • Using partitionable slots and hierarchical accounting groups
  • Managed with Quattor

Using MOUNT_UNDER_SCRATCH + PID namespace + cgroups for CPU and memory

Multicore jobs

  • GROUP_SORT_EXPRS to favor multicore jobs over single core
  • condor_defrag configured to drain 8 cores rathern than whole nodes
  • condor_defrag target adjusted dynamically based on waiting multicore jobs
    • Through a cron job that adjusts DEFRAG_MAX_CONCURRENT_DRAINING using condor_config_val based on the demand
  • Max wasted resources observed with current settings is ~2%

WN health check implemented as a cron job on the WN

  • In case of a problem, WN node is not accepting new jobs
  • Problem advertized in machine ClassAds

ElasticSearch used to mine HTCondor history log files

  • Provide dashboards like completed jobs per VO
  • Faster than condor_history in many situations

Using VM universe to instantiate microCERNVM images

  • Implemented the Vac model through jobs running in the local universe and starting VMs for each VO based on the load
    • Different contextualization for each VO

Also experimented to use a cloud to expand to clouds based on the needs

  • First done (successfully) with StratusLab, restarting the work with OpenNebula
  • Using condor_rooster to instantiate VM and HIBERNATE expressions to shut them down when no longer needed
    • Advertize offline ClassAds representing the type of VMs rather than specific machines: host names used are random string
    • Offline ClassAds can be used by the negotiator when doing the match making: if an offline machine is actually matched, condor_rooster starts it (online machines are configured to be preferred over offline ones)
    • Use START on the VM to prepare for VM shutdown after a certain time: VM health monitored with the standard cron used at RAL
    • HIBERNATE used to shut down the VM
  • VM machines when starting join the Condor pool

HTCondor @IAC (Canarias) - A. Dorta

Using Condor since 2004

  • Pool formed mainly by researchers' desktop

Problems encountered and solutions

  • License and access to hot disks: using the concurrency feature
  • Noise in offices: condor_time_restrict
  • Guarantee of a minimum disk space: SYSTEM_PERIODIC_REMOVE triggered when going below a threshold
  • Dashboard to allow users to check if/how their machine is used: ConGUSTo (local development)

HTCondor @FNAL - S. Trimm

Long experience with different batch systems, moved to Condor in 2006

  • Now running 4 HTCondor clusters
  • CMS T1 and GP grid cluster recently put under the management of the same departement after being separate during 8 years: trying to merge both approaches taking the best of both worlds

3 key technologies for addressing differentt FNAL needs

  • HTCondor-CE and its use of JobRouter feature in HTCondor
  • Partitionable slots to address different CPU/memory requirements
  • Hierarchical group quotas

Virtual Facility project: goal is to have the local facility provision nodes on commercial clouds on behalf of the various experiments

  • WNs and services both at FNAL and commercial clouds
  • Already demonstrated scalable squid servers and submit nodes with Amazon
  • Next challenge is data caching both inbound and outbound
  • Future steps: leverage load balancing/autoscaling features where they exist, other comercial clouds (Google, Azure), OpenStack

HTCondor @Milan T2 - f. Prelz

1 HTCondor distributed pool in INFN in 1997!

  • Still running!
  • 3 days of downtime in 17 years!

Enjoyed the "Condor Way" of concentrating on real problems and offering a flexible approach

  • A "wonderful community" behind and driving the evolution
  • A set of languages and components with rich semantic properties allowing to compose new expressions if/when need arises

Milan T2: modest site with 400 cores and 1.5 PB

  • Running HTCondor since 2008
  • Main issue was the necessity to mimics the queue setup used at other sites and to develop the required scripts for CREAM/BLAH job submission, monitorinig and accounting
    • Only one other Condor-based grid site in Cambridge

Condor Scripting

Scripting Basics - B. Bockelman

Man pages of Condor commands are very good: look at them!

  • Useful commands for advanced troubleshooting: condor_qedit, condor_fetchlog, condor_drain, condor_ping, condor_sos, condor_tail, condor_who

2 key elements: projections and filtering

  • Filtering: selecting some ClassAds based on some constraint
    • -const option of command line utilities
  • Projections: attributes you want to see returned

Querying remote daemons: the easiest is to query the collector as every daemon registers to it

  • Collector: central information service
  • -pool allows to select the collector
  • Possible to select the daemon we want information about

Some tips

  • Use -af to customize output format of condor_q and condor_status for specific needs
  • For common queries, prefer the new experimental custom print format (SQL-like), https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalCustomPrintFormats
  • Never use -l in scripts: returned very verbose output, putting a high load on the queried nodes
  • Do not use shell scripts that assume sanitized input from condor_q: users can modify most of the attributes in ClassAds, only ClassAds or XML output are sanitized but no parser for shell scripts available for these formats
  • Ask the experts before writing scripts: a built-in may exist addressing your needs!

Python Bindings - B. Bockelman

2 Python modules: htcondor and classads

  • Allow to query ClassAds, interact with most daemons and access logs
  • Interaction with schedd allows to submit, manage and monitor jobs
  • Interaction with negotiator allows to manage users and their priorities
  • collectors: can use/retrieve a collector list

Job submission

  • Python bindings use ClassAds instead of a submit file
    • Not the same syntax, a bit more complex
    • 'condor_submit -dump classads' allows to get a starting point from a submit file
  • Support submission transaction: allow for multiple steps committed only if no error happens at any step building the submission request

See slides for examples and details

Job Scheduling

Job/Startd Policy and Config - T. Tannenbaum

Everything defined through ClassAds: any attribute can be an expression, not just a literal

User policy defined at job submission

  • Requirements and rank
  • Job policy expressions allowing to define new job state based on expressions
    • on_exit_xxx

Admin policies regarding jobs

  • system_periodic_xxx (where xxx is a state): allow to define job state based on expression, apply to all jobs

Admin policies for startd

  • START: the main policy. Defines a condition that must be met for a new job to be accepted.
  • RANK: defines the type of jobs the machine prefers, a float like job ranks
  • PREEMPT: when expression is true, ask startd to kill running jobs.
    • 'PREEMPT = TotalJobRunTime > ...' allow to implement a max time limit
    • If WANT_VACATE is true, a SIGTERM is sent to the job giving it a chance to exit nicely, after doing a checkpoint for example. Max time before sending the final SIGKILL is MachineMaxVacateTime. Job is then returned to Idle state, giving it a chance to be rescheduled.
    • If WANT_HOLD is true, the job is not returned to Idle state but to Hold state with optionally a reason defined in WANT_HOLD_REASON.
    • This can also be defined as a job policy in condor_config defining SYSTEM_PERIODIC_xxx knobs but the startd policy is evaluated more often (every 5s). Also you may have control only on the submission side (if managing the schedd machine but not the WNs) or the execution side (if providing machines to pool managed by somebody else).
  • Suspension/resume policy can also be defined: a suspended job is not killed and resumes where it was suspended
  • See slides for example of policies

Custom attributes can be defined at several places

  • Config file: static attributes
    • Either global for all slots or attached to specific slots if prefixed by SLOTn_ (with 'n' the slot number)
  • From a script: dynamic attritubes merged into the ClassAds
    • Script is defined in an xxx_EXECUTE attribute
    • Script is run periodically
  • Custom attribute retrieved from a job running in a slot using STARTD_JOB_EXPRS
  • Custom attribute retrieved from all jobs running whatever is the slot using STARTD_JOB_ATTRS

Ressources advertized by machines

  • Can control the number of CPUs and memory size advertized
    • Default: number of detected CPUs (hyperthreaded or not, depending on a knob), actual memory size
  • Can define non uniform slot configuration, for example one slot with multiple CPUs and the others single core
    • DEFAULT_RANK can be used to steer single core jobs away from multicore slots if using static slots
  • Can also define slots dedicated to maintenance or test jobs requiring a free slot almost immediately
    • Define a slot in excess of real CPUs dedicated to these tasks
    • If the task using these slots require a good share of the available resources, can be coupled with a suspension policy for normal slots
  • See slides and other materials for advanced examples of these use cases

Partitionable slots: allow to define a big slot that can be partitionned to accomodate actual needs by accepted jobs

  • Defined by adding SLOT_TYPE_n_PARTITIONABLE=true
  • Default partitionable resources are CPU, memory, disk but custom resources can be added
  • A partitionable slot is always unclaimend and dynamically splitted when jobs are started: led to creation of dynamic slots
  • Drawback: led to slot fragmentation: condor_defrag is the tool to recover from it, better run on the central manager
  • A few limitations in the current implementation like incompatibility with preemption
    • Work in progress to remove them

User and Group Scheduling - G. Thain

User-side parameters

  • JobPriority know in submit file or `condor_prio` for dynamic change: only affects relative priority of jobs from a specific user on a specific schedd
  • Job ranking: not as powerful as it seems as in pool steady state there is often only one free machine...

Central manager (negotiator): a good place to implement central policies applying to all schedd and users

  • Can define a concurrency limit with xxx_LIMIT (xxx being an arbitrary tag)
  • Users have to set 'concurency_limits = tag' (can be a list) and this will be counted against the concurrency limit centrally defined.
    • May require a submit wrapper to ensure that 'concurrency_limits' is always defined when applicable else a user can "lie"
  • Handy to implement license limits for some SW without binding them a particular machine
  • A user with the same name using 2 different schedd is considered the same if using the same UID_DOMAIN
  • negotiator computes the user priority as Real Priority * Priority Factor
    • Conversely to job ranks, the user priority is higher if the number is lower
    • Real Priority computed based on actual usage: starts at 0.5 and asymptotically grows/shrinks to current usage
    • PRIORITY_HALFLIFE used to defined the history/decay rate
    • Priority Fact can be set/viewed with condor_userprio and persistently stored: defaults is now 100, allow to give priority to some users (lowering the number) still maintaining a group fairshare
    • 'condor_userprio -most' allows to see the effective priority, the priority factor, weighted hours...

Throughput vs. fairness: fundamental tension, preemtion is required to have fairness

  • Negatiator is implementing a real preemption where WN (started) implements eviction
  • Negotiation cycle
    • Gets all th slot ads
    • Updates user prio info for all users
    • Based on user prio, compute submitter limit for each users
    • For each user (in the priority order), find the schedd, gets a job: find all matching machines for job, sorts the jobs, give the job the best sorted machine
    • Sorting slots is done at 3 different levels: negotiator pre job rank, job rank, negotiator post job rank
  • If the matched machine is already claimed, preemption checks are done as defined by PREEMPTION_REQUIREMENTS (nothing to do with startd PREEMPT)
    • PREEMPTION_REQUIREMENTS can be used to avoid pool trashing (where job A prevents job B with then preempts job A or another job...): can be avoided by ensuring the preempting job prio is enough higher than the running job about to be preempted
    • PREEMPTION_RANK allows to sort the slots eligible for preemption
  • Partitionable slots handling controls the cost of a match against the cost of an unclaimed partitionable slot (the unused part of the partitionable slot)
    • Based on SLOT_WEIGHT

Accounting groups: manage prioritry across group of users and jobs (as long as they are in the same UID_DOMAIN)

  • Any user can join any group: may require some checks at schedd level to ensure that a user doesn't join an inappropriate group
    • submit filter can be used to assign user to groups
    • Accounting_Group knob in submit file
  • condor_userprio can be used to define priority factor of a group
  • In negotiator can define quota in term of "slot weight" (by default, cores)
    • By default quota are hard quotas, implemented/honoured even if there are free resources
    • Use GROUP_AUTOREGROUP to allow to go over limit if there are free resources
  • Fairshares between users honoured in the group
  • Supports hierarchical groups with hierarchical quotas: quota can be specified as an absolute number or a percentage
    • Can also enable a group to get "surplus" from other groups below their quota but there is no quota history: no "fairshare" of these surplus overtime
    • With preemption requirements, can allow preemption for a group under its quota if the group using a machine is over quota
  • Gotchas: quotas don't know about matching, assumes everything matches everything, may led to surpise in particular with partitionable slots if preempting multiple slots
    • May be a better approach to condor_defrag than to enable preemption in this kind of config

Put users in a box - G. Thain

3 goals

  • Protect a machine from a job
  • Protect a job from the machine
  • Protect a job from other jobs

Old solutions: limits on ressources used by a job, often OS dependent

  • As a PREEMPT expression but not taken into account in real times and misbehaving jobs can eat resources very quickly
  • Through setrlimit calls: USER_JOB_WRAPPER, STARTER_RLIMIT_AS
    • Implemented as hard limit by the OS in real time

Virtualization introducing a number of new issues for management and debugging: too heavyweight

Containers: the promising sandboxing approach

  • Working well with HTCondor ssh_to_job feature for job debugging
  • Can take into account CPU affinity if ASSIGN_CPU_AFFINITY=true
  • Works on Linux and Windows
    • On Linux, uses cgroups to group process and define limits
  • Implement PID namespace, chroot per job (named chroot), fs bind mounted (in particular for /tmp)
  • Work in progress to implement a Docker Universe
    • Docker: cgroups + repo for images + bind mounts

HTCondor Security - T. Tannenbaum

Authz: ALLOW/DENY for several authz level

  • READ: query information
  • WRITE: configuration update
  • ADMINISTRATOR: restart of HTCondor
  • DAEMON (includes READ/WRITE): daemon to daemon communications
  • NEGOTIATOR (includes READ): condor_negotiator to other daemons
    • Mainly to avoid starting several NEGOTIATOR by mistake

https://indico.cern.ch/event/272794/session/2/contribution/17/material/slides/3.pdf contains step-by-step examples/instructions.

HTCondor and European Grid - A. Lahiff

CREAM CE issues

  • Script to publish dynamic information into BDII
  • Recent BLAH scripts don't support HTCondor
  • RAL workaround: used and modernized scripts from older versions of CREAM CE that were supporting Condor
    • Adding support for partitionable slots
  • APEL: now a proper support of HTCondor in development
    • RAL currently relying on a script converting HTCondor history logs to PBS-style + apel-pbs parser

ARC CE: simpler to configure than CREAM CE

  • Single configuration file
  • Accounting directly published to the accounting central database
  • Now supported by all the 4 LHC VOs and usable transparently through the WMS
  • HTCondor integration used to be out-of-date but now working out of the box in 4.2.0
    • Pending patch to allow ARC CE of making use of per-job history files
  • CPU scaling factor not implemented in HTCondor: at RAL defined as a custom attribute in ClassAds and taken into consideration in the ARC CE
  • Current issues
    • No proxy renewal for jobs submitted through the WMS but a workaround exists: in fact not specific to HTCondor
    • Dynamic information not updated in real time, may improve if using ARC WS but requires RFC proxies

Panel Discussions

Panelists: Brian, Todd, Greg, Andrew, Francesco

CERN questions

  • What alternative to queues to organize host groups and job priorities?
    • Fransesco: tried both adding a queue name as ClassAds (custom) attribute and having one schedd per queue. Probably the ClassAds approach is more Condor-friendly but does not guarantee the order they are picked up by schedd
    • Todd: if the goal is to adjust relative priority between different kind of jobs, including short vs. long, may be should use accounting groups
    • Andrew: through ARC CE, users can specify the amount of resources they need and Condor can take actions if they go over these resources (SYSTEM_PERIODIC_REMOVE)
    • Todd: Condor allows to take other actions that killing a job when it is over its requested resources, for exemple let it running until there is competition for the resource and preempt it when it happens
    • Todd: at Wisconsin, no longer asks users to specify the resources they need but to publish in the ClassAds if their job is peridically checkpointed or not. If they are, they are given access to more resources that can be reclaimed at any point (without loosing what has been done and allowing the job to restart later from where it was).

Any way to throttle job submission from a misbehaving user submitting a large number of jobs that are failing immediately?

  • Brian: one knob allow to say an exution node must refuse new jobs is the job turnaround is too high
  • If manual intervention is acceptable, it is easy to put a max job limit for a certain user in a ClassAds

AFS integration

  • Not yet there but working on it. Already have Krb authentication between daemons.

"Crondor jobs": jobs submitted through Condor that repeat indefinitely at a defined interval, emulating a cron job

  • Used at Wisconsin to produce per user and per machine usage reports daily

How to unify grid and local resources when the constraint is to restrict view of NFS shared FS to local users only? Currently implemented with 2 different Torque clusters

  • Could used named chroot for grid users to prevent them from seeing local (shared) file systems
  • May even ask local users to request shared file systems if they really want them so that they are also using a named chroot if they don't need them

How to control/restrict the WN admission to a white list without introducing inefficiencies, management nightmares...?

  • At CERN tried to do it using a certificate map file registering the certificate of all legitimate machines but experiences long delays if reaching 100K entries...
  • Suggestion: use certificates for trusted hosts that have a specific attribute in their DN or are coming from a specific CA and register only the pattern to be matched in the certificate map file
  • CERN: Kerberos could also be investigated

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2014-12-09 - MichelJouvin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback