Service Challenge Meeting at CERN

chaired by Jamie Shiers
Thursday, 24 February 2005 from to (Europe/Zurich)
at CERN ( 513-1-024 )
Description
Meeting meant primarily for the planning of SC2 (March) and SC3 (July)

VRVS: Island Virtual Room

Phone access: 

+41 22 767 7000 and ask for "Service Challenge Meeting" chaired by Jamie Shiers

For visitor portable access, please see 

http://it-div.web.cern.ch/it-div/gencomputing/VisitorPortables.asp

Give JAMIE SHIERS IT-GD as contact person.

It is proposed that each site address the following questions, and give a timetable in weeks.

-- what is your data transfer cluster ?
-- and if you don't have it, when will you have it ?
-- what is your network and how much bandwidth can be used when ?
-- what is your software: OS, kernel version, globus version ?
-- when do you need to be alone for perf testing ?
-- what software will be tested and when ?
-- what is your SRM implementation / timeline ?
-- what are your performance milestones ?
Go to day
  • Thursday, 24 February 2005
    • 10:00 - 10:15 Introduction & Goals of SC2 15'
      Speaker: Jamie Shiers
      Material: more information powerpoint file pdf file
    • 10:15 - 10:45 CERN plans for SC2 30'
      Speaker: Vlado Bahyl
      Material: more information link transparencies powerpoint file pdf file
    • 10:45 - 11:15 CNAF plans for SC2 30'
      This is the time schedule for the INFN SC2 partecipation.
      
      CNAF Roadmap for SC2
      
      7-11 Feb:                    Network set-up and performance testing of 
      connection to CERN
      14-18 Feb:                  Installation of servers
      21 Feb - 11 Mar:        Configurations tuning and software set-up
      14-25 Mar:                 SC2
      
      A more detailed action list will follow.
      Speaker:
      Material: more information powerpoint filedown arrow pdf filedown arrow
    • 11:15 - 11:45 FZK plans for SC2 30'
      Speaker:
      Material: more information powerpoint file pdf filedown arrow
    • 11:45 - 12:15 IN2P3 plans for SC2 30'
      Proposal for SC milestones at CCIN2P3
      
      - From Feb 14 : Setup SRM-dCache on new hardware + tests
      - Feb 15-17 : sustained CERN-CC disk-disk transfers (100MB/s)
      - weeks 9-10 : sustained CERN-CC disk-disk transfers (100MB/s) or weeks 11-12 if CERN want all T1 at the same time.
      Cluster will be 2 nodes with 256GB disk each.
      - week 17 : SRM-dCache available as front-end to HPSS We could begin to test to write to tapes from CERN.
      Cluster will be 2 dCache pool nodes with 2TB disk each.
      - week 21 : 3 days sustained transfers between CERN and Lyon tapes. The rate target is 50-60 MB/s
      - weeks 27-30: SC3. Expected rate 50-60 MB/s.
      - september 2005: a direct link to CERN at 10Gb/s (dark fiber) will be available.
      - september 2005: we expect to add a few dCache nodes to the cluster.
      - beginning 2006: Lyon will be connected to Geant at 10Gb/s.
      Speaker:
      Material: more information powerpoint file pdf file
    • 12:15 - 12:45 FNAL plans for SC2 30'
      Speaker:
      Material: more information pdf file transparencies pdf file
    • 12:45 - 14:15 Lunch 1h30'
      Speaker:
    • 14:15 - 14:45 RAL plans for SC2 30'
      Here is preliminary proposal - but of course it depends on meshing in with James's plans.
      
      Intention: Obtain 2.5Gb UKLIGHT link end to end with CERN (needs negotiation). Try and fill as much as possible of 2*1gbit/sec pipe from RAL over short term test. Sustain 100MB/s for 2 weeks - last 2 weeks November.
      
      Intend to deploy up to 16 worker nodes as gridftp. Start with 4 but throw more at the problem if needed. Will backend to a number of disk servers/RAID arrays to deliver sufficient performance. Exact details to be finalised after I get resource from GRIDPP User Board - probably a significant number of RAID arrays.
      
      We will run this test on a new DCACHE infrastructure in parrallel to the existing framework currently being used by CMS. 
      
      Week ending
      
      11th February. Agreement reached with UK LIGHT (and onward), for end to end 
                     provisioning of lightpath/bandwidth to CERN. Available to deliver at 
                     least 2Gbit capacity.
                     All hardware available - freed from production (except extra network cards?) 
                     Deployment of new DCACHE infrastructure commenced.
                     Hardware benchmark/profiling commenced
      
      18th February  End to end connectivity established to CERN. This depends on
                     UKLIGHT being live at RAL by 14th February, which is our current
                     best estimate from UKERNA. Not within our control.
                     Tier1 attached to UKLIGHT.
                     Local system benchmarking complete
      
      25th February   Network guys completed capacity/throughput tests and hand over
                      DCACHE Infrastructure deployed 
      
      11th March      End to end transfers tested over DCACHE SRM - CERN to RAL
      
      18th March      DCACHE tested to peak load (try and achieve 2Gb/s) - depends on bandwidth
                      being made available by UKLIGHT/CERN.
      
      21st March      2 week SC2 starts at 100MB/s. Runs unnatended (this is what Jamie said to me
                      and I take him at his word). Also note that UK networkshop 22-24th March.
      
       
      Longer Range:
      
      1) Deploy prototype SRM to tape - end of February
      2) Complete internal stress test (not throughput) on SRM to tape by end of March
      3) Deploy new RAID controllers to enhance robot capacity to tape (end April)
      
      We continue to plan/stress test towards 50MB/s for 1 month July.
      Speaker:
      Material: more information powerpoint file pdf file
    • 14:45 - 15:15 NIKHEF/SARA plans for SC2 30'
      Speaker: Kors Bos
      Material: more information powerpoint file
    • 15:15 - 15:45 BNL plans for SC2 30'
      Speaker:
      Material: more information powerpoint file
    • 15:45 - 16:15 Triumf plans for SC2 30'
      TRIUMF's plans for 2005 Service Challenges:
      ===========================================
      
      February 7, 2005
      
      The following outlines the goals we would like to achieve in order
      to participate in the Service Challenges:
      
      A) 10 GigE lightpath tests between Vancouver-Ottawa:
         -------------------------------------------------
      
      February:
      
         The immediate plans are the following:
      
         We have 3 machines available for the test.
         1 SUN Sunfire V40Z quad opteron (2.4GHz processors) with 8GB memory
         but only 2 * 72 + 3*144 GByte SCSI drives. We are looking at
         an economical way to connect more storage to this unit - possibly
         by simply connecting 8-16 SATA disks housed in an external
         box which simply powers them and connects them to a pair
         of raid cards installed in the SUN via sata cables through the
         rear open pci slots.
      
         We also have 2 Tyan based Dual 2.4GHz Opteron machines,
         each with 4GB memory and each with 16 300GByte SATA drives
         connected to 2 RocketRaid 1820 controllers.
      
         These need to be configured this week with 64bit Fedora Core 3
         kernels with support for
           - bbftp
           - bonnie++
           - RocketRaid 1820a support
           - 10Gbit Intel support
           - 10Gbit S2IO support
           - xfs support
           - iperf
           - cacti monitoring
           - ssh configured to allow easy interconnect
      
         Tests already indicate good xfs read in raid5 configuration -
         420MB/sec being standard and 620MB/sec being available
         under circumstances that needs to be better understood.
      
         xfs writing is currently limited to about 250MB/sec.
      
         We have 2  10GbE intel cards and 1 S2IO 10GbE card.
         Tests could thus try to aggregating to/from 2 machines
         to the third. Lots of combinations to explore.
      
         We need to establish stable disk-to-disk transfers over
         the next week - at a minimum 200MB/sec. As soon
         we have this we should have Ottawa end likewise
         configures and start transfers to/from Ottawa.
      
         We have kept two 8 channel 3-Ware 9500-s8
         SATA Raid cards from Ciara for use in the SUN,
         or alternatively for when the RockeRaids fail to perform
         as required in either read or write modes.
      
         The 10Gbit link between TRIUMF and Ottawa
         should be checked out and established this week.
      
         Consideration should be given to implementing gridftp
         and using it instead of bbftp.
      
      
      B) March Service Challenge hardware and 1 GigE lightpath
         ------------------------------------------------------
      
      February:
        
         by mid-February, We will finalize the purchase of few more servers (4-5). 
         These machines will effectively be used in the incoming service challenges.
         Typically Dual processor/ 2G RAM / RAID 5 with at least 8 disks (2.4+ TB)/ 
         dual GigE (channel bonding). The goal is to aggregate these
         servers to be able to write at a speed of 500 MB/s with an SRM interface.
      
      
         1 GigE networking preparation:(needed for end of March service challenge)
      
           - 1GigE light path to CERN can be stablished immediately, 
            TRIUMF has the neccessary lambda and optics, must make a
            request to CANARIE for the lightpath, to be carried across
            CA*net4 by CANARIE and by Surfnet from either MANLAN in 
            New York or STARLIGHT in Chicago to CERN. A request will be
            submitted in the week 7-11th Feb for a 1GigE lightpath until
            the end of the year, or until CANARIE can contact a permanent
            10G lightpath which they are currently in the process of procuring.
            Will also request a routable address space from BCNET.
               
      
      March: Prepare new machines for the March service challenge:
            - installation/configuration of dCache/SRM service on new servers
            - Site tuning / performance tests for stable operations
            - Service Challenge at 100 MB/s (Disk to Disk)
      
      
      C) June Service Challenge and 10 GigE lightpath:
         --------------------------------------------
      
      April-May: 10 GigE networking preparation:    
      
            - 10G lightpath status, currently CANARIE is in the process 
              of procuring a permanent 10G lightpath to CERN. TRIUMF 
              currently has 10GigE equipment on loan from Foundry. 
              A purchased solution is awaiting clarification on the 
              availability of 10GigE WAN PHY 1550nm optics as well as 
              whether or not 10 GigE LAN PHY 1550m, optics will be 
              availalble at the BCNET gigapop. 
              This will not be known until end of March.
      
            - A 10G lightpath between TRIUMF and CERN will requested 
              between June 13 and 24th for a Service Challenge test. 
              The specific 10G equipment that will be used will be 
              determined by the availability of the optics mentioned
              above and can not be determined at this time.
      
      June: 10 GigE tests between Vancouver - CERN (via Amsterdam)
            - Allocated time splot: 13/6-24/6
            - Site tuning
            - Service Challenge (single site) Disk to Disk at 500 MB/s
      
      
      D) Infrastructure and Hardware for Service Challenge (disk/tape to tape):
         ----------------------------------------------------------------------
      
      Summer-Fall: Work on Tier 1 site infrastructure 
                   (computing room preparation / engineering work). 
                   The exact time table is not known yet.
      
      Fall 2005: Acquisition of a  Tape library system (when computing room ready)
                 - Tape library unit (base frame, IBM 3584 or something similar)
                 - 3 drives 
                 - 100-200 tapes
                 - dCache/SRM + tape back-end configuration
                 - site tuning / performance tests
      
      December 2005: Service Challenge (to tape at 50 MB/s)
      Speaker:
    • 16:15 - 16:30 Update on Computing Models & T2 plans 15'
      Speaker: Jamie Shiers
      Material: more information powerpoint filedown arrow word file pdf filedown arrow
    • 16:30 - 16:45 SC3 draft milestones 15'
      Speaker: Jamie Shiers
      Material: more information powerpoint file pdf file
    • 16:45 - 17:00 Coordination with 3D Project 15'
      Speaker: Dirk Duellmann
      Material: more information pdf file transparencies powerpoint file pdf file
    • 17:00 - 17:15 Future meetings and events 15'
      The tentative schedule and goals of future SC meetings and workshops will be discussed.
      Speaker: Kors Bos, Jamie Shiers
      Material: more information powerpoint file pdf file