Minutes, GridPP Storage Meeting 17 Marth 2010 (10:00 - 10:30)
Chair, minutes: Sam Skipsey
Present:
Wahid Bhimji
John Bland
Brian Davies
Matthew Doidge
Stephen Jones
Elena Korolkova
Winnie Lacesso
James Thorne
Chris Walker (as Queen Mary, U London London, U.K. in the chat log)
- Meeting Begins -
Everyone was exhorted to blog more!
-
Storage Workshop Planning
Winnie + James: Network trunking /channel-bonding talks requested. No volunteers to give this?
8 spaces left for meeting, so book now!
GridPP have more rooms booked - so we could overflow if needed.
We could reduce the size of the hardware section if needed - the only requested talks are the channel-bonding and the SSD talk. If you want a longer hardware section (and it was requested!) please suggest topics!
-
Long-term Storage Group Projects:
Data access -
There are reordered (and unordered) files at Glasgow. Wahid has been running HammerClouds against them.
Both tests so far were DQ2LOCAL (rfio remote io vs DPM), limited to 100 jobs concurrent.
Monday - HC vs new files (high event rate! - 100Hz)
Tuesday - HC vs old files (even higher event rate! - 130Hz).
Possibly because they're smaller AODs than the normal.
Peter Love has been asked if he can replicate some larger AODs to Glasgow and QMUL so we can test against more representative data.
Brian: could we run at the T1? Speak with atlas - can we replicate one to all sites (for comparable tests)?
Wahid: I plan to keep doing different tests (different job widths, different file access methods) etc.
Smaller files more efficient than larger AODs?
Brian: AODs will only get bigger with real data!
Wahid: Event data model group is investing lots of effort in optimising - if they can just split the files for performance gain, then this is silly.
Chris: you get lower utilisation , outbound with smaller files. Internally, of course, caching etc, makes smaller files more efficient.
[Erratum, post-meeting: We experience lower WAN utilisation when transferring lots of small files. The solution to this is either to have more fts channels, or (more sensibly I think) use a version of gridftp that supports piplining.]
RAM caching, etc are significant effects.
QMUL currently only has 1Gb per core, so compare this to most sites which have 2Gb/core (and thus, potentially much more ram for caching - this will only get bigger with new procurements bringing even more cores per node).
Checksumming -
Brian tested 6000 files at Lancaster and found no errors. (Action complete!)
Plans to use Glasgow tests with our tool - python api missing issue.
Plans to copy files in himself and manually corrupt the file and then get the checksum on the file again (to test how deep a checksum is).
Python lcg_util API issue:
How many people can "import lcg_util" on their UI?
Is this an SL4/5 issue, or an age of UI issue?
Action: everyone, test if they can import lcg_util on their UI and report back.
-
AOB.
Wahid: Storm 1.5 is out! No more support for 1.4? What are our plans?
At Bristol, Bob will install from scratch?.
Wahid will test at ECDF
Chris: QMUL old se (se01) upgraded to storm 1.5, and is currently having authentication issues.
Doesn't know exactly what the issue is, as logs are not useful. Help gladly accepted from anyone with ideas. Plan is to get this working, and then upgrade se03 or bring up a new official 1.5 machine.
There is no SL5 storm yet, so probably won't install a new SE until this is the case.
Upgrading, as indicated, may not be simple. (It may be better to install 1.5 from scratch.)
- Meeting Ends -
Chat window log:
[09:49:47] Brian Davies joined
[09:52:30] Stephen Jones joined
[09:58:10] Stephen Jones left
[09:58:13] Winnie Lacesso joined
[09:58:17] Brian Davies left
[09:58:33] Wahid Bhimji joined
[09:59:07] John Bland joined
[09:59:15] Stephen Jones joined
[09:59:46] Brian Davies joined
[10:02:18] James Thorne joined
[10:02:57] Winnie Lacesso I'd like to hear about network trunking/channel-bonding. Never done it but a walkthru would be good please.
[10:03:11] James Thorne That would be very interesting for me too [10:09:23] Queen Mary, U London London, U.K. joined
[10:11:40] Matthew Doidge joined
[10:16:25] Elena Korolkova joined
[10:25:14] Wahid Bhimji i can at CERN and here.
[10:26:04] Elena Korolkova Sam, could you send email what we should check, please. Thanks
[10:27:09] Winnie Lacesso What do you mean re: python api? Do you mean, do lcg_utils tools work? If the python api is missing do lcg_utils not work?
[10:28:02] Wahid Bhimji Brian - do you not have $LCG_LOCATION//lib/python2.3/site-packages
[10:30:36] John Bland import lcg_util works on our SL4 UI
[10:31:18] John Bland and our SL5 tarball UI
[10:33:13] Stephen Jones left
[10:33:14] James Thorne left
[10:33:14] Queen Mary, U London London, U.K. left
[10:33:14] Elena Korolkova left
[10:33:14] Matthew Doidge left
[10:33:14] Winnie Lacesso left
[10:33:15] John Bland left
[10:33:16] Wahid Bhimji left
There are minutes attached to this event.
Show them.