Showing posts with label Transfer Tests. Show all posts
Showing posts with label Transfer Tests. Show all posts

Sunday, 30 September 2007

RAL Network performance

Prior to running any transfer tests, I normally check the Gridmon plots for that site, the relevant T2 and RAL T1. Using the (non-default) options of 'Metric data on same graph" and "New graph on new test dest" (and "new test src"), you generally get a good feel for the sites capacity, and any 'slow' links associated with it. Sadly recently the Tier1 seems to be performing dreadfully WRT the other sites.
example to Glasgow from Durham, Edinburgh and RAL:

Wednesday, 11 July 2007

Q3 Transfer Tests

OK, new quarter and time to get down to some testing as I'm well overdue. First up - Lancaster. Apart from some user error at this end (typo in script) it went pretty smoothly and can cope with 25 files in flight happily:

ganglia plot from Glasgow end:

showing 10,15,20,25 file setting in transfer channel

I've also started the Oxford tests, but it seems to be much less happy - When transferring from RAL-T2, even at low nos of files (5) the CPU load on t2se01 seems awfy awfy high.


Hmm. Have mailed Pete, but something doesn't look happy...

Friday, 1 June 2007

srm-advisory-delete

OK, I'll admit I occasionally rant about some things I consider "bad", but I'm afraid deleting files from grid storage is just Painful.

to delete 490 (I nuked the first ten within the inner $i loop as a test of xargs) files I ended up with this nasty nasty hack:
for j in `seq -w 2 49 ` ; do for i in 0 1 2 3 4 5 6 7 8 9 ; do echo -n "srm://$DPM_HOST:8443/dpm/gla.scotgrid.ac.uk/home/dteam/rt2-gla-5-1/tfr000-file00$j$i " ; done ; done | xargs srm-advisory-delete

I mean, even altering for a `seq -w 20 499` single loop, just, well, is Bad and Wrong[*]

/rant

Mind you, it worked, as demonstrated by Paul's MonAMI plot of dteam storage use:

Wednesday, 30 May 2007

iperf between Gla-Bristol

In preparation for some serious testing of Bristol's StoRM implementation, Jon and I worked together to run some iperf tests so we know what our target is for FTS tests. Plot below - looks like we've got a 500Mb/s maximum.

Monday, 14 May 2007

Transfer Channel wiredness

As I still haven't completed this quarters round of transfer tests I decided to set up all the scripts and prereqs for doing each T2. Ran my little perl script to get the current (pre-tuned) values for the transfer channels and noticed that they were all set to 5. Nope, my script hadn't broken (more than originally designed anyway) - it looks like someone changed 'em. Why?

Site           From RAL  Star    To RAL
UKI-NORTHGRID-SHEF-HEP 5 5 5
UKI-LT2-IC-LESC 5 5 5
UKI-SOUTHGRID-BHAM-HEP 5 5 5
UKI-SOUTHGRID-OX-HEP 5 5 5
UKI-LT2-IC-HEP 5 5 5
UKI-SOUTHGRID-CAM-HEP 5 5 5
UKI-LT2-UCL-HEP 5 5 5
SCOTGRID-EDINBURGH 5 5 5
UKI-LT2-UCL-CENTRAL 5 5 5
UKI-NORTHGRID-MAN-HEP 5 5 5
UKI-SCOTGRID-GLASGOW 5 5 5
UKI-NORTHGRID-LANCS-HEP 5 5 5
UKI-SOUTHGRID-BRIS-HEP 5 5 5
UKI-LT2-BRUNEL 5 5 5
UKI-SCOTGRID-DURHAM 5 5 5
UKI-SOUTHGRID-RALPP 5 5 5
UKI-LT2-RHUL 5 5 5
UKI-LT2-QMUL 5 5 5
UKI-NORTHGRID-LIV-HEP 5 5 5

Monday, 30 April 2007

You're only supposed to blow the bloody doors off!

Apart from the Spring HEPiX 2007 meeting last week (which was excellent BTW - I'll do a proper writeup soon) I have been pushing on with Transfer tests. Rather than risk stressing the RAL CASTOR service too much while we work on improving the individual T2 sites, Chris Brew (RAL T2) was v helpful and we've been using their dCache pool (lots of beefy disk servers and some spare capacity) to push out to the T2 sites.

I have knocked up a simple "template" based script for performing the tests now so that I can run serveral sites serially automatically. This means I can set up a screen session to do all sites within a T2 then come back later and look at logs. What we've discovered is that we can set the glite-transfer-cheannel settings much more aggressively than we have been.

Some Examples
* GLA to RALT2 1TB transferred in 3:09:29, Bandwidth = 703.63 Mb/s
* RALT2 to GLA 618G transferred in 2:11:48, Bandwidth = 625.13 Mb/s

As CMS were also doing tests from the T1, we managed to saturate the 1G connection at Imperial College. Whoops. However we also set a new record for HEP Throughput at that site (previous record was ~800Gb/s overnight)

Friday, 23 March 2007

Transfer Slowdown

This week (apart from the GridPP 18 meeting) I have been trying to complete the transfer tests, but am getting terrible rates out of RAL. The CERN Network Stats To RAL Don't look too excessive:


But the iperf stats out of RAL have plummeted - The RAL-Lanc (who even have a private lightpath) one below is typical.

Tuesday, 13 March 2007

Sponsored by "No space left on device"

Dear root@Birmingham,
I broke your system

Well, I blame the filetransfer script - It uses your (long lifetime) myproxy for the transfers and the shorter voms-proxy for the delete. Result:
one filled up disk when my voms proxy expired. Seems to have recovered OK.

Friday, 9 March 2007

Prettification

With a hat-tip to xkcd I decided to knock up an ugly script to convert this:
aelwell@ppepc62:~$ ls tra*06T*.log
transfer-2007-03-06T09-52-38.log transfer-2007-03-06T14-38-59.log transfer-2007-03-06T19-06-22.log

consisting of blocks like
Transfer: srm://ralsrma.rl.ac.uk:8443//castor/ads.rl.ac.uk/prod/grid/hep/disk1tape1/dteam/j/jkf/castorTest/1GBcanned000 to srm:
//svr018.gla.scotgrid.ac.uk:8443/dpm/gla.scotgrid.ac.uk/home/dteam/aetest10/tfr000-file00000
Size: 1000000000.0
FTS Duration: 188
FTS Retries: 0
FTS State: Done
FTS Reason: (null)
Local_Created: 1173174769.27
Local_Submit: 1173174769.3
Submitted: 1173174770.69
Active: 1173174832.97
Done: 1173175041.23
Delete_time: 1173175050.67

into the much prettier



As normal, it's available on http://www.gridpp.ac.uk/wiki/User:Andrew_elwell/Scripts

Thursday, 8 March 2007

voms proxy issues

Another day, another pile of transfer tests. Or not. Problem renewing my voms-proxy.

voms.cern.ch bombed out with:
Error: Could not establish authenticated connection with the server.
GSS Major Status: Unexpected Gatekeeper or Service Name
GSS Minor Status Error Chain:

an unknown error occurred


so I raised GGUS Ticket (19457)

Monday, 5 March 2007

Transfer Tests Milestones

Started to work on the Q1 transfer tests (hey, I know I only have a month to go) - Stub on the Wiki at http://www.gridpp.ac.uk/wiki/2007-Q1_Transfer_Tests

Next step this afternoon is to try and get the experiment dress rehersal timetable and ensure we're not clashing