Infrastructure
From Wiki
This includes information on the infrastructure used to support the Bass system
Contents |
Infrastructure
Power
How Power is Supplied
All racks in Bass except rack 5 are on the switched (aka "clean", although that is no longer an accurate description since all power in Sitterson and Brooks is clean) grid. Rack 5 with Bassfiles is powered via a large UPS on the unswitched grid. Bass sits across two power distribution units located in the machine room. All power connections are 60 amp three phase.
Here's the first cut at a map of the power:
Racks are numbered from right to left facing south, and units are numbered from top to bottom. Rack 1, units 1,3,5 breakers 1/3/5 DUIA 1, units 2,4 breakers 7/9/11 DUIA 2, units 1,3,5 breakers 13/15/17 DUIA 2, units 2,4 breakers 19/21/23 DUIA 3, units 1,3,5 breakers 25/27/29 DUIA 3, units 2,4 breakers 31/33/35 DUIA 4, units 1,3,5 breakers 20/22/24 DUIA 4, units 2,4 breakers 17/18/19 DUIA 5, all units UPS 6, all units breakers 2/4/6 DUIA 7, all units breakers 26/28/30 DUIA 8, units 1,5 breakers 2/4/6 DUIB 8, unit 3 breakers 9/11/13 DUIB * 8, units 2,4 breakers 8/10/12 DUIB 9, units 1,3,5 breakers 14/16/18 DUIB 9, units 2,4 breakers 30/32/34 DUIB 10, units 1,3,5 breakers 38/40/42 DUIB 10, units 2,4 breakers 24/26/28 DUIB 11, units 1,3,5 breakers 37/39/41 DUIB 11, units 2,4 breakers 1/3/5 DUIB 12, units 1,3,5 breakers 15/17/19 DUIB 12, units 2,4 breakers 9/11/13 DUIB *This one we need to confirm--it's attached to both the A and B power leads, so it _should_ be on both 2/4/6 and 8/10/12 on DUIB, rather than 9/11/13 on DUIB
Upgrade to Power
During load testing a 100 amp breaker tripped. After testing we determined that the power available via one of the PDUs was insufficient. In July of 2008 we upgraded the power to that PDU to 225 amps, and in addition bypassed the shunt breaker. Thus load is pretty evenly distributed across both PDUs. As of late July 2008, stress testing has not tripped any breakers, so we believe this problem is resolved.
It is important to note, however, that the transformer itself in SN122 is limited to 225 amps overall, so that is the real limited on how much power we can supply via the switched grid. For most hardware, initial powerup sequences consume the most power. In practical terms this means that we should adopt a policy of starting hardware nodes in the cluster in a sequence.
Data Network
Topology
The bass cluster has multiple networks. One internal network runs on infiniband, using private numbers. There's also an internal networks running on gige ethernet (see the department's DNS table for specific numbers). The cluster also uplinks through a few connections to the production network over gige ethernet.
Performance
This section stores test data showing performance between various nodes and department servers. Unless otherwise noted, speeds are expressed in megabits per second.
For iperf testing please note that the speed expressed is NIC to NIC, and that the packet size can affect results.
Linux iperf tests, summer 2008
------------------------------------------------------------------------------------
iperf hosts and direction duration (s) pkt size range of throughputs avg
Server <- Client
------------------------------------------------------------------------------------
Bass-comp4 <- cvs 30 128 319-344 333
324-347 333
294-314 299
------------------------------------------------------------------------------------
Bass-comp4 <- quintet 30 128 205-229 220
200-230 222
200-221 211
------------------------------------------------------------------------------------
Cvs <- bass-comp4 30 128 181-360 340
174-364 332
172-382 345
------------------------------------------------------------------------------------
Quintet <- bass-comp4 30 128 170-367 340
1st two seconds always slow 173-384 355
179-373 357
------------------------------------------------------------------------------------
Tradeoff mode:
Client -> server then
Server -> client
------------------------------------------------------------------------------------
Bass-comp4 <- cvs 30 512 611-821 741
567-742 653
448-707 598
------------------------------------------------------------------------------------
Cvs <- bass-comp4 30 512 356-886 766
733-828 820
555-825 762
------------------------------------------------------------------------------------
Bass-comp4 <- quintet 30 512 373-522 495
493-538 520
406-523 505
------------------------------------------------------------------------------------
Quintet <- bass-comp4 30 512 448-499 491
482-514 500
397-511 489
------------------------------------------------------------------------------------
Bass-comp4 <- cvs 30 8192 650-912 843
633-920 819
817-915 885
------------------------------------------------------------------------------------
Cvs <- bass-comp4 30 8192 675-923 883
813-919 900
811-917 898
------------------------------------------------------------------------------------
Bass-comp4 <- quintet 30 8192 435-725 588
303-852 678
685-893 827
------------------------------------------------------------------------------------
Quintet <- bass-comp4 30 8192 662-828 823
809-839 828
643-832 820
------------------------------------------------------------------------------------
A couple of raw runs from 10/2/08
[hays@bass-comp4 ~]$ sudo iperf -t 30 -i 1 -c quintet.cs.unc.edu
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:
#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.
Password:
Sorry, try again.
Password:
------------------------------------------------------------
Client connecting to quintet.cs.unc.edu, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 152.2.141.154 port 41399 connected with 152.2.128.80 port 5001
[ 3] 0.0- 1.0 sec 99.6 MBytes 836 Mbits/sec
[ 3] 1.0- 2.0 sec 89.2 MBytes 748 Mbits/sec
[ 3] 2.0- 3.0 sec 91.2 MBytes 765 Mbits/sec
[ 3] 3.0- 4.0 sec 97.9 MBytes 821 Mbits/sec
[ 3] 4.0- 5.0 sec 97.6 MBytes 819 Mbits/sec
[ 3] 5.0- 6.0 sec 94.3 MBytes 791 Mbits/sec
[ 3] 6.0- 7.0 sec 98.4 MBytes 825 Mbits/sec
[ 3] 7.0- 8.0 sec 98.4 MBytes 825 Mbits/sec
[ 3] 8.0- 9.0 sec 98.2 MBytes 824 Mbits/sec
[ 3] 9.0-10.0 sec 98.5 MBytes 826 Mbits/sec
[ 3] 10.0-11.0 sec 97.8 MBytes 820 Mbits/sec
[ 3] 11.0-12.0 sec 97.8 MBytes 821 Mbits/sec
[ 3] 12.0-13.0 sec 98.2 MBytes 824 Mbits/sec
[ 3] 13.0-14.0 sec 96.2 MBytes 807 Mbits/sec
[ 3] 14.0-15.0 sec 94.5 MBytes 793 Mbits/sec
[ 3] 15.0-16.0 sec 97.8 MBytes 820 Mbits/sec
[ 3] 16.0-17.0 sec 97.0 MBytes 814 Mbits/sec
[ 3] 17.0-18.0 sec 94.9 MBytes 796 Mbits/sec
[ 3] 18.0-19.0 sec 97.8 MBytes 821 Mbits/sec
[ 3] 19.0-20.0 sec 98.4 MBytes 825 Mbits/sec
[ 3] 20.0-21.0 sec 97.9 MBytes 822 Mbits/sec
[ 3] 21.0-22.0 sec 97.5 MBytes 818 Mbits/sec
[ 3] 22.0-23.0 sec 95.9 MBytes 805 Mbits/sec
[ 3] 23.0-24.0 sec 97.8 MBytes 821 Mbits/sec
[ 3] 24.0-25.0 sec 89.6 MBytes 752 Mbits/sec
[ 3] 25.0-26.0 sec 97.7 MBytes 820 Mbits/sec
[ 3] 26.0-27.0 sec 98.0 MBytes 822 Mbits/sec
[ 3] 27.0-28.0 sec 91.0 MBytes 764 Mbits/sec
[ 3] 28.0-29.0 sec 98.1 MBytes 823 Mbits/sec
[ 3] 29.0-30.0 sec 97.9 MBytes 821 Mbits/sec
[ 3] 0.0-30.0 sec 2.83 GBytes 809 Mbits/sec
[hays@bass-comp4 ~]$ sudo iperf -t 30 -i 1 -c gilgamesh.cs.unc.edu
Password:
------------------------------------------------------------
Client connecting to gilgamesh.cs.unc.edu, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 152.2.141.154 port 34699 connected with 152.2.131.71 port 5001
[ 3] 0.0- 1.0 sec 66.2 MBytes 555 Mbits/sec
[ 3] 1.0- 2.0 sec 73.6 MBytes 617 Mbits/sec
[ 3] 2.0- 3.0 sec 70.2 MBytes 589 Mbits/sec
[ 3] 3.0- 4.0 sec 78.4 MBytes 658 Mbits/sec
[ 3] 4.0- 5.0 sec 68.6 MBytes 575 Mbits/sec
[ 3] 5.0- 6.0 sec 78.8 MBytes 661 Mbits/sec
[ 3] 6.0- 7.0 sec 67.0 MBytes 562 Mbits/sec
[ 3] 7.0- 8.0 sec 80.8 MBytes 678 Mbits/sec
[ 3] 8.0- 9.0 sec 70.4 MBytes 591 Mbits/sec
[ 3] 9.0-10.0 sec 79.9 MBytes 670 Mbits/sec
[ 3] 10.0-11.0 sec 67.7 MBytes 568 Mbits/sec
[ 3] 11.0-12.0 sec 80.2 MBytes 673 Mbits/sec
[ 3] 12.0-13.0 sec 69.2 MBytes 581 Mbits/sec
[ 3] 13.0-14.0 sec 79.1 MBytes 663 Mbits/sec
[ 3] 14.0-15.0 sec 65.2 MBytes 547 Mbits/sec
[ 3] 15.0-16.0 sec 72.9 MBytes 612 Mbits/sec
[ 3] 16.0-17.0 sec 68.4 MBytes 574 Mbits/sec
[ 3] 17.0-18.0 sec 80.2 MBytes 673 Mbits/sec
[ 3] 18.0-19.0 sec 69.2 MBytes 581 Mbits/sec
[ 3] 19.0-20.0 sec 79.4 MBytes 666 Mbits/sec
[ 3] 20.0-21.0 sec 67.8 MBytes 569 Mbits/sec
[ 3] 21.0-22.0 sec 80.0 MBytes 671 Mbits/sec
[ 3] 22.0-23.0 sec 69.7 MBytes 585 Mbits/sec
[ 3] 23.0-24.0 sec 80.8 MBytes 678 Mbits/sec
[ 3] 24.0-25.0 sec 64.3 MBytes 539 Mbits/sec
[ 3] 25.0-26.0 sec 66.9 MBytes 561 Mbits/sec
[ 3] 26.0-27.0 sec 67.7 MBytes 568 Mbits/sec
[ 3] 27.0-28.0 sec 78.6 MBytes 659 Mbits/sec
[ 3] 28.0-29.0 sec 68.1 MBytes 571 Mbits/sec
[ 3] 29.0-30.0 sec 79.3 MBytes 665 Mbits/sec
[ 3] 0.0-30.0 sec 2.14 GBytes 612 Mbits/sec
[hays@bass-comp4 ~]$
--Hays 10:42, 2 October 2008 (EDT)
