Infrastructure

From Wiki

Jump to: navigation, search

This includes information on the infrastructure used to support the Bass system

Contents

Infrastructure

Power

How Power is Supplied

All racks in Bass except rack 5 are on the switched (aka "clean", although that is no longer an accurate description since all power in Sitterson and Brooks is clean) grid. Rack 5 with Bassfiles is powered via a large UPS on the unswitched grid. Bass sits across two power distribution units located in the machine room. All power connections are 60 amp three phase.

Here's the first cut at a map of the power:

Racks are numbered from right to left facing south, and units are numbered from top to bottom.

Rack
1,  units 1,3,5        breakers 1/3/5    DUIA
1,  units 2,4          breakers 7/9/11   DUIA
2,  units 1,3,5        breakers 13/15/17 DUIA
2,  units 2,4          breakers 19/21/23 DUIA
3,  units 1,3,5        breakers 25/27/29 DUIA
3,  units 2,4          breakers 31/33/35 DUIA
4,  units 1,3,5        breakers 20/22/24 DUIA
4,  units 2,4          breakers 17/18/19 DUIA
5,  all units          UPS
6,  all units          breakers 2/4/6    DUIA
7,  all units          breakers 26/28/30 DUIA
8,  units 1,5          breakers 2/4/6    DUIB
8,  unit 3             breakers 9/11/13  DUIB *
8,  units 2,4          breakers 8/10/12  DUIB
9,  units 1,3,5        breakers 14/16/18 DUIB
9,  units 2,4          breakers 30/32/34 DUIB
10, units 1,3,5        breakers 38/40/42 DUIB
10, units 2,4          breakers 24/26/28 DUIB
11, units 1,3,5        breakers 37/39/41 DUIB
11, units 2,4          breakers 1/3/5    DUIB
12, units 1,3,5        breakers 15/17/19 DUIB
12, units 2,4          breakers 9/11/13  DUIB


*This one we need to confirm--it's attached to both 
the A and B power leads, so it _should_ be on 
both 2/4/6 and 8/10/12 on DUIB, rather than 9/11/13 
on DUIB

Upgrade to Power

During load testing a 100 amp breaker tripped. After testing we determined that the power available via one of the PDUs was insufficient. In July of 2008 we upgraded the power to that PDU to 225 amps, and in addition bypassed the shunt breaker. Thus load is pretty evenly distributed across both PDUs. As of late July 2008, stress testing has not tripped any breakers, so we believe this problem is resolved.

It is important to note, however, that the transformer itself in SN122 is limited to 225 amps overall, so that is the real limited on how much power we can supply via the switched grid. For most hardware, initial powerup sequences consume the most power. In practical terms this means that we should adopt a policy of starting hardware nodes in the cluster in a sequence.


Data Network

Topology

The bass cluster has multiple networks. One internal network runs on infiniband, using private numbers. There's also an internal networks running on gige ethernet (see the department's DNS table for specific numbers). The cluster also uplinks through a few connections to the production network over gige ethernet.

Performance

This section stores test data showing performance between various nodes and department servers. Unless otherwise noted, speeds are expressed in megabits per second.

For iperf testing please note that the speed expressed is NIC to NIC, and that the packet size can affect results.


Linux iperf tests, summer 2008
------------------------------------------------------------------------------------
iperf hosts and direction    duration (s)    pkt size    range of throughputs    avg
Server <- Client
------------------------------------------------------------------------------------
Bass-comp4 <- cvs                      30         128                 319-344    333
                                                                      324-347    333
                                                                      294-314    299
------------------------------------------------------------------------------------
Bass-comp4 <- quintet                  30         128                 205-229    220
                                                                      200-230    222
                                                                      200-221    211
------------------------------------------------------------------------------------
Cvs <- bass-comp4                      30         128                 181-360    340
                                                                      174-364    332
                                                                      172-382    345
------------------------------------------------------------------------------------
Quintet <- bass-comp4                  30         128                 170-367    340
1st two seconds always slow                                           173-384    355
                                                                      179-373    357
------------------------------------------------------------------------------------
Tradeoff mode:
Client -> server then
Server -> client
------------------------------------------------------------------------------------
Bass-comp4 <- cvs                      30         512                 611-821    741
                                                                      567-742    653
                                                                      448-707    598
------------------------------------------------------------------------------------
Cvs <- bass-comp4                      30         512                 356-886    766
                                                                      733-828    820
                                                                      555-825    762
------------------------------------------------------------------------------------
Bass-comp4 <- quintet                  30         512                 373-522    495
                                                                      493-538    520
                                                                      406-523    505
------------------------------------------------------------------------------------
Quintet <- bass-comp4                  30         512                 448-499    491
                                                                      482-514    500
                                                                      397-511    489
------------------------------------------------------------------------------------
Bass-comp4 <- cvs                      30        8192                 650-912    843
                                                                      633-920    819
                                                                      817-915    885
------------------------------------------------------------------------------------
Cvs <- bass-comp4                      30        8192                 675-923    883
                                                                      813-919    900
                                                                      811-917    898
------------------------------------------------------------------------------------
Bass-comp4 <- quintet                  30        8192                 435-725    588
                                                                      303-852    678
                                                                      685-893    827
------------------------------------------------------------------------------------
Quintet <- bass-comp4                  30        8192                 662-828    823
                                                                      809-839    828
                                                                      643-832    820
------------------------------------------------------------------------------------


A couple of raw runs from 10/2/08

[hays@bass-comp4 ~]$ sudo iperf -t 30 -i 1 -c quintet.cs.unc.edu

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

Password:
Sorry, try again.
Password:
------------------------------------------------------------
Client connecting to quintet.cs.unc.edu, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 152.2.141.154 port 41399 connected with 152.2.128.80 port 5001
[  3]  0.0- 1.0 sec  99.6 MBytes    836 Mbits/sec
[  3]  1.0- 2.0 sec  89.2 MBytes    748 Mbits/sec
[  3]  2.0- 3.0 sec  91.2 MBytes    765 Mbits/sec
[  3]  3.0- 4.0 sec  97.9 MBytes    821 Mbits/sec
[  3]  4.0- 5.0 sec  97.6 MBytes    819 Mbits/sec
[  3]  5.0- 6.0 sec  94.3 MBytes    791 Mbits/sec
[  3]  6.0- 7.0 sec  98.4 MBytes    825 Mbits/sec
[  3]  7.0- 8.0 sec  98.4 MBytes    825 Mbits/sec
[  3]  8.0- 9.0 sec  98.2 MBytes    824 Mbits/sec
[  3]  9.0-10.0 sec  98.5 MBytes    826 Mbits/sec
[  3] 10.0-11.0 sec  97.8 MBytes    820 Mbits/sec
[  3] 11.0-12.0 sec  97.8 MBytes    821 Mbits/sec
[  3] 12.0-13.0 sec  98.2 MBytes    824 Mbits/sec
[  3] 13.0-14.0 sec  96.2 MBytes    807 Mbits/sec
[  3] 14.0-15.0 sec  94.5 MBytes    793 Mbits/sec
[  3] 15.0-16.0 sec  97.8 MBytes    820 Mbits/sec
[  3] 16.0-17.0 sec  97.0 MBytes    814 Mbits/sec
[  3] 17.0-18.0 sec  94.9 MBytes    796 Mbits/sec
[  3] 18.0-19.0 sec  97.8 MBytes    821 Mbits/sec
[  3] 19.0-20.0 sec  98.4 MBytes    825 Mbits/sec
[  3] 20.0-21.0 sec  97.9 MBytes    822 Mbits/sec
[  3] 21.0-22.0 sec  97.5 MBytes    818 Mbits/sec
[  3] 22.0-23.0 sec  95.9 MBytes    805 Mbits/sec
[  3] 23.0-24.0 sec  97.8 MBytes    821 Mbits/sec
[  3] 24.0-25.0 sec  89.6 MBytes    752 Mbits/sec
[  3] 25.0-26.0 sec  97.7 MBytes    820 Mbits/sec
[  3] 26.0-27.0 sec  98.0 MBytes    822 Mbits/sec
[  3] 27.0-28.0 sec  91.0 MBytes    764 Mbits/sec
[  3] 28.0-29.0 sec  98.1 MBytes    823 Mbits/sec
[  3] 29.0-30.0 sec  97.9 MBytes    821 Mbits/sec
[  3]  0.0-30.0 sec  2.83 GBytes    809 Mbits/sec
[hays@bass-comp4 ~]$ sudo iperf -t 30 -i 1 -c gilgamesh.cs.unc.edu
Password:
------------------------------------------------------------
Client connecting to gilgamesh.cs.unc.edu, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 152.2.141.154 port 34699 connected with 152.2.131.71 port 5001
[  3]  0.0- 1.0 sec  66.2 MBytes    555 Mbits/sec
[  3]  1.0- 2.0 sec  73.6 MBytes    617 Mbits/sec
[  3]  2.0- 3.0 sec  70.2 MBytes    589 Mbits/sec
[  3]  3.0- 4.0 sec  78.4 MBytes    658 Mbits/sec
[  3]  4.0- 5.0 sec  68.6 MBytes    575 Mbits/sec
[  3]  5.0- 6.0 sec  78.8 MBytes    661 Mbits/sec
[  3]  6.0- 7.0 sec  67.0 MBytes    562 Mbits/sec
[  3]  7.0- 8.0 sec  80.8 MBytes    678 Mbits/sec
[  3]  8.0- 9.0 sec  70.4 MBytes    591 Mbits/sec
[  3]  9.0-10.0 sec  79.9 MBytes    670 Mbits/sec
[  3] 10.0-11.0 sec  67.7 MBytes    568 Mbits/sec
[  3] 11.0-12.0 sec  80.2 MBytes    673 Mbits/sec
[  3] 12.0-13.0 sec  69.2 MBytes    581 Mbits/sec
[  3] 13.0-14.0 sec  79.1 MBytes    663 Mbits/sec
[  3] 14.0-15.0 sec  65.2 MBytes    547 Mbits/sec
[  3] 15.0-16.0 sec  72.9 MBytes    612 Mbits/sec
[  3] 16.0-17.0 sec  68.4 MBytes    574 Mbits/sec
[  3] 17.0-18.0 sec  80.2 MBytes    673 Mbits/sec
[  3] 18.0-19.0 sec  69.2 MBytes    581 Mbits/sec
[  3] 19.0-20.0 sec  79.4 MBytes    666 Mbits/sec
[  3] 20.0-21.0 sec  67.8 MBytes    569 Mbits/sec
[  3] 21.0-22.0 sec  80.0 MBytes    671 Mbits/sec
[  3] 22.0-23.0 sec  69.7 MBytes    585 Mbits/sec
[  3] 23.0-24.0 sec  80.8 MBytes    678 Mbits/sec
[  3] 24.0-25.0 sec  64.3 MBytes    539 Mbits/sec
[  3] 25.0-26.0 sec  66.9 MBytes    561 Mbits/sec
[  3] 26.0-27.0 sec  67.7 MBytes    568 Mbits/sec
[  3] 27.0-28.0 sec  78.6 MBytes    659 Mbits/sec
[  3] 28.0-29.0 sec  68.1 MBytes    571 Mbits/sec
[  3] 29.0-30.0 sec  79.3 MBytes    665 Mbits/sec
[  3]  0.0-30.0 sec  2.14 GBytes    612 Mbits/sec
[hays@bass-comp4 ~]$ 

--Hays 10:42, 2 October 2008 (EDT)