Swarm Mode负载均衡

Libnetwork

Libnetwork最初是由libcontainer和Docker Engine中的网络相关的代码合并而成的,是Docker容器网络库,最核心的内容是其定义的Container Network Model(CNM)。

Libnetwork CNM 定义了Docker容器的网络模型,按照该模型开发出的driver就能与docker daemon协同工作,实现容器网络。docker 原生的driver包括 none、bridge、overlay和macvlan,第三方driver包括flannel、weave、calico等。

CNM

CNM定义了如下三个组件:

  • Sandbox
    Sandbox是Docker容器中一个网络配置的隔离环境,包含容器的interface、路由表和DNS设置。Linux Network Namespace是Sandbox的标准实现。Sandbox可以包含来自不同 Network的Endpoint。

  • Endpoint
    Endpoint是一个在Network中进行网络通讯的接口(veth pair),用于将Sandbox接入Network。一个Endpoint只能属于一个Network,也只能属于一个Sandbox
    Endpoint可以加入一个network,但多个Endpoint可以在一个Sandbox中共存。

  • Network
    一个Network就是一个唯一的、可识别的endpoint组,组内endpoint可以相互通讯。Network的实现可以是linux Bridge、VLAN等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
[root@swarm-manager ~]# ll /var/run/docker/netns/
total 0
-r--r--r-- 1 root root 0 Aug 5 10:45 1-i6xug49nwd
-r--r--r-- 1 root root 0 Aug 5 10:45 ingress_sbox
[root@swarm-manager ~]# nsenter --net=/var/run/docker/netns/ingress_sbox ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.255.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:0a:ff:00:02 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:ac:12:00:02 txqueuelen 0 (Ethernet)
RX packets 90 bytes 75247 (73.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 123 bytes 10271 (10.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 0 (Local Loopback)
RX packets 6 bytes 504 (504.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6 bytes 504 (504.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@swarm-manager ~]# iptables -t mangle -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
[root@swarm-manager ~]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn

[root@swarm-manager ~]# nsenter --net=/var/run/docker/netns/1-i6xug49nwd ifconfig
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.255.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 3a:31:2c:7f:21:a8 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 0 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

veth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
ether 4e:75:b1:f9:5b:55 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vxlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
ether 3a:31:2c:7f:21:a8 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

通过nsenter --net=<SandboxKey> ip add可以直接进入容器的Sandbox,查询相关的网络信息


负载均衡

internal load balancing

vip(基于ipvs)

  • 创建一个自定义的overlay网络:
    只需在manager节点创建,当有Service连接该overlay网络时,将会自动在所分配的worker节点上自动创建该overlay网络。

    1
    2
    3
    4
    5
    6
    [root@swarm-manager ~]# docker network create -d overlay --subnet 192.168.10.0/24 my-network
    [root@swarm-manager ~]# docker network ls -f name=my-network
    NETWORK ID NAME DRIVER SCOPE
    nvpbs39b6ctz my-network overlay swarm
    [root@swarm-manager ~]# docker network inspect -f {{.IPAM.Config}} my-network
    [{192.168.10.0/24 192.168.10.1 map[]}]
  • 创建一个连接到自定义的overlay网络的service(未指定”endpoint-mode”则默认为VIP模式)
    Docker会为每个overlay网络创建一个独立的network namespace,其中会有一个linux bridge br0,endpoint还是由veth pair实现,一端连接到容器中的eth0,另一端连接到network namespace的br0上。
    br0除了连接所有的endpoint,还会连接一个vxlan设备,用于与其他host建立vxlan tunnel。容器之间的数据就是通过这个tunnel通信的。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    [root@swarm-manager ~]# docker service create --replicas 2 --name nginx-vip --network my-network nginx
    [root@swarm-manager ~]# docker service inspect -f {{.Endpoint.VirtualIPs}} nginx-vip
    [{nvpbs39b6ctzrfw6vj809kjbu 192.168.10.2/24}]

    [root@swarm-node1 ~]# # ls -lrt /var/run/docker/netns/
    total 0
    -r--r--r--. 1 root root 0 Aug 6 11:24 ingress_sbox
    -r--r--r--. 1 root root 0 Aug 6 11:24 1-i6xug49nwd
    -r--r--r--. 1 root root 0 Aug 10 16:30 1-vxe1cwk14a
    -r--r--r--. 1 root root 0 Aug 10 16:30 18531514ffd0
    [root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/1-vxe1cwk14a ifconfig
    br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    inet 192.168.10.1 netmask 255.255.255.0 broadcast 0.0.0.0
    ether 5a:f5:4c:f3:98:d5 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    loop txqueuelen 0 (Local Loopback)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    veth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    ether 96:57:63:17:80:83 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    vxlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    ether 5a:f5:4c:f3:98:d5 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    [root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/18531514ffd0 ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    inet 192.168.10.4 netmask 255.255.255.0 broadcast 0.0.0.0
    ether 02:42:c0:a8:0a:04 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 172.18.0.3 netmask 255.255.0.0 broadcast 0.0.0.0
    ether 02:42:ac:12:00:03 txqueuelen 0 (Ethernet)
    RX packets 7484 bytes 16821353 (16.0 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 4780 bytes 321518 (313.9 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    loop txqueuelen 0 (Local Loopback)
    RX packets 14 bytes 1717 (1.6 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 14 bytes 1717 (1.6 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
  • service-name将通过内置的DNS解析到VIP地址

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [root@swarm-node1 ~]# docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    d66e400533af nginx:latest "nginx -g 'daemon ..." 18 minutes ago Up 18 minutes 80/tcp nginx-vip.2.p52ud6cmmgonl236dtvhuzibk
    # docker exec -it d66e400533af sh
    root@d66e400533af:/# nslookup nginx-vip
    Server: 127.0.0.11
    Address: 127.0.0.11#53

    Non-authoritative answer:
    Name: nginx-vip
    Address: 192.168.10.2

    root@d66e400533af:/# nslookup tasks.nginx-vip
    Server: 127.0.0.11
    Address: 127.0.0.11#53

    Non-authoritative answer:
    Name: tasks.nginx-vip
    Address: 192.168.10.4
    Name: tasks.nginx-vip
    Address: 192.168.10.3
  • Service IP:192.168.10.2在iptables的mangle表的OUTPUT链中被标记为0x112(274),IPVS通过该标记将Service IP转发到192.168.10.3和192.168.10.4的容器

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    [root@swarm-node1 ~]# docker inspect -f {{.NetworkSettings.SandboxKey}} d66e400533af
    /var/run/docker/netns/18531514ffd0
    [root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/18531514ffd0 sh
    sh-4.2# ip add show eth0
    102: eth0@if103: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:c0:a8:0a:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.10.4/24 scope global eth0
    valid_lft forever preferred_lft forever
    inet 192.168.10.2/32 scope global eth0
    valid_lft forever preferred_lft forever
    sh-4.2# ip route
    default via 172.18.0.1 dev eth1
    172.18.0.0/16 dev eth1 proto kernel scope link src 172.18.0.3
    192.168.10.0/24 dev eth0 proto kernel scope link src 192.168.10.4
    sh-4.2# iptables -t mangle -S
    -P PREROUTING ACCEPT
    -P INPUT ACCEPT
    -P FORWARD ACCEPT
    -P OUTPUT ACCEPT
    -P POSTROUTING ACCEPT
    -A OUTPUT -d 192.168.10.2/32 -j MARK --set-xmark 0x112/0xffffffff
    sh-4.2# ipvsadm
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
    -> RemoteAddress:Port Forward Weight ActiveConn InActConn
    FWM 274 rr
    -> 192.168.10.3:0 Masq 1 0 0
    -> 192.168.10.4:0 Masq 1 0 0

VIP模式中,swarm mode为容器分配了一个连接到overlay网络(my-network)的网卡"eth0@if103"且生成了VIP,同时也分配了一个连接到docker_gwbridge网络的网卡"eth1@if105"用于连接外部网络。
所有连接到my-network网络中的容器可以通过service-name或者VIP来访问service,通过service-name访问时,将先通过内置的DNS服务解析获取到VIP。

dnsrr(DNS round-robin)

  • 创建一个连接到自定义的overlay网络的service,并指定endpoint-mode为dnsrr

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    [root@swarm-manager ~]# docker service create --endpoint-mode dnsrr --replicas 2 --name nginx-dnsrr --network my-network nginx
    [root@swarm-manager ~]# docker service inspect -f {{.Spec.EndpointSpec.Mode}} nginx-dnsrr
    dnsrr
    [root@swarm-node1 ~]# docker inspect -f {{.NetworkSettings.SandboxKey}} b68f0b4465b4
    /var/run/docker/netns/b4efcf686a74
    [root@swarm-node1 ~]# ls -lrt /var/run/docker/netns/
    total 0
    -r--r--r--. 1 root root 0 Aug 6 11:24 ingress_sbox
    -r--r--r--. 1 root root 0 Aug 6 11:24 1-i6xug49nwd
    -r--r--r--. 1 root root 0 Aug 10 16:30 1-vxe1cwk14a
    -r--r--r--. 1 root root 0 Aug 10 16:30 18531514ffd0
    -r--r--r--. 1 root root 0 Aug 10 18:08 b4efcf686a74
    [root@swarm-node1 ~]# docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    b68f0b4465b4 nginx:latest "nginx -g 'daemon ..." About a minute ago Up About a minute 80/tcp nginx-dnsrr.2.kpx2tqqmdugdpwpdwynbyzf9j
    d66e400533af nginx:latest "nginx -g 'daemon ..." 2 hours ago Up 2 hours 80/tcp nginx-vip.2.p52ud6cmmgonl236dtvhuzibk
  • service-name将通过内置的DNS解析到每个容器的overlay网络地址

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    [root@swarm-node1 ~]# docker exec -it b68f0b4465b4 bash
    root@b68f0b4465b4:/# nslookup nginx-dnsrr
    Server: 127.0.0.11
    Address: 127.0.0.11#53

    Non-authoritative answer:
    Name: my-nginx-dnsrr
    Address: 192.168.10.5
    Name: my-nginx-dnsrr
    Address: 192.168.10.6
    root@b68f0b4465b4:/# ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    inet 192.168.10.6 netmask 255.255.255.0 broadcast 0.0.0.0
    ether 02:42:c0:a8:0a:06 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 172.18.0.4 netmask 255.255.0.0 broadcast 0.0.0.0
    ether 02:42:ac:12:00:04 txqueuelen 0 (Ethernet)
    RX packets 7656 bytes 16832156 (16.0 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 5053 bytes 343122 (335.0 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    loop txqueuelen 0 (Local Loopback)
    RX packets 6 bytes 788 (788.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 6 bytes 788 (788.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ingress load balancing

  • 创建一个连接到自定义的overlay网络的service,并将容器的80端口映射host的80端口
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    [root@swarm-manager ~]# docker service create --replicas 2 --name nginx-ingress --network my-network --publish 80:80 nginx
    [root@swarm-manager ~]# docker service ls
    ID NAME MODE REPLICAS IMAGE PORTS
    7yzee08a9ryq nginx-dnsrr replicated 2/2 nginx:latest
    qx5epc99yu8q nginx-vip replicated 2/2 nginx:latest
    udiaexlplqq2 nginx-ingress replicated 2/2 nginx:latest *:80->80/tcp
    [root@swarm-manager ~]# docker service inspect -f {{.Endpoint.VirtualIPs}} nginx-ingress
    [{i6xug49nwdsxauqqpli3apvym 10.255.0.5/16} {vxe1cwk14avlfp2xjgymhkhdl 192.168.10.7/24}]
    [root@swarm-node1 ~]# docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    42aed469b4cc nginx:latest "nginx -g 'daemon ..." About an hour ago Up About an hour 80/tcp nginx-ingress.2.u00j8ich8cv4bo0grx1hbsr7q
    b68f0b4465b4 nginx:latest "nginx -g 'daemon ..." About an hour ago Up About an hour 80/tcp nginx-dnsrr.2.kpx2tqqmdugdpwpdwynbyzf9j
    d66e400533af nginx:latest "nginx -g 'daemon ..." 3 hours ago Up 3 hours 80/tcp nginx-vip.2.p52ud6cmmgonl236dtvhuzibk
    [root@swarm-node1 ~]# docker exec -it 42aed469b4cc bash
    root@42aed469b4cc:/# ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    inet 10.255.0.7 netmask 255.255.0.0 broadcast 0.0.0.0
    ether 02:42:0a:ff:00:07 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 172.18.0.5 netmask 255.255.0.0 broadcast 0.0.0.0
    ether 02:42:ac:12:00:05 txqueuelen 0 (Ethernet)
    RX packets 4049 bytes 10745529 (10.2 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 3164 bytes 211730 (206.7 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    inet 192.168.10.9 netmask 255.255.255.0 broadcast 0.0.0.0
    ether 02:42:c0:a8:0a:09 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    loop txqueuelen 0 (Local Loopback)
    RX packets 4 bytes 620 (620.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 4 bytes 620 (620.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    root@42aed469b4cc:/# ip add show eth0
    110: eth0@if111: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:ff:00:07 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.255.0.7/16 scope global eth0
    valid_lft forever preferred_lft forever
    inet 10.255.0.5/32 scope global eth0
    valid_lft forever preferred_lft forever
    root@42aed469b4cc:/# ip add show eth2
    root@42aed469b4cc:/# ip add show eth2
    114: eth2@if115: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:c0:a8:0a:09 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet 192.168.10.9/24 scope global eth2
    valid_lft forever preferred_lft forever
    inet 192.168.10.7/32 scope global eth2
    valid_lft forever preferred_lft forever

容器中除了lo,还创建了3个网卡:

  • eth0:连接到ingress的网络,通过routing mesh提供外部服务访问。
  • eth1:当service设置了端口映射时,swarm会为Service中的每个容器另外分配一块网卡,连接到docker_gwbridge网络。当容器内部主动往外发送数据时,由docker_gwbridge SNAT转发至外部网络。
  • eth2:连接到my-network的overlay网络。

routing mesh

如果在创建Service时映射了端口,swarm mode将会通过routing mesh在所有节点上监听80端口,即使节点上未创建相应的容器,并通过iptables做反向NAT,当客户端访问集群中的任意节点的80端口,swarm负载均衡会将请求路由到一个活动的容器。若不想在未创建容器的节点上监听published端口,则可在映射端口时通过--publish mode=host,target=80,published=8080进行指定。

创建完Service之后会发现Virtual IPs有两个入口,其中10.255.0.5连接到ingress网络,192.168.10.7是连接自定义的my-network网络。当外部客户端访问服务时,swarm负载均衡的流程如下:

1、用户访问swarm-node1上的Nginx服务(172.16.100.21:80)
2、iptables将根据-A DOCKER-INGRESS -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:80的规则将请求转发至ingress sanbox中的172.18.0.2:80。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@swarm-node1 ~]# iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-N DOCKER-INGRESS
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER-INGRESS
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m addrtype --dst-type LOCAL -j DOCKER-INGRESS
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -o docker_gwbridge -m addrtype --src-type LOCAL -j MASQUERADE
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o docker_gwbridge -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i docker_gwbridge -j RETURN
-A DOCKER-INGRESS -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:80
-A DOCKER-INGRESS -j RETURN

3、ingress sanbox中的iptable根据不同的端口设置不同的mark(0x114/276)。

1
2
3
4
5
6
7
8
[root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-A PREROUTING -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x114/0xffffffff
-A OUTPUT -d 10.255.0.5/32 -j MARK --set-xmark 0x114/0xffffffff

4、ipvs将根据不同的mark转发到对应的real server(容器namespace);

1
2
3
4
5
6
7
[root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/ingress_sbox ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 276 rr
-> 10.255.0.6:0 Masq 1 0 0
-> 10.255.0.7:0 Masq 1 0 0

坚持原创技术分享,您的支持将鼓励我继续创作!
0%