news 2026/1/20 6:04:54

【技术】一文看懂Kubernetes之Calico 网络实现(二)

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
【技术】一文看懂Kubernetes之Calico 网络实现(二)

【技术】一文看懂Kubernetes之Calico 网络实现(二)

📌 本系列文章主要探讨云计算领域Kubernetes中CNI Calico组件的架构以及网络实现,本文主要介绍calico的ipip网络模式下的通信实现

一、Calico 网络模式

模式数据包封装是否overlay
BGP
IPIP
VXLAN

二、关于 IPIP

ipip是Linux内核原生支持的一种三层隧道协议,全称为IPv4 in IPv4。其核心原理是在原始IPv4报文的基础上再封装一个IPv4报文头,从而实现报文在不同网络间的透明传输。

三、Calico之IPIP

查看当前calico 运行在哪个网络模式下:

# kubectl -n kube-system get ippool default-ipv4-ippool -o yaml apiVersion: crd.projectcalico.org/v1 kind: IPPool metadata: annotations: projectcalico.org/metadata: '{"creationTimestamp":"2025-12-09T07:33:15Z"}' creationTimestamp: "2025-12-09T07:33:15Z" generation: 1 name: default-ipv4-ippool resourceVersion: "941" uid: bfa1b297-4402-4fa7-bfdf-1e1dc629e2cd spec: allowedUses: - Workload - Tunnel blockSize: 26 cidr: 192.168.0.0/16 ipipMode: Always natOutgoing: true nodeSelector: all() vxlanMode: Never

根据 ipipMode: Always 可以看出,calico 这里安装后默认使用的ipip模式。

3.1. 同节点之间POD通信

同一node下 192.168.79.67 和 192.168.79.68 通信

# kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-deployment-bf744486c-8ppjm 1/1 Running 0 5d17h 192.168.79.68 host-10-16-217-141 <none> <none> default nginx-deployment-bf744486c-cvhhz 1/1 Running 0 5d17h 192.168.79.67 host-10-16-217-141 <none> <none>
a. 查看网卡

查看 192.168.79.67的pod,对应宿主机 veth网卡 8: cali88e3e62ccbf@if3

# ip netns exec cni-d84e16a0-941d-7bca-80af-3e8914638e88 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 1a:39:0f:40:34:09 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.67/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::1839:fff:fe40:3409/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-d84e16a0-941d-7bca-80af-3e8914638e88 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link

查看 192.168.79.68的pod,对应宿主机veth网卡 9: califc335b22756@if3

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether ea:b2:2d:44:f1:73 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.68/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::e8b2:2dff:fe44:f173/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-6f3c9c59-8654-b2c4-756e-eca53e0db3b4 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link
b. 网络分析

由 192.168.79.67 的路由可以看出,pod 网卡出向路由直接送给了 169.254.1.1,这里的169.254.1.1是个假地址,因为pod网卡是通过veth到宿主机的caliXXX上,所以即使网关IP是假的,也能从eth0发出送到宿主机的caliXXX上。

[ Pod netns ] eth0 <====veth====> caliXXXX [ Node netns ]

所以,流量从pod 192.168.79.67 从eth0 出去后进入了宿主机,再经宿主机路由联通另外的本机pod

宿主机上对于本机pod有静态路由

# ip r 192.168.79.67 dev cali88e3e62ccbf scope link 192.168.79.68 dev califc335b22756 scope link

所以真实的网络通信路径是

POD1 ====veth====> caliXXX1 [ Node ] ====node===> caliXXX2 [ Node ] ====veth====> POD2
c. 抓包确认

在本宿主机 host-10-16-217-141 抓包,发现caliXXX网卡收到了pod的包

# tcpdump -i any -nnee icmp tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 10:58:25.977391 cali88e3e62ccbf In ifindex 8 1a:39:0f:40:34:09 ethertype IPv4 (0x0800), length 104: 192.168.79.67 > 192.168.79.68: ICMP echo request, id 4, seq 1, length 64 10:58:25.977492 califc335b22756 Out ifindex 9 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.67 > 192.168.79.68: ICMP echo request, id 4, seq 1, length 64 10:58:25.977515 califc335b22756 In ifindex 9 ea:b2:2d:44:f1:73 ethertype IPv4 (0x0800), length 104: 192.168.79.68 > 192.168.79.67: ICMP echo reply, id 4, seq 1, length 64 10:58:25.977534 cali88e3e62ccbf Out ifindex 8 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.68 > 192.168.79.67: ICMP echo reply, id 4, seq 1, length 64

综上,由抓包可以看出,流量和如上分析结果一致

3.2 跨节点之间POD通信

不同一node下 192.168.79.72 和 192.168.232.194 通信

# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-bf744486c-8ppjm 1/1 Running 0 5d17h 192.168.79.68 host-10-16-217-141 <none> <none> nginx-deployment-bf744486c-cvhhz 1/1 Running 0 5d17h 192.168.79.67 host-10-16-217-141 <none> <none> nginx-deployment2-fb46746f5-5w77x 1/1 Running 0 12s 192.168.79.72 host-10-16-217-141 <none> <none> nginx-deployment2-fb46746f5-cddm6 1/1 Running 0 9s 192.168.232.194 host-10-16-217-208 <none> <none> nginx-deployment2-fb46746f5-j2p6k 1/1 Running 0 28s 192.168.232.193 host-10-16-217-208 <none> <none>
a. 查看网卡

查看 192.168.79.72 的pod,对应宿主机 veth网卡 13: cali58644df6687@if3

# ip netns exec cni-338ada74-f1bd-25c7-a223-c0d2c0cbf7e1 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 5a:49:52:d3:d0:b7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.72/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5849:52ff:fed3:d0b7/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-338ada74-f1bd-25c7-a223-c0d2c0cbf7e1 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link

查看 192.168.232.194 的pod,对应宿主机 veth网卡 7: caliab6a8d8a743@if3

# ip netns exec cni-41842320-511e-c641-304f-1f58c4fdb0bf ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 4a:30:47:7d:15:c7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.232.194/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::4830:47ff:fe7d:15c7/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-41842320-511e-c641-304f-1f58c4fdb0bf ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link
b. 网络分析

和上文一样,pod 192.168.79.72 从eth0 出去进入宿主机 host-10-16-217-141之后,,再经宿主机 host-10-16-217-141 本机路由联通其他node的pod

宿主机host-10-16-217-141上对于目标192.168.232.194 命中静态路由

# ip r 192.168.232.192/26 via 172.22.3.64 dev tunl0 proto bird onlink

所以,真实的网络路径是:

POD1 ====veth====> caliXXX1 [ Node1 ] ====node1===> tunl0 [ Node1 ] ====ipip====> eth0 [ Node2 ]

包进入目标宿主机host-10-16-217-208之后

eth0 [ Node2 ] ===node2===> caliXXX2 [ Node2 ] ====veth====> POD2
c. 抓包确认

在 目标宿主机 host-10-16-217-208 抓包,发现eth0网卡收到了源宿主机host-10-16-217-141的ipip包

# tcpdump -i any -nnee proto 4 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 11:43:22.174218 eth0 In ifindex 2 fa:16:3e:46:0b:f2 ethertype IPv4 (0x0800), length 124: 172.22.1.229 > 172.22.3.64: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 259, length 64 11:43:22.174323 eth0 Out ifindex 2 fa:16:3e:cd:14:d4 ethertype IPv4 (0x0800), length 124: 172.22.3.64 > 172.22.1.229: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 259, length 64

经tunl0解封装后,经静态路由送给了caliXXX2网卡

192.168.232.194 dev caliab6a8d8a743 src 172.22.3.64 uid 0
11:39:07.198229 tunl0 In ifindex 3 ethertype IPv4 (0x0800), length 104: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 10, length 64 11:39:07.198313 caliab6a8d8a743 Out ifindex 7 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 10, length 64 11:39:07.198345 caliab6a8d8a743 In ifindex 7 4a:30:47:7d:15:c7 ethertype IPv4 (0x0800), length 104: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 10, length 64 11:39:07.198364 tunl0 Out ifindex 3 ethertype IPv4 (0x0800), length 104: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 10, length 64

综上,由抓包可以看出,流量和如上分析结果一致

3.3 POD和SVC通信

提到svc,那自然想到了kube-proxy,本文这里kube-proxy也是基于iptables实现的

kube-proxy 在 iptables 模式下,干三件事:

  1. 截获访问 Service IP 的流量
  2. 选择一个后端 Pod
  3. 做 DNAT

那么pod 192.168.79.72 和 比如 svc 172.16.0.10 是如何通信的

# kubectl get svc -o wide -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 20d <none> kube-system kube-dns ClusterIP 172.16.0.10 <none> 53/UDP,53/TCP,9153/TCP 20d k8s-app=kube-dns
a.如何截获访问 Service IP 的流量

通过iptables如下链依次截获流量

PREROUTING --> KUBE-SERVICES --> KUBE-SVC-xxxx --> KUBE-SEP-xxxx

查看 宿主机上的PREROUTING链信息

iptables -t nat -nvL PREROUTING Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 55182 2870K cali-PREROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */ 55284 2875K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */

POD流量经宿主机出去后,

Chain KUBE-SERVICES (2 references) pkts bytes target prot opt in out source destination 0 0 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:dns cluster IP */ 0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ 0 0 KUBE-SVC-JD5MR3NA4I4DYORP tcp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:metrics cluster IP */ 0 0 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- * * 0.0.0.0/0 172.16.0.1 /* default/kubernetes:https cluster IP */ 65 4547 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

KUBE-SERVICES 链中,将去往 172.16.0.10的udp包转给了KUBE-SVC-TCOU7JCQXEZGVUNU 链

Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references) pkts bytes target prot opt in out source destination 0 0 KUBE-MARK-MASQ udp -- * * !192.168.0.0/16 172.16.0.10 /* kube-system/kube-dns:dns cluster IP */ 0 0 KUBE-SEP-V3WL5PSHR6KK4LJN all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns -> 192.168.34.193:53 */ statistic mode random probability 0.50000000000 0 0 KUBE-SEP-NJ5U6PSIJNX4FJ6P all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns -> 192.168.79.66:53 */

最终在 KUBE-SEP-V3WL5PSHR6KK4LJN 链中,一方面对POD将192.168.34.193 的回包做MARK从而能够在POSTROUTING中对其进行SNAT/MARSQUERADE,同时将包的目的地址和IP改成192.168.34.193:53

Chain KUBE-SEP-V3WL5PSHR6KK4LJN (1 references) pkts bytes target prot opt in out source destination 0 0 KUBE-MARK-MASQ all -- * * 192.168.34.193 0.0.0.0/0 /* kube-system/kube-dns:dns */ 0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */ udp to:192.168.34.193:53
c. 抓包确认

因为本质POD访问SVC也是被DNAT给了POD IP,这个就和上面POD和POD的抓包结果一样,就不贴抓包结果了。

四、结尾

calico的网络能力,主要依赖于Linux内核的overlay网络封装能力,不论是ipip抑或是vxlan等,同时借助于iptables实现细致的隔离策略,不论在openstack还是kubernetes,都是借助Linux内核的能力,这也是内核态的通用解决方案了。

参考:

https://cloud.tencent.com/developer/article/2394273

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/1/4 12:04:55

内联数组提升性能50%?,揭秘.NET 7+中的StackOnly类型魔法

第一章&#xff1a;内联数组提升性能50%&#xff1f;&#xff0c;揭秘.NET 7中的StackOnly类型魔法在 .NET 7 中&#xff0c;微软引入了对“内联数组”&#xff08;Inline Arrays&#xff09;的实验性支持&#xff0c;这一特性允许开发者将固定大小的数组直接嵌入到结构体中&am…

作者头像 李华
网站建设 2026/1/4 12:04:02

如何删除HeyGem中的错误视频任务?批量清除操作技巧

如何删除HeyGem中的错误视频任务&#xff1f;批量清除操作技巧 在数字人内容生产日益自动化的今天&#xff0c;企业使用AI生成虚拟人物视频的频率越来越高。像 HeyGem 这样的系统&#xff0c;凭借语音驱动口型同步&#xff08;Lip-sync&#xff09;能力&#xff0c;能快速批量生…

作者头像 李华
网站建设 2026/1/18 0:58:10

HTML页面结构解析:HeyGem WebUI前端技术栈揭秘

HTML页面结构解析&#xff1a;HeyGem WebUI前端技术栈揭秘 在AI驱动的音视频生成工具日益普及的今天&#xff0c;一个直观、高效且稳定的Web用户界面&#xff08;WebUI&#xff09;已成为决定产品成败的关键因素。以HeyGem数字人视频生成系统为例&#xff0c;其前端不仅承担着基…

作者头像 李华
网站建设 2026/1/4 12:01:37

变量捕获问题全解析,彻底搞懂C# Lambda闭包的生命周期管理

第一章&#xff1a;变量捕获问题全解析&#xff0c;彻底搞懂C# Lambda闭包的生命周期管理在C#中&#xff0c;Lambda表达式因其简洁性和函数式编程特性被广泛使用&#xff0c;但其背后的变量捕获机制常引发开发者困惑。当Lambda捕获外部局部变量时&#xff0c;实际上创建了一个闭…

作者头像 李华