Life is too bug

从Nat到TCP_TW

2020.02.25

看到这个标题,你可能会感到疑惑这两个有什么关系呢?一个是IP转换一个是TCP,似乎关系不大?接下来听我慢慢介绍。

NAT

在IPV4的时代,IP可以分为两种,公网和局域网。当你在局域网时想要访问外面的内容就要经过Nat,即网络地址转换也就是把局域网的IP转换为公网,发送和接收请求。其中

  • SNat 原地址转换 Source nat,也就是在发送请求的时候,我们的路由器把类似192.168.1.2转换为我们拨号上网的公网IP,
  • DNat也就是目的地址转换,也就是在收到接收到请求的时候,把上面的公网IP转换为192.168.1.2的局域网IP。

也就是说其实我们无时无刻都在使用着Nat,这是客户端。对于服务端呢?也会用到Nat。典型的场景比如负载均衡。比如说我们请求的淘宝首页,淘宝的服务器肯定不止一台,淘宝可能也是通过负载均衡以及Nat来把请求分发到后端。

dig A taobao.com +short
140.205.220.96
140.205.94.189

TCP

为什么会注意到这点呢?其实主要是因为部署在容器内的服务在连接MySQL的时候经常会报错比如:

pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'xxxx (timed out)")
Error Code: 2013. Lost connection to MySQL server during query

主要的原因就是因为开发TCP timewait的复用。这里引用下别人的文章:

1. 同时开启tcp_timestamp和tcp_tw_recycle会启用TCP/IP协议栈的per-host的PAWS机制
2. 经过同一NAT转换后的来自不同真实client的数据流,在服务端看来是于同一host打交道
3. 虽然经过同一NAT转化,但由于不同真实client会携带各自的timestamp值
因而无法保证整过NAT转化后的数据包携带的timestamp值严格递增
4. 当服务器的per-host PAWS机制被触发后,会丢弃timestamp值不符合递增条件的数据包

先说结论

开启tcp_timestamp,但不要开tcp_tw_recycle

TCP的三次握手这些就不多说了,下面就根据流程图以及Linux kernel相关TCP参数补充一下细节。

客户端                                            服务端
                  +                          +
                  |                          |
                  |                          |   Listen 监听端口
                  |                          |
                  |                          |
+------------------------------------------------------------------+
发送SYN,(SYN_SEND) |                          |
                  |                          |   
                  |                          |  回复SYN+ACK,到SYN_RCVD
                  |                          |
 回复ACK           |                          |
                  |                          |
+------------------------------------------------------------------+                                            建立连接
 建立链接          |                          | 建立链接
 双方开始传输数据   |                          | 双方开始传输数据
                  |                          |  
  GET / HTTP1.1   |                          |                  
                  |                          |  
                  |                          |   ESTABLISHED              
+------------------------------------------------------------------+                                 
发FIN (FIN_WAIT1) |                          | 
                  |                          | 
                  |                          | ACK
                  |                          | (CLOSE_WAIT)
 (FIN_WAIT2)      |                          | 
                  |                          | 发FIN+ACK
                  |                          | (LAST_ACK)              
                  |                          |
 发ACK            |                          |
 (TIME_WAIT)      |                          | 
                  |                          |      
 2MSL之后CLOSED    |                          | CLOSED
                  +                          +

Kernel中关于TCP的可设置项目,这里我们可以看到time timeout的时间我们是不能设置的,只能设置tcp_fin_timeout默认值是60。

sysctl -a |grep tcp
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_allowed_congestion_control = reno cubic
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_autocorking = 1
net.ipv4.tcp_available_congestion_control = reno cubic
net.ipv4.tcp_available_ulp =
net.ipv4.tcp_base_mss = 1024
net.ipv4.tcp_challenge_ack_limit = 1000
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_early_demux = 1
net.ipv4.tcp_early_retrans = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_ecn_fallback = 1
net.ipv4.tcp_fack = 0
net.ipv4.tcp_fastopen = 1
net.ipv4.tcp_fastopen_blackhole_timeout_sec = 3600
net.ipv4.tcp_fastopen_key = 00000000-00000000-00000000-00000000
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_frto = 2
net.ipv4.tcp_fwmark_accept = 0
net.ipv4.tcp_invalid_ratelimit = 500
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_l3mdev_accept = 0
net.ipv4.tcp_limit_output_bytes = 262144
net.ipv4.tcp_low_latency = 0
sysctl: reading key "net.ipv6.conf.all.stable_secret"
net.ipv4.tcp_max_orphans = 16384
net.ipv4.tcp_max_reordering = 300
net.ipv4.tcp_max_syn_backlog = 128
net.ipv4.tcp_max_tw_buckets = 16384
net.ipv4.tcp_mem = 45840        61122   91680
net.ipv4.tcp_min_rtt_wlen = 300
net.ipv4.tcp_min_snd_mss = 48
net.ipv4.tcp_min_tso_segs = 2
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.tcp_notsent_lowat = 4294967295
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_pacing_ca_ratio = 120
net.ipv4.tcp_pacing_ss_ratio = 200
net.ipv4.tcp_probe_interval = 600
net.ipv4.tcp_probe_threshold = 8
net.ipv4.tcp_recovery = 1
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rmem = 4096        131072  6291456
net.ipv4.tcp_sack = 1
net.ipv4.tcp_slow_start_after_idle = 1
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_thin_linear_timeouts = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_tw_reuse = 2
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096        16384   4194304
net.ipv4.tcp_workaround_signed_windows = 0
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.docker0.stable_secret"
sysctl: reading key "net.ipv6.conf.docker_gwbridge.stable_secret"
sysctl: reading key "net.ipv6.conf.enp0s3.stable_secret"
sysctl: reading key "net.ipv6.conf.enp0s8.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
sysctl: reading key "net.ipv6.conf.veth2465cce.stable_secret"
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
root@ubuntu-bionic:/home/vagrant# uname  -a
Linux ubuntu-bionic 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

tcp

Ref

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

http://perthcharles.github.io/2015/08/27/timestamp-intro/

https://yuanrengu.com/2020/77eef79f.html

comments powered by Disqus