fix(bpf): Monitor use of splice to avoid kernel bug on fast WAN redirecting#507
Draft
jschwinger233 wants to merge 2 commits intodaeuniverse:mainfrom
Draft
fix(bpf): Monitor use of splice to avoid kernel bug on fast WAN redirecting#507jschwinger233 wants to merge 2 commits intodaeuniverse:mainfrom
jschwinger233 wants to merge 2 commits intodaeuniverse:mainfrom
Conversation
Contributor
|
有个问题,如果该 task 第一次 tcp 没有使用 splice,第二次 tcp 使用了,是不是就坏事了。 当然这种情况可能比较少?可以观察一下。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
由于内核 bug 导致 bpf_msg_redirect 和 splice 不能同时使用 (https://github.com/jschwinger233/bpf_msg_redirect_bug_reproducer) , #481 会导致 dae-0.6 和 glider 无法一起工作,因为 glider 会用 splice 来接收 TCP。
(感谢来自社区的 bug report )
这个 PR 来解决上述问题。整体思路是 “建立白名单 fastsock_allowlist_map,对于白名单上的进程允许使用 sockmap + msg_redirect 直连,反之使用传统的内核网络栈”。白名单是一个 bpf map, key 是进程名
task->comm。一个进程必须完成一次 TCP 会话,在会话结束的时候检查这个进程有没有调用过 splice syscall,如果没有就标记 allow=1,下一次 TCP 会话就可以走快速通道。具体来说,考虑 curl 1.1.1.2:80 被 WAN 劫持给 dae 这个场景:
bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG);, 从而在这次 TCP 会话状态变化的时候会被回调。只有一个进程的第一次 TCP 会话会走这个探测流程,第二次就不会了,因为第二次直接查询名名单,无论 allow/!allow 都有记录。task->comm相同就行),在 TCP established 的时候检查白名单,发现有记录,如果 allow 就走快速 TCP 通道, !allow 就继续走内核网络栈。如果是 glider 进程走上述流程,会在第一次 TCP 会话结束的时候进行就会发现 splice called,从而在白名单里标记 allow=0,不会走进内核 bug 的旋涡。
Requirements
由于需要探测是否调用 splice syscall,需要使用 syscall tracepoint,所以要求内核有编译配置
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y。(应该不会有太大问题,吧?)这个 PR 本质是内核 bug 的临时 workaround,我已经把 bug 上报给内核社区并引起了注意,等未来修复 bug 并且 backport 到了 LTS (6.6, 6.1, 5.15, 5.10) 之后,我们可以考虑再删除这些临时措施。
Checklist
Full Changelogs
Issue Reference
Closes #[issue number]
Test Result