Bug 6081 - Service DNS resolution failed
Summary: Service DNS resolution failed
Status: NEW
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: Others (show other bugs) Others
Version: 8.6
Hardware: All Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: Jacob
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-03 11:18 UTC by czhenly
Modified: 2023-08-04 19:18 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description czhenly 2023-08-03 11:18:31 UTC
I am using localdns in anolis8.6 (kernel: 4.19.91-26) based K8s cluster. Now localdns sometimes takes 2s to return a response. I used tcpdump to capture packets and found that upstream responded quickly to localdns, but localdns did not respond to pod immediately, but waited until localdns successfully requested upstream for the second time before responding to pod dns request。


Version-Release number of selected component (if applicable):

anolis8.6 (kernel: 4.19.91-26)

localdns image version:node-cache  1.22.20


Expected results:
 localdns Pod  can response pod dns request quickly
Comment 1 czhenly 2023-08-04 19:10:12 UTC

 I use strace to trace the process, the message is follow:

right responce trace:
```
[pid 212087] 17:14:29.013067 epoll_pwait(4,  <unfinished ...>
[pid 179520] 17:14:29.013074 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 212083] 17:14:29.013084 futex(0xc000658148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 179520] 17:14:29.013165 <... nanosleep resumed>NULL) = 0 <0.000083>
[pid 179520] 17:14:29.013180 futex(0x2521200, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=21550841} <unfinished ...>
[pid 212087] 17:14:29.015350 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=4079288752, u64=140522524311984}}], 128, 21, NULL, 0) = 1 <0.002270>
[pid 212087] 17:14:29.015387 futex(0x2521200, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000010>
[pid 179520] 17:14:29.015411 <... futex resumed>) = 0 <0.002223>
[pid 212087] 17:14:29.015420 read(10,  <unfinished ...>
[pid 179520] 17:14:29.015428 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 212087] 17:14:29.015438 <... read resumed>"/5\201\200\0\1\0\3\0\0\0\1\6stream\5cloud\4aaa\6domain\0\0\1\0\1\300\f\0\5\0\1\0\0\0\n\0\10\5apigw\300\31\3006\0\1\0\1\0\0\2X\0\4\nH\vm\3006\0\1\0\1\0\0\2X\0\4\nH\t\4\0\0)\20\0\0\0\200\0\0\0", 2048) = 105 <0.000013>

```

no responce trace:
```
[pid 179508] 17:17:29.087933 <... write resumed>) = 53 <0.000035>
[pid 179508] 17:17:29.087957 read(14,  <unfinished ...>
[pid 179508] 17:17:29.087972 <... read resumed>0xc000def800, 2048) = -1 EAGAIN (Resource temporarily unavailable) <0.000009>
[pid 179508] 17:17:29.087997 epoll_pwait(4,  <unfinished ...>
[pid 179508] 17:17:29.088023 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 <0.000019>
[pid 179508] 17:17:29.088038 futex(0x2520e48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 179508] 17:17:29.114482 <... futex resumed>) = 0 <0.026438>
[pid 179508] 17:17:29.114516 epoll_pwait(4,  <unfinished ...>
[pid 179508] 17:17:29.114540 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 <0.000018>
[pid 179508] 17:17:29.114558 epoll_pwait(4,  <unfinished ...>
```
Comment 2 czhenly 2023-08-04 19:18:15 UTC
(In reply to czhenly from comment #1)
> 
>  I use strace to trace the process, the message is follow:
> 
> right responce trace:
> ```
> [pid 212087] 17:14:29.013067 epoll_pwait(4,  <unfinished ...>
> [pid 179520] 17:14:29.013074 nanosleep({tv_sec=0, tv_nsec=20000}, 
> <unfinished ...>
> [pid 212083] 17:14:29.013084 futex(0xc000658148, FUTEX_WAIT_PRIVATE, 0, NULL
> <unfinished ...>
> [pid 179520] 17:14:29.013165 <... nanosleep resumed>NULL) = 0 <0.000083>
> [pid 179520] 17:14:29.013180 futex(0x2521200, FUTEX_WAIT_PRIVATE, 0,
> {tv_sec=0, tv_nsec=21550841} <unfinished ...>
> [pid 212087] 17:14:29.015350 <... epoll_pwait
> resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=4079288752,
> u64=140522524311984}}], 128, 21, NULL, 0) = 1 <0.002270>
> [pid 212087] 17:14:29.015387 futex(0x2521200, FUTEX_WAKE_PRIVATE, 1) = 1
> <0.000010>
> [pid 179520] 17:14:29.015411 <... futex resumed>) = 0 <0.002223>
> [pid 212087] 17:14:29.015420 read(10,  <unfinished ...>
> [pid 179520] 17:14:29.015428 nanosleep({tv_sec=0, tv_nsec=20000}, 
> <unfinished ...>
> [pid 212087] 17:14:29.015438 <... read
> resumed>"/
> 5\201\200\0\1\0\3\0\0\0\1\6stream\5cloud\4aaa\6domain\0\0\1\0\1\300\f\0\5\0\1
> \0\0\0\n\0\10\5apigw\300\31\3006\0\1\0\1\0\0\2X\0\4\nH\vm\3006\0\1\0\1\0\0\2X
> \0\4\nH\t\4\0\0)\20\0\0\0\200\0\0\0", 2048) = 105 <0.000013>
> 
> ```
> 
> no responce trace:
> ```
> [pid 179508] 17:17:29.087933 <... write resumed>) = 53 <0.000035>
> [pid 179508] 17:17:29.087957 read(14,  <unfinished ...>
> [pid 179508] 17:17:29.087972 <... read resumed>0xc000def800, 2048) = -1
> EAGAIN (Resource temporarily unavailable) <0.000009>
> [pid 179508] 17:17:29.087997 epoll_pwait(4,  <unfinished ...>
> [pid 179508] 17:17:29.088023 <... epoll_pwait resumed>[], 128, 0, NULL, 0) =
> 0 <0.000019>
> [pid 179508] 17:17:29.088038 futex(0x2520e48, FUTEX_WAIT_PRIVATE, 0, NULL
> <unfinished ...>
> [pid 179508] 17:17:29.114482 <... futex resumed>) = 0 <0.026438>
> [pid 179508] 17:17:29.114516 epoll_pwait(4,  <unfinished ...>
> [pid 179508] 17:17:29.114540 <... epoll_pwait resumed>[], 128, 0, NULL, 0) =
> 0 <0.000018>
> [pid 179508] 17:17:29.114558 epoll_pwait(4,  <unfinished ...>
> ```

the right response message is wrong, this is new right message

```
[pid 212083] 17:15:29.034880 <... write resumed>) = 53 <0.000039>
[pid 212083] 17:15:29.034898 read(12,  <unfinished ...>
[pid 212083] 17:15:29.034915 <... read resumed>0xc00098f800, 2048) = -1 EAGAIN (Resource temporarily unavailable) <0.000009>
[pid 212083] 17:15:29.034931 epoll_pwait(4,  <unfinished ...>
[pid 212083] 17:15:29.034946 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 <0.000009>
[pid 212083] 17:15:29.034964 futex(0xc000658148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 212083] 17:15:29.036643 <... futex resumed>) = 0 <0.001672>
[pid 212083] 17:15:29.036657 epoll_pwait(4, [], 128, 0, NULL, 0) = 0 <0.000010>
[pid 212083] 17:15:29.036686 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
[pid 212083] 17:15:29.036760 <... nanosleep resumed>NULL) = 0 <0.000065>
[pid 212083] 17:15:29.036775 futex(0xc000076548, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 212083] 17:15:29.036801 <... futex resumed>) = 1 <0.000020>
[pid 212083] 17:15:29.036817 epoll_pwait(4,  <unfinished ...>
[pid 212083] 17:15:29.036846 <... epoll_pwait resumed>[{events=EPOLLOUT, data={u32=4083213016, u64=140522528236248}}], 128, 0, NULL, 0) = 1 <0.000022>
[pid 212083] 17:15:29.036870 epoll_pwait(4,  <unfinished ...>
[pid 212083] 17:15:29.055981 <... epoll_pwait resumed>[], 128, 19, NULL, 0) = 0 <0.019090>
[pid 212083] 17:15:29.056034 epoll_pwait(4, [], 128, 0, NULL, 0) = 0 <0.000009>
[pid 212083] 17:15:29.056070 epoll_pwait(4, [], 128, 0, NULL, 0) = 0 <0.000009>
[pid 212083] 17:15:29.056103 epoll_pwait(4,  <unfinished ...>
[pid 212083] 17:15:29.056456 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=4083213016, u64=140522528236248}}], 128, 1, NULL, 0) = 1 <0.000347>
[pid 212083] 17:15:29.056484 recvmsg(24, {msg_name={sa_family=AF_INET, sin_port=htons(47118),
.....

```