Domain resolve error (“Name or service not known”) forAzure OpenAI Service domain

Kun-Hung Tsai
3 min readAug 3, 2023

Recently, our company subscribed to the Azure OpenAI service. According to the official documentation, we were supposed to replace the original OpenAI API URL with the URL provided by Azure (e.g., ***.openai.azure.com) to use their service. However, we encountered a "Name or service not known" error. It seemed as though the DNS module of Python was unable to resolve this host within our service.

We also attempted to directly resolve the endpoint from our Kubernetes Pod using the dig and curl commands. Curiously, while dig worked as expected, the curl command resulted in a curl: (6) Could not resolve host: ***.service.openai.azure.com error.

Upon further investigation, we noticed that the dig result for the Azure OpenAI service URL appeared unusually lengthy compared to standard DNS records. It required six CNAME queries to retrieve the final A record. The answer section of the dig command looked something like this:

;; ANSWER SECTION:
**.openai.azure.com. 300 IN CNAME eastus.api.cognitive.microsoft.com.
eastus.api.cognitive.microsoft.com. 299 IN CNAME cognitiveuseprod.trafficmanager.net.
cognitiveuseprod.trafficmanager.net. 29 IN CNAME cognitiveuseprod.azure-api.net.
cognitiveuseprod.azure-api.net. 299 IN CNAME apimgmttmsxrvtifzqdtqhxmfmfanqbnjviisx1axz0alfmmew.trafficmanager.net.
apimgmttmsxrvtifzqdtqhxmfmfanqbnjviisx1axz0alfmmew.trafficmanager.net. 299 IN CNAME cognitiveuseprod-eastus-01.regional.azure-api.net.
cognitiveuseprod-eastus-01.regional.azure-api.net. 299 IN CNAME apid69e4f0cd7984aa8a89d9d59fb140df1wjcqcmgrfkzhbxquwhytr.eastus.cloudapp.azure.com.
apid69e4f0cd7984aa8a89d9d59fb140df1wjcqcmgrfkzhbxquwhytr.eastus.cloudapp.azure.com. 9 IN A 20.232.91.180

After searching on Google, we learned that according to RFC#1123, a single DNS payload exceeding the 512-byte limit for UDP will be truncated:

It is also clear that some new DNS record types defined in the future will contain information exceeding the 512 byte limit that applies to UDP, and hence will require TCP. Thus, resolvers and name servers should implement TCP services as a backup to UDP today, with the knowledge that they will require the TCP service in the future.

Our local-dnsservice logs also confirmed that the DNS query from curl was indeed truncated to 512 bytes:

# dig
[node-local-dns-****] [INFO] **.**.**.**:59001 - 37837 "A IN ***.openai.azure.com. udp 46 false 512" NOERROR qr,rd,ra 674 0.000912329s

# curl
[node-local-dns-****] [INFO] **.**..**.**:42253 - 61289 "A IN ***.openai.azure.com. udp 69 false 4096" NOERROR qr,rd,ra 783 0.013809889s

According to RFC#5596, the introduction of EDNS0 (Extension Mechanisms for DNS 0) provides a mechanism to use DNS Transport over TCP when a response exceeds the 512-byte limit. In such cases, the client should interpret the TC flag as an indication to retry over TCP instead.

In the absence of EDNS0 (Extension Mechanisms for DNS 0) (see below),
the normal behaviour of any DNS server needing to send a UDP response
that would exceed the 512-byte limit is for the server to truncate
the response so that it fits within that limit and then set the TC
flag in the response header. When the client receives such a
response, it takes the TC flag as an indication that it should retry
over TCP instead.

We decided to use EDNS0 and added it directly to our deployment configuration to see if it would solve our problem:

dnsConfig:
options:
- name: edns0

After this modification, we were able to successfully query the Azure OpenAI service domain.

I hope this article helps those who encounter the same problem. Do check out the reference [1] below, which provides a comprehensive explanation of DNS behavior and other potential solutions to this issue

Reference

[1] https://easoncao.com/coredns-resolution-truncation-issue-on-kubernetes-kube-dns/
[2] https://wdicc.com/dns-request-in-alpine-image/ (Chinese)

--

--