mtglib/dcprobe: unauthenticated DC verification probe
New leaf package that performs the first step of the MTProto handshake
(req_pq_multi -> resPQ) over the existing obfuscated2 transport. No
auth_key is generated; no long-lived state is introduced. Two TL
messages, one round-trip, no new dependencies.
A generic listener cannot fake the reply because it must echo back our
random nonce in resPQ.
Used by the doctor command in a follow-up commit to distinguish a real
Telegram DC from a generic TCP listener bound to port 443.
Per discussion on #494, this allows external packages (e.g. the upcoming
mtglib/dcprobe for the doctor RPC probe) to reuse the obfuscated2
transport without an internal wrapper.
No public-API change beyond the import path. The only exported names
(Obfuscator, its two methods, and the Secret field) were already
exported within the package.
Deprecate "ip" in favour of "host" for domain fronting
Per review on #480: warn-and-ignore for the IP-shaped paths,
mirroring the net.Dialer.DualStack precedent — a config that
sets only "ip" will warn at startup and effectively disable
domain-fronting until the user switches to "host".
- mtglib.ProxyOpts: add DomainFrontingHost; mark DomainFrontingIP
Deprecated and warn-and-drop in NewProxy.
- internal/config: GetDomainFrontingHost returns only
[domain-fronting].host; deprecated keys are no longer used to
derive the dial target. runProxy logs a startup warning per
deprecated key that is set.
- internal/cli: add --domain-fronting-host; --domain-fronting-ip
flag is parsed only so the runtime warning can fire.
- internal/cli/doctor: redirect the existing 2.3.0 entry at "host"
and add a 2.4.0 entry for [domain-fronting].ip.
- example.config.toml: mark # ip = ... as deprecated.
doctor: use WaitGroup.Go and recover panics in DC probes
Address review feedback on #485:
- switch to sync.WaitGroup.Go (Go 1.25+) for the per-DC goroutine
- recover panics inside the goroutine and record them as that DC's
error, so a single panicking probe no longer crashes the whole
doctor run and the remaining DCs still report their results
Fix SELinux-related permission denied error for containerized apps
reading configs exposed via volumes.
Also make it possible to use port 80 in the fronted.
Each DC dial uses a 10s timeout, and "checkNetwork" iterates 6 DCs
sequentially, so worst case is ~60s when egress is broken. Probing in
parallel collapses the worst case to a single timeout window while
preserving the existing DC-ordered output.
Refs #482
Do not use custom DNS resolver to dial proxy upstreams
Fixes #439.
When `[network] dns = "tls://..."` (or "https://...") is set, the
resulting *net.Resolver gets attached to the base network's NativeDialer
and was previously also handed to golang.org/x/net/proxy.FromURL via
NewProxyNetwork. As a result, the SOCKS5 client used the user's DoT/DoH
resolver to look up the SOCKS server's own hostname (e.g. "xray" inside
a docker compose stack). Public DNS-over-TLS resolvers don't know about
docker-compose service names, k8s service DNS, /etc/hosts entries, or
corporate split-horizon DNS, so the upstream lookup returned NXDOMAIN
and the proxy chain broke with a misleading "lookup xray on
127.0.0.11:53: no such host" error.
The custom DNS resolver exists to bypass DPI poisoning when resolving
public censored names like Telegram DCs or the SNI/fronting host. Proxy
server addresses are almost always internal and should be resolved via
the system resolver instead. This change introduces proxyServerDialer,
which copies the timeout and fallback-delay from the base dialer but
leaves Resolver==nil, and uses it for the SOCKS upstream.
The new internal test asserts the structural property directly: the
returned dialer must not inherit the base's custom resolver.
docs: link example.config.toml as the config reference
The TOML configuration file is documented in example.config.toml -
every option is listed with its default value and an inline comment.
But the README never links to it directly: it just says "please
checkout an example configuration file" / "please check configuration
file example", which is easy to skim past when looking for a config
reference.
This patch:
- makes example.config.toml an explicit link in the "Prepare a
configuration file" section and calls it out as the configuration
documentation;
- adds the same link to the Doppelganger and Metrics sections, which
also point readers at the example file.
No content/option changes - README only.
Address round-two review: rename mtglib privates, reorder, more tests
- mtglib/proxy.go: rename private field domainFrontingIP -> domainFrontingHost
and update DomainFrontingAddress() doc comment to reflect that hostnames
are now accepted. The exported mtglib.ProxyOpts.DomainFrontingIP is
unchanged (public API), so the assignment in NewProxy now reads
`domainFrontingHost: opts.DomainFrontingIP,` which makes the
public-vs-internal naming explicitly visible at the boundary.
- internal/config/{parse,config}.go: reorder so Host comes before IP in
the [domain-fronting] struct. Cosmetic, but signals Host is the
preferred forward path.
- Add TestDomainFrontingHostAcceptsLiteralIP + domain_fronting_host_ip.toml
fixture exercising the documented "host accepts hostname or literal IP"
contract end-to-end.
Follow-up to the previous commit on this branch:
- Rename Config.GetDomainFrontingIP -> GetDomainFrontingHost. The
helper now returns a hostname or an IP, so the old name was a lie.
Drop the unused defaultValue net.IP parameter (every caller passed
nil). Update internal/cli/run_proxy.go and internal/cli/doctor.go;
rename the misleading `ip` local var in doctor.go to `override`.
- Add TOML fixtures (domain_fronting_host.toml, domain_fronting_ip.toml)
so the new field is exercised through the actual Parse()->JSON->Config
path users hit, not just via direct .Set() calls. Plus a positive
backward-compat test confirming an `ip`-only legacy config still
validates and resolves correctly, and a no-fronting test confirming
the unset case returns empty.
- Clarify example.config.toml: `ip` is kept for backward compatibility,
not because it has stricter validation semantics worth choosing over
`host`.
mtglib.ProxyOpts.DomainFrontingIP keeps its name (public API).
The existing `[domain-fronting].ip` only accepts a literal IP. That
forces SNI-router setups to pin a static container address (and a
static docker subnet) so mtg can dial the fronting backend directly
instead of resolving the secret's hostname via DNS, which would loop
back into mtg through the SNI router.
Add a sibling `[domain-fronting].host` that accepts either a hostname
or an IP. Hostnames are resolved at dial time by the native dialer
(Happy Eyeballs / dual-stack), so a docker-DNS or any A+AAAA record
naturally picks the right backend address family per client. Setting
both `host` and `ip` is rejected at validation.
The mtglib API stays backward compatible: ProxyOpts.DomainFrontingIP
is still a plain string and the dial path already calls JoinHostPort +
DialContext, both of which accept hostnames. Only the doc comment was
clarified.
Require all detected IP families to match in SNI-DNS check
Previously the check returned OK if any resolved address matched
either the public IPv4 or IPv6. A matching AAAA could mask a
mismatched A record (and vice versa), which is a problem because
most client connectivity is still IPv4: a partial match would
silently pass the warning while DPI still blocks the proxy.
Now each detected IP family must appear in the DNS response; the
warning also reports per-family match status so operators can tell
which record is wrong.
Pass real client IPs through with PROXY protocol v2
Without this, mtg and Caddy see HAProxy's container IP for every
connection, which breaks meaningful logging, abuse handling, and any
IP-based blocklist logic. HAProxy sends a PROXY protocol v2 header on
its TCP backends; mtg enables proxy-protocol-listener, and Caddy wraps
:8443 with a proxy_protocol listener before tls.
The :80 path (ACME HTTP-01 passthrough) is unchanged — client IP there
is not useful and HAProxy's http mode already adds X-Forwarded-For if
anyone wants it.
Requested in https://github.com/9seconds/mtg/pull/462 review.
The previous wording ("silently routed to the fronting domain")
is inaccurate. In mtglib/proxy.go the blocklist path calls
conn.Close() immediately with no further handshake or fronting;
domain fronting only happens on FakeTLS failures for non-blocked
IPs. Reword to "TCP connection is closed with no response" so
users searching the docs get the same symptom they actually see.
Document firehol_level1 RFC1918 gotcha in blocklist defaults
The default [defense.blocklist] uses firehol_level1.netset, which
includes bogon networks and therefore all RFC1918 ranges. Clients
connecting from a LAN address (e.g. a phone on the home Wi-Fi when
mtg runs at home) are silently rejected with "ip was blacklisted"
and routed to the fronting domain. This is a recurring source of
confusion (see issue #466 for the latest example).
Add a warning next to the urls list in example.config.toml and a
Troubleshooting section in README.md covering the symptom, the
cause, and three resolution paths (disable blocklist, swap for a
narrower list, or use hairpin NAT).
Docs only, no code changes.
Add an ACL that routes /.well-known/acme-challenge/ requests on :80
to Caddy instead of redirecting to HTTPS, so Let's Encrypt certificate
issuance works out of the box.
Also simplify Caddyfile to use Caddy's http_port/https_port directives.
Add docker-compose example with HAProxy SNI router
Turnkey deployment: HAProxy on :443 peeks at the TLS SNI and routes
Telegram clients to mtg while forwarding everything else (including DPI
probes) to a real Caddy web server with automatic HTTPS.
This is the setup recommended in BEST_PRACTICES.md, packaged so that
operators can clone and run it with minimal configuration.
Refs: #458
The SNI-DNS validation that exists in 'mtg doctor' is now also run at
proxy startup. If the secret hostname does not resolve to the server's
public IP, a warning is logged so that operators notice the
misconfiguration before DPI silently blocks the proxy.
The check is best-effort: if the public IP cannot be detected or the
hostname cannot be resolved, a brief warning is emitted and the proxy
starts normally.
Refs: #444, #458
Fixes #457.
OpenBSD has no user-settable per-socket TCP keepalive options:
TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT do not exist on OpenBSD,
keepalive timing is controlled system-wide via the sysctls
net.inet.tcp.keepidle and net.inet.tcp.keepintvl. Go reflects this in
src/net/tcpsockopt_openbsd.go: setKeepAliveIdle / Interval / Count
return ENOPROTOOPT for any non-negative value, and only short-circuit
to nil for negative values that explicitly mean "leave alone".
mtg builds a net.KeepAliveConfig with zero-valued Idle / Interval /
Count whenever the user does not override them in the config (which
is the default and the documented expectation). It then hands that
config to (*TCPConn).SetKeepAliveConfig in two places:
- network/sockopts.go: applied to every connection accepted by
internal/utils.Listener.Accept and to every server-side dial that
goes through the v1 default network.
- network/v2/sockopts.go: applied to every connection produced by
the v2 network's DialContext.
On OpenBSD both calls fail with "set tcp ...: protocol not available".
The user-visible effect is that:
- `mtg doctor` reports the error for every Telegram DC.
- `mtg run` accepts incoming TCP connections at the kernel level but
Listener.Accept then closes each one before the proxy server ever
sees it, so the client appears to hang on a half-open socket and
nothing is logged.
- There is no configuration workaround. Setting [network]
keep-alive.disabled = true only zeroes Enable; Go still calls
setKeepAliveIdle / Interval / Count, which still fail.
This change extracts the keepalive setup behind an applyKeepAlive
helper that has a per-platform implementation, following the same
build-tag pattern already used for sockopts_lowat, sockopts_congestion,
sockopts_reuseaddr and sockopts_usertimeout. On every supported
platform except OpenBSD it still calls SetKeepAliveConfig and the
behaviour is unchanged. On OpenBSD it calls SetKeepAlive(cfg.Enable)
instead, which only flips SO_KEEPALIVE on or off and never touches
the missing per-socket options. OpenBSD users get the system-wide
sysctl-controlled keepalive timing, which is the only thing the
kernel exposes anyway.
Verified by cross-building (`GOOS=openbsd GOARCH=amd64 go build ./...`
and `GOARCH=arm64`) and by running `go test ./network/...` on linux.
As per RFC, if TLS server cannot pickup a suitable cipher from a client
list, it has to send handshake_failure alert. For us it means that we
have to route a request to a fronting domain, because we want to have it
exactly like a real webserver does. So, if it misbehaves, so do we.
This PR adds a new setting to the config: `network.timeout`. This setting
defines a time period during which all handshake procedures and
ceremonies must be completed. If not - connection is aborted. This
should help in situations when connection is established but client
cannot continue for some reason (for example, RST sent by some middle box).