-count=3 for statistical consistency| Buffer | Throughput (MB/s) | Underlying Reads | Notes |
|---|---|---|---|
| 4 KB | 7,460-7,700 | 322 | |
| 8 KB | 7,400-7,480 | 322 | |
| 16 KB | 7,470-7,540 | 322 |
Result: Identical read counts (322). Identical throughput within noise. As expected: tls.Conn.Read() reads from internal readBuf (bytes.Buffer), relay buffer size doesn’t propagate to underlying socket.
| Buffer | Throughput (MB/s) | Underlying Reads | Notes |
|---|---|---|---|
| 4 KB | 1,946-1,950 | 1,281 | |
| 8 KB | 1,942-1,946 | 1,281 | |
| 16 KB | 1,935-1,948 | 1,281 |
Result: Also identical read counts (1,281). Throughput identical.
Why: net.Pipe() delivers data synchronously — one Write() maps to exactly one Read(). The relay buffer size determines the maximum bytes per Read(), but Read() returns whatever the sender wrote. In real TCP, the kernel determines how much data is available per read(2) call based on:
The buffer size only matters when the kernel has MORE data than the buffer can hold. For Telegram traffic over internet (not localhost), individual TCP segments are typically 1.4 KB (MTU). The kernel may batch multiple segments, but rarely >64 KB.
| Scenario | Buffer | Throughput (MB/s) | Reads |
|---|---|---|---|
| Burst | 4 KB | 12,033-12,674 | 1,281 |
| Burst | 16 KB | 12,679-12,751 | 1,281 |
| MTU | 4 KB | 2,816-2,848 | 7,184 |
| MTU | 16 KB | 2,833-2,856 | 7,184 |
Key finding for MTU test: Even with 1,460-byte chunks (simulating real TCP), read counts are identical (7,184) for all buffer sizes. This is because each chunk is smaller than even the 4 KB buffer, so buffer size doesn’t matter.
Throughput difference between burst and MTU modes (~12 GB/s vs ~2.8 GB/s) comes from overhead of many small writes through net.Pipe(), not from buffer-related syscall counts.
| Buffer | Throughput (MB/s) | Underlying Reads |
|---|---|---|
| 4 KB | 7,630-7,644 | 322 |
| 16 KB | 7,688-7,823 | 322 |
Same pattern as Test A. TLS layer absorbs the difference.
| Direction | Buffer | Throughput (MB/s) | Reads |
|---|---|---|---|
| tg→client | 4 KB | 392-396 | 10,001 |
| tg→client | 16 KB | 400-402 | 10,001 |
| client→tg | 4 KB | 2,023-2,025 | 64 |
| client→tg | 16 KB | 2,012-2,028 | 64 |
Small messages: all data is <200 bytes per write, buffer size is irrelevant.
In practice, relay buffer size does not affect syscall count or throughput. The argument “4 KB buffer = 4× more syscalls” assumes the kernel always has 16 KB of data ready and the application is the bottleneck. In reality:
| Approach | N=100 | N=500 | N=1000 | N=2000 |
|---|---|---|---|---|
| Stack 16KB | 3.2 MB (32 KB/gor) | 16.4 MB (32 KB/gor) | 32.8 MB (32 KB/gor) | 65.5 MB (32 KB/gor) |
| Pool 16KB | 0 MB | 0.03-0.1 MB | 0.4-0.8 MB | 2.1-2.4 MB |
| Pool 4KB | 0 MB | 0-0.1 MB | 0.5-0.7 MB | 2.3-2.5 MB |
Explanation:
Surprisingly similar! Both have ~2.3 MB stack overhead at N=2000. The difference is in heap:
The heap difference is small because sync.Pool is clever — it reuses buffers and GC cleans idle ones.
| Pool buf size | Idle heap after burst 1 | Active heap during burst 2 |
|---|---|---|
| 4 KB | 5.6-8.1 MB | ~8 MB + 2.7 MB stack |
| 16 KB | 11.9-13.9 MB | ~13 MB + 2.7 MB stack |
9seconds is partially right: After a burst of 500 goroutines, pool holds ~6-14 MB of idle heap (depending on buffer size). This is memory that wouldn’t exist with stack-allocated buffers (which are freed when goroutines exit).
However:
sync.Pool with relay buffers provides massive stack memory savings (96.5% at 1000 connections). The trade-off is temporary idle heap memory between connection bursts, but:
The 16 KB vs 4 KB pool buffer size makes negligible difference for memory — the savings come from moving the buffer off the stack entirely, not from making it smaller.
| Scenario | stack 16KB | pool 16KB | pool 4KB |
|---|---|---|---|
| Raw relay (10 MB) | 11,018 MB/s | 10,952 MB/s | 11,004 MB/s |
| TLS relay (10 MB) | 9,788 MB/s | 9,633 MB/s | 9,676 MB/s |
All values are within ±2% noise. No statistically significant difference.
sync.Pool.Get() + Put() = 7.3 ns per call (one-time per connection, not per read)[16379]byte = 0.25 nssync.Pool introduces no measurable CPU overhead for relay operations. The ~7 ns per Get/Put is amortized across the entire connection lifetime (millions of ns). Throughput is identical whether using stack-allocated or pool-allocated buffers of any size.