From dc664afa527d9291944f7b42473aef2a18358f0a Mon Sep 17 00:00:00 2001 From: sanasol Date: Tue, 27 Jan 2026 06:28:09 +0100 Subject: [PATCH] docs: Update crash investigation - no stable solution found - jemalloc helps ~30% of the time but not reliable - Documented all failed approaches (allocators, scheduling, patching variations) - Added potential alternative approaches (network hooking, proxy, container) - Status: UNSOLVED Co-Authored-By: Claude Opus 4.5 --- docs/STEAMDECK_CRASH_INVESTIGATION.md | 226 +++++++++++++++----------- 1 file changed, 128 insertions(+), 98 deletions(-) diff --git a/docs/STEAMDECK_CRASH_INVESTIGATION.md b/docs/STEAMDECK_CRASH_INVESTIGATION.md index 87e1d50..f50067c 100644 --- a/docs/STEAMDECK_CRASH_INVESTIGATION.md +++ b/docs/STEAMDECK_CRASH_INVESTIGATION.md @@ -1,11 +1,23 @@ # Steam Deck / Ubuntu LTS Crash Investigation +## Status: UNSOLVED + +**Last updated:** 2026-01-27 + +No stable solution found. jemalloc helps occasionally but crashes still occur randomly. + +--- + ## Problem Summary The Hytale F2P launcher's client patcher causes crashes on Steam Deck and Ubuntu LTS with the error: ``` free(): invalid pointer ``` +or +``` +SIGSEGV (Segmentation fault) +``` The crash occurs after successful authentication, specifically right after "Finished handling RequiredAssets". @@ -16,35 +28,73 @@ The crash occurs after successful authentication, specifically right after "Fini **Working Systems:** - macOS - Windows -- Arch Linux +- Older Arch Linux (glibc < 2.41) -**Critical Finding:** The UNPATCHED original binary works fine on Steam Deck. The crash is caused by our patching. +**Critical Finding:** The UNPATCHED original binary works fine on Steam Deck. The crash is caused by ANY binary patching. + +--- + +## What Was Tried (All Failed) + +### Memory Allocators +| Approach | Result | +|----------|--------| +| `LD_PRELOAD=/usr/lib/libjemalloc.so.2` | Works randomly (3/10 times), not stable | +| `MALLOC_CHECK_=0` | No effect | +| `MALLOC_PERTURB_=255` | No effect | +| `GLIBC_TUNABLES=glibc.malloc.tcache_count=0` | No effect | + +### Process/Scheduling +| Approach | Result | +|----------|--------| +| `taskset -c 0` (single core) | Game too slow, stuck at connecting | +| `taskset -c 0,1` or `0-3` | Still crashes | +| `nice -n 19` | No effect | +| `chrt --idle 0` | No effect | +| `strace -f` | No effect | + +### Linker/Loading +| Approach | Result | +|----------|--------| +| `LD_BIND_NOW=1` | No effect | +| Wrapper script with LD_PRELOAD | No effect | +| Shell spawn with inline LD_PRELOAD | No effect | + +### Patching Variations +| Approach | Result | +|----------|--------| +| Null-padding after replacement | Crashes (made it worse) | +| No null-padding (develop behavior) | Still crashes | +| Minimal patches (3 instead of 6) | Still crashes | +| Ultra-minimal (1 patch - domain only) | Still crashes | +| Skip sentry patch | Still crashes | +| Skip subdomain patches | Still crashes | + +**Key Finding:** Even patching just 1 string (main domain only) causes the crash. --- ## String Occurrences Found +### Length-Prefixed Format +Found by default patcher mode: + +| Offset | Content | Notes | +|--------|---------|-------| +| 0x1bc5d63 | `hytale.com` | **Surrounded by x86 code!** | + ### UTF-16LE Format (3 occurrences) -Found by `HYTALE_PATCH_MODE=utf16le`: - -| Index | Offset | Before Context | After Context | Likely URL | -|-------|--------|----------------|---------------|------------| -| 0 | 0x1bc5ad7 | `try.` | `/2i3...` | `sentry.hytale.com/2...` | -| 1 | 0x1bc5b3f | `s://` | `/hel` | `https://hytale.com/help...` | -| 2 | 0x1bc5bc9 | `ore.` | `/?up` | `store.hytale.com/?up...` | - -### Length-Prefixed Format (1 occurrence) -Found by default `length-prefixed` mode: - -| Offset | Before | After | Notes | -|--------|--------|-------|-------| -| 0x1bc5d63 | `5933b8` | `89338807` | **Surrounded by what looks like x86 code!** | +| Offset | Content | +|--------|---------| +| 0x1bc5ad7 | `sentry.hytale.com/...` | +| 0x1bc5b3f | `https://hytale.com/help...` | +| 0x1bc5bc9 | `store.hytale.com/?...` | --- -## Critical Finding: Binary Diff Analysis +## Binary Analysis -When patching with length-prefixed mode (single occurrence): +When patching with length-prefixed mode: ``` < 01bc5d60: 5933 b80a 0000 0068 0079 0074 0061 006c Y3.....h.y.t.a.l @@ -54,31 +104,18 @@ When patching with length-prefixed mode (single occurrence): > 01bc5d70: 006f 006c 002e 0077 0073 8933 8807 0000 .o.l...w.s.3.... ``` -**Structure at 0x1bc5d60:** +**Structure:** ``` -5933 b8 | 0a000000 | 68007900740061006c0065002e0063006f006d | 8933 8807 0000 -???????? | len=10 | h.y.t.a.l.e...c.o.m | mov [rbx],esi? +5933 b8 | 0a000000 | h.y.t.a.l.e...c.o.m | 8933 8807 0000 +???????? | len=10 | string content | mov [rbx],esi? ``` -- `5933 b8` before the string - could be code or metadata +- `5933 b8` before string - could be code or metadata - `0a 00 00 00` - .NET length prefix (10 characters) - String content in UTF-16LE - `89 33` after - this is `mov [rbx], esi` in x86-64! -**The string appears to be embedded near executable code, not in a clean data section.** - ---- - -## Test Results Summary - -| Test | Occurrences Patched | Auth Works | Crashes | -|------|---------------------|------------|---------| -| Length-prefixed (default) | 1 at 0x1bc5d63 | YES | YES | -| UTF-16LE mode | 3 at 0x1bc5ad7, 0x1bc5b3f, 0x1bc5bc9 | YES | YES | -| Skip all UTF-16LE | 0 (but legacy fallback patched 4!) | YES | YES | -| Original unpatched | 0 | NO (wrong issuer) | NO | - -**Key Insight:** Even patching just ONE string (the length-prefixed one) causes the crash, yet authentication succeeds before the crash. +**The string is embedded near executable code, not in a clean data section.** --- @@ -100,63 +137,75 @@ Crash occurs in `libzstd.so` during `free()` after "Finished handling RequiredAs ## Hypotheses -### 1. .NET String Interning -.NET AOT may have precomputed hashes or metadata for interned strings. Modifying the string content breaks the hash, causing memory corruption when the runtime tries to use it. +### 1. .NET AOT String Metadata (Most Likely) +.NET AOT may have precomputed hashes, checksums, or relocation info for strings. Modifying string content breaks internal consistency, causing memory corruption when the runtime tries to use related data structures. -### 2. Code/Data Boundary Issue -The string at 0x1bc5d63 appears to be embedded near x86 code (`89 33` = `mov [rbx], esi`). Modifying it might corrupt instruction decoding or memory layout calculations. +### 2. Code/Data Interleaving +The strings are embedded near x86 code (`89 33` = `mov [rbx], esi`). .NET AOT may use relative offsets that get invalidated when we modify nearby bytes. -### 3. Checksums/Integrity -The binary may have checksums for certain data sections that we're invalidating. +### 3. Binary Checksums +The binary may have integrity checks for certain sections that we're invalidating by patching. -### 4. Memory Alignment -glibc 2.41's stricter heap validation may catch alignment issues that older versions ignore. +### 4. Timing-Dependent Race Condition +The fact that it works randomly (~30% of the time with jemalloc) suggests a race condition that's affected by: +- Memory layout changes from patching +- Allocator behavior differences +- CPU scheduling --- -## Debug Environment Variables +## Valgrind Results (Misleading) -| Variable | Description | Example | -|----------|-------------|---------| -| `HYTALE_AUTH_DOMAIN` | Target domain | `sanasol.ws` | -| `HYTALE_PATCH_MODE` | `utf16le` or `length-prefixed` | `utf16le` | -| `HYTALE_SKIP_SENTRY_PATCH` | Skip sentry URL patch | `1` | -| `HYTALE_SKIP_SUBDOMAIN_PATCH` | Skip subdomain patches | `1` | -| `HYTALE_PATCH_LIMIT` | Max patches to apply | `1` | -| `HYTALE_PATCH_SKIP` | Comma-separated indices to skip | `0,2` | -| `HYTALE_NO_LEGACY_FALLBACK` | Disable legacy fallback | `1` | -| `HYTALE_NOOP_TEST` | Read/write without patching | `1` | +- Valgrind showed NO memory corruption errors +- Game ran successfully under Valgrind (slower execution) +- This suggested jemalloc would fix it, but it doesn't consistently work + +The slowdown from Valgrind likely masks the race condition timing. --- -## Files & Offsets Reference +## Current Launcher Implementation + +The launcher attempts: +1. Auto-detect jemalloc at common paths +2. Auto-install jemalloc via pkexec if not found +3. Launch game with `LD_PRELOAD` via shell command + +But this doesn't provide stable results. + +--- + +## Potential Alternative Approaches (Not Yet Tried) + +### 1. LD_PRELOAD Network Hooking +Instead of patching the binary, hook `getaddrinfo()` / `connect()` to redirect network calls at runtime. No binary modification needed. + +### 2. Local Proxy + Certificate +Run a local HTTPS proxy that intercepts hytale.com traffic and redirects to custom server. Requires installing a custom CA certificate. + +### 3. DNS + iptables Redirect +Use local DNS to resolve hytale.com to localhost, then iptables to redirect to actual custom server. Requires root/sudo. + +### 4. Container with Older glibc +Run the game in a container with glibc < 2.41 where the stricter validation doesn't exist. + +### 5. Different Patching Location +Find strings in a pure data section rather than code-adjacent areas. + +--- + +## Files Reference **Binary:** `HytaleClient` (ELF 64-bit, ~39.9 MB) -| Offset | Format | Content | -|--------|--------|---------| -| 0x1bc5ad7 | UTF-16LE | `sentry.hytale.com/...` | -| 0x1bc5b3f | UTF-16LE | `https://hytale.com/help...` | -| 0x1bc5bc9 | UTF-16LE | `store.hytale.com/?...` | -| 0x1bc5d63 | Length-prefixed | Main session URL (surrounded by code?) | +**Branch:** `fix/steamdeck-jemalloc-crash` --- -## SOLUTION FOUND ✓ +## Install jemalloc (Partial Mitigation) -### Root Cause -The crash is caused by **glibc 2.41's stricter heap validation** catching a pre-existing race condition in the .NET AOT runtime or asset decompression code. Our binary patching triggers this timing-dependent bug, but the patching itself is correct. +jemalloc may help in some cases (~30% success rate): -### Evidence -- Valgrind showed NO memory corruption errors -- Game ran successfully under Valgrind (slower execution avoids the race) -- Game was manually killed (SIGINT), not crashed -- 1.4M allocations with no "Invalid free" detected - -### Fix: Use jemalloc allocator -jemalloc handles the race condition gracefully. The launcher now auto-detects and uses jemalloc on Linux. - -**Install jemalloc:** ```bash # Steam Deck / Arch Linux sudo pacman -S jemalloc @@ -168,34 +217,15 @@ sudo apt install libjemalloc2 sudo dnf install jemalloc ``` -The launcher automatically uses jemalloc if found at: -- `/usr/lib/libjemalloc.so.2` (Arch, Steam Deck) -- `/usr/lib/x86_64-linux-gnu/libjemalloc.so.2` (Debian/Ubuntu) -- `/usr/lib64/libjemalloc.so.2` (Fedora/RHEL) - -**Manual workaround (if launcher doesn't detect):** -```bash -LD_PRELOAD=/usr/lib/libjemalloc.so.2 ./Client/HytaleClient ... -``` - -**Disable jemalloc (for testing):** +The launcher automatically uses jemalloc if found. To disable: ```bash HYTALE_NO_JEMALLOC=1 npm start ``` --- -## Previous Investigation (for reference) +## Conclusion -### Next Steps (COMPLETED) +**No stable solution found.** The binary patching approach may be fundamentally incompatible with glibc 2.41's stricter heap validation when modifying .NET AOT compiled binaries. -1. ~~Try runtime hooking instead of binary patching~~ - Not needed, jemalloc fixes the issue -2. ~~Investigate .NET AOT string metadata~~ - Not the root cause -3. ~~Test on different glibc versions~~ - Confirmed glibc 2.41 specific -4. ~~Examine libzstd interaction~~ - libzstd's free() was just where the corruption manifested - ---- - -## Branch - -`fix/patcher-memory-corruption-v2` +Alternative approaches (network hooking, proxy, container) may be required for reliable Steam Deck / Ubuntu LTS support.