docs: Update crash investigation - no stable solution found

- jemalloc helps ~30% of the time but not reliable
- Documented all failed approaches (allocators, scheduling, patching variations)
- Added potential alternative approaches (network hooking, proxy, container)
- Status: UNSOLVED

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
sanasol
2026-01-27 06:28:09 +01:00
parent 2efecd168f
commit dc664afa52

View File

@@ -1,11 +1,23 @@
# Steam Deck / Ubuntu LTS Crash Investigation # Steam Deck / Ubuntu LTS Crash Investigation
## Status: UNSOLVED
**Last updated:** 2026-01-27
No stable solution found. jemalloc helps occasionally but crashes still occur randomly.
---
## Problem Summary ## Problem Summary
The Hytale F2P launcher's client patcher causes crashes on Steam Deck and Ubuntu LTS with the error: The Hytale F2P launcher's client patcher causes crashes on Steam Deck and Ubuntu LTS with the error:
``` ```
free(): invalid pointer free(): invalid pointer
``` ```
or
```
SIGSEGV (Segmentation fault)
```
The crash occurs after successful authentication, specifically right after "Finished handling RequiredAssets". The crash occurs after successful authentication, specifically right after "Finished handling RequiredAssets".
@@ -16,35 +28,73 @@ The crash occurs after successful authentication, specifically right after "Fini
**Working Systems:** **Working Systems:**
- macOS - macOS
- Windows - Windows
- Arch Linux - Older Arch Linux (glibc < 2.41)
**Critical Finding:** The UNPATCHED original binary works fine on Steam Deck. The crash is caused by our patching. **Critical Finding:** The UNPATCHED original binary works fine on Steam Deck. The crash is caused by ANY binary patching.
---
## What Was Tried (All Failed)
### Memory Allocators
| Approach | Result |
|----------|--------|
| `LD_PRELOAD=/usr/lib/libjemalloc.so.2` | Works randomly (3/10 times), not stable |
| `MALLOC_CHECK_=0` | No effect |
| `MALLOC_PERTURB_=255` | No effect |
| `GLIBC_TUNABLES=glibc.malloc.tcache_count=0` | No effect |
### Process/Scheduling
| Approach | Result |
|----------|--------|
| `taskset -c 0` (single core) | Game too slow, stuck at connecting |
| `taskset -c 0,1` or `0-3` | Still crashes |
| `nice -n 19` | No effect |
| `chrt --idle 0` | No effect |
| `strace -f` | No effect |
### Linker/Loading
| Approach | Result |
|----------|--------|
| `LD_BIND_NOW=1` | No effect |
| Wrapper script with LD_PRELOAD | No effect |
| Shell spawn with inline LD_PRELOAD | No effect |
### Patching Variations
| Approach | Result |
|----------|--------|
| Null-padding after replacement | Crashes (made it worse) |
| No null-padding (develop behavior) | Still crashes |
| Minimal patches (3 instead of 6) | Still crashes |
| Ultra-minimal (1 patch - domain only) | Still crashes |
| Skip sentry patch | Still crashes |
| Skip subdomain patches | Still crashes |
**Key Finding:** Even patching just 1 string (main domain only) causes the crash.
--- ---
## String Occurrences Found ## String Occurrences Found
### Length-Prefixed Format
Found by default patcher mode:
| Offset | Content | Notes |
|--------|---------|-------|
| 0x1bc5d63 | `hytale.com` | **Surrounded by x86 code!** |
### UTF-16LE Format (3 occurrences) ### UTF-16LE Format (3 occurrences)
Found by `HYTALE_PATCH_MODE=utf16le`: | Offset | Content |
|--------|---------|
| Index | Offset | Before Context | After Context | Likely URL | | 0x1bc5ad7 | `sentry.hytale.com/...` |
|-------|--------|----------------|---------------|------------| | 0x1bc5b3f | `https://hytale.com/help...` |
| 0 | 0x1bc5ad7 | `try.` | `/2i3...` | `sentry.hytale.com/2...` | | 0x1bc5bc9 | `store.hytale.com/?...` |
| 1 | 0x1bc5b3f | `s://` | `/hel` | `https://hytale.com/help...` |
| 2 | 0x1bc5bc9 | `ore.` | `/?up` | `store.hytale.com/?up...` |
### Length-Prefixed Format (1 occurrence)
Found by default `length-prefixed` mode:
| Offset | Before | After | Notes |
|--------|--------|-------|-------|
| 0x1bc5d63 | `5933b8` | `89338807` | **Surrounded by what looks like x86 code!** |
--- ---
## Critical Finding: Binary Diff Analysis ## Binary Analysis
When patching with length-prefixed mode (single occurrence): When patching with length-prefixed mode:
``` ```
< 01bc5d60: 5933 b80a 0000 0068 0079 0074 0061 006c Y3.....h.y.t.a.l < 01bc5d60: 5933 b80a 0000 0068 0079 0074 0061 006c Y3.....h.y.t.a.l
@@ -54,31 +104,18 @@ When patching with length-prefixed mode (single occurrence):
> 01bc5d70: 006f 006c 002e 0077 0073 8933 8807 0000 .o.l...w.s.3.... > 01bc5d70: 006f 006c 002e 0077 0073 8933 8807 0000 .o.l...w.s.3....
``` ```
**Structure at 0x1bc5d60:** **Structure:**
``` ```
5933 b8 | 0a000000 | 68007900740061006c0065002e0063006f006d | 8933 8807 0000 5933 b8 | 0a000000 | h.y.t.a.l.e...c.o.m | 8933 8807 0000
???????? | len=10 | h.y.t.a.l.e...c.o.m | mov [rbx],esi? ???????? | len=10 | string content | mov [rbx],esi?
``` ```
- `5933 b8` before the string - could be code or metadata - `5933 b8` before string - could be code or metadata
- `0a 00 00 00` - .NET length prefix (10 characters) - `0a 00 00 00` - .NET length prefix (10 characters)
- String content in UTF-16LE - String content in UTF-16LE
- `89 33` after - this is `mov [rbx], esi` in x86-64! - `89 33` after - this is `mov [rbx], esi` in x86-64!
**The string appears to be embedded near executable code, not in a clean data section.** **The string is embedded near executable code, not in a clean data section.**
---
## Test Results Summary
| Test | Occurrences Patched | Auth Works | Crashes |
|------|---------------------|------------|---------|
| Length-prefixed (default) | 1 at 0x1bc5d63 | YES | YES |
| UTF-16LE mode | 3 at 0x1bc5ad7, 0x1bc5b3f, 0x1bc5bc9 | YES | YES |
| Skip all UTF-16LE | 0 (but legacy fallback patched 4!) | YES | YES |
| Original unpatched | 0 | NO (wrong issuer) | NO |
**Key Insight:** Even patching just ONE string (the length-prefixed one) causes the crash, yet authentication succeeds before the crash.
--- ---
@@ -100,63 +137,75 @@ Crash occurs in `libzstd.so` during `free()` after "Finished handling RequiredAs
## Hypotheses ## Hypotheses
### 1. .NET String Interning ### 1. .NET AOT String Metadata (Most Likely)
.NET AOT may have precomputed hashes or metadata for interned strings. Modifying the string content breaks the hash, causing memory corruption when the runtime tries to use it. .NET AOT may have precomputed hashes, checksums, or relocation info for strings. Modifying string content breaks internal consistency, causing memory corruption when the runtime tries to use related data structures.
### 2. Code/Data Boundary Issue ### 2. Code/Data Interleaving
The string at 0x1bc5d63 appears to be embedded near x86 code (`89 33` = `mov [rbx], esi`). Modifying it might corrupt instruction decoding or memory layout calculations. The strings are embedded near x86 code (`89 33` = `mov [rbx], esi`). .NET AOT may use relative offsets that get invalidated when we modify nearby bytes.
### 3. Checksums/Integrity ### 3. Binary Checksums
The binary may have checksums for certain data sections that we're invalidating. The binary may have integrity checks for certain sections that we're invalidating by patching.
### 4. Memory Alignment ### 4. Timing-Dependent Race Condition
glibc 2.41's stricter heap validation may catch alignment issues that older versions ignore. The fact that it works randomly (~30% of the time with jemalloc) suggests a race condition that's affected by:
- Memory layout changes from patching
- Allocator behavior differences
- CPU scheduling
--- ---
## Debug Environment Variables ## Valgrind Results (Misleading)
| Variable | Description | Example | - Valgrind showed NO memory corruption errors
|----------|-------------|---------| - Game ran successfully under Valgrind (slower execution)
| `HYTALE_AUTH_DOMAIN` | Target domain | `sanasol.ws` | - This suggested jemalloc would fix it, but it doesn't consistently work
| `HYTALE_PATCH_MODE` | `utf16le` or `length-prefixed` | `utf16le` |
| `HYTALE_SKIP_SENTRY_PATCH` | Skip sentry URL patch | `1` | The slowdown from Valgrind likely masks the race condition timing.
| `HYTALE_SKIP_SUBDOMAIN_PATCH` | Skip subdomain patches | `1` |
| `HYTALE_PATCH_LIMIT` | Max patches to apply | `1` |
| `HYTALE_PATCH_SKIP` | Comma-separated indices to skip | `0,2` |
| `HYTALE_NO_LEGACY_FALLBACK` | Disable legacy fallback | `1` |
| `HYTALE_NOOP_TEST` | Read/write without patching | `1` |
--- ---
## Files & Offsets Reference ## Current Launcher Implementation
The launcher attempts:
1. Auto-detect jemalloc at common paths
2. Auto-install jemalloc via pkexec if not found
3. Launch game with `LD_PRELOAD` via shell command
But this doesn't provide stable results.
---
## Potential Alternative Approaches (Not Yet Tried)
### 1. LD_PRELOAD Network Hooking
Instead of patching the binary, hook `getaddrinfo()` / `connect()` to redirect network calls at runtime. No binary modification needed.
### 2. Local Proxy + Certificate
Run a local HTTPS proxy that intercepts hytale.com traffic and redirects to custom server. Requires installing a custom CA certificate.
### 3. DNS + iptables Redirect
Use local DNS to resolve hytale.com to localhost, then iptables to redirect to actual custom server. Requires root/sudo.
### 4. Container with Older glibc
Run the game in a container with glibc < 2.41 where the stricter validation doesn't exist.
### 5. Different Patching Location
Find strings in a pure data section rather than code-adjacent areas.
---
## Files Reference
**Binary:** `HytaleClient` (ELF 64-bit, ~39.9 MB) **Binary:** `HytaleClient` (ELF 64-bit, ~39.9 MB)
| Offset | Format | Content | **Branch:** `fix/steamdeck-jemalloc-crash`
|--------|--------|---------|
| 0x1bc5ad7 | UTF-16LE | `sentry.hytale.com/...` |
| 0x1bc5b3f | UTF-16LE | `https://hytale.com/help...` |
| 0x1bc5bc9 | UTF-16LE | `store.hytale.com/?...` |
| 0x1bc5d63 | Length-prefixed | Main session URL (surrounded by code?) |
--- ---
## SOLUTION FOUND ✓ ## Install jemalloc (Partial Mitigation)
### Root Cause jemalloc may help in some cases (~30% success rate):
The crash is caused by **glibc 2.41's stricter heap validation** catching a pre-existing race condition in the .NET AOT runtime or asset decompression code. Our binary patching triggers this timing-dependent bug, but the patching itself is correct.
### Evidence
- Valgrind showed NO memory corruption errors
- Game ran successfully under Valgrind (slower execution avoids the race)
- Game was manually killed (SIGINT), not crashed
- 1.4M allocations with no "Invalid free" detected
### Fix: Use jemalloc allocator
jemalloc handles the race condition gracefully. The launcher now auto-detects and uses jemalloc on Linux.
**Install jemalloc:**
```bash ```bash
# Steam Deck / Arch Linux # Steam Deck / Arch Linux
sudo pacman -S jemalloc sudo pacman -S jemalloc
@@ -168,34 +217,15 @@ sudo apt install libjemalloc2
sudo dnf install jemalloc sudo dnf install jemalloc
``` ```
The launcher automatically uses jemalloc if found at: The launcher automatically uses jemalloc if found. To disable:
- `/usr/lib/libjemalloc.so.2` (Arch, Steam Deck)
- `/usr/lib/x86_64-linux-gnu/libjemalloc.so.2` (Debian/Ubuntu)
- `/usr/lib64/libjemalloc.so.2` (Fedora/RHEL)
**Manual workaround (if launcher doesn't detect):**
```bash
LD_PRELOAD=/usr/lib/libjemalloc.so.2 ./Client/HytaleClient ...
```
**Disable jemalloc (for testing):**
```bash ```bash
HYTALE_NO_JEMALLOC=1 npm start HYTALE_NO_JEMALLOC=1 npm start
``` ```
--- ---
## Previous Investigation (for reference) ## Conclusion
### Next Steps (COMPLETED) **No stable solution found.** The binary patching approach may be fundamentally incompatible with glibc 2.41's stricter heap validation when modifying .NET AOT compiled binaries.
1. ~~Try runtime hooking instead of binary patching~~ - Not needed, jemalloc fixes the issue Alternative approaches (network hooking, proxy, container) may be required for reliable Steam Deck / Ubuntu LTS support.
2. ~~Investigate .NET AOT string metadata~~ - Not the root cause
3. ~~Test on different glibc versions~~ - Confirmed glibc 2.41 specific
4. ~~Examine libzstd interaction~~ - libzstd's free() was just where the corruption manifested
---
## Branch
`fix/patcher-memory-corruption-v2`