mirror of
https://git.sanhost.net/sanasol/hytale-f2p.git
synced 2026-02-26 06:41:47 -03:00
docs: Update crash investigation - no stable solution found
- jemalloc helps ~30% of the time but not reliable - Documented all failed approaches (allocators, scheduling, patching variations) - Added potential alternative approaches (network hooking, proxy, container) - Status: UNSOLVED Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,11 +1,23 @@
|
||||
# Steam Deck / Ubuntu LTS Crash Investigation
|
||||
|
||||
## Status: UNSOLVED
|
||||
|
||||
**Last updated:** 2026-01-27
|
||||
|
||||
No stable solution found. jemalloc helps occasionally but crashes still occur randomly.
|
||||
|
||||
---
|
||||
|
||||
## Problem Summary
|
||||
|
||||
The Hytale F2P launcher's client patcher causes crashes on Steam Deck and Ubuntu LTS with the error:
|
||||
```
|
||||
free(): invalid pointer
|
||||
```
|
||||
or
|
||||
```
|
||||
SIGSEGV (Segmentation fault)
|
||||
```
|
||||
|
||||
The crash occurs after successful authentication, specifically right after "Finished handling RequiredAssets".
|
||||
|
||||
@@ -16,35 +28,73 @@ The crash occurs after successful authentication, specifically right after "Fini
|
||||
**Working Systems:**
|
||||
- macOS
|
||||
- Windows
|
||||
- Arch Linux
|
||||
- Older Arch Linux (glibc < 2.41)
|
||||
|
||||
**Critical Finding:** The UNPATCHED original binary works fine on Steam Deck. The crash is caused by our patching.
|
||||
**Critical Finding:** The UNPATCHED original binary works fine on Steam Deck. The crash is caused by ANY binary patching.
|
||||
|
||||
---
|
||||
|
||||
## What Was Tried (All Failed)
|
||||
|
||||
### Memory Allocators
|
||||
| Approach | Result |
|
||||
|----------|--------|
|
||||
| `LD_PRELOAD=/usr/lib/libjemalloc.so.2` | Works randomly (3/10 times), not stable |
|
||||
| `MALLOC_CHECK_=0` | No effect |
|
||||
| `MALLOC_PERTURB_=255` | No effect |
|
||||
| `GLIBC_TUNABLES=glibc.malloc.tcache_count=0` | No effect |
|
||||
|
||||
### Process/Scheduling
|
||||
| Approach | Result |
|
||||
|----------|--------|
|
||||
| `taskset -c 0` (single core) | Game too slow, stuck at connecting |
|
||||
| `taskset -c 0,1` or `0-3` | Still crashes |
|
||||
| `nice -n 19` | No effect |
|
||||
| `chrt --idle 0` | No effect |
|
||||
| `strace -f` | No effect |
|
||||
|
||||
### Linker/Loading
|
||||
| Approach | Result |
|
||||
|----------|--------|
|
||||
| `LD_BIND_NOW=1` | No effect |
|
||||
| Wrapper script with LD_PRELOAD | No effect |
|
||||
| Shell spawn with inline LD_PRELOAD | No effect |
|
||||
|
||||
### Patching Variations
|
||||
| Approach | Result |
|
||||
|----------|--------|
|
||||
| Null-padding after replacement | Crashes (made it worse) |
|
||||
| No null-padding (develop behavior) | Still crashes |
|
||||
| Minimal patches (3 instead of 6) | Still crashes |
|
||||
| Ultra-minimal (1 patch - domain only) | Still crashes |
|
||||
| Skip sentry patch | Still crashes |
|
||||
| Skip subdomain patches | Still crashes |
|
||||
|
||||
**Key Finding:** Even patching just 1 string (main domain only) causes the crash.
|
||||
|
||||
---
|
||||
|
||||
## String Occurrences Found
|
||||
|
||||
### Length-Prefixed Format
|
||||
Found by default patcher mode:
|
||||
|
||||
| Offset | Content | Notes |
|
||||
|--------|---------|-------|
|
||||
| 0x1bc5d63 | `hytale.com` | **Surrounded by x86 code!** |
|
||||
|
||||
### UTF-16LE Format (3 occurrences)
|
||||
Found by `HYTALE_PATCH_MODE=utf16le`:
|
||||
|
||||
| Index | Offset | Before Context | After Context | Likely URL |
|
||||
|-------|--------|----------------|---------------|------------|
|
||||
| 0 | 0x1bc5ad7 | `try.` | `/2i3...` | `sentry.hytale.com/2...` |
|
||||
| 1 | 0x1bc5b3f | `s://` | `/hel` | `https://hytale.com/help...` |
|
||||
| 2 | 0x1bc5bc9 | `ore.` | `/?up` | `store.hytale.com/?up...` |
|
||||
|
||||
### Length-Prefixed Format (1 occurrence)
|
||||
Found by default `length-prefixed` mode:
|
||||
|
||||
| Offset | Before | After | Notes |
|
||||
|--------|--------|-------|-------|
|
||||
| 0x1bc5d63 | `5933b8` | `89338807` | **Surrounded by what looks like x86 code!** |
|
||||
| Offset | Content |
|
||||
|--------|---------|
|
||||
| 0x1bc5ad7 | `sentry.hytale.com/...` |
|
||||
| 0x1bc5b3f | `https://hytale.com/help...` |
|
||||
| 0x1bc5bc9 | `store.hytale.com/?...` |
|
||||
|
||||
---
|
||||
|
||||
## Critical Finding: Binary Diff Analysis
|
||||
## Binary Analysis
|
||||
|
||||
When patching with length-prefixed mode (single occurrence):
|
||||
When patching with length-prefixed mode:
|
||||
|
||||
```
|
||||
< 01bc5d60: 5933 b80a 0000 0068 0079 0074 0061 006c Y3.....h.y.t.a.l
|
||||
@@ -54,31 +104,18 @@ When patching with length-prefixed mode (single occurrence):
|
||||
> 01bc5d70: 006f 006c 002e 0077 0073 8933 8807 0000 .o.l...w.s.3....
|
||||
```
|
||||
|
||||
**Structure at 0x1bc5d60:**
|
||||
**Structure:**
|
||||
```
|
||||
5933 b8 | 0a000000 | 68007900740061006c0065002e0063006f006d | 8933 8807 0000
|
||||
???????? | len=10 | h.y.t.a.l.e...c.o.m | mov [rbx],esi?
|
||||
5933 b8 | 0a000000 | h.y.t.a.l.e...c.o.m | 8933 8807 0000
|
||||
???????? | len=10 | string content | mov [rbx],esi?
|
||||
```
|
||||
|
||||
- `5933 b8` before the string - could be code or metadata
|
||||
- `5933 b8` before string - could be code or metadata
|
||||
- `0a 00 00 00` - .NET length prefix (10 characters)
|
||||
- String content in UTF-16LE
|
||||
- `89 33` after - this is `mov [rbx], esi` in x86-64!
|
||||
|
||||
**The string appears to be embedded near executable code, not in a clean data section.**
|
||||
|
||||
---
|
||||
|
||||
## Test Results Summary
|
||||
|
||||
| Test | Occurrences Patched | Auth Works | Crashes |
|
||||
|------|---------------------|------------|---------|
|
||||
| Length-prefixed (default) | 1 at 0x1bc5d63 | YES | YES |
|
||||
| UTF-16LE mode | 3 at 0x1bc5ad7, 0x1bc5b3f, 0x1bc5bc9 | YES | YES |
|
||||
| Skip all UTF-16LE | 0 (but legacy fallback patched 4!) | YES | YES |
|
||||
| Original unpatched | 0 | NO (wrong issuer) | NO |
|
||||
|
||||
**Key Insight:** Even patching just ONE string (the length-prefixed one) causes the crash, yet authentication succeeds before the crash.
|
||||
**The string is embedded near executable code, not in a clean data section.**
|
||||
|
||||
---
|
||||
|
||||
@@ -100,63 +137,75 @@ Crash occurs in `libzstd.so` during `free()` after "Finished handling RequiredAs
|
||||
|
||||
## Hypotheses
|
||||
|
||||
### 1. .NET String Interning
|
||||
.NET AOT may have precomputed hashes or metadata for interned strings. Modifying the string content breaks the hash, causing memory corruption when the runtime tries to use it.
|
||||
### 1. .NET AOT String Metadata (Most Likely)
|
||||
.NET AOT may have precomputed hashes, checksums, or relocation info for strings. Modifying string content breaks internal consistency, causing memory corruption when the runtime tries to use related data structures.
|
||||
|
||||
### 2. Code/Data Boundary Issue
|
||||
The string at 0x1bc5d63 appears to be embedded near x86 code (`89 33` = `mov [rbx], esi`). Modifying it might corrupt instruction decoding or memory layout calculations.
|
||||
### 2. Code/Data Interleaving
|
||||
The strings are embedded near x86 code (`89 33` = `mov [rbx], esi`). .NET AOT may use relative offsets that get invalidated when we modify nearby bytes.
|
||||
|
||||
### 3. Checksums/Integrity
|
||||
The binary may have checksums for certain data sections that we're invalidating.
|
||||
### 3. Binary Checksums
|
||||
The binary may have integrity checks for certain sections that we're invalidating by patching.
|
||||
|
||||
### 4. Memory Alignment
|
||||
glibc 2.41's stricter heap validation may catch alignment issues that older versions ignore.
|
||||
### 4. Timing-Dependent Race Condition
|
||||
The fact that it works randomly (~30% of the time with jemalloc) suggests a race condition that's affected by:
|
||||
- Memory layout changes from patching
|
||||
- Allocator behavior differences
|
||||
- CPU scheduling
|
||||
|
||||
---
|
||||
|
||||
## Debug Environment Variables
|
||||
## Valgrind Results (Misleading)
|
||||
|
||||
| Variable | Description | Example |
|
||||
|----------|-------------|---------|
|
||||
| `HYTALE_AUTH_DOMAIN` | Target domain | `sanasol.ws` |
|
||||
| `HYTALE_PATCH_MODE` | `utf16le` or `length-prefixed` | `utf16le` |
|
||||
| `HYTALE_SKIP_SENTRY_PATCH` | Skip sentry URL patch | `1` |
|
||||
| `HYTALE_SKIP_SUBDOMAIN_PATCH` | Skip subdomain patches | `1` |
|
||||
| `HYTALE_PATCH_LIMIT` | Max patches to apply | `1` |
|
||||
| `HYTALE_PATCH_SKIP` | Comma-separated indices to skip | `0,2` |
|
||||
| `HYTALE_NO_LEGACY_FALLBACK` | Disable legacy fallback | `1` |
|
||||
| `HYTALE_NOOP_TEST` | Read/write without patching | `1` |
|
||||
- Valgrind showed NO memory corruption errors
|
||||
- Game ran successfully under Valgrind (slower execution)
|
||||
- This suggested jemalloc would fix it, but it doesn't consistently work
|
||||
|
||||
The slowdown from Valgrind likely masks the race condition timing.
|
||||
|
||||
---
|
||||
|
||||
## Files & Offsets Reference
|
||||
## Current Launcher Implementation
|
||||
|
||||
The launcher attempts:
|
||||
1. Auto-detect jemalloc at common paths
|
||||
2. Auto-install jemalloc via pkexec if not found
|
||||
3. Launch game with `LD_PRELOAD` via shell command
|
||||
|
||||
But this doesn't provide stable results.
|
||||
|
||||
---
|
||||
|
||||
## Potential Alternative Approaches (Not Yet Tried)
|
||||
|
||||
### 1. LD_PRELOAD Network Hooking
|
||||
Instead of patching the binary, hook `getaddrinfo()` / `connect()` to redirect network calls at runtime. No binary modification needed.
|
||||
|
||||
### 2. Local Proxy + Certificate
|
||||
Run a local HTTPS proxy that intercepts hytale.com traffic and redirects to custom server. Requires installing a custom CA certificate.
|
||||
|
||||
### 3. DNS + iptables Redirect
|
||||
Use local DNS to resolve hytale.com to localhost, then iptables to redirect to actual custom server. Requires root/sudo.
|
||||
|
||||
### 4. Container with Older glibc
|
||||
Run the game in a container with glibc < 2.41 where the stricter validation doesn't exist.
|
||||
|
||||
### 5. Different Patching Location
|
||||
Find strings in a pure data section rather than code-adjacent areas.
|
||||
|
||||
---
|
||||
|
||||
## Files Reference
|
||||
|
||||
**Binary:** `HytaleClient` (ELF 64-bit, ~39.9 MB)
|
||||
|
||||
| Offset | Format | Content |
|
||||
|--------|--------|---------|
|
||||
| 0x1bc5ad7 | UTF-16LE | `sentry.hytale.com/...` |
|
||||
| 0x1bc5b3f | UTF-16LE | `https://hytale.com/help...` |
|
||||
| 0x1bc5bc9 | UTF-16LE | `store.hytale.com/?...` |
|
||||
| 0x1bc5d63 | Length-prefixed | Main session URL (surrounded by code?) |
|
||||
**Branch:** `fix/steamdeck-jemalloc-crash`
|
||||
|
||||
---
|
||||
|
||||
## SOLUTION FOUND ✓
|
||||
## Install jemalloc (Partial Mitigation)
|
||||
|
||||
### Root Cause
|
||||
The crash is caused by **glibc 2.41's stricter heap validation** catching a pre-existing race condition in the .NET AOT runtime or asset decompression code. Our binary patching triggers this timing-dependent bug, but the patching itself is correct.
|
||||
jemalloc may help in some cases (~30% success rate):
|
||||
|
||||
### Evidence
|
||||
- Valgrind showed NO memory corruption errors
|
||||
- Game ran successfully under Valgrind (slower execution avoids the race)
|
||||
- Game was manually killed (SIGINT), not crashed
|
||||
- 1.4M allocations with no "Invalid free" detected
|
||||
|
||||
### Fix: Use jemalloc allocator
|
||||
jemalloc handles the race condition gracefully. The launcher now auto-detects and uses jemalloc on Linux.
|
||||
|
||||
**Install jemalloc:**
|
||||
```bash
|
||||
# Steam Deck / Arch Linux
|
||||
sudo pacman -S jemalloc
|
||||
@@ -168,34 +217,15 @@ sudo apt install libjemalloc2
|
||||
sudo dnf install jemalloc
|
||||
```
|
||||
|
||||
The launcher automatically uses jemalloc if found at:
|
||||
- `/usr/lib/libjemalloc.so.2` (Arch, Steam Deck)
|
||||
- `/usr/lib/x86_64-linux-gnu/libjemalloc.so.2` (Debian/Ubuntu)
|
||||
- `/usr/lib64/libjemalloc.so.2` (Fedora/RHEL)
|
||||
|
||||
**Manual workaround (if launcher doesn't detect):**
|
||||
```bash
|
||||
LD_PRELOAD=/usr/lib/libjemalloc.so.2 ./Client/HytaleClient ...
|
||||
```
|
||||
|
||||
**Disable jemalloc (for testing):**
|
||||
The launcher automatically uses jemalloc if found. To disable:
|
||||
```bash
|
||||
HYTALE_NO_JEMALLOC=1 npm start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Previous Investigation (for reference)
|
||||
## Conclusion
|
||||
|
||||
### Next Steps (COMPLETED)
|
||||
**No stable solution found.** The binary patching approach may be fundamentally incompatible with glibc 2.41's stricter heap validation when modifying .NET AOT compiled binaries.
|
||||
|
||||
1. ~~Try runtime hooking instead of binary patching~~ - Not needed, jemalloc fixes the issue
|
||||
2. ~~Investigate .NET AOT string metadata~~ - Not the root cause
|
||||
3. ~~Test on different glibc versions~~ - Confirmed glibc 2.41 specific
|
||||
4. ~~Examine libzstd interaction~~ - libzstd's free() was just where the corruption manifested
|
||||
|
||||
---
|
||||
|
||||
## Branch
|
||||
|
||||
`fix/patcher-memory-corruption-v2`
|
||||
Alternative approaches (network hooking, proxy, container) may be required for reliable Steam Deck / Ubuntu LTS support.
|
||||
|
||||
Reference in New Issue
Block a user