Bad Malware Analysis: String Count vs File Size

In this delightfully “bad” foray into malware hunting, we ask whether the sheer amount of printable text inside a binary can betray its nefarious nature. By hashing (oops, counting) strings of lengths 2‑6 bytes in ~500 malicious samples versus 200 tidy Windows libraries, we compute “strings‑per‑KB”. The results are modest but tasty: at a 4‑byte cutoff, benign binaries sport roughly 22 % more strings per kilobyte than their shady cousins—a hint that packed or encrypted malware keeps its chatter to a whisper. Short 2‑byte fragments are just random noise, while 5‑ and 6‑byte strings level out, possibly thanks to debug messages. Bottom line? String density offers a cheeky heuristic, but it’s no silver bullet—still fun to poke at, especially when you love sprinkling a dash of Python over binary mysteries.