Originally published at MOXFIVE

Takeaway

We introduce entropy-based block selection (EBBS or “entropy triage”), a novel method to programmatically repair files rendered unusable after failed encryption processes. MOXFIVE has applied entropy triage while responding to ransomware attacks since 2023, repairing virtual machine disks that could not be decrypted with attacker-provided decryption tools because the prior malicious encryption process failed. The method measures Shannon entropy to select either pre-decryption or post-decryption data blocks to construct a usable file. Note that we simplify cryptographic descriptions for accessibility; using this method requires reverse engineers skilled with cryptography and malware.

Introduction

Paying a ransom to decrypt critical data is devastating for ransomware victims. Yet there’s an even worse outcome: when the decryption tool they paid for fails to work.

We’ve seen these failures play out several times over the years. The cause most commonly traces back to the moment the attacker encrypted the file in the first place. If the attacker fails to complete the encryption process, the partially encrypted file may be corrupted in such a way that the attacker’s decryption tool (“decryptor”) simply cannot function. This is especially problematic where even small imperfections can render decrypted files unusable, as in databases and virtual disks.

Working with our colleagues at Coveware, MOXFIVE identified entropy-based block selection (EBBS, or “entropy triage”), a method to repair these corrupted files in some circumstances. Entropy triage has been most successfully implemented as a feature to Coveware’s Unidecrypt tool which through a library of patches, has achieved 90%+ success repairing corrupted virtual disks, significantly improving outcomes for affected clients.

Failed encryption prevents complete decryption

The most typical cryptographic ransom pattern is simple:

  1. Attacker encrypts victim data, rendering the data unusable.
  2. Victim pays the attacker for a decryptor
  3. Victim uses the decryptor to return the data to a usable state

Figure 1 shows that ransom story from the standpoint of the victim’s data, represented by our own MOXFIVE phoenix.

Figure 1: Data encrypted by the attacker, then decrypted using the attacker-provided decryptor
Figure 1: Data encrypted by the attacker, then decrypted using the attacker-provided decryptor

Unfortunately, the attacker’s encryption processes are not perfect and sometimes fail mid-deployment. We typically see these encryption failures when the attacker is using a new encryption tool, or when the attacker lacks experience, or when the victim interrupts the encryption process. The resulting file is partially encrypted and operationally indistinguishable from fully encrypted file without special analysis. But this partially encrypted state means that the attacker-provided decryptor cannot produce usable output. In many cases, the decryptor blindly runs its cipher against each block – decrypting blocks that the attacker successfully encrypted and scrambling blocks that were left untouched.

Figure 2 shows this failed cryptographic pattern from the standpoint of the victim’s data, where both partially encrypted and partially decrypted data are unusable. This is especially problematic for large files like databases and virtual disks, where even small imperfections can render decrypted data unusable.

Figure 2: Data partially encrypted by failed attacker process; output remains partially encrypted and is unusable
Figure 2: Data partially encrypted by failed attacker process; output remains partially encrypted and is unusable

Entropy-based block selection

Readers may notice that in our example the “usable” portion of each block in the file can be recognized by the human eye. On a scale as small as our 16-block MOXFIVE phoenix, a user can manually stitch together to discernable blocks from the partially encrypted file and the partially decrypted file.

This evaluation is possible at the byte-level, too. Figure 3 provides byte-level representation of a Hyper-V virtual disk header; unencrypted on the left and encrypted on the right. Notice that the original data has discernible text strings (e.g., “vhdxfile”, “M.i.c.r.o.s.o.f.t.”, etc.) while the encrypted data lacks any recognizable pattern.

Figure 3: Unencrypted and encrypted examples of a virtual disk header viewed in a hex editor
Figure 3: Unencrypted and encrypted examples of a virtual disk header viewed in a hex editor

Unfortunately, manual reconstruction is rarely practical in the wild. A real example involving 160KB encryption blocks on a 1TB virtual disk would require over 6 million comparisons – far too many for a human workstream.

Luckily, information science has measurements that can do this work for us! In information theory, Shannon entropy measures “unpredictability” or randomness in data. It is often used to describe how well data can be compressed - highly structured data (like a text file full of repeated phrases) has low entropy and compresses well, while random-looking data (like encrypted files) has high entropy and barely compresses at all. On a scale of 0 to 8 bits of entropy per byte, regular text and structured data typically scores between 2-4, while encrypted or compressed data usually scores above 6. This makes entropy useful for distinguishing between encrypted and unencrypted data blocks.

This apparent randomness can help us guess whether a given block of data is encrypted or not. For example, in Figure 3 the “Original data” entropy is 2.5 and the “Encrypted data” entropy is 6.3. Figure 4 demonstrates this by measuring an encrypted and decrypted block from our phoenix.

Figure 4: Illustrative entropy measurements from unencrypted and encrypted versions of the same data block
Figure 4: Illustrative entropy measurements from unencrypted and encrypted versions of the same data block

Applying entropy triage

Let’s put this all together! We can automate the reconstruction of a partially encrypted file using a simple, block-by-block process. For each fixed-size block, our program performs the following steps:

  1. Measure the block’s entropy
  2. Measure the block’s entropy after running the decryptor
  3. Write to disk the version with the lower entropy measurement

The result is often a usable, fully decrypted file. Figure 5 visualizes the input and output of this process with illustrative measurements for each block.

Figure 5: Demonstrating how to use entropy measurements to produce usable data
Figure 5: Demonstrating how to use entropy measurements to produce usable data

Entropy triage has been most successfully deployed as a patch to Coveware’s Unidecrypt tool where we have achieved 90% success repairing corrupted virtual disks, significantly improving outcomes for affected clients.

Caveats and limitations

Entropy triage has a number of limitations.

  • Requires a working decryptor and key.
  • Requires reverse engineer(s) skilled with malware and cryptography.
  • Requires sufficient target data in either wholly encrypted or wholly unencrypted state. The method cannot recover lost, over-written, or mangled data.
  • Results may be unreliable when target data is high entropy (e.g., legitimately encrypted or compressed data). Further work on differentiating malicious ciphertext from legitimate ciphertext or compressed data may improve results.
  • Entropy triage is necessarily slower than ordinary decryption exercises. Anecdotally, MOXFIVE observes that EBBS takes 2-3x longer than the decryptor’s ordinary performance.

Conclusion

Entropy-based block selection, or “entropy triage,” repairs files corrupted by failed ransomware encryption. The method achieves high success rates in restoring broken files, including virtual disks, by leveraging Shannon entropy to intelligently reconstruct structured data. While implementation requires specialized expertise and comes with inherent limitations, entropy triage may represent the only recovery path for certain critical systems.