Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Introducing vmdiff: a tool to find everything that changes on your computer

We're publishing a new tool to diff virtual machine snapshots and view the diffs in-browser. It lets you see every file and process that’s changed between two points in time, and lets us finally answer the question “what happens on my computer when I do X?”. We’re releasing the prototype today on Github, we hope you like it! ☀️☀️ This has nothing to do with AI.

diffswap.gif

vmdiff is like git diff but for the whole computer

Untitled design.png

 

The problem: How do you know what happens on your computer?

Pop quiz, what happens on your computer when you install a new piece of software? What does this do?

dd-installer.png

What happens when you run software? When you change a setting? 

image-20230323-034814.png

What happens when you run this command? Does anyone know for sure?

What happens if you take your hands off the keyboard, go outside, and forget that computers were ever inflicted upon your life? Does anything change on your computer?

How do you know?

These are some of the embarrassingly simple questions I asked myself, and to which I found the answer was nobody knows.

Sure, when you install software, you know some things that change, from e.g. the logs the installer shows you, telling you which files it’s put where. But can you say for sure that’s everything that changed? What about if the installer changes another file that it doesn’t log? What if it just quietly adds write permissions to some of your directories? What if it sets some environment variables that are used by another program?

tl;dr

It is really hard right now to say:

“Here is everything that happens when you run this program.”

Currently we’re at:

“Here is some of what happens when you run this program.

Maybe that’s everything? Who can truly say, the universe is so mysterious ✨”

Okay, you don’t know what happens. So what?

So I wonder what secrets are hiding in the shadows 👀

In the last episode of Icarus Labs, our protagonist discovered that a whole lot of software developers had installed Docker Desktop on their macOS laptops, without knowing that this also installed a Linux virtual machine on their computer. Normally that would just feel a bit violating of the sanctity of one’s Macbook, but this Linux virtual machine also happened to allow us to hide malware very convincingly.

That’s why we’re here today. I thought to myself:

Wow, lots of people actually installed this software without knowing it’s a Linux virtual machine. I wonder what else I’ve installed that has secret stuff in it?

In finding the Docker virtual machine, I had to manually rummage through my files, trying to guess where the Docker app stored its config, where it wrote files, which files belonged to it, and so on, like a caveman. It took a long time. But we are no longer in caveman times, you see. We have blog posts, for example.

This blog post, then, is about the tool I made to mass-produce blog posts like the last one.

Seeing everything that changes on your computer

I decided I was sick of this chaotic and lawless world, and wanted to see a comprehensive list of everything that changed on my computer between two points in time. I wanted the sweet reassurance of being able to say “it’s not on my list, so it didn’t happen”.

I figured the simplest way to test it out was to use a virtual machine.

First I’d use the built-in feature of the VM software (e.g. VMWare) to take a snapshot (a saved state of the disk and memory), then do my changes (e.g. running software, changing a setting, anything), then take another snapshot.

do something.png

 

I then wanted to see the difference between the snapshots.

“Like git diff", I thought to myself.

So I of course googled “diff virtual machine snapshot”, and found….. nothing.

Somehow, this didn’t already exist

What do you mean nobody’s done this before? Nobody’s wanted a complete list of what happens when you run a program on your computer? Like, it’s okay if I, just some person, don’t know everything that happens when I run software on my computer. But nobody knows? You live like this? The Great Barrier Reef is dying and you don’t know what happens when you install something on your computer?

By which I mean I uh couldn’t believe that this seemingly basic thing hadn’t been done yet, so I tried making it.

Skipping several months of understanding virtual machines in a way that was far too intimate for me, I ended up with a good-enough prototype.

What does it do?

  • Accepts two virtual machine snapshots (vmdk and vmem files)

  • Diffs all files on both disks, line-by line (including deleted files). If it’s not in the list, it didn’t happen

  • Diffs memory (running processes, command lines, and environment variables) on Windows

  • Diffs also available to search/process via terminal as local directories (think grep)

For example: What happens when you install Docker Desktop?

Hmmm, I’ve always thought Docker was a bit strange. Why do you have to install an extra app that lives in your taskbar? Why can’t you install it with a package manager? Let’s find out.

Step 1: Take the snapshots

  • Get a Virtual Machine

  • Take a “before” snapshot

  • Install Docker Desktop (or whatever you want to test)

  • Take an “after” snapshot

Now you have two snapshots, and that’s all you need 😎😎

Step 2: Run vmdiff

image-20230331-041125.png

Point vmdiff at the two snapshots, and thennnn

Step 3: View the disk and memory diffs in-browser

Point your browser at the delightful new website on localhost:5000 to seeee

filetree.gif

Step 4: Wait is that a virtual machine

Hmmm, what’s this?

image-20230322-035652.png

What’s DockerDesktop.vhdx? It turns out that .vhdx is a format for virtual machine disk files. This might lead you to the surprising but incredibly real conclusion that there was a virtual machine on your computer. Who knows what kind of implications that might have?

Diff memory on Windows

You can also see how the running processes have changed between the two snapshots, like this:

processes.gif

Browse the diffs in detail via terminal

Maybe you want to see every file that mentions “docker”, sorted by frequency? The tree of diffs is also available in plaintext in a local directory, so you can grep, find, sort, etc.

grep.gif

You’ll notice that it’s not just the contents of the file that are diffed, but also the metadata, e.g. permissions, timestamps, or whatever “extended file attributes” the OS wants to put on them. For example, here’s a diff showing the macOS “quarantine” attribute changing:

image-20230213-023645.png

Quarantine is used to enable features such as showing you popup boxes that say “this file was downloaded from the internet, are you sure you want to open it?”

Bonus feature: Gratuitously fancy CLI

I mean would you look at this?

image-20230403-051253.pngimage-20230403-051401.png

It's got tables.

What can you use vmdiff for?

I don’t even know all the uses. There are so many reasons why someone might want to know everything that’s changed on a computer between two points in time.

Here are some examples of when people might use it:

  • Security researchers

    • To analyse what software does (my use case)

      • e.g. “How does this program store its config data/authentication cookies/assets?”

    • Analyse what malware does, of course

      • Existing malware sandboxes list what the malware does (e.g. syscalls, network requests)

        • vmdiff lists the result of running the malware

        • Both are good, just different

  • Software engineers

    • To ensure nothing has changed on a computer e.g. software testing

    • To diagnose and debug complex problems, when you really need more information on what’s going on inside your computer to figure out why it’s happening

How does it work?

vmdiff diagram.png

If that simple diagram somehow didn’t answer all your questions, more details are on Github.

Future work/contributing

  • I’m not going be working on/maintaining vmdiff for at least 12 months, maybe ever

  • I’d love for someone to steal this genius idea, either forking the prototype, or making their own

Try the prototype out, see what you find!

If you ever need to find out everything that happens on your computer when you do something, the prototype is available on Github.

1 comment

Comment

Log in or Sign up to comment
Jordan Bertasso
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 6, 2023

So awesome!

Like Jesse Merhi likes this
TAGS
AUG Leaders

Atlassian Community Events