Jump to content

comm, rsync, Fslint, Meld - compare stuff


sunrat

Recommended Posts

I love finding new commands that make doing things simple. Today I was using rsync (without --delete) to sync my music collection from one computer to another. It worked great but I ended up with 303 directories on one side and 304 on the other. Hmmm...

I wasn't about to scroll through both directories to find the culprit and tried both Kompare and Meld but they didn't do what I wanted. Off to StackOverflow where I found the comm command. This command will give you files that are in dir1 and not in dir2:

comm -23 <(ls dir1 |sort) <(ls dir2|sort)

From man comm:

With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains

lines unique to FILE2, and column three contains lines common to both files.

 

-1 suppress column 1 (lines unique to FILE1)

-2 suppress column 2 (lines unique to FILE2)

-3 suppress column 3 (lines that appear in both files)

 

I also inadvertently copied stacks of files that I had renamed on one side but not the other so ended up with multiple copies. For this I discovered Fslint, a GUI duplicate finding program which did a fine job. It found hundreds of files which were the same but had different names. It matches md5 sums I believe.

  • Like 3
Link to comment
Share on other sites

securitybreach

Very cool :thumbsup:

 

I am pretty sure that I have used the comm command in the past but it's been a long time ago. I will have to read up on it again.

  • Like 1
Link to comment
Share on other sites

I'm making a note of this! I certainly should have used it when I was getting ready to install a new openSUSE and was going to create all new partitions. I copied contents of my /home to usb but was in a hurry and very carelessly didn't check results closely and wound up loosing a lot of stuff. (Problem was copying through file manager software, not CL, and some weird error caused the process to terminate before everything had been copied). It's made me paranoid about checking my copies, but your example will make it easy.

Edited by ebrke
  • Like 1
Link to comment
Share on other sites

For completeness, I better describe what I did after this. The comm command helped in finding directories which were inside one directory but not the other. And Fslint found the files which were the same with different names. However I still had hundreds of duplicated files which were the same music but had edited tags and filenames. I ran Meld which found many of these but crashed when I was part way through checking and deleting the unwanted versions. It also took hours to finish comparing. There were over 200GB of files on each side to check through.

 

So back to the CLI and rsync to the rescue! Well sort of. For this I found a brilliant script at StackExchange called diff-dirs which listed every file that was different on both sides and took only seconds to do it. I still had to spend a couple of hours manually deleting files in a file manager while referring to the list but this took less time than Meld did to just check. There's probably a way of automating the deletions as well but I wanted to manually check the final result.

 

Here's a link to the diff-dirs script on StackExchange - https://unix.stackexchange.com/a/463214

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...