Duplicate finder performance (2022 edition)

Hard to believe it has been almost exactly four years since my previous duplicate finder performance comparison (2018 edition)! Life has been very busy since, so dupd has not had a release with new functionality (I did release dupd 1.7.1 last year but it only contains build and test fixes, no functionality difference from 1.7).

It’s been so long that it’s time to do another test run even though dupd has not changed, just to see how the duplicate finder performance landscape looks like in 2022.

This time I will be testing:

  • dupd 1.7.1 (functionally same as 1.7 from 2018 so no real update)
  • rmlint 2.9.0 (runner-up from previous test, updated version)
  • jdupes 1.21.0 (third place from previous test, updated version)
  • rdfind 1.4.1 (fourth place from previous test, updated version)
  • fclones 0.27.3 (new one which I had not tried before)
  • yadf 1.0 (new one which I had not tried before)

I’m no longer testing the following ones, as they were too slow last time: duff, fdupes, fslint.

If you know of any other promising duplicate finder please let me know!

The files

The file set has a total of 154205 files. Of these, 18677 are unique sizes and 1216 were otherwise ignored (zero-sized or not regular files). This leaves 134312 files for further processing. Of these, there are 44926 duplicates in 13828 groups (and thus, 89386 unique files). It is the exact same data set as in the 2018 performance test run.

The files are all “real” files. That is, they are all taken from my home file server instead of artificially constructed for the benchmark. There is a mix of all types of files such as source code, documents, images, videos and other misc stuff that accumulates on the file server.

Same as last time, I have the same set of files on an SSD and a HDD on the same machine, so I’ll run each program against both sets.

The cache

Same as last time, I’ll run each program and device combination with both a warm and a cold (flushed) file cache

The methodology

Will be identical to the 2018 test (as I’m using the same script to run each program). To recap:

For each tool/media (SSD and HDD) combination, the runs were done as follows:

  1. Clear the filesystem cache (echo 3 > /proc/sys/vm/drop_caches).
  2. Run the scan once, discarding the result.
  3. Repeat 5 times:
    1. For the no-cache runs, clear the cache again.
    2. Run and time the tool.
  4. Report the average of the above five runs as the result.

The command lines and individual run times are included at the bottom of this article.

Results

1. HDD with cache

HDDcache2. HDD without cache

HDDnocache3. SSD with cache

SSDcache4. SSD without cache

SSDnocache

Summary

As always, different tools excel in different scenarios so there isn’t any one that wins them all.

For an overall ranking, let’s average the four finishing positions like I did last time:

Tool aveRAGE ranking
dupd 1.8 1, 1, 2, 3
fclones 2.3 3, 4, 1, 1
yadf 3.5 2, 5, 3, 4
rmlint 3.8 6, 3, 4, 2
rdfind 4.5 5, 2, 6, 5
jdupes 5.3 4, 6, 5, 6

This time, dupd was fastest in both HDD scenarios and fclones was fastest in both SSD scenarios. Overall, dupd gets the best overall average rank thanks to doing slightly better than fclones in the categories each didn’t win.

The other newcomer, yadf, didn’t excel anywhere but scored consistently enough to come in third place.

In summary, some strong new competition from fclones in the duplicate finding space. I am happy to see that dupd still came in first place (although barely) even though it hasn’t really evolved in the last four years.

I do intend to release a dupd 2.0 at some point but haven’t really had any time to get it into shape. Whenever I get some time to do that, I’ll do another performance test run.

 

The Raw Data

-----[ yadf : CACHE KEPT : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 328.81
Running 5 times (timeout=3600): yadf -H /hdd/files
Run 0 took 3.77
Run 1 took 3.66
Run 2 took 3.64
Run 3 took 3.67
Run 4 took 3.65
AVERAGE TIME:
3.678


-----[ rmlint : CACHE KEPT : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 92.61
Running 5 times (timeout=3600): rmlint -o fdupes /hdd/files
Run 0 took 26.65
Run 1 took 26.81
Run 2 took 26.73
Run 3 took 26.86
Run 4 took 26.91
AVERAGE TIME:
26.792


-----[ jdupes : CACHE KEPT : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 333.15
Running 5 times (timeout=3600): jdupes -A -H -r -q /hdd/files
Run 0 took 5.74
Run 1 took 5.71
Run 2 took 5.85
Run 3 took 5.71
Run 4 took 5.72
AVERAGE TIME:
5.746


-----[ dupd : CACHE KEPT : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 49.51
Running 5 times (timeout=3600): dupd scan -q -p /hdd/files
Run 0 took 3.31
Run 1 took 3.29
Run 2 took 3.27
Run 3 took 3.22
Run 4 took 3.24
AVERAGE TIME:
3.266


-----[ rdfind : CACHE KEPT : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 92.94
Running 5 times (timeout=3600): rdfind -n true /hdd/files
Run 0 took 8.89
Run 1 took 8.85
Run 2 took 8.78
Run 3 took 8.77
Run 4 took 8.78
AVERAGE TIME:
8.814


-----[ fclones : CACHE KEPT : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 121.67
Running 5 times (timeout=3600): fclones group /hdd/files
Run 0 took 3.89
Run 1 took 3.8
Run 2 took 3.82
Run 3 took 3.82
Run 4 took 3.82
AVERAGE TIME:
3.83


-----[ yadf : CACHE CLEARED EACH RUN : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 330.77
Running 5 times (timeout=3600): yadf -H /hdd/files
Run 0 took 327.95
Run 1 took 328.46
Run 2 took 329.36
Run 3 took 329.6
Run 4 took 333.01
AVERAGE TIME:
329.676


-----[ rmlint : CACHE CLEARED EACH RUN : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 107.27
Running 5 times (timeout=3600): rmlint -o fdupes /hdd/files
Run 0 took 101.8
Run 1 took 99.69
Run 2 took 97.9
Run 3 took 96.67
Run 4 took 82.37
AVERAGE TIME:
95.686


-----[ jdupes : CACHE CLEARED EACH RUN : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 333.89
Running 5 times (timeout=3600): jdupes -A -H -r -q /hdd/files
Run 0 took 333.53
Run 1 took 333.49
Run 2 took 335.22
Run 3 took 332.42
Run 4 took 333.83
AVERAGE TIME:
333.698


-----[ dupd : CACHE CLEARED EACH RUN : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 49.42
Running 5 times (timeout=3600): dupd scan -q -p /hdd/files
Run 0 took 49.4
Run 1 took 49.26
Run 2 took 49.29
Run 3 took 49.11
Run 4 took 49.27
AVERAGE TIME:
49.266


-----[ rdfind : CACHE CLEARED EACH RUN : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 92.08
Running 5 times (timeout=3600): rdfind -n true /hdd/files
Run 0 took 91.72
Run 1 took 91.66
Run 2 took 91.6
Run 3 took 91.6
Run 4 took 91.75
AVERAGE TIME:
91.666


-----[ fclones : CACHE CLEARED EACH RUN : /hdd/files]------
Running one untimed scan first...
Result/time from untimed run: 121.85
Running 5 times (timeout=3600): fclones group /hdd/files
Run 0 took 125.28
Run 1 took 126.19
Run 2 took 123.47
Run 3 took 121.69
Run 4 took 121.29
AVERAGE TIME:
123.584


-----[ yadf : CACHE KEPT : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 24.14
Running 5 times (timeout=3600): yadf -H /ssd/files
Run 0 took 3.35
Run 1 took 3.26
Run 2 took 3.22
Run 3 took 3.23
Run 4 took 3.24
AVERAGE TIME:
3.26


-----[ rmlint : CACHE KEPT : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 17.43
Running 5 times (timeout=3600): rmlint -o fdupes /ssd/files
Run 0 took 6.23
Run 1 took 6.17
Run 2 took 5.8
Run 3 took 5.64
Run 4 took 6.21
AVERAGE TIME:
6.01


-----[ jdupes : CACHE KEPT : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 33.1
Running 5 times (timeout=3600): jdupes -A -H -r -q /ssd/files
Run 0 took 6.17
Run 1 took 6.09
Run 2 took 6.14
Run 3 took 6.17
Run 4 took 6.12
AVERAGE TIME:
6.138


-----[ dupd : CACHE KEPT : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 17.93
Running 5 times (timeout=3600): dupd scan -q -p /ssd/files
Run 0 took 3.27
Run 1 took 3.18
Run 2 took 3.25
Run 3 took 3.29
Run 4 took 3.13
AVERAGE TIME:
3.224


-----[ rdfind : CACHE KEPT : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 28.14
Running 5 times (timeout=3600): rdfind -n true /ssd/files
Run 0 took 8.76
Run 1 took 8.75
Run 2 took 8.73
Run 3 took 8.71
Run 4 took 8.8
AVERAGE TIME:
8.75


-----[ fclones : CACHE KEPT : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 10.1
Running 5 times (timeout=3600): fclones group /ssd/files
Run 0 took 2.42
Run 1 took 2.39
Run 2 took 2.36
Run 3 took 2.32
Run 4 took 2.42
AVERAGE TIME:
2.382


-----[ yadf : CACHE CLEARED EACH RUN : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 24.65
Running 5 times (timeout=3600): yadf -H /ssd/files
Run 0 took 24.65
Run 1 took 24.7
Run 2 took 24.44
Run 3 took 24.48
Run 4 took 24.53
AVERAGE TIME:
24.56


-----[ rmlint : CACHE CLEARED EACH RUN : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 17.29
Running 5 times (timeout=3600): rmlint -o fdupes /ssd/files
Run 0 took 17.31
Run 1 took 17.36
Run 2 took 17.33
Run 3 took 17.3
Run 4 took 17.28
AVERAGE TIME:
17.316


-----[ jdupes : CACHE CLEARED EACH RUN : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 32.4
Running 5 times (timeout=3600): jdupes -A -H -r -q /ssd/files
Run 0 took 32.51
Run 1 took 32.4
Run 2 took 32.28
Run 3 took 32.47
Run 4 took 32.48
AVERAGE TIME:
32.428


-----[ dupd : CACHE CLEARED EACH RUN : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 17.82
Running 5 times (timeout=3600): dupd scan -q -p /ssd/files
Run 0 took 17.73
Run 1 took 17.78
Run 2 took 17.83
Run 3 took 17.79
Run 4 took 17.74
AVERAGE TIME:
17.774


-----[ rdfind : CACHE CLEARED EACH RUN : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 27.83
Running 5 times (timeout=3600): rdfind -n true /ssd/files
Run 0 took 27.77
Run 1 took 27.78
Run 2 took 27.78
Run 3 took 27.99
Run 4 took 27.79
AVERAGE TIME:
27.822


-----[ fclones : CACHE CLEARED EACH RUN : /ssd/files]------
Running one untimed scan first...
Result/time from untimed run: 10.01
Running 5 times (timeout=3600): fclones group /ssd/files
Run 0 took 10.06
Run 1 took 10.01
Run 2 took 10.01
Run 3 took 10.04
Run 4 took 10.06
AVERAGE TIME:
10.036

Summer 2019 Monterey Trip

Last year when I sailed to Monterey there was no wind so the trip was nearly all motoring. This weekend was a much nicer round trip to Monterey, 100% sailing this time!

Day 1

Saturday morning was typical Santa Cruz summer gloom, heavy fog and no wind. Did not look promising. By 10am, still nothing. Around 10:30am felt a little bit of wind, not much but I wanted to go sailing and see. By the time I was out by the mile buoy (“SC”) there was enough wind to sail. Just barely though, about 8 kts. I decided to try for Monterey.

The wind never really picked up today, best was around 10 kts but it was steady all day so it worked out! Beam reach on starboard tack all the way.

I arrived at Breakwater Cove Marina in Monterey at 4:30pm, having sailed 23nm in about 5h:30m for an average speed of 4.1 kts.

SC2MontereyTied up for the night at Breakwater Cove Marina:

monterey

Day 2

Sunday forecast was showing 5 kts. Oh no… looks like a day of motoring.

I waited as long as possible for the fog to clear and some wind to show up, leaving at 11am. Outside the breakwater there was 7-8 kts of wind. Maybe I can try sailing? Tacking upwind this time, took a few tacks to get to the R”2″ buoy (neat Point Pinos lighthouse) and then a long port tack, close hauled all the way to Santa Cruz.

For most of the trip, wind was around 8-10kts. Enough to keep moving, at least. About 4nm miles from Santa Cruz the wind started to finally pick up. First 15 kts for a while, then up to 18. I hove to for a bit to reef the main and continued.

After going slow all day, the final few miles were a fun ride! Wind ended up increasing to 22 kts near the Point Santa Cruz lighthouse. Saw 8 kts SOG on the final stretch, surfing downwind.

Total for the day was 29 nm in 7h:30m for an average of 3.9 kts.

Monterey2SC

2018 Mini Cruise

This month I had the very rare occasion of having some free time so I wanted to get a few consecutive days of sailing done. How about a mini cruise?! October is typically a great sailing month here on the California coast but November is a bit iffy. As winter approaches, the weather alternates between dangerously large swell and glassy seas, with not too many days in the middle.

Still, I had free time now so best to take advantage of it. On the first week the weather was not good, with swells in the 10-15 ft range and winds gusting to 35 knots. The second week looked good though. Too calm for sailing, but that’s how it goes in the fall here.

My plan was to head north from Santa Cruz to San Francisco and then decide based on weather. I had some hope of making it to Drake’s Bay but the weather forecast decided against that. On the day I would’ve been there, forecast predicted gusty winds from the east which didn’t sound very safe. The anchorage there is best in the usual N or NW winds. So instead, I went to Angel Island which is always nice.

Here are notes and photos from each day:

  1. Santa Cruz to Half Moon Bay – 49 nm
  2. Half Moon Bay to Pier 39 (SF) – 32 nm
  3. Pier 39 (SF) to Angel Island – 5.2 nm
  4. At Angel Island
  5. Angel Island to Half Moon Bay – 30 nm
  6. Half Moon Bay to Santa Cruz – 50 nm

So 166 nautical miles in total. Unfortunately, most of that was motoring.

Here is some video of the trip: https://youtu.be/Ndyu3lYQSc8

All in all, despite all the motoring, a fun trip! This was the longest continuous cruise I’ve done so that was a nice experience. I was mostly self-sufficient on the boat during these six days. I didn’t plug in to shore power or fill water tanks at any point. I did buy 4.9 gallons of diesel in Half Moon Bay on day 5.

Of all the improvements I’ve made to the boat, the autopilot is the most significant. I never could have attempted a trip like this without it. Standing at the helm for six to ten hours a day would be too much to even consider. Now with the autopilot, I only had to steer a few minutes a day, in and out of the harbors/anchorages.

2018 Mini Cruise (Day 6)

The final day of the trip was the longer stretch of coast back to Santa Cruz. Going south is always easier as the wind and waves are on our back. Except today there was no wind and no swell, just glassy seas.

N winds 5 to 10 kt. Wind waves 2 ft or less. Mixed swell
NW 1 to 2 ft at 9 seconds and S 2 to 3 ft at 16 seconds. Patchy
smoke.

Would have been nice to have the 10 knots of wind from the forecast, but it was mostly in the range of 0-5 all day. So, another day of nearly all motoring, unfortunately.

The seas were so calm that the jellyfish were on the surface in large numbers, always fun to see. Also saw one fairly small (maybe 4ft) Mola Mola sunning itself.

Didn’t take any photos but I have some video of the jellyfish which I’ll process later.

Arrived in Santa Cruz harbor about two hours after sunset.

Total for the day: 50 nautical miles.

ruta6

 

 

 

2018 Mini Cruise (Day 5)

2018-11-12

The plan for today was uncertain. Having fouled and lost the anchor in Half Moon Bay last time, I didn’t want to anchor out there again. So my plan was to leave early so I could sail all the way to Santa Cruz today if necessary.

With that plan, I left Angel Island before sunrise. The tides cooperated and this allowed me to hit max ebb on the Golden Gate bridge for a fast ride out on the current conveyor belt.

NE winds 10 to 20 kt. Wind waves 2 to 4 ft.
NW swell 2 to 3 ft at 10 seconds. Areas of smoke.

DSC18A_5453Visibility was low between the wildfire smoke and the usual fog. Made for a dramatic sunrise:

DSC18A_5457That’s Alcatraz in the hazy smoke and immediately behind it, the San Francisco skyline (completely hidden in the smoke).

The ebb pushed me out the gate fast (7+ kts SOG). There is often heavy shipping traffic in this area and today was no different. Finally got some good use out of the AIS system! After passing the Golden Gate I could see on the chartplotter that a large tanker was heading in even though I didn’t see the ship until much later.

The seas were very flat so I was able to cut across the south bar earlier than usual. Plenty of dolphins in this area!

Had some really great sailing for a while. Ten knots of wind from the east (very rare direction) made for a beam reach down the coast. A whale quietly surfaced just a boat-length away at one point!

After a while the winds died down as forecast. From there on it was just motor-sailing down to Half Moon Bay. Approaching the PP buoy, I called the harbor to see if any space was available. If not, I’d continue to Santa Cruz for a midnight arrival. However, this time they had some space! So I diverted into the harbor and tied off.

Total for the day: 30 nautical miles.

ruta5

2018 Mini Cruise (Day 4)

No sailing today. Staying a full day and another night at Angel Island.

This is the cruising life! Spent time in the morning just reading and taking in the nice views (although, still way too much smoke in the air).

DSC18A_5375Then I kayaked over to the island and did some hiking and wandering. The cafe was open today (it was closed yesterday due to smoke, said a sign) so I had lunch there.

DSC18A_5391The raccoons at Angel Island have no fear of humans, they wander around the crowds of people, fun to watch.

Eventually I paddled back to the boat and did a bit of maintenance. I dug out my other anchor and hooked it up just in case it’s needed. I also noticed the top port cheekblock for the lazy jack has come loose. Can’t fix that here, it’ll have to wait until I climb the mast.

After sunset, a large group of pelicans (over a hundred, I’d say) had a feeding frenzy right in the cove between the boats. It was really fun to see them dive for fish over and over. It was almost dark so the photographs didn’t come out well, but here’s one:

DSC18A_5421Next day I wanted to get an early start so then it was time to sleep early.

2018 Mini Cruise (Day 3)

After two long days, today was just a short hop over to Angel Island.

Thus, I didn’t have to get up before sunrise and I even had time for a warm breakfast on Pier 39.

The day was windy and choppy, typical San Francisco summer weather, although it is November. So got a little bit of sailing in and then it was time to turn into Ayala cove.

It was the first time I had to pick up the mooring balls (fore and aft) singlehanded but it all went smoothly.

After settling in, I went to the island on an inflatable kayak, walked around for a bit and then paddled back to the boat.

Total for the day: 5.2 nautical miles.

ruta3

2018 Mini Cruise (Day 2)

2018-11-09

The second day started way earlier than I had hoped. I woke up at 12:30am to the sound of water slapping against the hull. And wind. That sounds like a lot of wind! Checking the instruments, 22 knots!

Will the anchor hold? To make matters worse, the wind had turned around making the beach north of the anchorage a lee shore.

I was up until 4am monitoring the situation. The wind kept blowing at 20-22 knots and also making large direction changes every few minutes. Not a fun night!

I had the anchor watch running on the primary chartplotter (which I can monitor on a tablet from the comfort of the aft cabin) and also on a handheld GPS as a backup. Fortunately the anchor never budged, the boat remained in a tight circle all night.

Around 4am the wind decreased to 12-15 knots and I finally was able to get a couple hours of sleep.

My plan for the day had been to leave at 7:30am in order to cross the Golden Gate bridge on a favorable tide. So I was up at 7am to prepare and hoist anchor.

Well.. long story short, looks like the anchor tangled itself with something very heavy. No wonder it never moved an inch even with the repeated 180 degree windshifts and 20+ knots. I spent three hours trying to raise it. Even had a rising tide to help. Eventually I had to concede defeat and abandon the anchor and chain on the bottom.

E winds 5 to 15 kt with gusts up to 25 kt...
becoming NW this afternoon.
Wind waves 3 to 4 ft this morning...becoming 2 ft or less.
NW swell around 3 ft at 10 seconds. Patchy smoke this morning.

The rest of the trip, once I finally got moving, was uneventful. Heavy fog at sea and heavy smoke over land (from the wildfires) meant very little visibility in either direction. Flat seas and little to no wind meant nothing but motoring today.

Given the delay in the morning, instead of arriving at the Golden Gate at a moderate flood, I arrived at max ebb, -4.7 knots of current.

The current runs strongest in the deep water channel. However, the shallow area just south of the bridge often has a counter current so I tried to hide there on the approach. Instead of -4.7 knots just a few hundred feet to the left, I actually had 2+ knots of counter current to ride, resulting in 8 knots SOG towards the bridge! Nice.

Eventually, reaching the bridge, I had to get into the ebb at which point my speed over ground dropped from 8 to barely above 1.

ruta2bOnce past the bridge, it was a short motor into Pier 39 where I stayed for the night.

DSC18A_5363Total for the day: 32 nautical miles.

ruta2

 

2018 Mini Cruise (Day 1)

2018-11-08

The first day was to be the longest and toughest. Heading north from Santa Cruz means the wind will (nearly always) be on the nose and the swell is against us. This is the typical “bash” of traveling north on the California coast.

The weather had been rough for many days prior to this (swells over 10 ft in the 9-10 second range). As winter approaches, the coastal weather here tends to be either glassy calm or boisterous to stormy.

Today the swell was down to a more reasonable level and the forecast over the next 5-7 days looked nice, so the weather window was open. Time to go!

NW winds 10 to 20 kt with gusts up to 25 kt.
Wind waves 3 to 4 ft. NW swell 4 to 7 ft at 12 seconds.

Being November, the days are short, so I wanted to start as early as possible. I left about an hour before sunrise for the long motor-sail to Half Moon Bay.

Had a little bit of everything in terms of conditions on this first day. Long calm stretches, sometimes windy sometimes not. Also an hour or so of a very wet ride in the bumpier swell.

Davenport, just north of Santa Cruz:

DSC18A_5344Pigeon Point Lighthouse:

DSC18A_5345Approaching Half Moon Bay (really, Pillar Point) harbor I called to ask about guest slip availability. Unfortunately, none available! Fortunately there is a large anchorage between the inner and outer breakwaters so that’s where I went for the night.

By 4:30pm I was anchored in the outer harbor. It was very calm inside the harbor but I’m always a bit wary of sleeping while anchored so I let out 100 feet of chain in 12 feet of water.

Due to the large wildfires in California this week, the air was filled with smoke, leading to dramatic sunsets:

DSC18A_5360Total for the day was 49 nautical miles.

ruta1