EMULAB Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

The new forum is online, hope you enjoy it!

Pages: [1]   Go Down

Author Topic: to thread or not to thread that is the question  (Read 1157 times)

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 120
  • Offline Offline
  • Posts: 3456
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 131.0.0.0 Chrome 131.0.0.0
    • View Profile
to thread or not to thread that is the question
« on: 11 December 2024, 09:36 »

Sorry, no profiler news at the moment.
Oddi's remark about slowlyness when scanning progetto snaps keeps me a bit busy these days. Not only that I already found some hash lookups which can be optimized a bit, I wonder how threading can influence the scanning speed.
Now John IV would say: I'm getting 1 second for a full scan, so threading rocks. yes, it does...when you're on a sdd...when you're on a hdd, too many threads may lead to too many seek operations which may slow the process down.
In case of an unpacked progetto collection (we talk about > 300000 files), each file's crc32 needs to be calculated....which is a lot seeking, reading and calculating (even if the actual calc is quickly done). Using too many threads may slow down the process....so currently I will do some benchmarking what the best value is.....I keep you updated with the output. Even if I see that too many threads slow it down, I somehow need a way to determine a good value (ok, I can show an input box where you simply define it as last possible solution ;-))
Logged


Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 120
  • Offline Offline
  • Posts: 3456
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 131.0.0.0 Chrome 131.0.0.0
    • View Profile
Re: to thread or not to thread that is the question
« Reply #1 on: 11 December 2024, 21:03 »

ok..some quick test runs are done....on a HD

Scenario 1, typical MAME style, lots of sets with some files per set, compressed
Scenario 2, typical Progetto style, small amout of sets with huge number of files per set, decompressed

Using let's say 28 versus 1 thread for Scenario 1, the more threads, the faster it gets.......good...end of story

For Scenario 2, you run into a real speed decrease as soon as you use more than 1 thread (in a concrete example of ~40000 files spread over 12 sets we talk about 5 (1 thread) versus 22 minutes (28 threads)

Now you'd say: ok, when you have decompressed files, use 1 thread...but sorry...it's not that easy. It depends on the number of files per set, the file sizes and how they are stored on the hd. For scenario 1 but in a decompressed form, you can easily get better results with more threads....*sigh*

The threads work on sets, so for Scenario 2, it tries to scan all 28 (ok there are only 12 available) sets (each with lots of files in it) in parallel leading to massive seeking overhead.
For Scenario 1, compressed, it scans 28 sets which (means just 28 files)  in parallel which seems to work pretty fine on a hd.

I think a general "use max threads when running on compressed data, "use 1 thread when running on decompressed data" can be a good start....maybe with some manual overwrite by the user if wanted....will see.....
« Last Edit: 11 December 2024, 21:26 by Roman »
Logged

oddi

  • Member
  • *
  • Karma: 2
  • Offline Offline
  • Posts: 195
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 132.0.0.0 Chrome 132.0.0.0
    • View Profile
Re: to thread or not to thread that is the question
« Reply #2 on: 12 December 2024, 10:54 »

Great job !!! Thank you  mister.
Logged
Pages: [1]   Go Up
 

Page created in 0.103 seconds with 20 queries.