![]() ![]() In that vein, there are a few general trends that appear over and over again. ![]() By the time you're done they rarely resemble anything intelligible, let alone wha the whiteboard version of the algorithm was. Instead each problem sort of has some kind of exotic, hypothetically ideal data structure, that is fine tuned to exploit the machine's resources to the max for just that problem. However, these aren't your normal, fundamental data structures. and this ultimately comes down to data structures. Although, there are some special case where even a slight theoretical gain matters even more than how it translates to hardware, but they're limited.Īnyways, to that end, everything becomes about dividing work in a way that parallelizes nicely, exploits cache well, reduces the need to share information between threads, etc. Theoretical advancements matter too but usually only in so much as they can translate to hardware. In the end what I learned, overwhelmingly, first hand, is that in the performance computing world, what wins is what exploits the hardware intelligently. I took a graduate level class in performance computing that was basically all lab based. This is in a nutshell the moral of the performance story. In summary: single threaded programs can easily be faster and more performant (wall clock time) than multithreaded/multimachine distributed machines, but they don't scale. I need to split up my problem into spaces that don't require synchronization/serialisation.ĮDIT: I've mentioned this whitepaper before ("Scalability! But at what COST?") but this whitepaper would be useful reading of anybody working on multithreaded or distributed systems. We hit Amdahl's law when it comes to parallelising. I think my approach to parallelising A* needs a redesign. I recently worked at parallelising the A* graph search algorithm that I'm using for code generation/program synthesis.įor 16 processes it takes 35 seconds to synthesise a program but with 3 processes it takes 21 seconds. But it can count to 2 billion in 1 second. My machine can lock and unlock 61570760 times a second. This is similar to the Rust's "no interior mutabiliy" and Rust data structures pattern. I think the easiest way to parallelise is to shard your data per thread and treat your multithreaded (or multimachine) architecture as a tree - not a graph - where dataflow doesn't need to pass between tree branches. Thank you for this blog post.Īsynchronous code, coroutines, async/await, parallelising problems is my deep interest and I blog about it everyday.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |