tag:blogger.com,1999:blog-90165985761487586562024-03-18T08:53:16.482-07:001024coresThis is an accompanying blog for the www.1024cores.net site about lockfree/waitfree synchronization algorithms and data structures, scalability-oriented architecture, multicore design patterns, high-performance computing, threading technologies and libraries, message-passing systems and related topics.Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.comBlogger20125tag:blogger.com,1999:blog-9016598576148758656.post-3708086536140442082013-02-22T07:23:00.001-08:002013-02-22T07:23:56.620-08:00Go, data races and combiningHi,<br />
<br />
I continue my work on speeding up <a href="http://golang.org/" target="_blank">Go language</a>. The next major release will feature faster parallel garbage collector, goroutine blocking profiler (enabled with -blockprofile flag) and a lot of other <a href="https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/FxELIOik2f4" target="_blank">yummy features</a>. Last but not least -- <a href="http://tip.golang.org/doc/articles/race_detector.html" target="_blank">builtin data race detector</a>. Yay! Just add -race flag.<br />
<br />
The race detector is based on our <a href="https://code.google.com/p/thread-sanitizer/" target="_blank">ThreadSanitizer</a> technology, which is initially developed for C/C++. Now that both C and C++ standards have multithreading support, data races officially banned as "undefined behavior". You can read my essay on data race <a href="http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong" target="_blank">here</a>.<br />
<br />
If you are subscribed to this blog, you are probably interested in synchronization and concurrency stuff. It's not that I am writing a lot lately, but here is at least something new -- <a href="http://software.intel.com/en-us/blogs/2013/02/22/combineraggregator-synchronization-primitive" target="_blank">Combiner/Aggregator Synchronization Primitive</a>.<br /><br />
Best<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com217tag:blogger.com,1999:blog-9016598576148758656.post-68249192416338299062011-11-27T06:52:00.000-08:002011-11-27T06:52:39.803-08:00AddressSanitizer, etcHi,<br />
<br />
While Go becomes faster and more scalable (if you don't yet tried it, don't miss <a href="http://tour.golang.org/" target="_blank">An Online Tour of Go</a>), our team was working on the <a href="http://code.google.com/p/address-sanitizer/" target="_blank">AddressSanitizer</a> (ASan) - a fast memory error detector for C/C++. The tool is really fast, a typical slowdown is only about 2x (while with, for example, Valgrind/Memcheck we observe slowdowns of 20-50x). And the speed is a real game changer here - we build Chromium with ASan and then do interactive browsing, or execute thousands of randomly generated tests and then do dichotomy reduction of crashing tests. It's interesting to observe how the tools spreads basically w/o our efforts, for example, here somebody writes how to <a href="http://ehsanakhgari.org/blog/2011-06-30/building-firefox-address-sanitizer" target="_blank">build Firefox</a> with ASan, and here is an interesting story of <a href="http://blogs.perl.org/users/rurban/2011/11/adventures-with-clang-and-asan.html" target="_blank">"ASanified" Perl</a>.<br />
<br />
Here are some concurrency-related links:<br />
<a href="http://www.concurrencykit.org/index.html" target="_blank">Concurrency Kit</a> provides a plethora of concurrency primitives, safe memory reclamation mechanisms and lock-less and lock-free data structures designed to aid in the design and implementation of high performance concurrent systems.<br />
<a href="http://dzmitryhuba.blogspot.com/search/label/Concurrency" target="_blank">Chasing state of the art</a>: Java synchronization algorithms.<div>Intel has updated and extended (AVX, NUMA) its <a href="http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications/" target="_blank">Guide for Developing Multithreaded Applications</a>.</div><div><br />
</div><div><br />
</div><div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com184tag:blogger.com,1999:blog-9016598576148758656.post-36376385253728942912011-08-05T05:20:00.000-07:002011-08-05T05:20:42.831-07:00Relacy Race Detector 2.4 and moreHi,<br />
<br />
It's been a long time since I blogged the last time. Since that I had joined <a href="http://www.google.com/">Google</a>, and was busy working on the <a href="http://code.google.com/p/data-race-test/">ThreadSanitizer</a> project, which is one the best race detectors out there (not saying that it's free and open source). Also I am actively contributing to the <a href="http://golang.org/">Go language</a> concurrency/scalability related things, and it already <a href="http://groups.google.com/group/golang-nuts/browse_thread/thread/d8d3e370200b44aa">starts paying off</a>. The language has <a href="http://golang.org/doc/effective_go.html#concurrency">built-in support for concurrency </a>and is very nice and simple to say the least of it. But any of that things requires a separate blog post.<br />
<br />
OK, I've just uploaded Relacy Race Detector version 2.4, you may download it <a href="http://www.1024cores.net/home/downloads/relacy_2_4.zip">here</a>.<br />
Features:<br />
+ Support for futex(FUTEX_WAIT/FUTEX_WAKE) <br />
+ Linux/Darwin performance improved (2.5x for Linux, 7x for Darwin)<br />
Fixes:<br />
+ Fixed a bunch of issues with WaitForMultipleObjects()/SignalObjectAndWait()<br />
+ Fixed rare spurious memory leak reports related to test progress reporting<br />
<br />
<br />
The credit for the release goes to <a href="http://cbloomrants.blogspot.com/">Charles Bloom</a> - check out the blog, you may find a lot of interesting concurrency related stuff there.<br />
<br />
The performance improvement is a result of an interesting optimization related to fiber/ucontext switching, and it deserves a separate post:<br />
<a href="http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers">http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers</a><br />
<br />
Also check out SPAA11 <a href="http://people.csail.mit.edu/edya/publications/LocationBasedMemoryFense-SPAA11.pdf">"Location-Based Memory Fences"</a> paper that I co-authored.<br />
<br />
<br />
P.S. Google office in Moscow is <a href="http://fishki.net/comment.php?id=80465">awesome</a>, people and possibilities are even better. Do not hesitate to <a href="http://www.google.ru/jobs/swe/index.html">submit your CV now</a>... and don't forget to indicate me as a referee :)<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com140tag:blogger.com,1999:blog-9016598576148758656.post-63916187448211682932011-02-02T03:44:00.000-08:002011-02-02T03:44:54.592-08:00Cache-oblivious AlgorithmsHi!<br />
<br />
This time it's about Parallel Computations, and the first real article in the section is titled <a href="http://www.1024cores.net/home/parallel-computing/cache-oblivious-algorithms">Cache-oblivious Algorithms</a>. It introduces the problem of memory bandwidth, the concept of cache hierarchy in modern computers; then describes the underlying idea of cache-oblivious algorithms, and proceeds with an example of how to design a cache-oblivious algorithm for a simple problem.Presented benchmark results show unconditional superiority of the approach.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com345tag:blogger.com,1999:blog-9016598576148758656.post-87742947135398143252011-01-31T10:16:00.000-08:002011-01-31T10:16:58.233-08:00Distributed Reader-Writer MutexA new article <a href="http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex">Distributed Reader-Writer Mutex</a> describes design and implementation of a simple yet scalable rw mutex. The mutex uses an interesting technique of <a href="http://www.1024cores.net/home/lock-free-algorithms/tricks/per-processor-data">per-processor data</a>. Results of a benchmark against plain pthread_rwlock_t are presented.<br />
<br />
By the way, I've changed code license on the site, now all code is covered by the <a href="http://www.1024cores.net/site/1024cores/home/code-license">Simplified BSD License</a>.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com137tag:blogger.com,1999:blog-9016598576148758656.post-46448823151647358032011-01-27T12:14:00.000-08:002011-01-27T12:14:40.711-08:00Per-processor DataA new article <a href="http://www.1024cores.net/home/lock-free-algorithms/tricks/per-processor-data">Per-processor Data</a> is available under <a href="http://www.1024cores.net/home/lock-free-algorithms/tricks">Lockfree/Tips&Tricks</a> section. It introduces an interesting technique of per-processor data, as well as some implementation aspects.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com67tag:blogger.com,1999:blog-9016598576148758656.post-24603696667322040722011-01-27T06:24:00.000-08:002011-01-27T06:24:38.557-08:00Case Study: MultiLane - a concurrent blocking multisetI've published a new article <a href="http://www.1024cores.net/home/scalable-architecture/multilane">Case Study: MultiLane - a concurrent blocking multiset</a>. It's about Dave Dice et al paper of the same name. The paper describes MultiLane technique which can be used to improve scalability of producer-consumer systems. I also describe some alternative architectural approaches to the problem.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com70tag:blogger.com,1999:blog-9016598576148758656.post-23818823501426288362011-01-24T11:29:00.000-08:002011-01-24T11:29:18.153-08:00Lockfree Tips&Tricks<div align="LEFT" lang="en-US" style="margin-bottom: 0in; orphans: 2; widows: 2;"></div><div align="LEFT" lang="en-US" style="margin-bottom: 0in; orphans: 2; widows: 2;"><span class="Apple-style-span" style="line-height: 21px;"> Hi!</span></div><div align="LEFT" lang="en-US" style="margin-bottom: 0in; orphans: 2; widows: 2;"><span class="Apple-style-span" style="line-height: 21px;"><br />
</span></div><div align="LEFT" lang="en-US" style="margin-bottom: 0in; orphans: 2; widows: 2;"><span class="Apple-style-span" style="line-height: 21px;">I've kicked off a new subsection <a href="http://www.1024cores.net/home/lock-free-algorithms/tricks">Lockfree/Tips&Tricks</a>, in which I collect various separate topics related to design and implementation of synchronization algorithms. There are 2 articles for warm-up: <a href="http://www.1024cores.net/home/lock-free-algorithms/tricks/spinning">Spinning</a> (active/passive/hybrid, pros&cons, implementation) and <a href="http://www.1024cores.net/home/lock-free-algorithms/tricks/pointer-packing">Pointer Packing</a>.</span></div><div align="LEFT" lang="en-US" style="margin-bottom: 0in; orphans: 2; widows: 2;"><span class="Apple-style-span" style="line-height: 21px;"><br />
</span></div><div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com24tag:blogger.com,1999:blog-9016598576148758656.post-87082728339088379492011-01-22T11:21:00.000-08:002011-01-22T11:21:21.579-08:00Lock-Free Link PackCheck out updated <a href="http://www.1024cores.net/home/lock-free-algorithms/links">Lock-Free Links</a> page. There you will find links to books, blogs, websites, research groups, separate articles and some sources to study. Perhaps, more than I ever needs to know about synchronization algorithms :)<br />
Have I read all that? Well, mostly, but not all :)<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com35tag:blogger.com,1999:blog-9016598576148758656.post-71502700167524117542011-01-21T06:40:00.000-08:002011-01-21T06:40:21.473-08:00Case Study: FastFlow QueueHi!<br />
<br />
I've posted a new article <a href="http://www.1024cores.net/home/lock-free-algorithms/queues/fastflow">Case Study: FastFlow Queue</a>, in which I examine design and implementation of a nonblocking single-producer/single-consumer queue, which used as a base building block in the <a href="http://www.1024cores.net/home/technologies/fastflow">FastFlow</a> multicore programming library. In the second part of the article I design and implement "a better" queue. Benchmark results confirm better performance and scalability.<br />
<br />
By the way, thanks to all who has participated in the poll on <s>left </s>right pane.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com19tag:blogger.com,1999:blog-9016598576148758656.post-21647425134707901612011-01-08T00:12:00.000-08:002011-01-08T00:12:50.197-08:00Alternative web addressesIt seems that Google hosting periodically has outages. Just in case you have problems accessing the main URL, you can use the following one:<br />
<a href="http://sites.google.com/site/1024cores">http://sites.google.com/site/1024cores</a><div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com149tag:blogger.com,1999:blog-9016598576148758656.post-8642496889827937132011-01-07T08:56:00.000-08:002011-01-07T08:56:38.667-08:00Differential Reference CountingJust published an article on <a href="http://www.1024cores.net/home/lock-free-algorithms/object-life-time-management/differential-reference-counting">Differential Reference Counting</a> - reference counting algorithm which provides strong thread-safety and permits useful use cases that are not permitted by plain atomic reference counting (like boost::shared_ptr).<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com45tag:blogger.com,1999:blog-9016598576148758656.post-79745181062925046422011-01-05T12:46:00.000-08:002011-01-05T12:46:10.480-08:00Lazy Concurrent InitializationI've added an article on <a href="http://www.1024cores.net/home/lock-free-algorithms/lazy-concurrent-initialization">Lazy Concurrent Initialization</a>. It describes 2 types of initialization - blocking and non-blocking, describes available implementations in Windows API, POSIX threads and C1x/C++0x and provides guidelines for efficient implementation based on atomic operations and fine-grained memory fences.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com22tag:blogger.com,1999:blog-9016598576148758656.post-4134525062950174502011-01-04T08:41:00.000-08:002011-01-04T09:31:25.016-08:00What new materials would you like to appear on 1024cores?Hi!<br />
<br />
I would like you to share you preferences regarding what kinds of information you would like to appear on <a href="http://www.1024cores.net/">1024cores</a> in the near future. You know, I need to kind of prioritize everything I may potentially share there, and your input is important to me.<br />
<br />
I've started a poll on the right pane >>><br />
<br />
I have a lot of thoughts and ideas on lock-free algorithms: about queues and reader-writer problem, about object life-time management, lazy initialization, caching, etc<br />
<br />
I may share more thought on scalable architecture: message-passing and pipelining, overloads, patterns, etc.<br />
<br />
I may share thing related to parallel computations: some principles, caveats, problems and solutions, patterns and anti-patterns, etc.<br />
<br />
Concurrency Myth Busters is going to be a section on common misconceptions around multicore, concurrency and parallelism.<br />
<br />
I may highlight some hardware aspects like caches, NUMA and HyperThreading.<br />
<br />
There are also Links, Libraries and Tools.<br />
<br />
And if you check Other, please, drop a comment here regarding what exactly you mean.<br />
<br />
Also don't hesitate to share here other thoughts, comments and suggestions on the site here.<br />
<br />
Thank you.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com18tag:blogger.com,1999:blog-9016598576148758656.post-68042528105478869562011-01-03T05:48:00.000-08:002011-01-03T06:19:22.970-08:00Parallel Disk IOI've added an article on Parallel Disk IO Subsystem:<br />
<a href="http://www.1024cores.net/home/scalable-architecture/parallel-disk-io">http://www.1024cores.net/home/scalable-architecture/parallel-disk-io</a><br />
It's based on my Wide Finder 2 entry, but it's quite general and describes common aspects and patterns and anti-patterns. It shows how not to under-subscribe nor over-subscribe CPUs and disks, how to keep data hot in caches and memory, how to automatically adapt to various external conditions, etc.<br />
<br />
Here is also a brief write-up on the Wide Finder 2 entry itself:<br />
<a href="http://www.1024cores.net/home/scalable-architecture/wide-finder-2">http://www.1024cores.net/home/scalable-architecture/wide-finder-2</a><div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com31tag:blogger.com,1999:blog-9016598576148758656.post-5309953211670149802010-12-30T08:43:00.000-08:002010-12-30T08:43:40.163-08:00Scalable ArchitectureHi!<br />
<br />
I've just kicked off the "Scalable Architecture" section on the site. For starters there is an <a href="http://www.1024cores.net/home/scalable-architecture/introduction">Introduction</a> which describes fundamental concepts of Distribution and Independence, as well as some ways as to how you may reason about architecture in the context of scalability.<br />
There is also <a href="http://www.1024cores.net/home/scalable-architecture/general-recipe">General "Bird's Eye View" Recipe</a> for scalable architecture.<br />
And free-standing <a href="http://www.1024cores.net/home/scalable-architecture/task-scheduling-strategies">introduction to task scheduling strategies</a>.<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com24tag:blogger.com,1999:blog-9016598576148758656.post-84033064776747046822010-12-30T08:28:00.000-08:002010-12-30T08:28:47.481-08:00About Me<span class="Apple-style-span" style="color: #333333; font-family: Georgia, 'Times New Roman', serif; font-size: 14px; line-height: 21px;">You know, I've got a rightful comments <s>Who the #$%^ are you?</s> that there are too many undergrads writing naive things about lock-free algorithms all over the Internet, so I kind of need to prove my "authority" on the matter :)</span><br />
<span class="Apple-style-span" style="color: #333333; font-family: Georgia, 'Times New Roman', serif; font-size: 14px; line-height: 21px;">So I've published the About Me page:</span><br />
<span class="Apple-style-span" style="color: #333333; font-family: Georgia, 'Times New Roman', serif;"><span class="Apple-style-span" style="font-size: 15px; line-height: 21px;"><a href="http://www.1024cores.net/home/about-me">http://www.1024cores.net/home/about-me</a></span></span><br />
<span class="Apple-style-span" style="color: #333333; font-family: Georgia, 'Times New Roman', serif;"><span class="Apple-style-span" style="font-size: 15px; line-height: 21px;"><br />
</span></span><div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com55tag:blogger.com,1999:blog-9016598576148758656.post-26306486917425759942010-12-26T09:32:00.000-08:002010-12-26T09:32:27.698-08:00So what is a memory model? And how to cook it?Just posted a new article on fundamentals of memory models in the context of multi-threading. It covers 3 basic properties: Atomicity, Visibility and Ordering, along with some compiler-related and high-level languages aspects:<br />
<a href="http://www.1024cores.net/home/lock-free-algorithms/so-what-is-a-memory-model-and-how-to-cook-it">http://www.1024cores.net/home/lock-free-algorithms/so-what-is-a-memory-model-and-how-to-cook-it</a><div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com17tag:blogger.com,1999:blog-9016598576148758656.post-24962183024535754802010-12-25T03:20:00.000-08:002010-12-25T03:28:55.215-08:001024cores officially goes into public beta<a href="http://www.1024cores.net/">http://www.1024cores.net</a> - a site devoted to concurrency, synchronization algorithms, parallel computations (HPC), multi-threading, multi-core, scalability - officially goes into public beta.<br />
<br />
So what is already there - some materials on synchronization (lock-free) algorithms, some articles on parallel computations (my write-ups for Intel Threading Challenge), and some initial sections on libraries, tools and external references (books, blogs, articles).<br />
Much more is coming (including sections of scalable architecture and hardware aspects), so subscribe to the RSS feed or follow <a href="http://blog.1024cores.net/">http://blog.1024cores.net</a>.<br />
<br />
Also don't hesitate to tweet/buzz/share/blog about the site. I appreciate it. Thank you in advance.<br />
Stay tuned!<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com18tag:blogger.com,1999:blog-9016598576148758656.post-11777390273348435752010-12-20T04:41:00.000-08:002010-12-20T04:41:53.824-08:00Welcome!Hi!<br />
My name is Dmitry Vyukov (aka Dmitiy V'jukov, dvyukov and remark), and this an accompanying blog for the <a href="http://www.1024cores.net/">1024cores</a> site. The site is dedicated to lock-free, wait-free, obstruction-free, atomic-free synchronization algorithms and data structures, scalability-oriented architecture, multi-core/multi-processor design patterns, high-performance computing, threading technologies and libraries, message-passing systems and related topics.<br />
In this blog I am going to post announcement of a new content available on the site, so subscribe to it right now! ;)<div class="blogger-post-footer"><a href="http://www.1024cores.net/" rel="tag" style="display:none">CodeProject</a></div>Dmitry Vyukovhttp://www.blogger.com/profile/10137998824493472445noreply@blogger.com19