Sunday, November 27, 2011

AddressSanitizer, etc


While Go becomes faster and more scalable (if you don't yet tried it, don't miss An Online Tour of Go), our team was working on the AddressSanitizer (ASan) - a fast memory error detector for C/C++. The tool is really fast, a typical slowdown is only about 2x (while with, for example, Valgrind/Memcheck we observe slowdowns of 20-50x). And the speed is a real game changer here - we build Chromium with ASan and then do interactive browsing, or execute thousands of randomly generated tests and then do dichotomy reduction of crashing tests. It's interesting to observe how the tools spreads basically w/o our efforts, for example, here somebody writes how to build Firefox with ASan, and here is an interesting story of "ASanified" Perl.

Here are some concurrency-related links:
Concurrency Kit provides a plethora of concurrency primitives, safe memory reclamation mechanisms and lock-less and lock-free data structures designed to aid in the design and implementation of high performance concurrent systems.
Chasing state of the art: Java synchronization algorithms.
Intel has updated and extended (AVX, NUMA) its Guide for Developing Multithreaded Applications.

Friday, August 5, 2011

Relacy Race Detector 2.4 and more


It's been a long time since I blogged the last time. Since that I had joined Google, and was busy working on the ThreadSanitizer project, which is one the best race detectors out there (not saying that it's free and open source). Also I am actively contributing to the Go language concurrency/scalability related things, and it already starts paying off. The language has built-in support for concurrency and is very nice and simple to say the least of it. But any of that things requires a separate blog post.

OK, I've just uploaded Relacy Race Detector version 2.4, you may download it here.
+ Support for futex(FUTEX_WAIT/FUTEX_WAKE)
+ Linux/Darwin performance improved (2.5x for Linux, 7x for Darwin)
+ Fixed a bunch of issues with WaitForMultipleObjects()/SignalObjectAndWait()
+ Fixed rare spurious memory leak reports related to test progress reporting

The credit for the release goes to Charles Bloom - check out the blog, you may find a lot of interesting concurrency related stuff there.

The performance improvement is a result of an interesting optimization related to fiber/ucontext switching, and it deserves a separate post:

Also check out SPAA11 "Location-Based Memory Fences" paper that I co-authored.

P.S. Google office in Moscow is awesome, people and possibilities are even better. Do not hesitate to submit your CV now... and don't forget to indicate me as a referee :)

Wednesday, February 2, 2011

Cache-oblivious Algorithms


This time it's about Parallel Computations, and the first real article in the section is titled Cache-oblivious Algorithms. It introduces the problem of memory bandwidth, the concept of cache hierarchy in modern computers; then describes the underlying idea of cache-oblivious algorithms, and proceeds with an example of how to design a cache-oblivious algorithm for a simple problem.Presented benchmark results show unconditional superiority of the approach.

Monday, January 31, 2011

Distributed Reader-Writer Mutex

A new article Distributed Reader-Writer Mutex describes design and implementation of a simple yet scalable rw mutex. The mutex uses an interesting technique of per-processor data. Results of a benchmark against plain pthread_rwlock_t are presented.

By the way, I've changed code license on the site, now all code is covered by the Simplified BSD License.

Thursday, January 27, 2011

Per-processor Data

A new article Per-processor Data is available under Lockfree/Tips&Tricks section. It introduces an interesting technique of per-processor data, as well as some implementation aspects.

Case Study: MultiLane - a concurrent blocking multiset

I've published a new article Case Study: MultiLane - a concurrent blocking multiset. It's about Dave Dice et al  paper of the same name. The paper describes MultiLane technique which can be used to improve scalability of producer-consumer systems. I also describe some alternative architectural approaches to the problem.

Monday, January 24, 2011

Lockfree Tips&Tricks


I've kicked off a new subsection Lockfree/Tips&Tricks, in which I collect various separate topics related to design and implementation of synchronization algorithms. There are 2 articles for warm-up: Spinning (active/passive/hybrid, pros&cons, implementation) and Pointer Packing.

Saturday, January 22, 2011

Lock-Free Link Pack

Check out updated Lock-Free Links page. There you will find links to books, blogs, websites, research groups, separate articles and some sources to study. Perhaps, more than I ever needs to know about synchronization algorithms :)
Have I read all that? Well, mostly, but not all :)

Friday, January 21, 2011

Case Study: FastFlow Queue


I've posted a new article Case Study: FastFlow Queue, in which I examine design and implementation of a nonblocking single-producer/single-consumer queue, which used as a base building block in the FastFlow multicore programming library. In the second part of the article I design and implement "a better" queue. Benchmark results confirm better performance and scalability.

By the way, thanks to all who has participated in the poll on left right pane.

Saturday, January 8, 2011

Alternative web addresses

It seems that Google hosting periodically has outages. Just in case you have problems accessing the main URL, you can use the following one:

Friday, January 7, 2011

Differential Reference Counting

Just published an article on Differential Reference Counting - reference counting algorithm which provides strong thread-safety and permits useful use cases that are not permitted by plain atomic reference counting (like boost::shared_ptr).

Wednesday, January 5, 2011

Lazy Concurrent Initialization

I've added an article on Lazy Concurrent Initialization. It describes 2 types of initialization - blocking and non-blocking, describes available implementations in Windows API, POSIX threads and C1x/C++0x and provides guidelines for efficient implementation based on atomic operations and fine-grained memory fences.

Tuesday, January 4, 2011

What new materials would you like to appear on 1024cores?


I would like you to share you preferences regarding what kinds of information you would like to appear on 1024cores in the near future. You know, I need to kind of prioritize everything I may potentially share there, and your input is important to me.

I've started a poll on the right pane >>>

I have a lot of thoughts and ideas on lock-free algorithms: about queues and reader-writer problem, about object life-time management, lazy initialization, caching, etc

I may share more thought on scalable architecture: message-passing and pipelining, overloads, patterns, etc.

I may share thing related to parallel computations: some principles, caveats, problems and solutions, patterns and anti-patterns, etc.

Concurrency Myth Busters is going to be a section on common misconceptions around multicore, concurrency and parallelism.

I may highlight some hardware aspects like caches, NUMA and HyperThreading.

There are also Links, Libraries and Tools.

And if you check Other, please, drop a comment here regarding what exactly you mean.

Also don't hesitate to share here other thoughts, comments and suggestions on the site here.

Thank you.

Monday, January 3, 2011

Parallel Disk IO

I've added an article on Parallel Disk IO Subsystem:
It's based on my Wide Finder 2 entry, but it's quite general and describes common aspects and patterns and anti-patterns. It shows how not to under-subscribe nor over-subscribe CPUs and disks, how to keep data hot in caches and memory, how to automatically adapt to various external conditions, etc.

Here is also a brief write-up on the Wide Finder 2 entry itself: