1024cores

Friday, February 22, 2013

Go, data races and combining

Hi,

I continue my work on speeding up Go language. The next major release will feature faster parallel garbage collector, goroutine blocking profiler (enabled with -blockprofile flag) and a lot of other yummy features. Last but not least -- builtin data race detector. Yay! Just add -race flag.

The race detector is based on our ThreadSanitizer technology, which is initially developed for C/C++. Now that both C and C++ standards have multithreading support, data races officially banned as "undefined behavior". You can read my essay on data race here.

If you are subscribed to this blog, you are probably interested in synchronization and concurrency stuff. It's not that I am writing a lot lately, but here is at least something new -- Combiner/Aggregator Synchronization Primitive.

Best

Sunday, November 27, 2011

AddressSanitizer, etc

Hi,

While Go becomes faster and more scalable (if you don't yet tried it, don't miss An Online Tour of Go), our team was working on the AddressSanitizer (ASan) - a fast memory error detector for C/C++. The tool is really fast, a typical slowdown is only about 2x (while with, for example, Valgrind/Memcheck we observe slowdowns of 20-50x). And the speed is a real game changer here - we build Chromium with ASan and then do interactive browsing, or execute thousands of randomly generated tests and then do dichotomy reduction of crashing tests. It's interesting to observe how the tools spreads basically w/o our efforts, for example, here somebody writes how to build Firefox with ASan, and here is an interesting story of "ASanified" Perl.

Here are some concurrency-related links:
Concurrency Kit provides a plethora of concurrency primitives, safe memory reclamation mechanisms and lock-less and lock-free data structures designed to aid in the design and implementation of high performance concurrent systems.
Chasing state of the art: Java synchronization algorithms.

Intel has updated and extended (AVX, NUMA) its Guide for Developing Multithreaded Applications.

Friday, August 5, 2011

Relacy Race Detector 2.4 and more

Hi,

It's been a long time since I blogged the last time. Since that I had joined Google, and was busy working on the ThreadSanitizer project, which is one the best race detectors out there (not saying that it's free and open source). Also I am actively contributing to the Go language concurrency/scalability related things, and it already starts paying off. The language has built-in support for concurrency and is very nice and simple to say the least of it. But any of that things requires a separate blog post.

OK, I've just uploaded Relacy Race Detector version 2.4, you may download it here.
Features:
+ Support for futex(FUTEX_WAIT/FUTEX_WAKE)
+ Linux/Darwin performance improved (2.5x for Linux, 7x for Darwin)
Fixes:
+ Fixed a bunch of issues with WaitForMultipleObjects()/SignalObjectAndWait()
+ Fixed rare spurious memory leak reports related to test progress reporting

The credit for the release goes to Charles Bloom - check out the blog, you may find a lot of interesting concurrency related stuff there.

The performance improvement is a result of an interesting optimization related to fiber/ucontext switching, and it deserves a separate post:
http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers

Also check out SPAA11 "Location-Based Memory Fences" paper that I co-authored.

P.S. Google office in Moscow is awesome, people and possibilities are even better. Do not hesitate to submit your CV now... and don't forget to indicate me as a referee :)

Wednesday, February 2, 2011

Cache-oblivious Algorithms

Hi!

This time it's about Parallel Computations, and the first real article in the section is titled Cache-oblivious Algorithms. It introduces the problem of memory bandwidth, the concept of cache hierarchy in modern computers; then describes the underlying idea of cache-oblivious algorithms, and proceeds with an example of how to design a cache-oblivious algorithm for a simple problem.Presented benchmark results show unconditional superiority of the approach.

Monday, January 31, 2011

Distributed Reader-Writer Mutex

A new article Distributed Reader-Writer Mutex describes design and implementation of a simple yet scalable rw mutex. The mutex uses an interesting technique of per-processor data. Results of a benchmark against plain pthread_rwlock_t are presented.

By the way, I've changed code license on the site, now all code is covered by the Simplified BSD License.

Thursday, January 27, 2011

Per-processor Data

A new article Per-processor Data is available under Lockfree/Tips&Tricks section. It introduces an interesting technique of per-processor data, as well as some implementation aspects.

Case Study: MultiLane - a concurrent blocking multiset

I've published a new article Case Study: MultiLane - a concurrent blocking multiset. It's about Dave Dice et al paper of the same name. The paper describes MultiLane technique which can be used to improve scalability of producer-consumer systems. I also describe some alternative architectural approaches to the problem.