Thursday, January 27, 2011

Per-processor Data

A new article Per-processor Data is available under Lockfree/Tips&Tricks section. It introduces an interesting technique of per-processor data, as well as some implementation aspects.

67 comments:

  1. FYI http://www.kernel.org/doc/man-pages/online/pages/man2/getcpu.2.html

    VERSIONS
    getcpu() was added in kernel 2.6.19 for x86_64 and i386.

    ReplyDelete
  2. And the code to use it since the glibc doesn't export it is:


    /* This code is in the public domain */
    #define _GNU_SOURCE
    #include
    #include
    #include

    struct getcpu_cache {
    unsigned long blob[128 / sizeof(long)];
    };

    static long (*vgetcpu)(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);

    static int init_vgetcpu(void)
    {
    void *vdso;

    dlerror();
    vdso = dlopen("linux-vdso.so.1", RTLD_LAZY);
    if (vdso == NULL)
    return -1;
    vgetcpu = dlsym(vdso, "__vdso_getcpu");
    dlclose(vdso);
    return vgetcpu == NULL ? -1 : 0;
    }

    int main(void)
    {
    unsigned cpu, node;
    struct getcpu_cache cache;

    if (init_vgetcpu() < 0) {
    fprintf(stderr, "Unable to locate vgetcpu: %s", dlerror());
    return -1;
    }

    if (vgetcpu(&cpu, &node, &cache) < 0) {
    perror("vgetcpu");
    return -1;
    }
    printf("cpu:%d node:%d\n", cpu, node);
    return 0;
    }


    and using it gives:
    $ repeat 10 ./vgetcpu
    cpu:0 node:0
    cpu:1 node:0
    cpu:0 node:0
    cpu:1 node:0
    cpu:0 node:0
    cpu:1 node:0
    cpu:1 node:0
    cpu:0 node:0
    cpu:1 node:0
    cpu:1 node:0

    (this is a core2 duo, I suppose that node is != 0 when this is a hardware thread but I don't have access to HT machines right now so somebody could probably confirm that)

    ReplyDelete
  3. Which seems to only work on X64, I must reckon I've no idea how vdso works on i386 (but who cares anyway ;p)

    ReplyDelete
  4. and I'm wrong, glibc returns it through sched_getcpu... "woopsie" ;)

    ReplyDelete
  5. getcpu is still rather heavy-weight compared to a direct call to rdtscp, vsyscall is definitely cheap in some cases but it isn't free. Per-processor distribution is nice, but sometimes you may simply want cache line distribution in which case you can use a simple subscribe/unsubscribe interface (which will also at least help alleviate some hotspot pains).

    For example,
    pid = subscribe(data_structure);
    operation(pid, data_structure);
    ...
    ...
    unsubscribe(data_structure, pid);

    where pid may be a map to a unique cache line. Additionally, if CPU migration is common and you're working with relatively small data sets then locality benefits may not exceed the cost of cache line invalidation.

    ReplyDelete
  6. @MadCoder Thanks for your suggestions. sched_getcpu() is not quite vgetcpu(). It seems that sched_getcpu() initially was a slow syscall, but I see that there is a constant progress going on:
    http://kerneltrap.org/mailarchive/linux-kernel/2008/9/23/3387414
    I gave sched_getcpu() one more try, and it seems that now it's quite fast (potentially implemented with the same SIDT or LSL).
    I will update the article with this and other aspects.

    ReplyDelete
  7. @sbahra
    It seems that sched_getcpu() is rather fast nowadays:
    http://kerneltrap.org/mailarchive/linux-kernel/2008/9/23/3387414
    Indeed, there is not reason why it can't be reimplemented with SIDT/LSL (which a cheaper that RDTSCP because there are not serializing instructions).

    > but sometimes you may simply want cache line distribution

    Well, actually I implied that it's a baseline, and what a lot of people are able to do. So it's vice versa :)
    Per-thread subscription/unsubscription/handoff is usually leads to quite complex implementations.

    > if CPU migration is common

    I think then nothing will help you :)

    ReplyDelete
  8. I've updated the article with sched_getcpu(), LSL, and added clear indication that "remember in user-space it's no more than an optimization - a thread can be rescheduled to another CPU straight after it has obtained the number".

    ReplyDelete
  9. My point was regarding overheads associated with VDSO (and DSO in general) on large multi-processor machines (where topology is not a simple 1-hop network) and not so much the specific implementation of getcpu. vgetcpu has been in Linux since 2.6.24 and sched_getcpu is implemented in terms of vgetcpu. However even on smaller multi-processor systems, getcpu is simply is not an option since sched_getcpu may still be implemented in terms of a full system call (RHEL5 is still a standard).

    Subscribe/unsubscribe can provide a guarantee of cache line ownership everywhere while processor ID cannot do this in user-space if hard processor affinity is not used. You can collide very easily. In kernel-space, as you point out the ownership can be guaranteed. For memory-intensive user-space applications if the actual topology is an important factor then it is possible to set hard affinity (on Linux, see sched_setaffinity, simple interface example for RR distribution in http://codepad.org/IKLC2YkE) but this isn't suitable for applications that include a fair bit of thread deactivation and/or thread migration.

    The additional complexity is totally negligible considering the attractive guarantees it can provide (in an operating system agnostic manner) if subscribe/unsubscribe frequency is low.

    ReplyDelete
  10. Note, also, you can also use TLS to implement a similar technique but abstract away the ID from the interface and all these may be coupled with the nice guarantee on some operating systems with first-touch allocation policies for NUMA (providing essentially the same benefits as using getcpu, but again, in a portable manner).

    ReplyDelete
  11. @sbahra What you are saying makes perfect sense.
    However, engineering is always about choice and trade-offs. So I just want to show alternatives and describe their trade-offs.
    By the way, here is a draft of a new article "Distributed Reader-Writer Mutex" (it's not yet officially published, but almost done):
    http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex
    It shows how to build a very simple rw mutex, which still shows very good scalability.
    I've done the same with per-thread approach, and I can say that the implementation is *significantly* more complicated. The problem is with synchronization between arriving/departing threads and writers, and between departing threads and mutex destruction. And by the way, it's not that easy to catch thread completion event under Windows.

    ReplyDelete
  12. vgetcpu is a vsyscall on recent linuces, and its implementation is fast and uses RDTSCP or a per-cpu variable:

    http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/x86/vdso/vgetcpu.c

    ReplyDelete
  13. Thanks for the info! Nice to hear!
    Now I'm crossing my fingers for SYS_membarrier.

    ReplyDelete
  14. Hello Dmitry,

    Thank you for the ideas in the paper! I gave those three approaches (RDTSCP, CPUID, SIDT) a try in my microbenchmark (parallel_for over a team of 16 threads with 1 mill of tasks of (1) get_cpu_num(); (2) read a buffer of a predefined size from a memory pool and compute a check sum for the buffer (xor ints). Turns out reading from 0x7FFE0000 does not work on Windows 7. Using NtQuerySystemTime works just as good, but the return value is in 100 ns intervals, so simple != comparison in amortization function needs to be modified too.
    Performance of the different approaches were as you'd expect (or as you predicted), which is SIDT is the fastest, then amortized RDTSCP, and then CPUID is the slowest one. Curious thing however is that caching latest proc_num (and apic or idt alongside of it) in TLS makes the code slower really. In my case search among 16 values (registered apics or idts for all the logical CPUs I got) worked 5% faster than the cached version of the same code.

    ReplyDelete
  15. Another finding - if you use GetSystemTimeAsFileTime instead of NtQuerySystemTime on Windows 7, it does the following under the covers:
    mov edx,dword ptr [7FFE0018h]
    mov r8d,dword ptr [7FFE0014h]
    mov eax,dword ptr [7FFE001Ch]
    so, I guess "accessing the address" functionality is still there, only the address has changed a little bit.

    ReplyDelete
  16. Hi Anton,

    Thanks for the info!

    >Turns out reading from 0x7FFE0000 does not work on...

    Of course it is a dirty hack, and production code should include a dynamic check for known OS versions with a fall-back to fair GetTickCount() call.

    I guess that you are working on a relatively modern version of Windows, so it would be interesting to see numbers for GetCurrentProcessorNumber().

    >Curious thing however is that caching latest proc_num (and apic or idt alongside of it) in TLS makes the code slower...

    Hummm... That is *not* as I would expect... Do you implement TLS with __declspec(thread)? Do you prevent several accesses to TLS?

    ReplyDelete
  17. Your initial step ought to be to discover the best rate of payday credit on offer. Whichever payday credit organization you may pick you should comprehend what will be the settled sum that you will owe after finishing your payday advance. Visit here Payday Loans Chicago

    ReplyDelete
  18. Get help from professional coursework writer in UK to ease your writing tasks if you are struggling hard. We are the only online coursework writing service that caters to your every need. We are the finest British online coursework help agency to help students for almost a decade now. Coursework Help & Writing Service from Trusted Coursework Helpers.We always keep ourselves a step further from rest

    ReplyDelete
  19. I loved the article, keep updating interesting articles. I will be a regular reader I am offering assignment help to students over the globe at a low price.
    Essay writing
    Essay Writer
    seo writing service
    Essay writing service
    Essay writing help
    Write My Essay
    hire seo writer
    hire writer
    Write my essay cheap
    hire article writer

    ReplyDelete
  20. I found this one pretty fascinating and it should go into my collection. Very good work! I am Impressed. We appreciate that please keep writing more content. We are the assignment helper, we provide services all over the globe. We are best in these:- services

    Ecommerce CMS
    Web Development Tools
    Social bookmarking sites
    Benefits Of Social Media Marketing

    ReplyDelete
  21. I found this one pretty fascinating and it should go into my collection. Very good work! I am Impressed. We appreciate that please keep writing more content. We are the assignment helper, we provide services all over the globe. We are best in these:- services
    Best Assignment Help
    What is the Importance of Statistics?
    What is the best way to learn java?
    What is the best way to learn Python?
    What is computer science?
    How to write an essay plan?

    ReplyDelete
  22. If you love playing card games, Playrummy.com provide you rummy online here! Join us now and play rummy games to win real cash.Play Online Rummy Card Games on India’s Largest Rummy Website. 100% Safe & Sure - Download the rummy app now!

    ReplyDelete
  23. Thanks for sharing this information. It's really helpful for me. Want to hire an assignment writer in the USA then visit studentsassignmenthelp.com that offer the best assignment help USA services to the students.

    ReplyDelete
  24. I am not much into reading, but somehow I got to read nice information on your site. carpet cleaning

    ReplyDelete
  25. This article totally answered my question! Nice shower remodel

    ReplyDelete
  26. เกมออนไลน์ แจกเครดิตฟรี slot online คลิกเลย
    https://www.slotxd.com/jokergaming123

    ReplyDelete
  27. It's good for me to read this article that is about Per-processor Data. It will help me to write my assignment that is about per processor data. Custom Dissertation writing services

    ReplyDelete

  28. เล่นเกม สล็อตออนไลน์ joker123ฟรีเครดิต slot online ฟรีเครดิต
    https://www.slotxd.com/jokergaming123

    ReplyDelete
  29. PlayRummy introduces Online Rummy to the Indian market. You can learn how to play rummy card games very easily by watching videos of rummy, reading online rummy rules. Download rummy game app for FREE on your Android & IOS device for fast, secure & seamless app experience of rummy games on your mobile and enjoy favorite points anytime, anywhere.

    ReplyDelete
  30. ดูได้เลยหนังออไนลน์ หนังใหม่ Delirium ลวงหลอนซ่อนผวา(2018) ดูฟรีกับเว็บดูหนังออนไลน์ที่นี่

    https://www.doonung1234.com/

    ReplyDelete
  31. เล่นเลยที่นี่ เกมยิงปลา สล็อตออนไลน์ เล่นสล็อต slotxo ฟรีเครดิต ได้เงินจริง เล่นเลยที่นี่ สนใจคลิกเลย
    https://www.slotxd.com/slotxo

    ReplyDelete
  32. หลักการเล่นสล็อตออนไลน์ ต่างจากการเล่นเกมอื่นอย่างไร

    สนใจเข้าดูรายละเอียดได้ที่ >> หลักการเล่นสล็อต

    ReplyDelete

  33. เล่นเกมแล้วได้เงิน ได้ง่ายๆ ฟรีเครดิต สล็อตออนไลน์ เล่นสล็อต pgslot เครดิตฟรี ไม่มีโกง คลิกเลยที่นี่
    https://www.slot2xl.com/

    ReplyDelete
  34. ทริคสล็อต - กำหนดเงินการเล่นสล็อตออนไลน์ให้เหมาะสมที่ SLOTKKK

    อีกหนึ่งวิธีที่จะทำให้คุณรวยวจากการเล่นสล็อตมากที่สุดนั่นก็คือกำหนดเงินลงทุนและจำนวนเงินในการวางเดิมพันให้เหมาะสมที่สุดในเกมนั้นๆซึ่งถ้าคุณลงทุนน้อยแน่นอนว่าความเสี่ยงมันน้อยกว่าแต่เงินที่ได้กำไรมาก็คงดูน้อยเช่นกันแต่ไม่ใช่กับเว็บไซต์ SLOTKKK ที่เป็นเว็บที่จ่ายโบนัสให้คุณได้คุ้มค่ามากที่สุดแม้ลงทุนน้อย แต่ถ้าคุณลงทุนเยอะแน่นอนความเสี่ยงก็คงสูงไม่คุ้มค่าถ้าเสียแต่ถ้าคุณถูกแจ็คพอตขึ้นมาก็คงรวยข้ามคืนกันเลยทีเดียวนี่คือเหตุผลที่คุณต้องกำหนดเงินวางเดิมพันให้ดีและควรเล่นกับเราเท่านั้น

    อ่านทริคเกมสล็อตอื่นๆ


    ReplyDelete
  35. I love to make my startup with a big thank to the author for this wonderfully helpful blog.

    My site : saking789

    ReplyDelete
  36. slotxo Miami slot ในเกมนี้ที่เน้นความมีสีสันของบรรยากาศชายหาดในไมอามี่ โดยมาในรูปแบบ 25 ไลน์ ซึ่งแต่ละเพย์ไลน์ก็จะมีรูปแบบการจ่ายเงินที่ต่างกันออกไป ความพิเศษของเกมนี้ที่ออกแบบสวยงามและเสียงดนตรีประกอบรู้สึกผ่อนคลายอย่างมาก ทั้งนี้ในการเล่นเกมที่จะมีการลุ้นฟรีเกมที่มีโอกาสค่อนข้างสูง เพราะคุณจะเล่นเกมที่ลุ้นเงินรางวัลได้สูงขึ้น ซึ่งในเกมนี้มีให้คุณได้ลุ้นสูงสุด 50 ครั้ง ถือว่าคุ้มค่ามาก ทั้งนี้การเลือกใช้สัญลักษณ์ต่าง ๆ ในเกมที่เกี่ยวกับบรรยากาศในชายหาดทะเลทั้งกระดานโต้คลื่น ต้นมะพร้าว

    ReplyDelete
  37. เกมสล็อตออนไลน์ยอดนิยม เกมสล็อตออนไลน์ SLOTXO
    มีเกมให้เลือกเล่นหลากหลายแบบ มากกว่าร้อยเกม
    พร้อมกับโปรโมชั่นสล็อตเครดิตฟรี ให้ทั้งยูสใหม่และเก่า !!
    สามารถเลือกรับโปรโมชั่นและเข้าเล่นได้ที่ www.slotkkk.com

    ReplyDelete
  38. เล่น slotxo กราฟฟิกสวยงาม ความสะดวกตลอดการใช้งานที่จะทำให้การเล่นพนันของคุณมีตัวเลือกที่น่าสนใจมากกว่าการเดิมพันทั่วไป

    ReplyDelete
  39. เล่นเกมแล้วได้เงิน สล็อตออนไลน์ slotxo เล่นง่ายๆ เล่นสล็อต ฟรีเครดิต slotxo
    ได้เงินจริง เล่นเลยที่นี่ คลิกเลย เกมยิงปลา อย่าพลาดมาลุ้นกันเลย slotxo
    https://www.slot4u.com

    ReplyDelete
  40. Thank you for the ideas in the paper! I gave those three approaches.
    ufabet

    ReplyDelete
  41. There is an industry which combines men from countries of high income with women from the poorest parts of Asia, Latin America and the former Soviet Union. Some men are willing to spend ten thousand dollars to wed a woman they know scarcely, and those women continue to concentrate on fast emigration and migration at any expense. I advice you to find the site of the best ukrainian brides. Thanks for the attention.

    ReplyDelete
  42. Thanks for share your information. Your blog has been really great. I learned a lot from here. I hope you will give us such more awesome blogs.
    3Movierulz

    ReplyDelete
  43. Writing services online abound. You can choose from freelance writing services for fiction, creative non-fiction or non-fiction. You can purchase the writing services of the freelance writers directly or visit the site of a writing services provider that contracts out your writing needs. Best Dissertation Writing Services UK

    ReplyDelete
  44. Thanks! Its a beautiful updates and nice time visiting your blog. I sincerely appreciate your effort in putting on this article. Thanks once again for sharing. visit here fubk cut off mark for biology

    ReplyDelete
  45. Assignmenthelped provides engineering homework assistance to assist you in swiftly improving your marks. Get the best Engineering-Assignment-Help from the professionals you've been looking for. We can assist you regardless of your engineering major: electrical, dental, chemistry, physics, chemistry, psychology etc.

    ReplyDelete
  46. I'm following you up. Its such a sensitive article and well manners write-up. I really you for sharing. I am wishing you a great success writings. Check here download pdf format data processing past questions and answers

    ReplyDelete
  47. This comment has been removed by the author.

    ReplyDelete
  48. Good article for everyone , พร้อมเปย์ทุกยูสเซอร์กับเกมฮิตพีจีสล็อตออนไลน์
    พีจีสล็อต

    ReplyDelete
  49. This comment has been removed by the author.

    ReplyDelete

  50. Nice Article and I appreciate your thought and views. This is really great work. Thank you for sharing such a useful information here on the blog. If you need any support for QuickBooks desktop error 6189 816 then get immediate help from our QuickBooks Experts.

    ReplyDelete
  51. Nice post. I was continuously checking this blog and I am inspired ! Extremely helpful info specially the ultimate part
    I take care of such information a lot. For a very long time, I was looking for this specific information.
    Please, Visit here : QuickBooks Payroll Error PS036
    Thank you and best of luck. |

    ReplyDelete
  52. Thanks for sharing informative information, get quick response, connect with Quickbooks Enterprises support at +1-(855)-955-1942, our team of experts will help you get rid of QuickBooks Multi-User Mode Issues and others QuickBooks software issues. Call now!

    ReplyDelete
  53. Per-processor data plays a crucial role in optimizing parallel computing, enhancing efficiency and performance. As an essay writer, exploring the significance of managing data at the processor level provides valuable insights into the intricacies of concurrent processing and resource utilization. This aspect not only contributes to technical depth but also enriches the narrative on the evolving landscape of modern computing.

    ReplyDelete
  54. เกมสล็อต pg สมัคร รับโบนัส โอกาสที่ดีที่จะเพลินกับประสบการณ์การเล่นสล็อตออนไลน์ PG SLOT ที่น่าดึงดูดรวมทั้งรับโบนัสที่มีมูลค่ามาก! นี้เป็นขั้นตอนกล้วยๆที่คุณจำเป็นต้องทำเพื่อไปสู่โลก

    ReplyDelete