golang chan Transfer data performance overhead detail _Golang_ Script Home

golang chan delivers detailed performance overhead for data

Updated: Jan 17, 2024 at 11:26:02 by apocelipes
This article mainly introduces in detail the overhead generated by chan in Golang when receiving and sending data because of "replication", the example code in the article explains in detail, interested partners can understand

This article does not discuss chan's runtime overhead for locking and unlocking and maintaining the behavior defined by the memory model.

This article explores the overhead that chan incurs when receiving and sending data due to "copying".

Go over some basics before you do a performance test.

How does data flow in chan

First let's look at chan with buffer, which we'll break down into two categories. What about chan without buffer? More on that later.

Case 1: The data sent is being read by a reader

It may be necessary to explain the title of this section, which means: the sender is sending data while the other receiver is waiting for data, and the picture may be faster

The chan in the diagram is empty, the sender coroutine is sending data to the channel, and there is a receiver coroutine waiting to receive data from the chan.

If you're familiar with chan's memory model, this is a special case of a buffered chan, who behaves the same way as an unbuffered chan, and in fact handles both in a similar way.

So the situation of sending data to unbuffered chan can be grouped into case 1.

In this case, although chan's buffer is still drawn in the diagram, go actually has an optimization: chan discovers this situation and uses the runtime api to write data directly into the receiver's memory (usually stack memory), skipping chan's own buffer and copying the data only once.

In this case, it is as if the data flows directly from the sender to the receiver.

Case 2: No reader is reading the data being sent

This case is much simpler, and basically all cases except case 1 fall into this category:

This is the most common case where the reader and the writer are operating on different memory. The writer copies the data into the buffer and returns it, and if the buffer is full blocks until an empty space is available; The reader copies the data from the buffer into its own memory and marks the memory at the corresponding location as writable. If the buffer is empty, it blocks until there is data to read.

One might ask, what if the buffer gets full and the sending party gets blocked? In fact, the sender needs to continue to send data after recovering from the block, which is not able to escape the case 1 and 2, so whether it will be blocked here will not affect the way the data is sent, it is not important.

In case 2, the data is first copied into chan's own buffer, and then the receiver copies the data from chan's buffer into its own memory when it reads. In general, the data is copied twice.

In case 2, chan is like this pool, where data flows first from the sender into the pool, and then from the pool to the receiver after some time.

Special case of special case

We are talking about the empty structure: struct{}. Passing this thing directly on chan has no extra memory overhead because the empty structure itself does not take up memory. go makes special use of this special case, as it does with maps of empty structures.

Of course, while no additional memory is consumed, the memory model remains the same. For convenience you can think of this special case as case 2, only using less memory.

Why copy

As we saw in case 1, the runtime actually has the ability to manipulate the memory of individual Goroutines directly, so why not choose to "move" the data to the destination instead of copying it?

Let's first see what happens if it's "move". Referring to the conventions of other languages, the moved object will no longer be accessible, and its data will be in an indeterminate state that can be safely deleted. Simply put, once the data in a variable is moved somewhere else, the variable should no longer be accessed. In some languages the variable is forcibly inaccessible, while in others it is accessible but causes "undefined behavior" that leaves the program in a dangerous state. go is more embarrassing, there is no means to prevent variables from continuing to be accessed after moving, and there is no means similar to "undefined behavior" to cover these unexpected situations, random panic not only consumes performance but also stability.

So moving is not realistic in go.

Let's look at sharing data between Goroutines. For a runtime that can manipulate goroutine memory, this is more work than moving, but it can be done. Sharing may be a bit of a drain on cpu resources, but it does save a lot of memory.

The feasibility of sharing is also higher than that of mobile, because it does not have a big impact on the existing grammar and language design, and it can even be said that it is completely reasonable to operate under the framework of this grammar. But there's only one problem: insecurity. Most of the use cases of chan are in concurrent programming, where shared data can cause serious concurrency security problems. The most common is when shared data is accidentally modified. For a go language that sells itself on convenient and secure concurrent operations, it is unacceptable that built-in concurrency primitives constantly create concurrency security issues.

Finally, there is only one solution left, using replication to pass the data. Copying can be used within a syntactic framework and is less problematic than sharing (although chan's shallow copying problem is sometimes a breeding ground for concurrency problems). This is also advocated by the CSP (Communicating Sequential Process) model that go follows.

Overhead caused by replication

Since copying is justified and inevitable, we have no choice but to accept it. So the cost of replication becomes critical.

The overhead in memory usage is simple to calculate. In both case 1 and case 2, the data at most has its own ontology and one copy at a time - case 1 is that the sender holds the ontology and the receiver holds a copy; Case 2 is that the sender holds the ontology, and either chan's buffer or the receiver (copied from the buffer after the buffer is empty) holds the copy. Of course, the sender could have destroyed the ontology so that only one copy of the data remained in memory. So the memory consumption can double in the worst case.

The cpu consumption and the impact on speed is not so good to estimate, and this can only be done by performance testing.

The design of the test is very simple. buffered chan is selected to test the overhead of copying data directly by chan and coroutine.

The small standard is 2 Int64s, 16 bytes in size, which is more than enough to store in a cache line:

type SmallData struct {
    a, b int64
}

Medium-size data is a normal business object that is 144 bytes in size and contains more than a dozen fields:

type Data struct {
	a, b, c, d     int64
	flag1, flag2   bool
	s1, s2, s3, s4 string
	e, f, g, h     uint64
	r1, r2         rune
}

Finally, the big object contains ten medium objects, size 1440 bytes, I know maybe no one will write this, maybe the actual project there are more weighty author, of course, I can only choose a reasonable value for testing:

type BigData struct {
	d1, d2, d3, d4, d5, d6, d7, d8, d9, d10 Data
}

In view of the particularity of chan blocking coroutines, we can only send data after it is removed from chan, otherwise we have to repeatedly create and release chan, so the noise from the generation is too large, so the data actually has to be copied twice, here we only focus on the cost of memory replication, other factors control variables will not have an impact. The complete test code looks like this:

import "testing" type SmallData struct { a, b int64 } func BenchmarkSendSmallData(b *testing.B) { c := make(chan SmallData, 1) sd := SmallData{ a: -1, b: -2, } for i := 0; i < b.N; i++ { c <- sd <-c } } func BenchmarkSendSmallPointer(b *testing.B) { c := make(chan *SmallData, 1) sd := &SmallData{ a: -1, b: -2, } for i := 0; i < b.N; i++ { c <- sd <-c } } type Data struct { a, b, c, d int64 flag1, flag2 bool s1, s2, s3, s4 string e, f, g, h uint64 r1, r2 rune } func BenchmarkSendData(b *testing.B) { c := make(chan Data, 1) d := Data{ a: -1, b: -2, c: -3, d: -4, flag1: True, flag2: false, s1: "mail armour armour", s2: "b b b," s3: "c c c," s4: "Ding Dingding", e: 4, f: 3, g: 2, h: 1, r1: 'measurement', r2: 'try'} for I: = 0; i < b.N; i++ { c <- d <-c } } func BenchmarkSendPointer(b *testing.B) { c := make(chan *Data, 1) d := &Data{ a: -1, b: -2, c: - 3 d: - 4, flag1, flag2: true, false, s1: "mail armour armour", s2: "b b b," s3: "c c c," s4: "Ding Dingding", e: 4, f: 3, g: 2, h: 1, r1: 'measurement', r2: 'try ',} for i := 0; i < b.N; i++ { c <- d <-c } } type BigData struct { d1, d2, d3, d4, d5, d6, d7, d8, d9, d10 Data } func BenchmarkSendBigData(b *testing.B) { c := make(chan BigData, 1) d := Data{ a: -1, b: -2, c: -3, d: - 4, flag1: true, flag2: false, s1: "mail armour armour", s2: "b b b," s3: "c c c," s4: "Ding Dingding", e: 4, f: 3, g: 2, h: 1, r1: 'measurement', r2: 'Try ',} bd := BigData{d1: d, d2: d, d3: d, d4: d, d5: d, d6: d, d7: d, d8: d, d9: d, d10: d,} for i := 0; i < b.N; i++ { c <- bd <-c } } func BenchmarkSendBigDataPointer(b *testing.B) { c := make(chan *BigData, 1) d := Data{ a: -1, b: 2, c: - 3 d: - 4, flag1: true, flag2: false, s1: "mail armour armour", s2: "b b b," s3: "c c c," s4: "Ding Dingding," e: 4, f: 3, g: 2, h: 1, r1: R2: 'test' 'try'} bd: = & BigData {d1: d, d2: d, d3: d, d4: d, d5: d, d6: d, d7: d, d8: d, d9: d, d10: d,} for I: = 0; i < b.N; i++ { c <- bd <-c } }

We chose to pass Pointers as a comparison, which is another common practice in everyday development.

Test results on Windows11:

Test results on Linux:

For small data, the overhead of replication is not as significant.

Not so optimistic for medium and large data, with performance dropping by 20% and 50%, respectively.

The results are clear, but it's easy to wonder why large data is 10 times larger than medium data, but only 2.5 times slower in replication speed.

The reason is that golang will enable SIMD instructions for big data to increase the data throughput per unit time, so it is true that the data is large and the replication will be slower, but not 10 times the data volume will be 10 times slower.

So the overhead of copying data is hard to ignore.

How to avoid overhead

Since chan copying data has a significant performance overhead, we need to come up with some countermeasures to solve the problem. Here are a few ideas.

Only pass small objects

How small is small is a big debate. I can only talk about my own experience: 1 cache row store is small.

How big is a cache row? The size of L1D on modern x64 cpus is usually 32 bytes, which is the size of 4 normal data Pointers /int64.

From our tests, the replication cost of small data is almost negligible, so passing such small data only in chan has no performance issues.

The only thing to note is string, the current implementation of a string itself size is 16 bytes, but this size is not calculated the string itself data, that is, a string of 256 length and a string of 1 length, its own structure is 16 bytes, but when copying a copy of 256 characters a copy of only one character. So strings often appear as instances that look small but are large in size.

Pass pointer only

32 bytes is a bit small, what if I need to pass 2-3 cached lines of data? This is also a common requirement in real development.

The answer was actually given in the performance test's control group: Pass the pointer to chan.

From the results of the performance test, in the case of only passing Pointers, no matter how big the data is, the time is the same, because we only copy a copy of the pointer - 8 bytes of data.

This also saves memory: only the pointer is copied, and the data referenced by the pointer is not copied.

It looks like we've found a silver bullet for transmitting data to chan -- just a needle, and there's no silver bullet in the world

  • Passing pointer is equivalent to "sharing" data mentioned in the previous section, which is easy to bring about concurrent security problems;
  • For the sender, passing a pointer to chan will most likely affect escape analysis, not only allocating objects on the heap, but also making the optimization in case 1 meaningless (calling runtime just to write a pointer to the receiver's stack).
  • For the receiver, the data referenced by the operation pointer needs to be dereferenced one or more times, and this dereference is difficult to optimize away, so it is likely to have a visible performance impact on some hot code (there is usually no overhead of copying data, but everything is subject to performance testing).
  • Too many Pointers can burden the gc

Remember to fully consider the disadvantages listed above when using pointer passing.

Use lock-free data results instead of chan

chan is used as a concurrent secure queue most of the time. If chan has only one fixed sender and one fixed receiver, then try this lock-free data structure: SPSCQueue.

The advantage of lock-free data structures over chan is that there is no mutex and no data replication overhead.

The disadvantage is that it only supports a single receiver and a single sender, and the implementation is relatively complex, so it requires high code quality to ensure the safety and correct operation results. When I can't find a high-quality library, I suggest that it is best not to try to write it yourself, and it is also best not to use it. (The bad news is that there aren't many reliable lock-free data structure libraries in go.)

A situation where the cost is acceptable

There is a class of systems that pursue correctness and security and have a high tolerance for performance loss and resource consumption. For such systems, the overhead of copying data is generally acceptable.

At this time, it is obvious that copy passing is simpler and safer than passing Pointers and other operations.

Another common scenario is that chan is not a performance bottleneck, and the impact of replication on performance is minimal. I also tend to choose to copy and pass the data.

Sum up

In general, chan is very convenient, and in go, you have to use it.

I am not writing this post to scare you, but to remind you of some of the performance pitfalls that can occur when using chan and how to work around them.

As for how you use chan, in addition to the actual needs, performance testing is another important reference standard.

If you ask me, I tend to prefer copying data over pointer passing, unless the data is huge/performance bottleneck on replication/receiver and sender need to do some collaborative work on the same object. Again, performance tests and profiles are my reference criteria for using these methods.

This article about the performance cost of golang chan transfer data is introduced to this article, more related to go chan transfer data content please search script home previous articles or continue to browse the following related articles hope that you will support script home in the future!

Related article

  • Golang实现Mongo数据库增删改查操作

    Golang implements Mongo database additions, deletions, changes and checks operations

    This article mainly introduces Golang to achieve Mongo database addition, deletion and change operation, we use the official Go driver of MongoDB, to achieve insertion, query, update and delete operations, interested you can understand
    2024-01-01
  • 利用Go语言实现Raft日志同步

    Raft log synchronization using Go language

    This article mainly introduces in detail how to use Go language to achieve Raft log synchronization. The example code in this article explains in detail, which has certain help for us to deeply understand Go language. If you need it, you can refer to it
    2023-05-05
  • golang类型推断与隐式类型转换

    golang type inference and implicit type conversion

    This article focuses on golang type inference and implicit type conversion, golang type inference can omit types, like writing dynamic language code, making programming easier while retaining the security of static typing
    2022-06-06
  • Golang等多种语言转数组成字符串举例详解

    Golang and other languages such as the number of revolutions composed of string examples detailed

    Today, writing code encountered array conversion into string operation, the following article mainly introduces the relevant information about Golang and other languages such as the number of strings, the article through the example code introduction is very detailed, the need of friends can refer to the next
    2023-05-05
  • go语言优雅地处理error工具及技巧详解

    The go language elegantly handles error tools and techniques in detail

    This article mainly introduces the go language elegant handling error tools and skills detailed, need friends can use for reference, I hope to be helpful, I wish you a lot of progress, early promotion and pay rise
    2022-11-11
  • Golang 串口通信的实现示例

    Golang serial communication implementation example

    Serial communication is a common hardware communication mode, used to transfer data between computers and external devices, this article mainly introduces the implementation of Golang serial communication example, has a certain reference value, interested can understand
    2024-03-03
  • Go实现文件分片上传

    Go implements file fragment upload

    This article mainly introduces the Go implementation of file fragmentation upload in detail, the example code in the article is very detailed, has a certain reference value, interested partners can refer to it
    2022-07-07
  • golang时间及时间戳的获取转换

    golang time and timestamp acquisition conversion

    This article mainly introduces the golang time and time stamp acquisition conversion, the article through the example code introduction is very detailed, for everyone's study or work has a certain reference learning value, the need of friends below with the small series to study together
    2022-06-06
  • 详解Go中Map类型和Slice类型的传递

    Detail the pass of Map and Slice types in Go

    This article mainly introduces the transmission of Map type and Slice type in Go in detail for everyone, which has certain reference value, interested partners can refer to it
    2017-11-11
  • Goland 2020或2019软件版本去掉a...或fmt...提示的方法

    Goland 2020 or 2019 software version without a... Or fmt... Method of prompting

    This article mainly introduces Goland 2020 or 2019 software version without a... Or fmt... The method of prompt, this article through the form of graphic and graphic to introduce you very detailed, everyone's study or work has a certain reference value, the need of friends can refer to the next
    2020-10-10

Latest comments