Why is fasthttp 10 times faster than the net/http?
An In-depth Analysis of High Performance in fasthttp

Note: Non-members can read the full story in this link.
fasthttp is an HTTP package developed in Go, emphasizing high performance. It optimizes the “hot path” code in the HTTP request-response cycle, achieving zero memory allocation and outperforming the standard library
net/httpby 10 times.
This description from the official GitHub repository already reflects the confidence of the author in the project, solely judging by its name.
This article won’t delve into the application methods of fasthttp but will instead focus on analyzing the implementation principles behind its high performance.
Benchmarking
Let’s validate whether fasthttp truly lives up to its claims of outperforming the standard library’s net/http. The following test was run on my laptop, and the results may differ from yours.
mac m1 pro go version : 1.21.1 fasthttp version : bbc7bd04e2cb3747dad23b1c0a9e6ff22df5d449
GOMAXPROCS=4 net/http
➜ fasthttp git:(master) GOMAXPROCS=1 go test -bench=NetHTTPServerGet -benchmem -benchtime=10s
goos: darwin
goarch: arm64
pkg: github.com/valyala/fasthttp
BenchmarkNetHTTPServerGet1ReqPerConn 1663719 7108 ns/op 3146 B/op 36 allocs/op
BenchmarkNetHTTPServerGet2ReqPerConn 2102826 6249 ns/op 2745 B/op 28 allocs/op
BenchmarkNetHTTPServerGet10ReqPerConn 2339218 4996 ns/op 2443 B/op 23 allocs/op
BenchmarkNetHTTPServerGet10KReqPerConn 2847196 4155 ns/op 2353 B/op 21 allocs/op
BenchmarkNetHTTPServerGet1ReqPerConn10KClients 1944338 6528 ns/op 3151 B/op 36 allocs/op
BenchmarkNetHTTPServerGet2ReqPerConn10KClients 2161939 5046 ns/op 2812 B/op 28 allocs/op
BenchmarkNetHTTPServerGet10ReqPerConn10KClients 2413830 4391 ns/op 2479 B/op 23 allocs/op
BenchmarkNetHTTPServerGet100ReqPerConn10KClients 3277617 4102 ns/op 2385 B/op 21 allocs/op
PASS
ok github.com/valyala/fasthttp 309.870s
fasthttp server:
➜ fasthttp git:(master) GOMAXPROCS=4 go test -bench=kServerGet -benchmem -benchtime=10s
goos: darwin
goarch: arm64
pkg: github.com/valyala/fasthttp
BenchmarkServerGet1ReqPerConn-4 10046025 1106 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet2ReqPerConn-4 12885564 902.6 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet10ReqPerConn-4 19840665 598.4 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet10KReqPerConn-4 21422018 529.6 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet1ReqPerConn10KClients-4 13986956 852.9 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet2ReqPerConn10KClients-4 16376418 739.5 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet10ReqPerConn10KClients-4 22365590 519.5 ns/op 0 B/op 0 allocs/op
BenchmarkServerGet100ReqPerConn10KClients-4 20419492 522.2 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/valyala/fasthttp 109.240s
The benchmark results show that fasthttp executes significantly faster than the standard library’s net/http, with optimized memory allocation, completely defeating net/http.
Core Optimization Points
Object Reuse
workerPool
The workerPool object represents a worker pool for connection handling. This controls the processing of connections established, unlike the standard library's net/http, which starts a goroutine for each request connection. The ready field stores idle workerChan objects, while workerChanPool manages the workerChan object pool.
// Such a scheme keeps CPU caches hot (in theory).
type workerPool struct {
....
ready []*workerChan
...
workerChanPool sync.Pool
}
type workerChan struct {
lastUseTime time.Time
ch chan net.Conn
}Request/Response Objects
// client.go
var (
requestPool sync.Pool
responsePool sync.Pool
)
// Acquire Request object from pool
func AcquireRequest() *Request {
...
}
// Release Request object back to the pool
func ReleaseRequest(req *Request) {
...
}
// Acquire Response object from pool
func AcquireResponse() *Response {
...
}
// Release Response object back to the pool
func ReleaseResponse(resp *Response) {
...
}Cookie Objects
// cookie.go
var cookiePool = &sync.Pool{
New: func() interface{} {
return &Cookie{}
},
}
// Acquire Cookie object from pool
func AcquireCookie() *Cookie {
...
}
// Release Cookie object back to pool
func ReleaseCookie(c *Cookie) {
...
}Other Object Reuse
Almost all objects in fasthttp are reused, leveraging sync.Pool effectively. So Crazy!
[]byte Reuse
In fasthttp, objects reused are reset with the corresponding Reset method before returning to the pool. If the object contains a []byte type field, it's directly reused rather than initializing a new []byte. For example, the Reset method of the URI object:
// Reset the URI object
// From the method's implementation, it's evident that all fields of type []byte are reused
func (u *URI) Reset() {
u.pathOriginal = u.pathOriginal[:0]
u.scheme = u.scheme[:0]
u.path = u.path[:0]
u.queryString = u.queryString[:0]
u.hash = u.hash[:0]
u.username = u.username[:0]
u.password = u.password[:0]
u.host = u.host[:0]
...
}Moreover, when it involves modifying individual fields, if the field is of type []byte, it's directly reused. For instance, these methods of the Cookie object:
func (c *Cookie) SetValue(value string) {
c.value = append(c.value[:0], value...)
}
func (c *Cookie) SetValueBytes(value []byte) {
c.value = append(c.value[:0], value...)
}
func (c *Cookie) SetKey(key string) {
c.key = append(c.key[:0], key...)
}
func (c *Cookie) SetKeyBytes(key []byte) {
c.key = append(c.key[:0], key...)
}All these methods explicitly reuse the []byte type parameters.
[]byte and string Conversion
fasthttp provides specific methods for converting between []byte and string, avoiding memory allocation and copying, hence enhancing performance.
High-Performance bytebufferpool
Instead of using the standard library’s bytes.Buffer, fasthttp references another package, valyala/bytebufferpool, which optimizes by avoiding memory copy and reuses the underlying byte slice. Interested readers can explore the benchmark test results provided in the official documentation.
Avoiding Reflection
In fasthttp, all object deep copies are implemented without using reflection, avoiding its impact entirely. For example, the copy implementation of the Cookie object:
// cookie.go
// Implementation of Cookie object copy
func (c *Cookie) CopyTo(src *Cookie) {
c.Reset()
c.key = append(c.key, src.key...)
c.value = append(c.value, src.value...)
c.expire = src.expire
c.maxAge = src.maxAge
c.domain = append(c.domain, src.domain...)
c.path = append(c.path, src.path...)
c.httpOnly = src.httpOnly
c.secure = src.secure
c.sameSite = src.sameSite
}As seen from the above code, the copy operation involves manually copying each field, a primitive yet effective solution. Moreover, the copy implementations of the request object Request and response object Response exhibit similar characteristics to Cookie.
Issues with fasthttp
While high performance is commendable, it comes with certain trade-offs. The main issues with fasthttp include:
- Reduced Code Readability: Understanding the code without knowledge of fasthttp’s design philosophy can be challenging.
- Increased Development Complexity: Development effort is higher compared to using the standard library, due to the manual object reuse resembling memory management in languages like C/C++.
- Added Developer Cognitive Load: Developers accustomed to the standard library’s development model may easily introduce bugs.
- If there are asynchronous processing scenarios, the core framework’s object reuse mechanism might lead to various issues such as premature object returns, hanging object pointers, and more severe cases of referencing reset objects (These logic-related issues are difficult to debug).
Performance Optimization Techniques for Multi-core Systems
- Utilize
reuseportlistening (SO_REUSEPORTallows linear scaling of server performance on multi-core servers, refer to Socket Sharding in NGINX Release 1.9.1 for details) - Use
GOMAXPROCS=1to run a separate server instance for each CPU core (Process and CPU binding) - Ensure even distribution of interrupts from multi-queue network cards among CPU cores, refer to How to achieve low latency with 10Gbps Ethernet for details
Best Practices for fasthttp
- Reuse objects and
[]bytebuffers as much as possible instead of reallocation. - Utilize features of
[]byte. - Use
sync.Poolobject pool. - Conduct performance analysis of the program in production,
go tool pprof --alloc_objects app mem.pprofis usually easier to identify performance bottlenecks compared togo tool pprof app cpu.pprof. - Write tests and benchmarks for the
hot pathcode. - Avoid direct type conversion between
[]byteandstringas it may lead to memory allocation and copying. Refer to thes2bandb2smethods within thefasthttppackage. - Regularly conduct race detection on the code, typically integrated into CI.
- Use
quicktemplateinstead ofhtml/templatetemplates.
Conclusion
fasthttp is designed for high-performance edge scenarios. If your business demands high QPS and consistently low latency, then adopting fasthttp is rational. However, if the complexity of development and cognitive load outweighs the performance gains, then fasthttp might not be suitable. In most cases, the standard library net/http is a better choice due to its simplicity, ease of use, and high compatibility (after Go 1.21.0, you might not even need a framework). If your business has low traffic, the so-called performance difference between the two can be negligible.
References
- Why fasthttp is fast and the cost of it
- https://www.jianshu.com/p/a0e766f8dcb0
- https://github.com/dgryski/go-perfbook
If you like my articles, consider to :
- Could you drop me a follow -> huizhou92?
- Leave a clap👏 (50 👏👏👏👏would be the best) and a comment if you want to interact with me.
- Receive an email every time I post on Medium -> Click here!
- If you find my article helpful to you, please buy me a cup of coffee





