Skip to content

Goroutine Troubleshooting: Official References, Observability Entry Points, and Common Error Checklist

Overview

A practical goroutine troubleshooting guide based on official Go documentation, covering goroutine lifecycle boundaries, runtime.NumGoroutine, runtime.Stack, net/http/pprof, goroutine profiles, block profiles, mutex profiles, runtime/trace, go vet, race detector, goroutine leaks, deadlocks, channel blocking, closed channel panics, WaitGroup misuse, Mutex and RWMutex contention, context cancellation, main lifecycle, panics, unbounded goroutine creation, external I/O blocking, select waits, and a standard investigation workflow.

1. Concept Boundaries

The Go language specification describes a go statement as starting an independent concurrent control thread, namely a goroutine. That goroutine runs in the same address space as the current goroutine. After calling go f(), the function call begins executing in a new goroutine, and the current program flow does not wait for that function to complete. When the function returns, that goroutine terminates. 1

Therefore, when troubleshooting goroutine problems in Go programs, the core objects are not operating system threads themselves, but goroutine count, lifecycle, blocking locations, scheduling relationships, synchronization relationships, shared-memory access relationships, and cancellation propagation relationships.

The Go language specification also states that when the main function returns, the program exits and does not wait for other non-main goroutines to complete. 1 Therefore, "a goroutine did not finish executing" and "main exited early" are lifecycle problems.


2. Troubleshooting Entry Points from Official Documentation

2.1 Observing Goroutine Count

runtime.NumGoroutine() returns the current number of existing goroutines. The official Go diagnostics documentation explains that this metric can be used to monitor goroutine count and detect goroutine leaks. 2

Common usage:

go
package main

import (
	"log"
	"runtime"
	"time"
)

func main() {
	ticker := time.NewTicker(10 * time.Second)
	defer ticker.Stop()

	for range ticker.C {
		// Print the current number of existing goroutines.
		log.Printf("goroutines=%d", runtime.NumGoroutine())
	}
}

Troubleshooting uses:

SymptomObservation Method
Goroutine count keeps increasingPeriodically record runtime.NumGoroutine()
Count does not fall after load testing endsCompare counts before, during, and after load testing
Count increases after a specific API callRecord the count at API entry, exit, and async task startup points
Periodic growth in productionExport goroutine count as a runtime metric

2.2 Capturing Goroutine Stacks

runtime.Stack(buf, true) can write the stack of the current goroutine and other goroutines into a buffer. 2

Example:

go
package debugutil

import "runtime"

func DumpAllGoroutines() []byte {
	buf := make([]byte, 1<<20)

	for {
		// Write stack traces for all goroutines.
		n := runtime.Stack(buf, true)
		if n < len(buf) {
			return buf[:n]
		}

		// Grow the buffer when it is not large enough.
		buf = make([]byte, len(buf)*2)
	}
}

Troubleshooting uses:

Stack InformationCorresponding Problem
Many goroutines stopped at the same channel receiveReceiver is waiting for data, sender did not send, channel was not closed, or nil channel
Many goroutines stopped at the same channel sendSender is blocked, insufficient receivers, buffer is full, or nil channel
Many goroutines stopped at sync.(*WaitGroup).WaitDone did not execute, Add/Wait order is wrong, or counter did not return to zero
Many goroutines stopped at sync.(*Mutex).LockLock contention, lock not released, or lock-order waiting
Many goroutines stopped in I/O callsNetwork, file, syscall, or external dependency blocking
Many goroutines stopped in selectWaiting for multiple events, but no case is ready

2.3 net/http/pprof

The official net/http/pprof documentation explains that this package exposes runtime profiling data through HTTP, and the data format can be read by the pprof tool. Importing this package for side effects registers HTTP handlers under /debug/pprof/. 3

Minimal integration:

go
package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
	"runtime"
)

func main() {
	// Enable block profiling. A rate of 1 records every blocking event.
	runtime.SetBlockProfileRate(1)

	// Enable mutex profiling. A rate of 1 records every contention event.
	runtime.SetMutexProfileFraction(1)

	go func() {
		// Expose pprof endpoints on localhost only in this example.
		log.Println(http.ListenAndServe("localhost:6060", nil))
	}()

	select {}
}

Common commands:

bash
# List available pprof profiles.
curl http://localhost:6060/debug/pprof/

# Capture goroutine stack traces in text form.
curl -o goroutine.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

# Analyze goroutine profile.
go tool pprof http://localhost:6060/debug/pprof/goroutine

# Analyze heap profile.
go tool pprof http://localhost:6060/debug/pprof/heap

# Capture a 30-second CPU profile.
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=30"

# Analyze block profile after runtime.SetBlockProfileRate is enabled.
go tool pprof http://localhost:6060/debug/pprof/block

# Analyze mutex profile after runtime.SetMutexProfileFraction is enabled.
go tool pprof http://localhost:6060/debug/pprof/mutex

# Capture execution trace.
curl -o trace.out "http://localhost:6060/debug/pprof/trace?seconds=5"

# Open execution trace.
go tool trace trace.out

Profiles listed by official Go diagnostics documentation include:

ProfileOfficial Troubleshooting Target
goroutineStacks of all current goroutines
heapHeap memory allocation
threadcreateOperating system thread creation
blockLocations where goroutines block on synchronization primitives
mutexLock contention locations
profileCPU profile
traceExecution trace

The block profile is disabled by default and requires calling runtime.SetBlockProfileRate; the mutex profile is disabled by default and requires calling runtime.SetMutexProfileFraction. 3


2.4 runtime/trace

The official runtime/trace documentation explains that an execution trace captures goroutine creation, blocking, unblocking, system call entry, system call exit, system call blocking, GC events, heap size changes, processor start and stop events, and more. 4

Troubleshooting uses:

ProblemTrace Observation Point
Too many goroutines createdgoroutine creation events
Goroutines blocked for a long timeblocking / unblocking events
External calls are slowsyscall enter / exit / block events
Scheduling latencytime from goroutine runnable to running
GC impact on latencyGC events and goroutine execution timeline

During testing:

bash
go test -trace=trace.out ./...
go tool trace trace.out

For running services:

bash
curl -o trace.out "http://localhost:6060/debug/pprof/trace?seconds=5"
go tool trace trace.out

2.5 go vet

Official go vet documentation explains that this tool checks suspicious constructs in Go source code. Its analyzers include lostcancel, copylocks, loopclosure, waitgroup, and others. 8

Common command:

bash
go vet ./...

Typical goroutine-related checks:

vet AnalyzerCorresponding Problem
lostcancelcancel returned by context.WithCancel, context.WithTimeout, or context.WithDeadline is not called
copylocksLock-related objects such as sync.Mutex or sync.WaitGroup are copied
loopclosureGoroutine closure captures loop variable
waitgroupWaitGroup.Add is called inside a goroutine and may race with Wait

2.6 Race Detector

The official Go race detector documentation explains that a data race occurs when two goroutines access the same variable concurrently, at least one access is a write, and there is no synchronization constraint. 7

Common commands:

bash
go test -race ./...
go run -race ./cmd/app
go build -race ./cmd/app

Race detector reports include the conflicting access stacks and the creation stacks of related goroutines. 7

Notes:

Official FactTroubleshooting Meaning
The race detector only detects races that actually occur at runtimeTests or runtime traffic must cover related paths
The race detector adds memory and execution-time overheadAlways-on production use needs separate evaluation
Reports include goroutine creation stacksHelps locate which goroutines access shared variables

3. Common Goroutine Errors and Troubleshooting Methods

3.1 Abnormally Increasing Goroutines / Goroutine Leaks

Symptoms:

SymptomManifestation
Goroutine count keeps increasingruntime.NumGoroutine() curve rises monotonically or periodically
Goroutines are not released after requests endCount does not fall after load testing stops
Same stack appears repeatedlyMany goroutines stop at the same location in pprof/goroutine?debug=2
Memory grows togetherGoroutine count rises together with heap, stack memory, timers, or context objects

Official basis:

  • runtime.NumGoroutine() returns the current number of existing goroutines. 2
  • The goroutine profile reports stacks of all current goroutines. 3
  • The context documentation explains that failing to call CancelFunc leaks child contexts and their children until the parent context is canceled. 5

Troubleshooting steps:

bash
# 1. Record the current goroutine stacks.
curl -o goroutine_1.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

# 2. Record again after some time.
curl -o goroutine_2.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

# 3. Compare repeatedly growing stacks.
diff -u goroutine_1.txt goroutine_2.txt

Code checkpoints:

CheckpointCorresponding Problem
Whether go func() is created without limit in loops, requests, or message consumptionGoroutine creation rate exceeds exit rate
Whether goroutines listen to ctx.Done()Whether async tasks exit after upstream cancellation
Whether CancelFunc is called on all control pathsWhether context children and timers are released
Whether channels are never closed or never sent toWhether goroutines block forever
Whether tickers call StopWhether periodic task resources are released
Whether external I/O has timeoutsWhether network, database, or RPC calls block for a long time

3.2 Process-Level Deadlock

Symptom:

text
fatal error: all goroutines are asleep - deadlock!

Go runtime source contains this fatal message. When all goroutines are in states where they cannot continue running, the runtime triggers this error. [10]

Common trigger patterns:

PatternExample
main goroutine waits on channel receive, but there is no sender<-ch
main goroutine waits on channel send, but there is no receiverch <- v
all goroutines wait on the same WaitGroupwg.Wait()
goroutines wait on locks in opposite ordersA holds lock1 and waits for lock2; B holds lock2 and waits for lock1
nil channel send / receivevar ch chan int; <-ch

Troubleshooting steps:

bash
# Reproduce with all goroutine traceback.
GOTRACEBACK=all ./app

Or capture during runtime:

bash
curl -o goroutine.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

How to judge:

Stack StateDirection
All stopped on channel send / receiveIncomplete channel communication parties
All stopped at WaitGroup.WaitWaitGroup counter did not return to zero
Multiple goroutines stopped at different locks' LockLock order or lock release path problem
Goroutine stopped around nil channel codeNil channel not initialized or select logic error

3.3 Channel Send / Receive Blocks Forever

Official basis:

  • Channels provide a communication mechanism among concurrent goroutines. 9
  • Communication on an unbuffered channel completes only when sender and receiver are both ready. 9
  • A nil channel is never ready. 9
  • Sending to a nil channel blocks forever. 9
  • Receiving from a nil channel blocks forever. 9

Error patterns:

go
func blockOnNilChannel() {
	var ch chan int

	// This receive blocks forever because ch is nil.
	<-ch
}
go
func blockOnSend() {
	ch := make(chan int)

	// This send blocks because there is no receiver.
	ch <- 1
}

Troubleshooting methods:

MethodPurpose
goroutine stackDetermine whether blocked on send or receive
block profileLocate where goroutines block on synchronization primitives
traceView blocking and unblocking timelines
code reviewCheck channel initialization, close, sender, receiver, and buffer capacity

Commands:

bash
go tool pprof http://localhost:6060/debug/pprof/block
curl -o trace.out "http://localhost:6060/debug/pprof/trace?seconds=5"
go tool trace trace.out

3.4 send on closed channel / close closed channel / close nil channel

Official basis:

  • Sending to a closed channel panics. 9
  • Closing an already closed channel panics. 9
  • Closing a nil channel panics. 9
  • Receiving from a closed channel with no remaining values immediately returns the zero value of the element type. 9

Error pattern:

go
func sendClosedChannel() {
	ch := make(chan int)
	close(ch)

	// This panics because the channel is already closed.
	ch <- 1
}

Troubleshooting methods:

SymptomMethod
panic: send on closed channelInspect panic stack and locate sender
panic: close of closed channelInspect panic stack and locate repeated closer
panic: close of nil channelCheck channel initialization path
Occasional panicUse go test -race to check whether send / close happen concurrently

The official Race Detector documentation includes a typical "unsynchronized send and close operations" case. 7


3.5 WaitGroup Misuse

Official basis:

  • sync.WaitGroup is a counting semaphore used to wait for a collection of tasks to complete. 6
  • Add adds delta to the WaitGroup counter. 6
  • A negative counter panics. 6
  • Done is equivalent to Add(-1). 6
  • Wait blocks until the counter reaches zero. 6
  • The waitgroup analyzer in go vet detects misuse where WaitGroup.Add is called inside a new goroutine. 8

Common errors:

ErrorSymptom
Missing Done() after Add(1)Wait() blocks forever
More Done() calls than Add()panic: sync: negative WaitGroup counter
Calling Add() inside a goroutineAdd may race with Wait
Copying WaitGroupCounters are inconsistent across copies
Goroutine panics before Done() executesWait() blocks

Incorrect example:

go
func wrongWaitGroup() {
	var wg sync.WaitGroup

	go func() {
		// Wrong: Add may race with Wait.
		wg.Add(1)
		defer wg.Done()
	}()

	wg.Wait()
}

Troubleshooting:

bash
go vet ./...

Runtime investigation:

bash
curl -o goroutine.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

Stack judgment:

StackMeaning
sync.(*WaitGroup).WaitCurrent goroutine waits for counter to reach zero
panic: sync: negative WaitGroup counterDone or Add(-1) count exceeds Add
Multiple goroutines stopped at WaitCounter did not reach zero or task exit path is abnormal

3.6 Mutex / RWMutex / Cond Blocking

Official Go diagnostics documentation explains:

  • The block profile shows where goroutines block on synchronization primitives.
  • The mutex profile reports lock contention.
  • The block profile is disabled by default and requires runtime.SetBlockProfileRate.
  • The mutex profile is disabled by default and requires runtime.SetMutexProfileFraction. 3

Troubleshooting methods:

go
func enableProfiles() {
	// Enable block profiling. A rate of 1 records every blocking event.
	runtime.SetBlockProfileRate(1)

	// Enable mutex profiling. A rate of 1 records every contention event.
	runtime.SetMutexProfileFraction(1)
}
bash
go tool pprof http://localhost:6060/debug/pprof/block
go tool pprof http://localhost:6060/debug/pprof/mutex

Common errors:

ErrorSymptomMethod
Lock acquired but not releasedgoroutine stopped at Lockgoroutine stack + mutex profile
Inconsistent lock orderMultiple goroutines wait on each othergoroutine stack
Slow I/O while holding lockmutex profile shows long contentionmutex profile + trace
Copying a struct containing a locklock state copiedgo vet -copylocks
Cond wait condition not satisfiedgoroutine stopped at Cond.Waitgoroutine stack

3.7 Data Race

Official basis:

Official Go documentation defines a data race as occurring when two goroutines access the same variable concurrently and at least one access is a write, without a synchronization relationship. 7

Common patterns:

ErrorExample
Goroutine closure shares loop variableMultiple goroutines read/write the same loop variable
Concurrent map read/writeOne goroutine writes a map and another reads it
Global variable without lockMultiple goroutines read/write package-level variable
Channel send and close not synchronizedOne goroutine sends and another closes
Concurrent read/write of primitive variablebool, int, pointer, and similar direct accesses

Troubleshooting commands:

bash
go test -race ./...
go run -race ./cmd/app
go build -race ./cmd/app

Report fields to focus on:

Report FieldPurpose
conflicting access stackLocate conflicting read/write
goroutine creation stackLocate goroutine startup point
read/write markerDetermine which path writes shared variable
file:lineLocate source line

3.8 Loop Variable Closure Capture

The official documentation for the loopclosure analyzer in go vet explains that before Go 1.22, loop variable lifetimes could cause closures to observe the wrong variable value; starting from Go 1.22, loop variable lifetime semantics changed. 8

The official Go race detector documentation also lists concurrent access to loop variables as a typical data race example. 7

Error pattern:

go
func wrongLoopCapture(values []int) {
	for _, v := range values {
		go func() {
			// In old loop variable semantics, this may capture the loop variable.
			println(v)
		}()
	}
}

Compatible style for old semantics:

go
func correctLoopCapture(values []int) {
	for _, v := range values {
		v := v

		go func() {
			// This goroutine captures the per-iteration value.
			println(v)
		}()
	}
}

Troubleshooting commands:

bash
go vet ./...
go test -race ./...

3.9 Context Not Canceled / Cancellation Signal Not Propagated

Official basis:

Official context documentation explains:

  • Context carries deadlines, cancellation signals, and request-scoped values.
  • CancelFunc cancels child contexts and their children, removes the parent context's reference to the child, and stops associated timers.
  • Failing to call CancelFunc leaks child contexts and their children until the parent context is canceled.
  • Done() returns a channel that is closed when the related work should be canceled. 5

Error pattern:

go
func wrongContext(parent context.Context) {
	ctx, _ := context.WithTimeout(parent, time.Second)

	go func() {
		select {
		case <-ctx.Done():
			return
		}
	}()
}

Problem: CancelFunc is not saved or called.

Troubleshooting:

bash
go vet ./...

The lostcancel analyzer detects cases where cancel is not called. 8

Runtime investigation:

SymptomMethod
Goroutine still exists after request endsCheck whether goroutine stack waits on channel, I/O, or timer
Many identical business goroutines in pprofCheck whether they listen to ctx.Done()
Timer resources growCheck whether WithTimeout / WithDeadline calls cancel
Downstream calls do not exitCheck whether context is propagated downstream

3.10 main Exits Early and Goroutines Do Not Finish

Official basis:

The Go language specification explains that program execution begins by initializing the main package and then calling main; when that function returns, the program exits and does not wait for other non-main goroutines to complete. 1

Error pattern:

go
func main() {
	go func() {
		// This goroutine may not finish before main returns.
		doWork()
	}()
}

Symptoms:

SymptomJudgment
Logs are incompleteProcess exited before main waited
Async task did not completeGoroutine lifecycle was not waited for
Test fails occasionallyAsync goroutine still runs after test function returns

Troubleshooting:

MethodPurpose
Add exit logsDetermine whether main returned first
Use WaitGroup or other synchronizationMake main wait for tasks to finish
Use go test -raceCheck whether async goroutines access shared state after test ends

3.11 Panic Inside Goroutine

The Go language specification explains that after panic is called, the current function stops executing, deferred functions execute in last-in-first-out order, and if there is no recover, the panic continues propagating. [11]

Common symptoms:

SymptomMethod
Process exits and prints panic stackLocate panic goroutine from stack
Panic occurs inside async taskInspect goroutine creation stack or business startup point
WaitGroup wait does not returnGoroutine panicked before Done, or recovery path is abnormal

Troubleshooting command:

bash
GOTRACEBACK=all ./app

It can also be set in the program:

go
func init() {
	// Print all goroutine stacks when an unrecovered panic occurs.
	debug.SetTraceback("all")
}

3.12 Unbounded Goroutine Creation

Symptoms:

SymptomManifestation
Goroutine count grows with requests, messages, or tasksruntime.NumGoroutine() grows linearly with input
Many startup points are identical in goroutine profilego func is in loops, request handling, message consumption, and similar paths
Goroutine creation events are dense in traceMany goroutines are created in a short time

Troubleshooting:

bash
curl -o goroutine.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

curl -o trace.out "http://localhost:6060/debug/pprof/trace?seconds=5"
go tool trace trace.out

Code checkpoints:

CheckpointDescription
Whether go func is inside an unbounded loopWhether goroutine creation count is input-controlled
Whether async tasks have exit conditionsWhether goroutines exit after task completion or cancellation
Whether task queues have capacity boundariesWhether creation rate can exceed processing rate
Whether ctx.Done() is observedWhether goroutines exit after upstream cancellation

3.13 Goroutines Blocked on External I/O or Syscalls

Official runtime/trace documentation explains that trace captures system call entry, exit, and blocking events. 4

Common symptoms:

SymptomMethod
Goroutine stack stopped at network read/writeCheck network call location and timeout configuration
Goroutine stack stopped at database or RPC callCheck external dependency call location
Long syscall block time in traceUse go tool trace to inspect syscall block
Goroutine count grows but CPU is not highCheck whether many goroutines wait on I/O

Troubleshooting commands:

bash
curl -o goroutine.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

curl -o trace.out "http://localhost:6060/debug/pprof/trace?seconds=5"
go tool trace trace.out

3.14 select Waits Forever

select is often used to wait for channels, context cancellation, timers, and similar events. When all cases cannot proceed and there is no default branch, the current goroutine blocks.

Common pattern:

go
func waitForever(ch <-chan int) {
	select {
	case <-ch:
		return
	}
}

Troubleshooting:

SymptomMethod
Goroutine stack stopped at selectCheck the channel or context corresponding to each case
Goroutine does not exit after context cancellationCheck whether select contains <-ctx.Done()
Timer branch does not triggerCheck timer creation, reset, and stop paths
Channel branch does not triggerCheck sender, closer, and buffer capacity

4. Standard Troubleshooting Workflow

4.1 Confirm Goroutine Count First

bash
# If the application exports metrics, query the goroutine count metric.
# If not, expose runtime.NumGoroutine() in logs or diagnostics endpoints.

Judgment:

ResultNext Step
Count is stableFocus on local blocking, race, panic, or external I/O
Count keeps growingCapture goroutine profiles and compare growing stacks
Count periodically grows and fallsInvestigate business cycles, scheduled tasks, connection pools, or queue consumers
Count suddenly spikesCheck loop creation, request storms, message backlog, or external call blocking

4.2 Capture Goroutine Profile

bash
curl -o goroutine.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

Analysis dimensions:

DimensionContent
Number of identical stacksWhich type of goroutine is most numerous
Blocking pointchannel, WaitGroup, Mutex, I/O, select
Creation pointBusiness path where go func is located
Whether context is includedWhether cancellation signal is observed
Whether concentrated in one API or taskWhether related to business traffic entry

4.3 Enable Block / Mutex Profile

go
func enableBlockingDiagnostics() {
	// Enable block profiling.
	runtime.SetBlockProfileRate(1)

	// Enable mutex profiling.
	runtime.SetMutexProfileFraction(1)
}
bash
go tool pprof http://localhost:6060/debug/pprof/block
go tool pprof http://localhost:6060/debug/pprof/mutex

Applicable problems:

ProfileScenario
blockSynchronization blocking on channel, select, WaitGroup, Cond, and similar primitives
mutexMutex / RWMutex lock contention

4.4 Capture Trace

bash
curl -o trace.out "http://localhost:6060/debug/pprof/trace?seconds=5"
go tool trace trace.out

Applicable problems:

ProblemTrace Purpose
Too many goroutines createdObserve creation events
Goroutines runnable for a long time but not runningObserve scheduling latency
Syscall blockingObserve syscall block
GC affects latencyObserve GC and goroutine execution timeline
Complex task chainsObserve goroutine unblock relationships

4.5 Run Static and Dynamic Checks

bash
go vet ./...
go test -race ./...

Mapping:

CommandFinds
go vetsuspicious constructs such as lostcancel, copylocks, loopclosure, waitgroup
go test -racedata races that actually occur at runtime
go run -racerace detection during local execution
go build -racebinary built with race detector

5. Common Error Summary

No.Error TypeMain SymptomMain Troubleshooting Method
1Abnormally increasing goroutines / leakCount keeps increasing, does not fall after request endsruntime.NumGoroutine, goroutine profile, context checks
2Process-level deadlockall goroutines are asleep - deadlockGOTRACEBACK=all, goroutine stack, block profile
3Channel receive blocks forevergoroutine stopped at <-chgoroutine stack, sender check
4Channel send blocks forevergoroutine stopped at ch <- vgoroutine stack, buffer/receiver check
5nil channel blockssend/receive never continueschannel initialization path check
6send on closed channelpanicpanic stack, race detector
7close closed channelpanicpanic stack, closer check
8close nil channelpanicchannel initialization path check
9WaitGroup missing DoneWait blocks forevergoroutine stack, go vet
10WaitGroup Add/Wait raceoccasional wait abnormalitygo vet -waitgroup, code path check
11WaitGroup counter negativepanicpanic stack, Add/Done count check
12Mutex not releasedgoroutine stopped at Lockgoroutine stack, mutex profile
13Locks waiting on each othermultiple goroutines wait on each othergoroutine stack, mutex profile
14Lock object copiedabnormal lock statego vet -copylocks
15Data Racenondeterministic result, race reportgo test -race
16Loop variable closure capturegoroutine uses wrong variable valuego vet -loopclosure, race detector
17context cancel not calledcontext children, timers, or goroutines not releasedgo vet -lostcancel, goroutine profile
18goroutine does not listen for cancellationtask still runs after request cancellationgoroutine stack, context check
19main exits earlyasync task not completedmain lifecycle check, synchronization wait
20panic inside goroutineprocess panic or task exits abnormallyGOTRACEBACK=all, panic stack
21unbounded goroutine creationcount grows rapidly with inputgoroutine profile, trace
22external I/O blockinggoroutine stopped in network, RPC, DB callgoroutine stack, trace
23select waits forevergoroutine stopped at selectgoroutine stack, case condition check
24Cond wait not wokengoroutine stopped at Cond.Waitgoroutine stack, block profile
25channel close semantics misusereceive zero value causes business misjudgmentcheck value, ok := <-ch usage

6. Conclusion

Goroutine troubleshooting can be summarized into five categories of objective evidence:

  1. Count evidence: runtime.NumGoroutine().
  2. Stack evidence: runtime.Stack, /debug/pprof/goroutine?debug=2.
  3. Blocking evidence: block profile, mutex profile.
  4. Timeline evidence: runtime/trace, go tool trace.
  5. Code evidence: go vet, race detector, panic stack.

For abnormally increasing goroutines, direct evidence is the goroutine count trend and repeated stacks. For deadlocks, direct evidence is the runtime fatal message and the blocking stacks of all goroutines. For channel, WaitGroup, Mutex, context, data race, and similar problems, Go official documentation provides the corresponding semantic descriptions, runtime tools, or static checking entry points.

References

1 The Go language specification explains that a go statement starts an independently executing goroutine, and the current execution flow does not wait for it to finish; it also states that after main returns, the program exits and does not wait for other non-main goroutines. (Go)

2 Official Go diagnostics documentation explains that runtime.NumGoroutine can be used to monitor goroutine count and detect goroutine leaks; runtime.Stack can output the current and all goroutine stacks. (Go)

3 Official net/http/pprof documentation explains that this package exposes /debug/pprof/ profiling entry points; official Go diagnostics documentation lists uses of profiles such as goroutine, heap, threadcreate, block, and mutex. (Go Packages)

4 runtime.SetBlockProfileRate and runtime.SetMutexProfileFraction enable block and mutex profiles; official runtime/trace documentation explains that trace captures events such as goroutine creation, blocking, unblocking, system calls, and GC. (Go Packages)

5 Official context documentation explains the role of CancelFunc, and that failing to call CancelFunc leaks child contexts and their children; the channel returned by Done() is closed on cancellation. (Go Packages)

6 Official sync.WaitGroup documentation explains its counter, Add, Done, and Wait semantics, and that a negative counter panics. (Go Packages)

7 Official Go Data Race Detector documentation defines data races and explains that go test -race, go run -race, and go build -race can be used for detection; reports include conflicting access stacks and goroutine creation stacks. (Go)

8 Official go vet documentation explains that it checks suspicious constructs in Go source code; related analyzers include waitgroup, copylocks, loopclosure, and lostcancel. (Go Packages)

9 The Go language specification explains channel semantics including blocking, nil channels, closed channels, send, receive, and close. (Go)

[10] Go runtime source and runtime documentation include diagnostic basis such as all goroutines are asleep - deadlock!, GOTRACEBACK, debug.SetTraceback, and SIGQUIT stack dumps. (Go)

[11] The Go language specification explains that after panic, the current function stops executing, deferred functions execute in last-in-first-out order, and the panic propagates along the call stack until recovered or the program terminates. (Go)

Chinese Reference

GitHub Discussions

Join the discussion

Comments are synchronized with GitHub Discussions in stellhub/stell-web.

Powered by VitePress and GitHub Discussions.