Trade-Offs Among High Availability, High Performance, and High Concurrency

Overview

A reliability-engineering analysis of conflicts among high availability, high performance, and high concurrency across resources, time, consistency, and complexity, with a production-oriented trade-off framework.

Abstract

High availability, high performance, and high concurrency are three core engineering goals in distributed-system design. High availability focuses on continuously providing service under failures, overload, dependency exceptions, and release changes. High performance focuses on request latency, throughput efficiency, and resource utilization. High concurrency focuses on the system's ability to handle many simultaneous requests. These goals can reinforce each other, but they also conflict across resources, complexity, consistency, and failure boundaries. Google SRE defines the four golden signals for user-facing system monitoring as latency, traffic, errors, and saturation. These signals correspond to performance, concurrency, failure rate, and resource boundaries, and provide an important observation basis for three-high trade-off analysis. (sre.google)

Based on the AWS Well-Architected Reliability Pillar, Google SRE, Microsoft Azure Architecture Center, and Kubernetes documentation, this article summarizes typical conflicts among high availability, high performance, and high concurrency, and proposes engineering trade-off principles for production systems.

Keywords: high availability; high performance; high concurrency; reliability engineering; rate limiting; circuit breaking; retry; resource isolation; degradation

1. Introduction

In distributed systems, performance, concurrency capacity, and availability are often all included as architecture goals. The AWS Well-Architected Reliability Pillar describes reliability as the ability of a system to work correctly and consistently when needed, and emphasizes resilience, recovery capability, quotas, capacity management, and disaster recovery mechanisms. (AWS Documentation)

In practice, high-performance optimization may reduce call-chain length, reduce validation, add caches, expand connection pools, or increase batch sizes. High-concurrency design may add threads, queues, connections, and asynchronous processing capacity. High-availability design requires redundancy, rate limiting, circuit breaking, isolation, degradation, rollback, monitoring, and recovery. These measures can conflict in resource consumption, request latency, data consistency, and system complexity.

Google SRE's overload-handling chapter points out that even with effective load balancing, some parts of a system will eventually become overloaded. Reliable systems must handle overload gracefully by returning degraded responses, reducing computation cost, or shedding load. (sre.google) This means the goal in high-concurrency scenarios is not to accept unlimited requests, but to preserve core service capability when the system exceeds its capacity boundary.

2. Concept Definitions

2.1 High Availability

High availability means that a system can continue to provide service that meets business expectations when components fail, dependencies become abnormal, networks jitter, traffic spikes, or releases change. The AWS Reliability Pillar emphasizes automatic recovery, horizontal scaling, capacity management, fault isolation, and disaster recovery as ways to improve reliability. (AWS Documentation)

High availability usually focuses on:

text

Availability SLA
Error rate
Timeout rate
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Failure blast radius
Degradation capability
Rollback capability

2.2 High Performance

High performance means that a system completes request processing with low latency, high throughput, and low resource cost. In Google SRE's golden signals, latency measures request processing time, traffic measures demand, and saturation measures resource usage. (sre.google)

High performance usually focuses on:

text

Average response time
P95 / P99 / P999 latency
Throughput
CPU utilization
Memory usage
GC time
Network I/O
Disk I/O

2.3 High Concurrency

High concurrency means that a system maintains stable processing capacity when many requests arrive at the same time. Typical mechanisms include horizontal scaling, load balancing, caching, asynchronous peak shaving, rate limiting, connection pools, thread pools, and data sharding.

High concurrency usually focuses on:

text

QPS
TPS
Concurrent connections
Active thread-pool count
Queue length
Connection-pool wait time
Message backlog
Resource saturation

3. Conflict Model Among the Three Goals

Conflicts among high availability, high performance, and high concurrency mainly come from four factors:

text

Resource conflict: CPU, memory, connections, threads, database connections, and network bandwidth are limited.
Time conflict: low latency, retries, timeout waiting, and recovery time constrain each other.
Consistency conflict: caches, read/write splitting, and async processing can affect freshness or consistency.
Complexity conflict: isolation, degradation, circuit breaking, canary release, and rollback increase system complexity.

Therefore, three-high design cannot optimize one goal in isolation. Production systems must make trade-offs based on capacity boundaries, failure boundaries, and business priority.

4. Conflicts Between High Performance and High Availability

4.1 Timeout Settings: Single-Call Success Rate vs. Resource Release

In remote calls, longer timeouts may improve the success probability of a single call, but they also extend the time during which threads, connections, and memory are occupied. Shorter timeouts release resources faster, but may increase failure responses during brief jitter.

Google SRE production service best practices mention well-behaved queuing mechanisms and dynamic timeouts under high load, combined with graceful load shedding. (sre.google) This shows that timeouts are not merely performance parameters; they control resource occupancy and overload propagation.

Direction	Benefit	Risk
Extend timeout	Improves single-call success probability	Threads and connections remain occupied, recovery becomes slower
Shorten timeout	Fast failure and resource release	Failure rate may rise under brief jitter
Layered timeout	Controls total call budget	Requires unified chain governance

Trade-off by request type:

text

Core synchronous requests: short timeout + limited retry + degradation.
Weak dependencies: shorter timeout + default value or partial response.
Background jobs: longer timeout is acceptable, but must be isolated from online paths.

4.2 Retry: Transient Recovery vs. Failure Amplification

Retries can handle temporary network errors and brief dependency jitter. The Azure Retry Pattern documentation states that retry improves stability when an application connects to a service or network resource that has transient failures. (Microsoft Learn)

But retries can also amplify failures. Azure's Retry Storm Antipattern says clients should limit retry count and duration, and generally should not retry for a long time. (Microsoft Learn) Azure's Circuit Breaker Pattern further explains that Retry Pattern and Circuit Breaker Pattern have different goals: retry assumes the operation will eventually succeed, while a circuit breaker prevents operations that are likely to fail from continuing. (Microsoft Learn)

Direction	Benefit	Risk
Many retries	Better recovery from occasional failures	Retry storm when downstream is abnormal
No retry	Does not amplify downstream pressure	Exposes transient failures directly
Budgeted retry	Balances success rate and stability	Needs timeout, idempotency, and circuit breaking

Engineering trade-offs:

text

Retry idempotent read requests in limited form.
Do not automatically retry non-idempotent writes by default.
Retries must respect the total request timeout.
Use backoff and random jitter.
Stop retrying when the downstream is overloaded.
Avoid multiple layers retrying at the same time.

4.3 Cache: Low Latency vs. Consistency and Expiration Risk

Caching reduces database pressure and request latency, but introduces consistency, penetration, breakdown, and avalanche risks. Google SRE overload guidance mentions relying on local copies or not-fully-fresh data during overload to reduce the cost of accessing authoritative storage. (sre.google) This means caches can serve as degradation and overload-protection mechanisms, but only when the business permits some data staleness.

Direction	Benefit	Risk
Heavy caching	Low latency and lower database pressure	Stale reads, dirty reads, cache avalanche
Minimal caching	Fresher data	Higher database pressure
Multi-level cache	Low latency and strong traffic absorption	Complex invalidation and version management

Engineering trade-offs:

text

Cache data that allows brief inconsistency.
Use caches carefully for strongly consistent data.
Use mutex rebuild or logical expiration for hot data.
Cache null values with short TTL to prevent penetration.
Use versions, push, and local snapshots for configuration data.

5. Conflicts Between High Performance and High Concurrency

5.1 Thread-Pool Size: Parallelism vs. Context Switching

Expanding thread pools can improve short-term concurrent processing capacity, but too many threads increase context switching, memory usage, and scheduling overhead. When downstream services slow down, large thread pools also accumulate many waiting requests, making recovery harder.

Direction	Benefit	Risk
Large thread pool	More parallel processing	CPU jitter, memory pressure, failure accumulation
Small thread pool	Controlled resources	Limited peak capacity
Isolated thread pools	Clear failure boundaries	More configuration and governance complexity

Engineering trade-offs:

text

Use independent thread pools for core APIs.
Use independent thread pools for slow tasks.
Use independent thread pools for weak dependency calls.
Thread-pool queues must be bounded.
Rejection policies must be observable.

5.2 Connection-Pool Size: Single-Request Wait Time vs. Downstream Protection

Increasing a database connection pool can reduce application-side connection wait time, but it also transfers more concurrency pressure to the database. A database is a shared state system and often a key bottleneck in high-concurrency paths. Oversized connection pools can slow the whole database through connection count, lock contention, CPU, and I/O pressure.

Direction	Benefit	Risk
Expand connection pool	Less application-side waiting	Database is saturated
Shrink connection pool	Protects database	More application waiting or rejection
Isolate pools by business	Protects core paths	Higher cost and configuration complexity

Engineering trade-offs:

text

Calculate connection-pool size from database capacity and instance count.
Isolate core and non-core queries.
Set SQL execution timeouts.
Continuously govern slow SQL.
Rate-limit at the application side when database pressure is high.

5.3 Batch Size: Throughput vs. Tail Latency

Batching improves throughput, but larger batches increase per-batch duration, memory usage, rollback cost, and lock-hold time.

Direction	Benefit	Risk
Large batches	Higher throughput	Higher latency and failure cost
Small batches	Low latency and faster recovery	Lower throughput
Count + time thresholds	Balances throughput and latency	More implementation complexity

Engineering trade-offs:

text

Use small batches and short windows for online paths.
Use larger batches for offline paths.
Support dynamic batch size in consumer paths.
Split and retry after failure.
Monitor per-batch duration, backlog, and consumption latency.

6. Conflicts Between High Concurrency and High Availability

6.1 Rate Limiting: Accepted Request Volume vs. System Stability

High concurrency tends to accept more requests. High availability requires active rate limiting, degradation, or rejection when capacity is exceeded. Google SRE states that overload handling is fundamental to reliable services; under overload, a service can return degraded responses or reduce the work it performs. (sre.google)

Google Cloud load shedding guidance also says that load shedding aims to maintain nominal service capability when traffic exceeds system capacity by dropping some requests and letting clients retry. (Google Cloud)

Direction	Benefit	Risk
No rate limiting	Accepts more short-term requests	Full-chain avalanche under overload
Strong rate limiting	Protects core services	Some requests are rejected
Tiered rate limiting	Prioritizes core paths	More strategy complexity

Engineering trade-offs:

text

Guarantee core APIs.
Degrade non-core APIs first.
Throttle background jobs.
Rate-limit by user, tenant, API, and resource.
Set protection thresholds for downstream dependencies separately.

6.2 Queuing: Peak Absorption vs. Hidden Failure

Queues can absorb traffic spikes, but unbounded queues hide overload and make requests wait for a long time. As waiting time grows, clients or upstream systems may trigger timeouts and retries, amplifying pressure.

Google SRE service best practices mention well-behaved queuing and dynamic timeouts under high load, with graceful load shedding when necessary. (sre.google)

Direction	Benefit	Risk
Large queue	Strong peak absorption	High queue latency and late failure exposure
Small queue	Exposes overload quickly	Weak peak absorption
Bounded queue	Controls resource upper bound	Requires rejection and degradation strategy

Engineering trade-offs:

text

Online request queues need explicit limits.
Queue waiting time must be monitored.
When queues are full, fail fast or degrade.
Async task queues should combine consumer capacity and downstream throttling.

6.3 Resource Isolation: Utilization vs. Fault Isolation

Resource co-location improves utilization but reduces fault isolation. Azure Bulkhead Pattern documentation states that the bulkhead pattern isolates application elements into pools so that other pools can continue to operate when one pool fails. It also requires trade-offs around cost, performance, and manageability. (Microsoft Learn)

Direction	Benefit	Risk
Resource mixing	Higher utilization and lower cost	Non-core work affects core business
Full isolation	Clear failure boundary	Higher cost and lower utilization
Tiered isolation	Balances stability and cost	Requires capacity planning

Engineering trade-offs:

text

Isolate core transaction paths.
Separate reports, exports, log consumption, and online paths.
Separate core Redis and ordinary cache Redis.
Separate core MQ topics and log topics.
Separate core databases and analytical databases.

Kubernetes documentation provides Pod and container resource requests and limits. CPU documentation states that a container cannot use more CPU than its configured limit, and can obtain the CPU guaranteed by its request when spare CPU is available. (Kubernetes) These mechanisms implement resource boundaries, but architecture design must still define workload priority and capacity needs.

7. Trade-Offs Around Consistency

7.1 Read/Write Splitting: Read Performance vs. Freshness

Read/write splitting reduces primary database pressure, but primary-replica replication usually has delay. For order status, payment status, permission changes, and inventory deduction, stale reads can cause business errors.

Principles:

text

Read from primary after write.
Read critical state from primary.
Use replicas only when eventual consistency is acceptable.
Remove replicas automatically when lag exceeds a threshold.

7.2 Asynchronization: Response Speed vs. Reliable Delivery

Asynchronization shortens main-path response time, but if async tasks are only placed into a local thread pool, process restart, rejection, or task exceptions can lose business results. Tasks that affect business results need durable messages, Outbox, or transactional messaging.

Principles:

text

Discardable tasks may use local thread pools.
Critical business tasks should use MQ or Outbox.
Consumers must be idempotent.
Failed tasks need retry, compensation, or dead-letter queues.

8. Trade-Offs Around Observability

High-performance optimization may reduce log, metric, and trace overhead, but insufficient observability affects failure diagnosis and recovery. Google SRE defines latency, traffic, errors, and saturation as the four golden signals for user-facing systems. (sre.google) The SRE Workbook also states that monitoring may include metrics, text logs, structured event logs, distributed tracing, and event introspection. (sre.google)

Therefore, observability trade-offs should not be based only on storage cost. They must also consider diagnosis and recovery needs.

Principles:

text

Retain error logs longer than ordinary logs.
Retain ordinary logs for a shorter period.
Keep high-resolution metrics short term and downsample long term.
Sample normal traces and fully sample error traces.
Store audit logs independently and retain them long term.

9. Summary of Typical Conflict Points

Conflict	High-Performance Tendency	High-Concurrency Tendency	High-Availability Tendency	Main Risk
Timeout	Wait longer to improve success rate	Support more pending requests	Fail fast and release resources	Long resource occupancy
Retry	Improve single-call success	Amplified traffic	Limit retry budget	Retry storm
Cache	Reduce latency	Absorb read traffic	Control expiration and degradation	Avalanche, dirty data
Thread pool	Increase parallelism	Accept more requests	Bounded isolation	Queue buildup
Connection pool	Reduce wait	Increase access concurrency	Protect database	Downstream overload
Queue	Smooth processing	Peak shaving	Bounded waiting	Hidden latency
Batch	Increase throughput	Improve consumption capacity	Limit failure cost	Higher tail latency
Async	Shorten main path	Carry peaks	Reliable delivery and compensation	Task loss
Read/write split	Reduce read latency	Scale reads	Ensure critical consistency	Stale reads
Rate limiting	May reduce success count	Control inflow	Protect core system	Some requests rejected
Resource isolation	Adds overhead	Reduces sharing	Shrinks blast radius	Higher cost
Observability	Adds runtime overhead	Adds collection pressure	Supports diagnosis and recovery	Higher cost

10. Trade-Off Framework

10.1 Business Correctness Is a Precondition

For payment, order, inventory, permission, and risk-control systems, data correctness and security boundaries are preconditions. Performance optimization must not skip permission checks, parameter validation, idempotency checks, or audit records.

Possible strategies:

text

Idempotency key
Unique index
State-machine constraint
Permission cache
Batch authorization
Async audit logging with reliable delivery

10.2 High Availability Defines the Production Stability Boundary

When traffic exceeds capacity, protect core paths instead of accepting all requests unconditionally. Google SRE and Google Cloud load-shedding guidance both emphasize degradation, load shedding, and work reduction during overload. (sre.google)

Possible strategies:

text

Entry rate limiting
API rate limiting
Resource rate limiting
Circuit breaking
Degradation
Isolation
Bounded queue
Canary release
Automatic rollback

10.3 High Concurrency Is a Capacity Planning Goal

High-concurrency design should be based on a capacity model, not simply larger thread pools, connection pools, or queues. System capacity should be evaluated across entry traffic, service instances, thread pools, connection pools, databases, caches, MQ, and network bandwidth.

Possible strategies:

text

Capacity load testing
Layered rate limiting
Async peak shaving
Hotspot cache
Read/write splitting
Database/table sharding
Consumer throttling
Tenant quota

10.4 High Performance Is a Constrained Optimization Goal

High-performance optimization should be done only when correctness, availability, and capacity boundaries are preserved. Performance metrics should not focus only on averages; they should include tail latency, error rate, timeout rate, and saturation. Google SRE's golden signals observe systems through latency, traffic, errors, and saturation. (sre.google)

Possible strategies:

text

Reduce invalid computation
Optimize SQL and indexes
Reduce serialization cost
Optimize object allocation
Cache hot data
Batch while limiting batch size
Reuse connections
Reduce lock contention

11. Trade-Off Patterns in Different Business Scenarios

11.1 Transactions, Payments, and Orders

These systems usually prioritize correctness and availability:

text

Correctness > high availability > consistency > high concurrency > high performance

Design focus:

text

Idempotent writes
Read critical state from primary
Short transactions
State-machine constraints
Traceable payment results
Reliable async compensation
No unbudgeted retries

11.2 Content, Feed, and Recommendation Systems

These systems often allow eventual consistency and care more about concurrency and availability:

text

High availability > high concurrency > high performance > strong consistency

Design focus:

text

Cache first
Degrade weak dependencies
Return default content when recommendation fails
Allow temporarily stale data
Return partial core-page results

11.3 Logs, Search, and Monitoring Systems

These systems have high write volume and complex queries, and must avoid dragging down business systems:

text

High availability > high concurrency > cost > real-time freshness > single-call performance

Design focus:

text

Async writes
Consumer-side throttling
Read/write isolation
Hot/cold data tiers
Sampling and downsampling
Storage protection

11.4 Configuration Centers, Registry Centers, and Governance Rule Systems

These are control-plane systems. When the control plane fails, existing data-plane capability should not be affected:

text

High availability > correctness > consistency > high performance

Design focus:

text

Client local snapshot
Pull after push failure
Rollbackable configuration versions
Change audit
Data plane continues with existing rules when control plane fails

11.5 Flash-Sale, Promotion, and Campaign Systems

These systems face traffic bursts. The design focus is protecting core paths:

text

High availability > high concurrency > correctness > high performance

Design focus:

text

Entry rate limiting
Queuing
Inventory preheating
Hotspot isolation
Async peak shaving
Fast failure
Shortest possible core path

12. Engineering Decision Checklist

Use these questions when making three-high design trade-offs:

text

1. Will this optimization increase downstream pressure?
2. Will this optimization expand the failure blast radius?
3. Will this optimization reduce observability?
4. Will this optimization expose failure later?
5. Is a total request timeout configured?
6. Are there unbudgeted retries?
7. Are there unbounded queues?
8. Are thread pools or connection pools oversized?
9. Will cache expiration cause concentrated fallback?
10. Does the database have protection thresholds?
11. Can weak dependencies degrade?
12. Are core and non-core paths isolated?
13. Are async tasks delivered reliably?
14. Are write requests idempotent?
15. Does read/write splitting handle read-after-write consistency?
16. Do releases support canary and rollback?
17. Does load testing cover P95, P99, error rate, and saturation?
18. Does monitoring cover latency, traffic, errors, and saturation?
19. Does rate limiting cover entry, API, user, tenant, resource, and downstream dimensions?
20. Can failures be diagnosed, recovered, and reviewed?

13. Conclusion

High availability, high performance, and high concurrency contain multidimensional conflicts. High-performance optimization may increase consistency risk, reduce observability, or expand the blast radius. High-concurrency design may improve intake capacity by expanding thread pools, connection pools, and queues, but can also overload downstream systems and cause avalanches. High-availability design lowers failure impact through rate limiting, degradation, circuit breaking, isolation, and rollback, but introduces overhead and complexity.

Based on reliability engineering principles, the general trade-off order for production systems can be summarized as:

text

Correctness
  |
High availability
  |
High concurrency
  |
High performance
  |
Cost optimization

This order is not a fixed conclusion for every business scenario, but it is a general constraint model when no explicit business exception exists. For payment, order, permission, and inventory systems, correctness and availability should take precedence over local performance. For content, recommendation, logging, and monitoring systems, cache, async processing, sampling, and eventual consistency can be used within business-acceptable boundaries to gain concurrency capacity. For control-plane systems, control-plane failure must not break existing data-plane capability.

The essence of three-high design is not maximizing single-point performance. It is keeping the system controllable in normal, overloaded, and failed states under capacity boundaries, failure boundaries, and business priorities.

References

1 AWS. Reliability Pillar - AWS Well-Architected Framework. (AWS Documentation) 2 Google SRE. Handling Overload. (sre.google) 3 Google SRE. Monitoring Distributed Systems. (sre.google) 4 Google SRE. Production Services Best Practices. (sre.google) 5 Microsoft Azure Architecture Center. Retry Pattern. (Microsoft Learn) 6 Microsoft Azure Architecture Center. Circuit Breaker Pattern. (Microsoft Learn) 7 Microsoft Azure Architecture Center. Retry Storm Antipattern. (Microsoft Learn) 8 Microsoft Azure Architecture Center. Bulkhead Pattern. (Microsoft Learn) 9 Kubernetes Documentation. Resource Management for Pods and Containers. (Kubernetes) 10 Kubernetes Documentation. Assign CPU Resources to Containers and Pods. (Kubernetes)

Chinese Reference

Read the original Chinese article

Trade-Offs Among High Availability, High Performance, and High Concurrency ​

Overview ​

Abstract ​

1. Introduction ​

2. Concept Definitions ​

2.1 High Availability ​

2.2 High Performance ​

2.3 High Concurrency ​

3. Conflict Model Among the Three Goals ​

4. Conflicts Between High Performance and High Availability ​

4.1 Timeout Settings: Single-Call Success Rate vs. Resource Release ​

4.2 Retry: Transient Recovery vs. Failure Amplification ​

4.3 Cache: Low Latency vs. Consistency and Expiration Risk ​

5. Conflicts Between High Performance and High Concurrency ​

5.1 Thread-Pool Size: Parallelism vs. Context Switching ​

5.2 Connection-Pool Size: Single-Request Wait Time vs. Downstream Protection ​

5.3 Batch Size: Throughput vs. Tail Latency ​

6. Conflicts Between High Concurrency and High Availability ​

6.1 Rate Limiting: Accepted Request Volume vs. System Stability ​

6.2 Queuing: Peak Absorption vs. Hidden Failure ​

6.3 Resource Isolation: Utilization vs. Fault Isolation ​

7. Trade-Offs Around Consistency ​

7.1 Read/Write Splitting: Read Performance vs. Freshness ​

7.2 Asynchronization: Response Speed vs. Reliable Delivery ​

8. Trade-Offs Around Observability ​

9. Summary of Typical Conflict Points ​

10. Trade-Off Framework ​

10.1 Business Correctness Is a Precondition ​

10.2 High Availability Defines the Production Stability Boundary ​

10.3 High Concurrency Is a Capacity Planning Goal ​

10.4 High Performance Is a Constrained Optimization Goal ​

11. Trade-Off Patterns in Different Business Scenarios ​

11.1 Transactions, Payments, and Orders ​

11.2 Content, Feed, and Recommendation Systems ​

11.3 Logs, Search, and Monitoring Systems ​

11.4 Configuration Centers, Registry Centers, and Governance Rule Systems ​

11.5 Flash-Sale, Promotion, and Campaign Systems ​

12. Engineering Decision Checklist ​

13. Conclusion ​

References ​

Chinese Reference ​

Join the discussion

Trade-Offs Among High Availability, High Performance, and High Concurrency

Overview

Abstract

1. Introduction

2. Concept Definitions

2.1 High Availability

2.2 High Performance

2.3 High Concurrency

3. Conflict Model Among the Three Goals

4. Conflicts Between High Performance and High Availability

4.1 Timeout Settings: Single-Call Success Rate vs. Resource Release

4.2 Retry: Transient Recovery vs. Failure Amplification

4.3 Cache: Low Latency vs. Consistency and Expiration Risk

5. Conflicts Between High Performance and High Concurrency

5.1 Thread-Pool Size: Parallelism vs. Context Switching

5.2 Connection-Pool Size: Single-Request Wait Time vs. Downstream Protection

5.3 Batch Size: Throughput vs. Tail Latency

6. Conflicts Between High Concurrency and High Availability

6.1 Rate Limiting: Accepted Request Volume vs. System Stability

6.2 Queuing: Peak Absorption vs. Hidden Failure

6.3 Resource Isolation: Utilization vs. Fault Isolation

7. Trade-Offs Around Consistency

7.1 Read/Write Splitting: Read Performance vs. Freshness

7.2 Asynchronization: Response Speed vs. Reliable Delivery

8. Trade-Offs Around Observability

9. Summary of Typical Conflict Points

10. Trade-Off Framework

10.1 Business Correctness Is a Precondition

10.2 High Availability Defines the Production Stability Boundary

10.3 High Concurrency Is a Capacity Planning Goal

10.4 High Performance Is a Constrained Optimization Goal

11. Trade-Off Patterns in Different Business Scenarios

11.1 Transactions, Payments, and Orders

11.2 Content, Feed, and Recommendation Systems

11.3 Logs, Search, and Monitoring Systems

11.4 Configuration Centers, Registry Centers, and Governance Rule Systems

11.5 Flash-Sale, Promotion, and Campaign Systems

12. Engineering Decision Checklist

13. Conclusion

References

Chinese Reference