cloud native
Table of Contents
Cloud native refers to designing, developing, and operating applications that optimize cloud computing platforms' features and capabilities.
I plan to extensively explore the ecosystem and understand the potential trajectory of modern computing and its future:
The Projects sub-node in here hosts an index into tooling that I've enqueued for exploration.
All the major practical components of the cloud native landscape (as of 0x22AB) are being developed in Golang : initializing exploration of the same directed towards my intentions of contributing competently in the CNCF landscape.
1. The Path to Cloud Native
- The primary incentive so far pushing the evolution of computing services has been the pressure to scale.
- 1950s : about the mainframe computer with all the logic and data residing together as one monolith.
- 1980s : networks of personal computers encouraged some changes. Some application logic could be off-loaded to these PCs (partioning into the Client-Server Architecture) : the first major move towards decoupled services.
- 1990s : the dotcom rush led to explosion of (SaaS) Software as a Service. The associated logistics of operating (develop, deploy, maintain) such a service began to get complex. This further encouraged the decoupling of the business layer in the Client Server architecture into multiple Microservices.
- 2000s : AWS popularized (IaaS) Infrastructure as a Service : yielding the initiation of the term Cloud Computing.
2. Important Terms (checkout System Design)
2.1. Scalability
2.2. Loose Coupling
2.3. Resilience
2.4. Manageability
2.5. Observability
4. Cloud Native Patterns
- sourced from chapter 4 of BOOK: Cloud Native Go
4.1. context.Context
- read up : https://pkg.go.dev/context
- used to idiomatically convey, cancellation signals, deadline abstractions (timeouts), etc.
- context values are thread safe
- read up : https://dave.cheney.net/2017/01/26/context-is-for-cancelation
- also read up : https://dave.cheney.net/2017/08/20/context-isnt-for-cancellation
4.2. Stability Patterns
4.2.1. Circuit Breaker
- https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern
- automatically degrades service functions in response to a likely fault, preventing larger or cascading failures by eliminating recurring errors and providing reasonable error responses
- 2 major components :
- Circuit
- Breaker
- Consider the Scenario:->
- User Request <-> Business Logic Layer <-> Data Layer
- the Business Logic implements a Breaker if the Circuit (Data Layer in this case has failed)
- will retry after sensible pauses and backoff strats
- won't continue requesting if failures are assured.
- Code
Consider the Circuit type as, that signifies the signature of the function that's interacting with the database. An error as one of its returns is necessary.
type Circuit func(context.Context) (string, error)
Consider the Breaker function that accepts a function of type Circuit and some failure threshold information (number of consecutive failures to be tolerated in this case, before breaking (opening) the circuit), returning a wrapped circuit with the same function signature.
func Breaker(circuit Circuit, failureThreshold uint) Circuit { consecutiveFailures := 0 lastAttempt := time.Now() // for closures access over breaker's administrative data // failure count and retry times var m sync.RWMutex return func(context.Context) (string,error) { //see exponential backoff m.RLock() d := consecutiveFailures - int(failureThreshold) if d >= 0 { //circuit breaking in case of failure shouldRetryAt := lastAttempt.Add(time.Second*2 << d) if !time.Now().After(shouldRetryAt){ m.Runlock() return "", errors.New("Service unreachable") } } m.Runlock() // continuing request if no track of failures response, err := circuit(ctx) m.Lock() // acquiring write locks for closure's commons defer m.Unlock() lastAttempt = time.Now() if err != nil { consecutiveFailures++ return response, err } consecutiveFailures = 0 return response, nil } }
4.2.2. Debounce
- etymological origins in electronic circuits : https://www.geeksforgeeks.org/switch-debounce-in-digital-circuits/
- limits the frequency of a function invocation so that only the first or last in a cluster of calls is actually performed.
- is native to javascript but can port to others as needed, will be proceeding in golang
- 2 components:
- Circuit : the computation to be regulated
- Debounce : A closure over Circuit that manages the calls
- similar logic to Circuit Breakers in that the closure maintains the rate limiting logic and state
- Code
- on each call of the Debounce returned closure, regardless of the outcome, a time interval is set.
- calls before expiry of that duration are ignored, any after the duration are passed along to the inner Circuit function.
- this is a "function-first" : i.e cache results and ignore the latter calls
- Alternatively, a "funciton-last" implementation will accumulate a series of requests before calling Circuit
- This could be useful when the inner circuit needs some kick-starting corpus of inputs (think autocompletion)
- can be employed if the response can be delayed a little and increased latency is not an issue.
- calls before expiry of that duration are ignored, any after the duration are passed along to the inner Circuit function.
The Core Circuit can be a function as follows
type Circuit func(context.Context) (string, error)
The Debounce prepped closure can then be structured as follows (function-first)
func DebounceFirst(circuit Circuit, d time.Duration) Circuit { var threshold time.Time var result string // result cache var err error var m sync.Mutex return func(ctx context.Context) (string, error) { m.Lock() defer func() { threshold = time.Now().Add(d) m.Unlock() }() if time.Now().Before(threshold){ //return cached result before threshold return result, err } // if expired, compute and cache result // in the enclosed variable result result, err = circuit(ctx) return result, err } }
a function-last implementation needs a little more book-keeping
func DebounceLast(circuit Circuit, d time.Duration) Circuit { var threshold time.Time = time.Now() var ticker *time.Ticker var result string var err error var once sync.Once var m sync.Mutex return func(ctx context.Context) (string, error) { m.Lock() defer m.Unlock() threshold = time.Now().Add(d) once.Do(func() { ticker = time.NewTicker(time.Millisecond * 100) go func() { defer func() { m.Lock() ticker.Stop() once = sync.Once{} m.Unlock() }() for { select { case <-ticker.C: m.Lock() if time.Now().After(threshold) { result, err = circuit(ctx) m.Unlock() return } m.Unlock() case <-ctx.Done(): m.Lock() result, err = "", ctx.Err() m.Unlock() return } } }() }) return result, err } }
- on each call of the Debounce returned closure, regardless of the outcome, a time interval is set.
4.2.3. Retry
- https://learn.microsoft.com/en-us/azure/architecture/patterns/retry
- accounts for a possible transient fault in a distributed system by transparently retrying a failed operation
- 2 components:
- Effector : interacts with the service
- Retry : accepts the effector, returning a closure over it
- Code
- like the Circuit, the effector will have a function signature as follows
type Effector func(context.Context) (string, error)
the Retry can take in paramters like the number of retries and delay durations, returning a closure of the same signature as the Effector
func Retry(effector Effector, retries int, delay time.Duration) Effector{ return func(ctx context.Context) (string, error) { for r := 0; ; r++ { response, err := effector(ctx) if err == nil || r >= retries { return response, err } log.Printf("Attempt %d failed; retrying in %v", r + 1, delay) select { case <- time.After(delay): case <- ctx.Done(): return "", ctx.Err() } } } }
emulating an erroneous function to try out retry
#+beginsrc go
var count int
func EmulateTransientError(ctx context.Context) (string, error) { count++
if count <= 3 { return "intentional fail", errors.New("error") } else { return "success", nil } }
func main() { r := Retry(EmulateTransientError, 5, 2*time.Second)
res, err := r(context.Background())
fmt.Println(res,err) } #+
4.2.4. Throttle
- limits the frequency of a funciton call to some maximum number of invocations per minute
- See Rate Limiting Algorithms
- diff w/ Debounce
- debounce collates clusters of calls (across flexible durations) into representative boundary calls
- throttle limits the amount of calls in a relatively fixed duration
- 2 components
- Effector : the function being regulated
- Throttle : the enwrapping closure over Effector : implementing the rate limiting layer
- Code
type Effector func(context.Context) (string, error)
func Throttle(e Effector, max uint, refill uint, d time.Duration) Effector { var tokens = max var once sync.Once return func(ctx context.Context) (string, error) { if ctx.Err() != nil { return "", ctx.Err() } } once.Do(func() { ticker := time.NewTicker(d) go func() { defer ticker.Stop() for { select { case <-ctx.Done(): return case <- ticker.C: t := tokens + refill if t > max { t = max } tokens = t } } }() }) if tokens <= 0 { return "", fmt.Errorf("too many calls") } tokens-- return e(ctx) }
4.2.5. Timeout
- allows a process to stop waiting for an answer once it's clear that an answer may not be coming
- 3 components:
- client : calls a slow function
- slowfunction : a long running function
- timeout : wrapper over slow function
- straightforward if a function utilizes context.Context in golang,
- Code
ctx := context.Background() ctxt,cancel := context.WithTimeout(ctx, 10*time.Second) defer cancel() result, err := SomeFunction(ctxt)
- This isn't usually the case though
- build a closure that respects the context in such a case, followed by a select over your injected timeout context and the result of a goroutined slow function
- will need to convert the slow function into a context respecting wrapper as follows
type SlowFunction func(string) (string, error) type WithContext func(context.Context, string) (string, error) func Timeout(f SlowFunction) WithContext { return func(ctx context.Context, arg string) (string error) { chres := make(chan string) //channel for results cherr := make(chan string) //channel for errors go func() { res, err := f(arg) //dispatch slow function chres <- res cherr <- err }() select { case res := <-chres://if done before timeout return res, <-cherr case <-ctx.Done()://in case of timeout return "",ctx.Err() } } }
- finally, using Timeout will be like
func main() { ctx := context.Background() ctxt, cancel := context.WithTimeout(ctx, 1*time.Second) defer cancel() timeout := Timeout(Slow) res, err := timeout(ctxt, "some input") fmt.Println(res, err) }
- an alternative to using context.Context (context.Context is the preferred method btw), is using the time.After function : https://pkg.go.dev/time#After
- This isn't usually the case though
4.3. Concurrency Patterns
5. CNCF (Cloud Native Computing Foundation)
The CNCF is a vendor-neutral open source community that fosters the adoption and advancement of cloud-native technologies. It defines cloud-native as:
- Container-packaged: Software is packaged as container images.
- Dynamically orchestrated: Containers are managed by orchestrators like Kubernetes.
- Microservices-oriented: Applications are composed of loosely coupled, modular services.
- Automated: Infrastructures and pipelines are managed declaratively and automated.
5.1. Canonical Layers to Cloud-Native
5.3. Projects
5.3.1. Devops
Project | Utility |
---|---|
Argo | CI/CD |
Flux | CI/CD |
Helm | App. Def. & Image Builds |
KEDA | AutoScaling |
Kubernetes | Orchestration |
5.3.2. Compute
Project | Utility |
---|---|
Containerd | Container Runtime |
cri-o | Container Runtime |
5.3.3. Storage
5.3.4. Networking
5.3.5. Security
Project | Utility |
---|---|
Falco | security & compliance |
Open Policy Agent | security & compliance |
Spiffe | Key Management |
Spire | Key Management |
TUF | Update System Security |
5.3.6. Meta
Project | Utility |
---|---|
fluentd | Logging |
Harbor | Container Registry |
Jaeger | Distributed Tracing |
Prometheus | Monitoring & Alerts |