Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.
DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!
Containers allow applications to run quicker across many different development environments, and a single container encapsulates everything needed to run an application. Container technologies have exploded in popularity in recent years, leading to diverse use cases as well as new and unexpected challenges. This Zone offers insights into how teams can solve these challenges through its coverage of container performance, Kubernetes, testing, container orchestration, microservices usage to build and deploy containers, and more.
You Can Shape Trend Reports: Participate in DZone Research Surveys + Enter the Prize Drawings!
Debugging Kubernetes Part 1: An Introduction
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. When it comes to software engineering and application development, cloud native has become commonplace in many teams' vernacular. When people survey the world of cloud native, they often come away with the perspective that the entire process of cloud native is for the large enterprise applications. A few years ago, that may have been the case, but with the advancement of tooling and services surrounding systems such as Kubernetes, the barrier to entry has been substantially lowered. Even so, does adopting cloud-native practices for applications consisting of a few microservices make a difference? Just as cloud native has become commonplace, the shift-left movement has made inroads into many organizations' processes. Shifting left is a focus on application delivery from the outset of a project, where software engineers are just as focused on the delivery process as they are on writing application code. Shifting left implies that software engineers understand deployment patterns and technologies as well as implement them earlier in the SDLC. Shifting left using cloud native with microservices development may sound like a definition containing a string of contemporary buzzwords, but there's real benefit to be gained in combining these closely related topics. Fostering a Deployment-First Culture Process is necessary within any organization. Processes are broken down into manageable tasks across multiple teams with the objective being an efficient path by which an organization sets out to reach a goal. Unfortunately, organizations can get lost in their processes. Teams and individuals focus on doing their tasks as best as possible, and at times, so much so that the goal for which the process is defined gets lost. Software development lifecycle (SDLC) processes are not immune to this problem. Teams and individuals focus on doing their tasks as best as possible. However, in any given organization, if individuals on application development teams are asked how they perceive their objectives, responses can include: "Completing stories" "Staying up to date on recent tech stack updates" "Ensuring their components meet security standards" "Writing thorough tests" Most of the answers provided would demonstrate a commitment to the process, which is good. However, what is the goal? The goal of the SDLC is to build software and deploy it. Whether it be an internal or SaaS application, deploying software helps an organization meet an objective. When presented with the statement that the goal of the SDLC is to deliver and deploy software, just about anyone who participates in the process would say, "Well, of course it is." Teams often lose sight of this "obvious" directive because they're far removed from the actual deployment process. A strategic investment in the process can close that gap. Cloud-native abstractions bring a common domain and dialogue across disciplines within the SDLC. Kubernetes is a good basis upon which cloud-native abstractions can be leveraged. Not only does Kubernetes' usefulness span applications of many shapes and sizes, but when it comes to the SDLC, Kubernetes can also be the environment used on systems ranging from local engineering workstations, though the entire delivery cycle, and on to production. Bringing the deployment platform all the way "left" to an engineer's workstation has everyone in the process speaking the same language, and deployment becomes a focus from the beginning of the process. Various teams in the SDLC may look at "Kubernetes Everywhere" with skepticism. Work done on Kubernetes in reducing its footprint for systems such as edge devices has made running Kubernetes on a workstation very manageable. Introducing teams to Kubernetes through automation allows them to iteratively absorb the platform. The most important thing is building a deployment-first culture. Plan for Your Deployment Artifacts With all teams and individuals focused on the goal of getting their applications to production as efficiently and effectively as possible, how does the evolution of application development shift? The shift is subtle. With a shift-left mindset, there aren't necessarily a lot of new tasks, so the shift is where the tasks take place within the overall process. When a detailed discussion of application deployment begins with the first line of code, existing processes may need to be updated. Build Process If software engineers are to deploy to their personal Kubernetes clusters, are they able to build and deploy enough of an application that they're not reliant on code running on a system beyond their workstation? And there is more to consider than just application code. Is a database required? Does the application use a caching system? It can be challenging to review an existing build process and refactor it for workstation use. The CI/CD build process may need to be re-examined to consider how it can be invoked on a workstation. For most applications, refactoring the build process can be accomplished in such a way that the goal of local build and deployment is met while also using the refactored process in the existing CI/CD pipeline. For new projects, begin by designing the build process for the workstation. The build process can then be added to a CI/CD pipeline. The local build and CI/CD build processes should strive to share as much code as possible. This will keep the entire team up to date on how the application is built and deployed. Build Artifacts The primary deliverables for a build process are the build artifacts. For cloud-native applications, this includes container images (e.g., Docker images) and deployment packages (e.g., Helm charts). When an engineer is executing the build process on their workstation, the artifacts will likely need to be published to a repository, such as a container registry or chart repository. The build process must be aware of context. Existing processes may already be aware of their context with various settings for environments ranging from test and staging to production. Workstation builds become an additional context. Given the awareness of context, build processes can publish artifacts to workstation-specific registries and repositories. For cloud-native development, and in keeping with the local workstation paradigm, container registries and chart repositories are deployed as part of the workstation Kubernetes cluster. As the process moves from build to deploy, maintaining build context includes accessing resources within the current context. Parameterization Central to this entire process is that key components of the build and deployment process definition cannot be duplicated based on a runtime environment. For example, if a container image is built and published one way on the local workstation and another way in the CI/CD pipeline. How long will it be before they diverge? Most likely, they diverge sooner than expected. Divergence in a build process will create a divergence across environments, which leads to divergence in teams and results in the eroding of the deployment-first culture. That may sound a bit dramatic, but as soon as any code forks — without a deliberate plan to merge the forks — the code eventually becomes, for all intents and purposes, unmergeable. Parameterizing the build and deployment process is required to maintain a single set of build and deployment components. Parameters define build context such as the registries and repositories to use. Parameters define deployment context as well, such as the number of pod replicas to deploy or resource constraints. As the process is created, lean toward over-parameterization. It's easier to maintain a parameter as a constant rather than extract a parameter from an existing process. Figure 1. Local development cluster Cloud-Native Microservices Development in Action In addition to the deployment-first culture, cloud-native microservices development requires tooling support that doesn't impede the day-to-day tasks performed by an engineer. If engineers can be shown a new pattern for development that allows them to be more productive with only a minimum-to-moderate level of understanding of new concepts, while still using their favorite tools, the engineers will embrace the paradigm. While engineers may push back or be skeptical about a new process, once the impact on their productivity is tangible, they will be energized to adopt the new pattern. Easing Development Teams Into the Process Changing culture is about getting teams on board with adopting a new way of doing something. The next step is execution. Shifting left requires that software engineers move from designing and writing application code to becoming an integral part of the design and implementation of the entire build and deployment process. This means learning new tools and exploring areas in which they may not have a great deal of experience. Human nature tends to resist change. Software engineers may look at this entire process and think, "How can I absorb this new process and these new tools while trying to maintain a schedule?" It's a valid question. However, software engineers are typically fine with incorporating a new development tool or process that helps them and the team without drastically disrupting their daily routine. Whether beginning a new project or refactoring an existing one, adoption of a shift-left engineering process requires introducing new tools in a way that allows software engineers to remain productive while iteratively learning the new tooling. This starts with automating and documenting the build out of their new development environment — their local Kubernetes cluster. It also requires listening to the team's concerns and suggestions as this will be their daily environment. Dev(elopment) Containers The Development Containers specification is a relatively new advancement based on an existing concept in supporting development environments. Many engineering teams have leveraged virtual desktop infrastructure (VDI) systems, where a developer's workstation is hosted on a virtualized infrastructure. Companies that implement VDI environments like the centralized control of environments, and software engineers like the idea of a pre-packaged environment that contains all the components required to develop, debug, and build an application. What software engineers do not like about VDI environments is network issues where their IDEs become sluggish and frustrating to use. Development containers leverage the same concept as VDI environments but bring it to a local workstation, allowing engineers to use their locally installed IDE while being remotely connected to a running container. This way, the engineer has the experience of local development while connected to a running container. Development containers do require an IDE that supports the pattern. What makes the use of development containers so attractive is that engineers can attach to a container running within a Kubernetes cluster and access services as configured for an actual deployment. In addition, development containers support a first-class development experience, including all the tools a developer would expect to be available in a development environment. From a broader perspective, development containers aren't limited to local deployments. When configured for access, cloud environments can provide the same first-class development experience. Here, the deployment abstraction provided by containerized orchestration layers really shines. Figure 2. Microservice development container configured with dev containers The Synergistic Evolution of Cloud-Native Development Continues There's a synergy across shift-left, cloud-native, and microservices development. They present a pattern for application development that can be adopted by teams of any size. Tooling continues to evolve, making practical use of the technologies involved in cloud-native environments accessible to all involved in the application delivery process. It is a culture change that entails a change in mindset while learning new processes and technologies. It's important that teams aren't burdened with a collection of manual processes where they feel their productivity is being lost. Automation helps ease teams into the adoption of the pattern and technologies. As with any other organizational change, upfront planning and preparation is important. Just as important is involving the teams in the plan. When individuals have a say in change, ownership and adoption become a natural outcome. This is an excerpt from DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report
Debugging application issues in a Kubernetes cluster can often feel like navigating a labyrinth. Containers are ephemeral by design and intended to be immutable once deployed. This presents a unique challenge when something goes wrong and we need to dig into the issue. Before diving into the debugging tools and techniques, it's essential to grasp the core problem: why modifying container instances directly is a bad idea. This blog post will walk you through the intricacies of Kubernetes debugging, offering insights and practical tips to effectively troubleshoot your Kubernetes environment. The Problem With Kubernetes Video The Immutable Nature of Containers One of the fundamental principles of Kubernetes is the immutability of container instances. This means that once a container is running, it shouldn't be altered. Modifying containers on the fly can lead to inconsistencies and unpredictable behavior, especially as Kubernetes orchestrates the lifecycle of these containers, replacing them as needed. Imagine trying to diagnose an issue only to realize that the container you’re investigating has been modified, making it difficult to reproduce the problem consistently. The idea behind this immutability is to ensure that every instance of a container is identical to any other instance. This consistency is crucial for achieving reliable, scalable applications. If you start modifying containers, you undermine this consistency, leading to a situation where one container behaves differently from another, even though they are supposed to be identical. The Limitations of kubectl exec We often start our journey in Kubernetes with commands such as: $ kubectl -- exec -ti <pod-name> This logs into a container and feels like accessing a traditional server with SSH. However, this approach has significant limitations. Containers often lack basic diagnostic tools—no vim, no traceroute, sometimes not even a shell. This can be a rude awakening for those accustomed to a full-featured Linux environment. Additionally, if a container crashes, kubectl exec becomes useless as there's no running instance to connect to. This tool is insufficient for thorough debugging, especially in production environments. Consider the frustration of logging into a container only to find out that you can't even open a simple text editor to check configuration files. This lack of basic tools means that you are often left with very few options for diagnosing problems. Moreover, the minimalistic nature of many container images, designed to reduce their attack surface and footprint, exacerbates this issue. Avoiding Direct Modifications While it might be tempting to install missing tools on the fly using commands like apt-get install vim, this practice violates the principle of container immutability. In production, installing packages dynamically can introduce new dependencies, potentially causing application failures. The risks are high, and it's crucial to maintain the integrity of your deployment manifests, ensuring that all configurations are predefined and reproducible. Imagine a scenario where a quick fix in production involves installing a missing package. This might solve the immediate problem but could lead to unforeseen consequences. Dependencies introduced by the new package might conflict with existing ones, leading to application instability. Moreover, this approach makes it challenging to reproduce the exact environment, which is vital for debugging and scaling your application. Enter Ephemeral Containers The solution to the aforementioned problems lies in ephemeral containers. Kubernetes allows the creation of these temporary containers within the same pod as the application container you need to debug. These ephemeral containers are isolated from the main application, ensuring that any modifications or tools installed do not impact the running application. Ephemeral containers provide a way to bypass the limitations of kubectl exec without violating the principles of immutability and consistency. By launching a separate container within the same pod, you can inspect and diagnose the application container without altering its state. This approach preserves the integrity of the production environment while giving you the tools you need to debug effectively. Using kubectl debug The kubectl debug command is a powerful tool that simplifies the creation of ephemeral containers. Unlike kubectl exec, which logs into the existing container, kubectl debug creates a new container within the same namespace. This container can run a different OS, mount the application container’s filesystem, and provide all necessary debugging tools without altering the application’s state. This method ensures you can inspect and diagnose issues even if the original container is not operational. For example, let’s consider a scenario where we’re debugging a container using an ephemeral Ubuntu container: kubectl debug <myapp> -it <pod-name> --image=ubuntu --share-process --copy-to=<myapp-debug> This command launches a new Ubuntu-based container within the same pod, providing a full-fledged environment to diagnose the application container. Even if the original container lacks a shell or crashes, the ephemeral container remains operational, allowing you to perform necessary checks and install tools as needed. It relies on the fact that we can have multiple containers in the same pod, that way we can inspect the filesystem of the debugged container without physically entering that container. Practical Application of Ephemeral Containers To illustrate, let’s delve deeper into how ephemeral containers can be used in real-world scenarios. Suppose you have a container that consistently crashes due to a mysterious issue. By deploying an ephemeral container with a comprehensive set of debugging tools, you can monitor the logs, inspect the filesystem, and trace processes without worrying about the constraints of the original container environment. For instance, you might encounter a situation where an application container crashes due to an unhandled exception. By using kubectl debug, you can create an ephemeral container that shares the same network namespace as the original container. This allows you to capture network traffic and analyze it to understand if there are any issues related to connectivity or data corruption. Security Considerations While ephemeral containers reduce the risk of impacting the production environment, they still pose security risks. It’s critical to restrict access to debugging tools and ensure that only authorized personnel can deploy ephemeral containers. Treat access to these systems with the same caution as handing over the keys to your infrastructure. Ephemeral containers, by their nature, can access sensitive information within the pod. Therefore, it is essential to enforce strict access controls and audit logs to track who is deploying these containers and what actions are being taken. This ensures that the debugging process does not introduce new vulnerabilities or expose sensitive data. Interlude: The Role of Observability While tools like kubectl exec and kubectl debug are invaluable for troubleshooting, they are not replacements for comprehensive observability solutions. Observability allows you to monitor, trace, and log the behavior of your applications in real time, providing deeper insights into issues without the need for intrusive debugging sessions. These tools aren't meant for everyday debugging: that role should be occupied by various observability tools. I will discuss observability in more detail in an upcoming post. Command Line Debugging While tools like kubectl exec and kubectl debug are invaluable, there are times when you need to dive deep into the application code itself. This is where we can use command line debuggers. Command line debuggers allow you to inspect the state of your application at a very granular level, stepping through code, setting breakpoints, and examining variable states. Personally, I don't use them much. For instance, Java developers can use jdb, the Java Debugger, which is analogous to gdb for C/C++ programs. Here’s a basic rundown of how you might use jdb in a Kubernetes environment: 1. Set Up Debugging First, you need to start your Java application with debugging enabled. This typically involves adding a debug flag to your Java command. However, as discussed in my post here, there's an even more powerful way that doesn't require a restart: java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005 -jar myapp.jar 2. Port Forwarding Since the debugger needs to connect to the application, you’ll set up port forwarding to expose the debug port of your pod to your local machine. This is important as JDWP is dangerous: kubectl port-forward <pod-name> 5005:5005 3. Connecting the Debugger With port forwarding in place, you can now connect jdb to the remote application: jdb -attach localhost:5005 From here, you can use jdb commands to set breakpoints, step through code, and inspect variables. This process allows you to debug issues within the code itself, which can be invaluable for diagnosing complex problems that aren’t immediately apparent through logs or superficial inspection. Connecting a Standard IDE for Remote Debugging I prefer IDE debugging by far. I never used JDB for anything other than a demo. Modern IDEs support remote debugging, and by leveraging Kubernetes port forwarding, you can connect your IDE directly to a running application inside a pod. To set up remote debugging we start with the same steps as the command line debugging. Configuring the application and setting up the port forwarding. 1. Configure the IDE In your IDE (e.g., IntelliJ IDEA, Eclipse), set up a remote debugging configuration. Specify the host as localhost and the port as 5005. 2. Start Debugging Launch the remote debugging session in your IDE. You can now set breakpoints, step through code, and inspect variables directly within the IDE, just as if you were debugging a local application. Conclusion Debugging Kubernetes environments requires a blend of traditional techniques and modern tools designed for container orchestration. Understanding the limitations of kubectl exec and the benefits of ephemeral containers can significantly enhance your troubleshooting process. However, the ultimate goal should be to build robust observability into your applications, reducing the need for ad-hoc debugging and enabling proactive issue detection and resolution. By following these guidelines and leveraging the right tools, you can navigate the complexities of Kubernetes debugging with confidence and precision. In the next installment of this series, we’ll delve into common configuration issues in Kubernetes and how to address them effectively.
In this article, I want to discuss test containers and Golang, how to integrate them into a project, and why it is necessary. Testcontainers Review Testcontainers is a tool that enables developers to utilize Docker containers during testing, providing isolation and maintaining an environment that closely resembles production. Why do we need to use it? Some points: Importance of Writing Tests Ensures code quality by identifying and preventing errors. Facilitates safer code refactoring. Acts as documentation for code functionality. Introduction to Testcontainers Library for managing Docker containers within tests. Particularly useful when applications interact with external services. Simplifies the creation of isolated testing environments. Support Testcontainers-go in Golang Port of the Testcontainers library for Golang. Enables the creation and management of Docker containers directly from tests. Streamlines integration testing by providing isolated and reproducible environments. Ensures test isolation, preventing external factors from influencing results. Simplifies setup and teardown of containers for testing. Supports various container types, including databases, caches, and message brokers. Integration Testing Offers isolated environments for integration testing. Convenient methods for starting, stopping, and obtaining container information. Facilitates seamless integration of Docker containers into the Golang testing process. So, the key point to highlight is that we don't preconfigure the environment outside of the code; instead, we create an isolated environment from the code. This allows us to achieve isolation for both individual and all tests simultaneously. For example, we can set up a single MongoDB for all tests and work with it within integration tests. However, if we need to add Redis for a specific test, we can do so through the code. Let’s explore its application through an example of a portfolio management service developed in Go. Service Description The service is a REST API designed for portfolio management. It utilizes MongoDB for data storage and Redis for caching queries. This ensures fast data access and reduces the load on the primary storage. Technologies Go: The programming language used to develop the service. MongoDB: Document-oriented database employed for storing portfolio data. Docker and Docker Compose: Used for containerization and local deployment of the service and its dependencies. Testcontainers-go: Library for integration testing using Docker containers in Go tests. Testing Using Testcontainers Test containers allow integration testing of the service under conditions closely resembling a real environment, using Docker containers for dependencies. Let’s provide an example of a function to launch a MongoDB container in tests: Go func RunMongo(ctx context.Context, t *testing.T, cfg config.Config) testcontainers.Container { mongodbContainer, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{ ContainerRequest: testcontainers.ContainerRequest{ Image: mongoImage, ExposedPorts: []string{listener}, WaitingFor: wait.ForListeningPort(mongoPort), Env: map[string]string{ "MONGO_INITDB_ROOT_USERNAME": cfg.Database.Username, "MONGO_INITDB_ROOT_PASSWORD": cfg.Database.Password, }, }, Started: true, }) if err != nil { t.Fatalf("failed to start container: %s", err) } return mongodbContainer } And a part of the example: Go package main_test import ( "context" "testing" "github.com/testcontainers/testcontainers-go" "github.com/testcontainers/testcontainers-go/wait" ) func TestMongoIntegration(t *testing.T) { ctx := context.Background() // Replace cfg with your actual configuration cfg := config.Config{ Database: struct { Username string Password string Collection string }{ Username: "root", Password: "example", Collection: "test_collection", }, } // Launching the MongoDB container mongoContainer := RunMongo(ctx, t, cfg) defer mongoContainer.Terminate(ctx) // Here you can add code for initializing MongoDB, for example, creating a client to interact with the database // Here you can run tests using the started MongoDB container // ... // Example test that checks if MongoDB is available if err := checkMongoAvailability(mongoContainer, t); err != nil { t.Fatalf("MongoDB is not available: %s", err) } // Here you can add other tests in your scenario // ... } // Function to check the availability of MongoDB func checkMongoAvailability(container testcontainers.Container, t *testing.T) error { host, err := container.Host(ctx) if err != nil { return err } port, err := container.MappedPort(ctx, "27017") if err != nil { return err } // Here you can use host and port to create a client and check the availability of MongoDB // For example, attempt to connect to MongoDB and execute a simple query return nil } How to run tests: go test ./… -v This test will use Testcontainers to launch a MongoDB container and then conduct integration tests using the started container. Replace `checkMongoAvailability` with the tests you need. Please ensure that you have the necessary dependencies installed before using this example, including the `testcontainers-go` library and other libraries used in your code. Now, it is necessary to relocate the operation of the MongoDB Testcontainer into the primary test method. This adjustment allows for the execution of the Testcontainer a single time. Go var mongoAddress string func TestMain(m *testing.M) { ctx := context.Background() cfg := CreateCfg(database, collectionName) mongodbContainer, err := RunMongo(ctx, cfg) if err != nil { log.Fatal(err) } defer func() { if err := mongodbContainer.Terminate(ctx); err != nil { log.Fatalf("failed to terminate container: %s", err) } }() mappedPort, err := mongodbContainer.MappedPort(ctx, "27017") mongoAddress = "mongodb://localhost:" + mappedPort.Port() os.Exit(m.Run()) } And now, our test should be: Go func TestFindByID(t *testing.T) { ctx := context.Background() cfg := CreateCfg(database, collectionName) cfg.Database.Address = mongoAddress client := GetClient(ctx, t, cfg) defer client.Disconnect(ctx) collection := client.Database(database).Collection(collectionName) testPortfolio := pm.Portfolio{ Name: "John Doe", Details: "Software Developer", } insertResult, err := collection.InsertOne(ctx, testPortfolio) if err != nil { t.Fatal(err) } savedObjectID, ok := insertResult.InsertedID.(primitive.ObjectID) if !ok { log.Fatal("InsertedID is not an ObjectID") } service, err := NewMongoPortfolioService(cfg) if err != nil { t.Fatal(err) } foundPortfolio, err := service.FindByID(ctx, savedObjectID.Hex()) if err != nil { t.Fatal(err) } assert.Equal(t, testPortfolio.Name, foundPortfolio.Name) assert.Equal(t, testPortfolio.Details, foundPortfolio.Details) } Ok, but Do We Already Have Everything Inside the Makefile? Let's figure it out—what advantages do test containers offer now? Long before, we used to write tests and describe the environment in a makefile, where scripts were used to set up the environment. Essentially, it was the same Docker compose and the same environment setup, but we did it in one place and for everyone at once. Does it make sense for us to migrate to test containers? Let's conduct a brief comparison between these two approaches. Isolation and Autonomy Testcontainers ensure the isolation of the testing environment during tests. Each test launches its container, guaranteeing that changes made by one test won’t affect others. Ease of Configuration and Management Testcontainers simplifies configuring and managing containers. You don’t need to write complex Makefile scripts for deploying databases; instead, you can use the straightforward Testcontainers API within your tests. Automation and Integration With Test Suites Utilizing Testcontainers enables the automation of container startup and shutdown within the testing process. This easily integrates into test scenarios and frameworks. Quick Test Environment Setup Launching containers through Testcontainers is swift, expediting the test environment preparation process. There’s no need to wait for containers to be ready, as is the case when using a Makefile. Enhanced Test Reliability Starting a container in a test brings the testing environment closer to reality. This reduces the likelihood of false positives and increases test reliability. In conclusion, incorporating Testcontainers into tests streamlines the testing process, making it more reliable and manageable. It also facilitates using a broader spectrum of technologies and data stores. Conclusion In conclusion, it's worth mentioning that delaying transitions from old approaches to newer and simpler ones is not advisable. Often, this leads to the accumulation of significant complexity and requires ongoing maintenance. Most of the time, our scripts set up an entire test environment right on our computers, but why? In the test environment, we have everything — Kafka, Redis, and Istio with Prometheus. Do we need all of this just to run a couple of integration tests for the database? The answer is obviously no. The main idea of such tests is complete isolation from external factors and writing them as close to the subject domain and integrations as possible. As practice shows, these tests fit well into CI/CD under the profile or stage named e2e, allowing them to be run in isolation wherever you have Docker! Ultimately, if you have a less powerful laptop or prefer running everything in runners or on your company's resources, this case is for you! Thank you for your time, and I wish you the best of luck! I hope the article proves helpful! Code DrSequence/testcontainer-contest Read More Testcontainers MongoDB Module MongoDB
Twenty years ago, software was eating the world. Then around a decade ago, containers started eating software, heralded by the arrival of open source OCI standards. Suddenly, developers were able to package an application artifact in a container — sometimes all by themselves. And each container image could technically run anywhere — especially in cloud infrastructure. No more needing to buy VM licenses, look for Rackspace and spare servers, and no more contacting the IT Ops department to request provisioning. Unfortunately, the continuing journey of deploying containers throughout all enterprise IT estates hasn’t been all smooth sailing. Dev teams are confronted with an ever-increasing array of options for building and configuring multiple container images to support unique application requirements and different underlying flavors of commercial and open-source platforms. Even if a developer becomes an expert in docker build, and the team has enough daily time to keep track of changes across all components and dependencies, they are likely to see functional and security gaps appearing within their expanding container fleet. Fortunately, we are seeing a bright spot in the evolution of Cloud Native Buildpacks, an open-source implementation project pioneered at Heroku and adopted early at Pivotal, which is now under the wing of the CNCF. Paketo Buildpacks is an open-source implementation of Cloud Native Buildpacks currently owned by the Cloud Foundry Foundation. Paketo automatically compiles and encapsulates developer application code into containers. Here’s how this latest iteration of buildpacks supports several important developer preferences and development team initiatives. Open Source Interoperability Modern developers appreciate the ability to build on open-source technology whenever they can, but it’s not always that simple to decide between open-source solutions when vendors and end-user companies have already made architectural decisions and set standards. Even in an open-source-first shop, many aspects of the environment will be vendor-supported and offer opinionated stacks for specific delivery platforms. Developers love to utilize buildpacks because they allow them to focus on coding business logic, rather than the infinite combinations of deployment details. Dealing with both source and deployment variability is where Paketo differentiates itself from previous containerization approaches. So, it doesn’t matter whether the developer codes in Java, Go, nodeJS, or Python, Paketo can compile ready-to-run containers. And, it doesn’t matter which cloud IaaS resource or on-prem server it runs on. “I think we're seeing a lot more developers who have a custom platform with custom stacks, but they keep coming back to Paketo Buildpacks because they can actually plug them into a modular system,” said Forest Eckhardt, contributor and maintainer to the Paketo project. “I think that adoption is going well, a lot of the adopters that we see are DevOps or Operations leaders who are trying to deliver applications for their clients and external teams.” Platform Engineering With Policy Platform engineering practices give developers shared, self-service resources and environments for development work, reducing setup costs and time, and encouraging code, component, and configuration reuse. These common platform engineering environments can be offered within a self-service internal portal or an external partner development portal, sometimes accompanied by support from a platform team that curates and reviews all elements of the platform. If the shared team space has too many random uploads, developers will not be able to distinguish the relative utility or safety of various unvalidated container definitions and packages. Proper governance means giving developers the ability to build to spec — without having to slog through huge policy checklists. Buildpacks take much of the effort and risk out of the ‘last mile’ of platform engineering. Developers can simply bring their code, and Paketo Buildpacks detects the language, gathers dependencies, and builds a valid container image that fits within the chosen methodology and policies of the organization. DevOps-Speed Automation In addition to empowering developers with self-service resources, automating everything as much as possible is another core tenet of the DevOps movement. DevOps is usually represented as a continuous infinity loop, where each change the team promotes in the design/development/build/deploy lifecycle should be executed by automated processes, including production monitoring and feedback to drive the next software delivery cycle. Any manual intervention in the lifecycle should be looked at as the next potential constraint to be addressed. If developers are spending time setting up Dockerfiles and validating containers, that’s less time spent creating new functionality or debugging critical issues. Software Supply Chain Assurance Developers want to move fast, so they turn to existing code and infrastructure examples that are working for peers. Heaps of downloadable packages and source code snippets are ready to go on npm StackOverflow and DockerHub – many with millions of downloads and lots of upvotes and review stars. The advent of such public development resources and git-style repositories offers immense value for the software industry as a whole, but by nature, it also provides an ideal entry point for software supply chain (or SSC) attacks. Bad actors can insert malware and irresponsible ones can leave behind vulnerabilities. Scanning an application once exploits are baked in can be difficult. It’s about time the software industry started taking a page from other discrete industries like high-tech manufacturing and pharmaceuticals that rely on tight governance of their supply chains to maximize customer value with reduced risk. For instance, an automotive brand would want to know the provenance of every part that goes into a car they manufacture, a complete bill-of-materials (or BOM) including both its supplier history and its source material composition. Paketo Buildpacks automatically generates an SBOM (software bill-of-materials) during each build process, attached to the image, so there’s no need to rely on external scanning tools. The SBOM documents information about every component in the packaged application, for instance, that it was written with Go version 1.22.3, even though that original code was compiled. The Intellyx Take Various forms of system encapsulation routines have been around for years, well before Docker appeared. Hey, containers even existed on mainframes. But there’s something distinct about this current wave of containerization for a cloud-native world. Paketo Buildpacks provides application delivery teams with total flexibility in selecting their platforms and open-source components of choice, with automation and reproducibility. Developers can successfully build the same app, in the same way, thousands of times in a row, even if underlying components are updated. That’s why so many major development shops are moving toward modern buildpacks, and removing the black box around containerization — no matter what deployment platform and methodology they espouse. ©2024 Intellyx B.V. Intellyx is editorially responsible for this document. At the time of writing, Cloud Foundry Foundation is an Intellyx customer. No AI bots were used to write this content. Image source: Adobe Express AI.
We have a somewhat bare-bones chat service in our series so far. Our service exposes endpoints for managing topics and letting users post messages in topics. For a demo, we have been using a makeshift in-memory store that shamelessly provides no durability guarantees. A basic and essential building block in any (web) service is a data store (for storing, organizing, and retrieving data securely and efficiently). In this tutorial, we will improve the durability, organization, and persistence of data by introducing a database. There are several choices of databases: in-memory (a very basic form of which we have used earlier), object-oriented databases, key-value stores, relational databases, and more. We will not repeat an in-depth comparison of these here and instead defer to others. Furthermore, in this article, we will use a relational (SQL) database as our underlying data store. We will use the popular GORM library (an ORM framework) to simplify access to our database. There are several relational databases available, both free as well as commercial. We will use Postgres (a very popular, free, lightweight, and easy-to-manage database) for our service. Postgres is also an ideal choice for a primary source-of-truth data store because of the strong durability and consistency guarantees it provides. Setting Up the Database A typical pattern when using a database in a service is: |---------------| |-----------| |------------| |------| | Request Proto | <-> | Service | <-> | ORM/SQL | <-> | DB | |---------------| |-----------| |------------| |------| A gRPC request is received by the service (we have not shown the REST Gateway here). The service converts the model proto (e.g., Topic) contained in the request (e.g., CreateTopicRequest) into the ORM library. The ORM library generates the necessary SQL and executes it on the DB (and returns any results). Setting Up Postgres We could go the traditional way of installing Postgres (by downloading and installing its binaries for the specific platforms). However, this is complicated and brittle. Instead, we will start using Docker (and Docker Compose) going forward for a compact developer-friendly setup. Set Up Docker Set up Docker Desktop for your platform following the instructions. Add Postgres to Docker Compose Now that Docker is set up, we can add different containers to this so we can build out the various components and services OneHub requires. docker-compose.yml: version: '3.9' services: pgadmin: image: dpage/pgadmin4 ports: - ${PGADMIN_LISTEN_PORT}:${PGADMIN_LISTEN_PORT} environment: PGADMIN_LISTEN_PORT: ${PGADMIN_LISTEN_PORT} PGADMIN_DEFAULT_EMAIL: ${PGADMIN_DEFAULT_EMAIL} PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_DEFAULT_PASSWORD} volumes: - ./.pgadmin:/var/lib/pgadmin postgres: image: postgres:15.3 environment: POSTGRES_DB: ${POSTGRES_DB} POSTGRES_USER: ${POSTGRES_USER} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} volumes: - ./.pgdata:/var/lib/postgresql/data ports: - 5432:5432 healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 5s retries: 5 That's it. A few key things to note are: The Docker Compose file is an easy way to get started with containers - especially on a single host without needing complicated orchestration engines (hint: Kubernetes). The main part of Docker Compose files are the service sections that describe the containers for each of the services that Docker Compose will be executing as a "single unit in a private network." This is a great way to package multiple related services needed for an application and bring them all up and down in one step instead of having to manage them one by one individually. The latter is not just cumbersome, but also error-prone (manual dependency management, logging, port checking, etc). For now, we have added one service - postgres - running on port 5432. Since the services are running in an isolated context, environment variables can be set to initialize/control the behavior of the services. These environment variables are read from a specific .env file (below). This file can also be passed as a CLI flag or as a parameter, but for now, we are using the default .env file. Some configuration parameters here are the Postgres username, password, and database name. .env: POSTGRES_DB=onehubdb POSTGRES_USER=postgres POSTGRES_PASSWORD=docker ONEHUB_DB_ENDOINT=postgres://postgres:docker@postgres:5432/onehubdb PGADMIN_LISTEN_PORT=5480 PGADMIN_DEFAULT_EMAIL=admin@onehub.com PGADMIN_DEFAULT_PASSWORD=password All data in a container is transient and is lost when the container is shut down. In order to make our database durable, we will store the data outside the container and map it as a volume. This way from within the container, Postgres will read/write to its local directory (/var/lib/postgresql/data) even though all reads/writes are sent to the host's file system (./.pgdata) Another great benefit of using Docker is that all the ports used by the different services are "internal" to the network that Docker creates. This means the same postgres service (which runs on port 5432) can be run on multiple Docker environments without having their ports changed or checked for conflicts. This works because, by default, ports used inside a Docker environment are not exposed outside the Docker environment. Here we have chosen to expose port 5432 explicitly in the ports section of docker-compose.yml. That's it. Go ahead and bring it up: docker compose up If all goes well, you should see a new Postgres database created and initialized with our username, password, and DB parameters from the .env file. The database is now ready: onehub-postgres-1 | 2023-07-28 22:52:32.199 UTC [1] LOG: starting PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit onehub-postgres-1 | 2023-07-28 22:52:32.204 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 onehub-postgres-1 | 2023-07-28 22:52:32.204 UTC [1] LOG: listening on IPv6 address "::", port 5432 onehub-postgres-1 | 2023-07-28 22:52:32.209 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" onehub-postgres-1 | 2023-07-28 22:52:32.235 UTC [78] LOG: database system was shut down at 2023-07-28 22:52:32 UTC onehub-postgres-1 | 2023-07-28 22:52:32.253 UTC [1] LOG: database system is ready to accept connections The OneHub Docker application should now show up on the Docker desktop and should look something like this: (Optional) Setup a DB Admin Interface If you would like to query or interact with the database (outside code), pgAdmin and adminer are great tools. They can be downloaded as native application binaries, installed locally, and played. This is a great option if you would like to manage multiple databases (e.g., across multiple Docker environments). ... Alternatively ... If it is for this single project and downloading yet another (native app) binary is undesirable, why not just include it as a service within Docker itself!? With that added, our docker-compose.yml now looks like this: docker-compose.yml: version: '3.9' services: pgadmin: image: dpage/pgadmin4 ports: - ${PGADMIN_LISTEN_PORT}:${PGADMIN_LISTEN_PORT} environment: PGADMIN_LISTEN_PORT: ${PGADMIN_LISTEN_PORT} PGADMIN_DEFAULT_EMAIL: ${PGADMIN_DEFAULT_EMAIL} PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_DEFAULT_PASSWORD} volumes: - ./.pgadmin:/var/lib/pgadmin postgres: image: postgres:15.3 environment: POSTGRES_DB: ${POSTGRES_DB} POSTGRES_USER: ${POSTGRES_USER} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} volumes: - ./.pgdata:/var/lib/postgresql/data ports: - 5432:5432 healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 5s retries: 5 The accompanying environment variables are in our .env file: .env: POSTGRES_DB=onehubdb POSTGRES_USER=postgres POSTGRES_PASSWORD=docker ONEHUB_DB_ENDOINT=postgres://postgres:docker@postgres:5432/onehubdb PGADMIN_LISTEN_PORT=5480 PGADMIN_DEFAULT_EMAIL=admin@onehub.com PGADMIN_DEFAULT_PASSWORD=password Now you can simply visit the pgAdmin web console on your browser. Use the email and password specified in the .env file and off you go! To connect to the Postgres instance running in the Docker environment, simply create a connection to postgres (NOTE: container local DNS names within the Docker environment are the service names themselves). On the left-side Object Explorer panel, (right) click on Servers >> Register >> Server... and give a name to your server ("postgres"). In the Connection tab, use the hostname "postgres" and set the names of the database, username, and password as set in the .env file for the POSTGRES_DB, POSTGRES_USER, and POSTGRES_PASSWORD variables respectively. Click Save, and off you go! Introducing Object Relational Mappers (ORMs) Before we start updating our service code to access the database, you may be wondering why the gRPC service itself is not packaged in our docker-compose.yml file. Without this, we would still have to start our service from the command line (or a debugger). This will be detailed in a future post. In a typical database, initialization (after the user and DB setup) would entail creating and running SQL scripts to create tables, checking for new versions, and so on. One example of a table creation statement (that can be executed via psql or pgadmin) is: CREATE TABLE topics ( id STRING NOT NULL PRIMARY KEY, created_at DATETIME DEFAULT CURRENT_TIMESTAMP, updated_at DATETIME DEFAULT CURRENT_TIMESTAMP, name STRING NOT NULL, users TEXT[], ); Similarly, an insertion would also have been manual construction of SQL statements, e.g.: INSERT INTO topics ( id, name ) VALUES ( "1", "Taylor Swift" ); ... followed by a verification of the saved results: select * from topics ; This can get pretty tedious (and error-prone with vulnerability to SQL injection attacks). SQL expertise is highly valuable but seldom feasible - especially being fluent with the different standards, different vendors, etc. Even though Postgres does a great job in being as standards-compliant as possible - for developers - some ease of use with databases is highly desirable. Here ORM libraries are indispensable, especially for developers not dealing with SQL on a regular basis (e.g., yours truly). ORM (Object Relational Mappers) provide an object-like interface to a relational database. This simplifies access to data in our tables (i.e., rows) as application-level classes (Data Access Objects). Table creations and migrations can also be managed by ORM libraries. Behind the scenes, ORM libraries are generating and executing SQL queries on the underlying databases they accessing. There are downsides to using an ORM: ORMs still incur a learning cost for developers during adoption. Interface design choices can play a role in impacting developer productivity. ORMs can be thought of as a schema compiler. The underlying SQL generated by them may not be straightforward or efficient. This results in ORM access to a database being slower than raw SQL, especially for complex queries. However, for complex queries or complex data pattern accesses, other scalability techniques may need to be applied (e.g., sharding, denormalization, etc.). The queries generated by ORMs may not be clear or straightforward, resulting in increased debugging times on slow or complex queries. Despite these downsides, ORMs can be put to good use when not overly relied upon. We shall use a popular ORM library, GORM. GORM comes with a great set of examples and documentation and the quick start is a great starting point. Create DB Models GORM models are our DB models. GORM models are simple Golang structs with struct tags on each member to identify the member's database type. Our User, Topic and Message models are simply this: Topic, Message, User Models package datastore import ( "time" "github.com/lib/pq" ) type BaseModel struct { CreatedAt time.Time UpdatedAt time.Time Id string `gorm:"primaryKey"` Version int // used for optimistic locking } type User struct { BaseModel Name string Avatar string ProfileData map[string]interface{} `gorm:"type:json"` } type Topic struct { BaseModel CreatorId string Name string `gorm:"index:SortedByName"` Users pq.StringArray `gorm:"type:text[]"` } type Message struct { BaseModel ParentId string TopicId string `gorm:"index:SortedByTopicAndCreation,priority:1"` CreatedAt time.Time `gorm:"index:SortedByTopicAndCreation,priority:2"` SourceId string UserId string ContentType string ContentText string ContentData map[string]interface{} `gorm:"type:json"` } Why are these models needed when we have already defined models in our .proto files? Recall that the models we use need to reflect the domain they are operating in. For example, our gRPC structs (in .proto files) reflect the models and programming models from the application's perspective. If/When we build a UI, view-models would reflect the UI/view perspectives (e.g., a FrontPage view model could be a merge of multiple data models). Similarly, when storing data in a database, the models need to convey intent and type information that can be understood and processed by the database. This is why GORM expects data models to have annotations on its (struct) member variables to convey database-specific information like column types, index definitions, index column orderings, etc. A good example of this in our data model is the SortByTopicAndCreation index (which, as the name suggests, helps us list topics sorted by their creation timestamp). Database indexes are one or more (re)organizations of data in a database that speed up retrievals of certain queries (at the cost of increased write times and storage space). We won't go into indexes deeply. There are fantastic resources that offer a deep dive into the various internals of database systems in great detail (and would be highly recommended). The increased writes and storage space must be considered when creating more indexes in a database. We have (in our service) been mindful about creating more indexes and kept these to the bare minimum (to suit certain types of queries). As we scale our services (in future posts) we will revisit how to address these costs by exploring asynchronous and distributed index-building techniques. Data Access Layer Conventions We now have DB models. We could at this point directly call the GORM APIs from our service implementation to read and write data from our (Postgres) database; but first, a brief detail on the conventions we have decided to choose. Motivations Database use can be thought of as being in two extreme spectrums: On the one hand, a "database" can be treated as a better filesystem with objects written by some key to prevent data loss. Any structure, consistency guarantees, optimization, or indexes are fully the responsibility of the application layer. This gets very complicated, error-prone, and hard very fast. On the other extreme, use the database engine as the undisputed brain (the kitchen sink) of your application. Every data access for every view in your application is offered (only) by one or very few (possibly complex) queries. This view, while localizing data access in a single place, also makes the database a bottleneck and its scalability daunting. In reality, vertical scaling (provisioning beefier machines) is the easiest, but most expensive solution - which most vendors will happily recommend in such cases. Horizontal scaling (getting more machines) is hard as increased data coupling and probabilities of node failures (network partitions) mean more complicated and careful tradeoffs between consistency and availability. Our sweet spot is somewhere in between. While ORMs (like GORM) provide an almost 1:1 interface compatibility between SQL and the application needs, being judicious with SQL remains advantageous and should be based on the (data and operational) needs of the application. For our chat application, some desirable (data) traits are: Messages from users must not be lost (durability). Ordering of messages is important (within a topic). Few standard query types: CRUD on Users, Topics, and Messages Message ordering by timestamp but limited to either within a topic or by a user (for last N messages) Given our data "shapes" are simple and given the read usage of our system is much higher especially given the read/write application (i.e .,1 message posted is read by many participants on a Topic), we are choosing to optimize for write consistency, simplicity and read availability, within a reasonable latency). Now we are ready to look at the query patterns/conventions. Unified Database Object First, we will add a simple data access layer that will encapsulate all the calls to the database for each particular model (topic, messages, users). Let us create an overarching "DB" object that represents our Postgres DB (in db/db.go): type OneHubDB struct { storage *gorm.DB } This tells GORM that we have a database object (possibly with a connection) to the underlying DB. The Topic Store, User Store, and Message Store modules all operate on this single DB instance (via GORM) to read/write data from their respective tables (topics, users, messages). Note that this is just one possible convention. We could have instead used three different DB (gorm.DB) instances, one for each entity type: e.g., TopicDB, UserDB, and MessageDB. Use Custom IDs Instead of Auto-Generated Ones We are choosing to generate our own primary key (IDs) for topics, users, and messages instead of depending on the auto-increment (or auto-id) generation by the database engine. This was for the following reasons: An auto-generated key is localized to the database instance that generates it. This means if/when we add more partitions to our databases (for horizontal scaling) these keys will need to be synchronized and migrating existing keys to avoid duplications at a global level is much harder. Auto increment keys offer reduced randomness, making it easy for attackers to "iterate" through all entities. Sometimes we may simply want string keys that are custom assignable if they are available (for SEO purposes). Lack of attribution to keys (e.g., a central/global key server can also allow attribution/annotation to keys for analytics purposes). For these purposes, we have added a GenId table that keeps track of all used IDs so we can perform collision detection, etc: type GenId struct { Class string `gorm:"primaryKey"` Id string `gorm:"primaryKey"` CreatedAt time.Time } Naturally, this is not a scalable solution when the data volume is large, but suffices for our demo and when needed, we can move this table to a different DB and still preserve the keys/IDs. Note that GenId itself is also managed by GORM and uses a combination of Class + Id as its primary key. An example of this is Class=Topic and Id=123. Random IDs are assigned by the application in a simple manner: func randid(maxlen int) string { max_id := int64(math.Pow(36, maxlen)) randval := rand.Int63() % max_id return strconv.FormatInt(randval, 36) } func (tdb *OneHubDB) NextId(cls string) string { for { gid := GenId{Id: randid(), Class: cls, CreatedAt: time.Now()} err := tdb.storage.Create(gid).Error log.Println("ID Create Error: ", err) if err == nil { return gid.Id } } } The method randid generates a maxlen-sized string of random characters. This is as simple as (2^63) mod maxid where maxid = 36 ^ maxlen. The NextId method is used by the different entity create methods (below) to repeatedly generate random IDs if collisions exist. In case you are worried about excessive collisions or are interested in understanding their probabilities, you can learn about them here. Judicious Use of Indexes Indexes are very beneficial to speed up certain data retrieval operations at the expense of increased writes and storage. We have limited our use of indexes to a very handful of cases where strong consistency was needed (and could be scaled easily): Topics sorted by name (for an alphabetical sorting of topics) Messages sorted by the topic and creation time stamps (for the message list natural ordering) What is the impact of this on our application? Let us find out. Topic Creations and Indexes When a topic is created (or it is updated) an index write would be required. Topic creations/updates are relatively low-frequency operations (compared to message postings). So a slightly increased write latency is acceptable. In a more realistic chat application, a topic creation is a bit more heavyweight due to the need to check permissions, apply compliance rules, etc. So this latency hit is acceptable. Furthermore, this index would only be needed when "searching" for topics and even an asynchronous index update would have sufficed. Message Related Indexes To consider the usefulness of indexes related to messages, let us look at some usage numbers. This is a very simple application, so these scalability issues most likely won't be a concern (so feel free to skip this section). If your goals are a bit more lofty, looking at Slack's usage numbers we can estimate/project some usage numbers for our own demo to make it interesting: Number of daily active topics: 100 Number of active users per topic: 10 Message sent by an active user in a topic: Every 5 minutes (assume time to type, read other messages, research, think, etc.) Thus, the number of messages created each day is: = 100 * 10 * (1400 minutes in a day / 5 minutes)= 280k messages per day~ 3 messages per second In the context of these numbers, if we were to create a message every 3 seconds, even with an extra index (or three), we can handle this load comfortably in a typical database that can handle 10k IOPS, which is rather modest. It is easy to wonder if this scales as the number of topics or active users per topic or the creation frenzy increases. Let us consider a more intense setup (in a larger or busier organization). Instead of the numbers above, if we had 10k topics and 100 active users with a message every minute (instead of 5 minutes), our write QPS would be: WriteQPS: = 10000 * 100 * 1400 / 1= 1.4B messages per day~ 14k messages per second That is quite a considerable blow-up. We can solve this in a couple of ways: Accept a higher latency on writes - For example, instead of requiring a write to happen in a few milliseconds, accept an SLO of, say, 500ms. Update indexes asynchronously - This doesn't get us that much further, as the number of writes in a system has not changed - only the when has changed. Shard our data Let us look at sharding! Our write QPS is in aggregate. On a per-topic level, it is quite low (14k/10000 = 1.4 qps). However, user behavior for our application is that such activities on a topic are fairly isolated. We only want our messages to be consistent and ordered within a topic - not globally. We now have the opportunity to dynamically scale our databases (or the Messages tables) to be partitioned by topic IDs. In fact, we could build a layer (a control plane) that dynamically spins up database shards and moves topics around reacting to load as and when needed. We will not go that extreme here, but this series is tending towards just that especially in the context of SaaS applications. The _annoyed_ reader might be wondering if this deep dive was needed right now! Perhaps not - but by understanding our data and user experience needs, we can make careful tradeoffs. Going forward, such mini-dives will benefit us immensely to quickly evaluate tradeoffs (e.g., when building/adding new features). Store Specific Implementations Now that we have our basic DB and common methods, we can go to each of the entity methods' implementations. For each of our entity methods, we will create the basic CRUD methods: Create Update Get Delete List/Search The Create and Update methods are combined into a single "Save" method to do the following: If an ID is not provided then treat it as a create. If an ID is provided treat it as an update-or-insert (upsert) operation by using the NextId method if necessary. Since we have a base model, Create and Update will set CreatedAt and UpdatedAt fields respectively. The delete method is straightforward. The only key thing here is instead of leveraging GORM's cascading delete capabilities, we also delete the related entities in a separate call. We will not worry about consistency issues resulting from this (e.g., errors in subsequent delete methods). For the Get method, we will fetch using a standard GORM get-query-pattern based on a common id column we use for all models. If an entity does not exist, then we return a nil. Users DB Our user entity methods are pretty straightforward using the above conventions. The Delete method additionally also deletes all Messages for/by the user first before deleting the user itself. This ordering is to ensure that if the deletion of topics fails, then the user deletion won't proceed giving the caller to retry. package datastore import ( "errors" "log" "strings" "time" "gorm.io/gorm" ) func (tdb *OneHubDB) SaveUser(topic *User) (err error) { db := tdb.storage topic.UpdatedAt = time.Now() if strings.Trim(topic.Id, " ") == "" { return InvalidIDError // create a new one } result := db.Save(topic) err = result.Error if err == nil && result.RowsAffected == 0 { topic.CreatedAt = time.Now() err = tdb.storage.Create(topic).Error } return } func (tdb *OneHubDB) DeleteUser(topicId string) (err error) { err = tdb.storage.Where("topic_id = ?", topicId).Delete(&Message{}).Error if err == nil { err = tdb.storage.Where("id = ?", topicId).Delete(&User{}).Error } return } func (tdb *OneHubDB) GetUser(id string) (*User, error) { var out User err := tdb.storage.First(&out, "id = ?", id).Error if err != nil { log.Println("GetUser Error: ", id, err) if errors.Is(err, gorm.ErrRecordNotFound) { return nil, nil } else { return nil, err } } return &out, err } func (tdb *OneHubDB) ListUsers(pageKey string, pageSize int) (out []*User, err error) { query := tdb.storage.Model(&User{}).Order("name asc") if pageKey != "" { count := 0 query = query.Offset(count) } if pageSize <= 0 || pageSize > tdb.MaxPageSize { pageSize = tdb.MaxPageSize } query = query.Limit(pageSize) err = query.Find(&out).Error return out, err } Topics DB Our topic entity methods are also pretty straightforward using the above conventions. The Delete method additionally also deletes all messages in the topic first before deleting the user itself. This ordering is to ensure that if the deletion of topics fails then the user deletion won't proceed giving the caller a chance to retry. Topic entity methods: package datastore import ( "errors" "log" "strings" "time" "gorm.io/gorm" ) /////////////////////// Topic DB func (tdb *OneHubDB) SaveTopic(topic *Topic) (err error) { db := tdb.storage topic.UpdatedAt = time.Now() if strings.Trim(topic.Id, " ") == "" { return InvalidIDError // create a new one } result := db.Save(topic) err = result.Error if err == nil && result.RowsAffected == 0 { topic.CreatedAt = time.Now() err = tdb.storage.Create(topic).Error } return } func (tdb *OneHubDB) DeleteTopic(topicId string) (err error) { err = tdb.storage.Where("topic_id = ?", topicId).Delete(&Message{}).Error if err == nil { err = tdb.storage.Where("id = ?", topicId).Delete(&Topic{}).Error } return } func (tdb *OneHubDB) GetTopic(id string) (*Topic, error) { var out Topic err := tdb.storage.First(&out, "id = ?", id).Error if err != nil { log.Println("GetTopic Error: ", id, err) if errors.Is(err, gorm.ErrRecordNotFound) { return nil, nil } else { return nil, err } } return &out, err } func (tdb *OneHubDB) ListTopics(pageKey string, pageSize int) (out []*Topic, err error) { query := tdb.storage.Model(&Topic{}).Order("name asc") if pageKey != "" { count := 0 query = query.Offset(count) } if pageSize <= 0 || pageSize > tdb.MaxPageSize { pageSize = tdb.MaxPageSize } query = query.Limit(pageSize) err = query.Find(&out).Error return out, err } Messages DB Message entity methods: package datastore import ( "errors" "strings" "time" "gorm.io/gorm" ) func (tdb *OneHubDB) GetMessages(topic_id string, user_id string, pageKey string, pageSize int) (out []*Message, err error) { user_id = strings.Trim(user_id, " ") topic_id = strings.Trim(topic_id, " ") if user_id == "" && topic_id == "" { return nil, errors.New("Either topic_id or user_id or both must be provided") } query := tdb.storage if topic_id != "" { query = query.Where("topic_id = ?", topic_id) } if user_id != "" { query = query.Where("user_id = ?", user_id) } if pageKey != "" { offset := 0 query = query.Offset(offset) } if pageSize <= 0 || pageSize > 10000 { pageSize = 10000 } query = query.Limit(pageSize) err = query.Find(&out).Error return out, err } // Get messages in a topic paginated and ordered by creation time stamp func (tdb *OneHubDB) ListMessagesInTopic(topic_id string, pageKey string, pageSize int) (out []*Topic, err error) { err = tdb.storage.Where("topic_id= ?", topic_id).Find(&out).Error return } func (tdb *OneHubDB) GetMessage(msgid string) (*Message, error) { var out Message err := tdb.storage.First(&out, "id = ?", msgid).Error if err != nil { if errors.Is(err, gorm.ErrRecordNotFound) { return nil, nil } else { return nil, err } } return &out, err } func (tdb *OneHubDB) ListMessages(topic_id string, pageKey string, pageSize int) (out []*Message, err error) { query := tdb.storage.Where("topic_id = ?").Order("created_at asc") if pageKey != "" { count := 0 query = query.Offset(count) } if pageSize <= 0 || pageSize > tdb.MaxPageSize { pageSize = tdb.MaxPageSize } query = query.Limit(pageSize) err = query.Find(&out).Error return out, err } func (tdb *OneHubDB) CreateMessage(msg *Message) (err error) { msg.CreatedAt = time.Now() msg.UpdatedAt = time.Now() result := tdb.storage.Model(&Message{}).Create(msg) err = result.Error return } func (tdb *OneHubDB) DeleteMessage(msgId string) (err error) { err = tdb.storage.Where("id = ?", msgId).Delete(&Message{}).Error return } func (tdb *OneHubDB) SaveMessage(msg *Message) (err error) { db := tdb.storage q := db.Model(msg).Where("id = ? and version = ?", msg.Id, msg.Version) msg.UpdatedAt = time.Now() result := q.UpdateColumns(map[string]interface{}{ "updated_at": msg.UpdatedAt, "content_type": msg.ContentType, "content_text": msg.ContentText, "content_data": msg.ContentData, "user_id": msg.SourceId, "source_id": msg.SourceId, "parent_id": msg.ParentId, "version": msg.Version + 1, }) err = result.Error if err == nil && result.RowsAffected == 0 { // Must have failed due to versioning err = MessageUpdateFailed } return } The Messages entity methods are slightly more involved. Unlike the other two, Messages entity methods also include Searching by Topic and Searching by User (for ease). This is done in the GetMessages method that provides paginated (and ordered) retrieval of messages for a topic or by a user. Write Converters To/From Service/DB Models We are almost there. Our database is ready to read/write data. It just needs to be invoked by the service. Going back to our original plan: |---------------| |-----------| |--------| |------| | Request Proto | <-> | Service | <-> | GORM | <-> | DB | |---------------| |-----------| |--------| |------| We have our service models (generated by protobuf tools) and we have our DB models that GORM understands. We will now add converters to convert between the two. Converters for entity X will follow these conventions: A method XToProto of type func(input *datastore.X) (out *protos.X) A method XFromProto of type func(input *protos.X) (out *datastore.X) With that one of our converters (for Topics) is quite simply (and boringly): package services import ( "log" "github.com/lib/pq" ds "github.com/panyam/onehub/datastore" protos "github.com/panyam/onehub/gen/go/onehub/v1" "google.golang.org/protobuf/types/known/structpb" tspb "google.golang.org/protobuf/types/known/timestamppb" ) func TopicToProto(input *ds.Topic) (out *protos.Topic) { var userIds map[string]bool = make(map[string]bool) for _, userId := range input.Users { userIds[userId] = true } out = &protos.Topic{ CreatedAt: tspb.New(input.BaseModel.CreatedAt), UpdatedAt: tspb.New(input.BaseModel.UpdatedAt), Name: input.Name, Id: input.BaseModel.Id, CreatorId: input.CreatorId, Users: userIds, } return } func TopicFromProto(input *protos.Topic) (out *ds.Topic) { out = &ds.Topic{ BaseModel: ds.BaseModel{ CreatedAt: input.CreatedAt.AsTime(), UpdatedAt: input.UpdatedAt.AsTime(), Id: input.Id, }, Name: input.Name, CreatorId: input.CreatorId, } if input.Users != nil { var userIds []string for userId := range input.Users { userIds = append(userIds, userId) } out.Users = pq.StringArray(userIds) } return } The full set of converters can be found here - Service/DB Models Converters. Hook Up the Converters in the Service Definitions Our last step is to invoke the converters above in the service implementation. The methods are pretty straightforward. For example, for the TopicService we have: CreateTopic During creation we allow custom IDs to be passed in. If an entity with the ID exists the request is rejected. If an ID is not passed in, a random one is assigned. Creator and Name parameters are required fields. The topic is converted to a "DBTopic" model and saved by calling the SaveTopic method. UpdateTopic All our Update<Entity> methods follow a similar pattern: Fetch the existing entity (by ID) from the DB. Update the entity fields based on fields marked in the update_mask (so patches are allowed). Update with any extra entity-specific operations (e.g., AddUsers, RemoveUsers, etc.) - these are just for convenience so the caller would not have to provide an entire "final" users list each time. Convert the updated proto to a "DB Model." Call SaveTopic on the DB. SaveTopic uses the "version" field in our DB to perform an optimistically concurrent write. This ensures that by the time the model is loaded and it is being written, a write by another request/thread will not be overwritten. The Delete, List and Get methods are fairly straightforward. The UserService and MessageService also are implemented in a very similar way with minor differences to suit specific requirements. Testing It All Out We have a database up and running (go ahead and start it with docker compose up). We have converters to/from service and database models. We have implemented our service code to access the database. We just need to connect to this (running) database and pass a connection object to our services in our runner binary (cmd/server.go): Add an extra flag to accept a path to the DB. This can be used to change the DB path if needed. var ( addr = flag.String("addr", ":9000", "Address to start the onehub grpc server on.") gw_addr = flag.String("gw_addr", ":8080", "Address to start the grpc gateway server on.") db_endpoint = flag.String("db_endpoint", "", fmt.Sprintf("Endpoint of DB where all topics/messages state are persisted. Default value: ONEHUB_DB_ENDPOINT environment variable or %s", DEFAULT_DB_ENDPOINT)) ) Create *gorm.DB instance from the db_endpoint value. We have already created a little utility method for opening a GORM-compatible SQL DB given an address: cmd/utils/db.go: package utils import ( // "github.com/panyam/goutils/utils" "log" "strings" "github.com/panyam/goutils/utils" "gorm.io/driver/postgres" "gorm.io/driver/sqlite" "gorm.io/gorm" ) func OpenDB(db_endpoint string) (db *gorm.DB, err error) { log.Println("Connecting to DB: ", db_endpoint) if strings.HasPrefix(db_endpoint, "sqlite://") { dbpath := utils.ExpandUserPath((db_endpoint)[len("sqlite://"):]) db, err = gorm.Open(sqlite.Open(dbpath), &gorm.Config{}) } else if strings.HasPrefix(db_endpoint, "postgres://") { db, err = gorm.Open(postgres.Open(db_endpoint), &gorm.Config{}) } if err != nil { log.Println("Cannot connect DB: ", db_endpoint, err) } else { log.Println("Successfully connected DB: ", db_endpoint) } return } Now let us create the method OpenOHDB, which is a simple wrapper that also checks for a db_endpoint value from an environment variable (if it is not provided) and subsequently opens a gorm.DB instance needed for a OneHubDB instance: func OpenOHDB() *ds.OneHubDB { if *db_endpoint == "" { *db_endpoint = cmdutils.GetEnvOrDefault("ONEHUB_DB_ENDPOINT", DEFAULT_DB_ENDPOINT) } db, err := cmdutils.OpenDB(*db_endpoint) if err != nil { log.Fatal(err) panic(err) } return ds.NewOneHubDB(db) } With the above two, we need a simple change to our main method: func main() { flag.Parse() ohdb := OpenOHDB() go startGRPCServer(*addr, ohdb) startGatewayServer(*gw_addr, *addr) } Now we shall also pass the ohdb instance to the GRPC service creation methods. And we are ready to test our durability! Remember we set up auth in a previous part, so we need to pass login credentials, albeit fake ones (where password = login + "123"). Create a Topic curl localhost:8080/v1/topics -u auser:auser123 | json_pp { "nextPageKey" : "", "topics" : [] } That's right. We do not have any topics yet so let us create some. curl -X POST localhost:8080/v1/topics \ -u auser:auser123 \ -H 'Content-Type: application/json' \ -d '{"topic": {"name": "First Topic"}' | json_pp Yielding: { "topic" : { "createdAt" : "1970-01-01T00:00:00Z", "creatorId" : "auser", "id" : "q43u", "name" : "First Topic", "updatedAt" : "2023-08-04T08:14:56.413050Z", "users" : {} } } Let us create a couple more: curl -X POST localhost:8080/v1/topics \ -u auser:auser123 \ -H 'Content-Type: application/json' \ -d '{"topic": {"name": "First Topic", "id": "1"}' | json_pp curl -X POST localhost:8080/v1/topics \ -u auser:auser123 \ -H 'Content-Type: application/json' \ -d '{"topic": {"name": "Second Topic", "id": "2"}' | json_pp curl -X POST localhost:8080/v1/topics \ -u auser:auser123 \ -H 'Content-Type: application/json' \ -d '{"topic": {"name": "Third Topic", "id": "3"}' | json_pp With a list query returning: { "nextPageKey" : "", "topics" : [ { "createdAt" : "1970-01-01T00:00:00Z", "creatorId" : "auser", "id" : "q43u", "name" : "First Topic", "updatedAt" : "2023-08-04T08:14:56.413050Z", "users" : {} }, { "createdAt" : "1970-01-01T00:00:00Z", "creatorId" : "auser", "id" : "dejc", "name" : "Second Topic", "updatedAt" : "2023-08-05T06:52:33.923076Z", "users" : {} }, { "createdAt" : "1970-01-01T00:00:00Z", "creatorId" : "auser", "id" : "zuoz", "name" : "Third Topic", "updatedAt" : "2023-08-05T06:52:35.100552Z", "users" : {} } ] } Get Topic by ID We can do a listing as in the previous section. We can also obtain individual topics: curl localhost:8080/v1/topics/q43u -u auser:auser123 | json_pp { "topic" : { "createdAt" : "1970-01-01T00:00:00Z", "creatorId" : "auser", "id" : "q43u", "name" : "First Topic", "updatedAt" : "2023-08-04T08:14:56.413050Z", "users" : {} } } Send and List Messages on a Topic Let us send a few messages on the "First Topic" (id = "q43u"): curl -X POST localhost:8080/v1/topics/q43u/messages -u 'auser:auser123' -H 'Content-Type: application/json' -d '{"message": {"content_text": "Message 1"}' curl -X POST localhost:8080/v1/topics/q43u/messages -u 'auser:auser123' -H 'Content-Type: application/json' -d '{"message": {"content_text": "Message 2"}' curl -X POST localhost:8080/v1/topics/q43u/messages -u 'auser:auser123' -H 'Content-Type: application/json' -d '{"message": {"content_text": "Message 3"}' Now to list them: curl localhost:8080/v1/topics/q43u/messages -u 'auser:auser123' | json_pp { "messages" : [ { "contentData" : null, "contentText" : "Message 1", "contentType" : "", "createdAt" : "0001-01-01T00:00:00Z", "id" : "hlso", "topicId" : "q43u", "updatedAt" : "2023-08-07T05:00:36.547072Z", "userId" : "auser" }, { "contentData" : null, "contentText" : "Message 2", "contentType" : "", "createdAt" : "0001-01-01T00:00:00Z", "id" : "t3lr", "topicId" : "q43u", "updatedAt" : "2023-08-07T05:00:39.504294Z", "userId" : "auser" }, { "contentData" : null, "contentText" : "Message 3", "contentType" : "", "createdAt" : "0001-01-01T00:00:00Z", "id" : "8ohi", "topicId" : "q43u", "updatedAt" : "2023-08-07T05:00:42.598521Z", "userId" : "auser" } ], "nextPageKey" : "" } Conclusion Who would have thought setting up and using a database would have been such a meaty topic? We covered a lot of ground here that will both give us a good "functioning" service as well as a foundation when implementing new ideas in the future: We chose a relational database - Postgres - for its strong modeling capabilities, consistency guarantees, performance, and versatility. We also chose an ORM library (GORM) to improve our velocity and portability if we need to switch to another relational data store. We wrote data models that GORM could use to read/write from the database. We eased the setup by hosting both Postgres and its admin UI (pgAdmin) in a Docker Compose file. We decided to use GORM carefully and judiciously to balance velocity with minimal reliance on complex queries. We discussed some conventions that will help us along in our application design and extensions. We also addressed a way to assess, analyze, and address scalability challenges as they might arise and use that to guide our tradeoff decisions (e.g., type and number of indexes, etc). We wrote converter methods to convert between service and data models. We finally used the converters in our service to offer a "real" persistent implementation of a chat service where messages can be posted and read. Now that we have a "minimum usable app," there are a lot of useful features to add to our service and make it more and more realistic (and hopefully production-ready). Take a breather and see you soon in continuing the exciting adventure! In the next post, we will look at also including our main binary (with gRPC service and REST Gateways) in the Docker Compose environment without sacrificing hot reloading and debugging.
In the rapidly evolving landscape of cloud computing, deploying Docker images across multiple Amazon Web Services (AWS) accounts presents a unique set of challenges and opportunities for organizations aiming for scalability and security. According to the State of DevOps Report 2022, 50% of DevOps adopters are recognized as elite or high-performing organizations. This guide offers a comprehensive blueprint for leveraging AWS services—such as ECS, CodePipeline, and CodeDeploy — combined with the robust Blue/Green deployment strategy, to facilitate seamless Docker deployments. It also emphasizes employing best security practices within a framework designed to streamline and secure deployments across AWS accounts. By integrating CloudFormation with a cross-account deployment strategy, organizations can achieve an unparalleled level of control and efficiency, ensuring that their infrastructure remains both robust and flexible. Proposed Architecture The architecture diagram showcases a robust AWS deployment model that bridges the gap between development and production environments through a series of orchestrated services. It outlines how application code transitions from the development stage, facilitated by AWS CodeCommit, through a testing phase, and ultimately to production. This system uses AWS CodePipeline for continuous integration and delivery, leverages Amazon ECR for container image storage, and employs ECS with Fargate for container orchestration. It provides a clear, high-level view of the path an application takes from code commit to user delivery. Prerequisites To successfully implement the described infrastructure for deploying Docker images on Amazon ECS with a multi-account CodePipeline and Blue/Green deployment strategy, several prerequisites are necessary. Here are the key prerequisites: Create three separate AWS accounts: Development, Test, and Production. Install and configure the AWS Command Line Interface (CLI) and relevant AWS SDKs for scripting and automation. Fork the aws-cicd-cross-account-deployment GitHub repo and add all the files to your CodeCommit. Environment Setup This guide leverages a comprehensive suite of AWS services and tools, meticulously orchestrated to facilitate the seamless deployment of Docker images on Amazon Elastic Container Service (ECS) across multiple AWS accounts. Before we start setting up the environment, use this code repo for the relevant files mentioned in the steps below. 1. IAM Roles and Permissions IAM roles: Create IAM roles required for the deployment process. Use cross-account.yaml template in CloudFormation to create cross-account IAM roles in Test and Production accounts, allowing necessary permissions for cross-account interactions. YAML AWSTemplateFormatVersion: "2010-09-09" Parameters: CodeDeployRoleInThisAccount: Type: CommaDelimitedList Description: Names of existing Roles you want to add to the newly created Managed Policy DevelopmentAccCodePipelinKMSKeyARN: Type: String Description: ARN of the KMS key from the Development/Global Resource Account DevelopmentAccCodePipelineS3BucketARN: Type: String Description: ARN of the S3 Bucket used by CodePipeline in the Development/Global Resource Account DevelopmentAccNumber: Type: String Description: Account Number of the Development Resources Account Resources: CrossAccountAccessRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: AWS: - !Join [ ":", [ "arn","aws","iam:",!Ref DevelopmentAccNumber,"root" ] ] Service: - codedeploy.amazonaws.com - codebuild.amazonaws.com Action: - 'sts:AssumeRole' Policies: - PolicyName: CrossAccountServiceAccess PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - 's3:List*' - 's3:Get*' - 's3:Describe*' Resource: '*' - Effect: Allow Action: - 's3:*' Resource: !Ref DevelopmentAccCodePipelineS3BucketARN - Effect: Allow Action: - 'codedeploy:*' - 'codebuild:*' - 'sns:*' - 'cloudwatch:*' - 'codestar-notifications:*' - 'chatbot:*' - 'ecs:*' - 'ecr:*' - 'codedeploy:Batch*' - 'codedeploy:Get*' - 'codedeploy:List*' Resource: '*' - Effect: Allow Action: - 'codedeploy:Batch*' - 'codedeploy:Get*' - 'codedeploy:List*' - 'kms:*' - 'codedeploy:CreateDeployment' - 'codedeploy:GetDeployment' - 'codedeploy:GetDeploymentConfig' - 'codedeploy:GetApplicationRevision' - 'codedeploy:RegisterApplicationRevision' Resource: '*' - Effect: Allow Action: - 'iam:PassRole' Resource: '*' Condition: StringLike: 'iam:PassedToService': ecs-tasks.amazonaws.com KMSAccessPolicy: Type: 'AWS::IAM::ManagedPolicy' Properties: PolicyDocument: Version: '2012-10-17' Statement: - Sid: AllowThisRoleToAccessKMSKeyFromOtherAccount Effect: Allow Action: - 'kms:DescribeKey' - 'kms:GenerateDataKey*' - 'kms:Encrypt' - 'kms:ReEncrypt*' - 'kms:Decrypt' Resource: !Ref DevelopmentAccCodePipelinKMSKeyARN Roles: !Ref CodeDeployRoleInThisAccount S3BucketAccessPolicy: Type: 'AWS::IAM::ManagedPolicy' Properties: PolicyDocument: Version: '2012-10-17' Statement: - Sid: AllowThisRoleToAccessS3inOtherAccount Effect: Allow Action: - 's3:Get*' Resource: !Ref DevelopmentAccCodePipelineS3BucketARN Effect: Allow Action: - 's3:ListBucket' Resource: !Ref DevelopmentAccCodePipelineS3BucketARN Roles: !Ref CodeDeployRoleInThisAccount 2. CodePipeline Configuration Stages and actions: Configure CodePipeline actions for source, build, and deploy stages by running the pipeline.yaml in CloudFormation. Source repository: Use CodeCommit as the source repository for all the files. Add all the files from the demo-app GitHub folder to the repository. 3. Networking Setup VPC Configuration: Utilize the vpc.yaml CloudFormation template to set up the VPC. Define subnets for different purposes, such as public and private. YAML Description: This template deploys a VPC, with a pair of public and private subnets spread across two Availability Zones. It deploys an internet gateway, with a default route on the public subnets. It deploys a pair of NAT gateways (one in each AZ), and default routes for them in the private subnets. Parameters: EnvVar: Description: An environment name that is prefixed to resource names Type: String VpcCIDR: #Description: Please enter the IP range (CIDR notation) for this VPC Type: String PublicSubnet1CIDR: Description: Please enter the IP range (CIDR notation) for the public subnet in the first Availability Zone Type: String PublicSubnet2CIDR: Description: Please enter the IP range (CIDR notation) for the public subnet in the second Availability Zone Type: String PrivateSubnet1CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the first Availability Zone Type: String PrivateSubnet2CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String DBSubnet1CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the first Availability Zone Type: String DBSubnet2CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String vpcname: #Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String Resources: VPC: Type: AWS::EC2::VPC Properties: CidrBlock: !Ref VpcCIDR EnableDnsSupport: true EnableDnsHostnames: true Tags: - Key: Name Value: !Ref vpcname InternetGateway: Type: AWS::EC2::InternetGateway Properties: Tags: - Key: Name Value: !Ref EnvVar InternetGatewayAttachment: Type: AWS::EC2::VPCGatewayAttachment Properties: InternetGatewayId: !Ref InternetGateway VpcId: !Ref VPC PublicSubnet1: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 0, !GetAZs '' ] CidrBlock: !Ref PublicSubnet1CIDR MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${EnvVar} Public Subnet (AZ1) PublicSubnet2: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 1, !GetAZs '' ] CidrBlock: !Ref PublicSubnet2CIDR MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${EnvVar} Public Subnet (AZ2) PrivateSubnet1: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 0, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet1CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvVar} Private Subnet (AZ1) PrivateSubnet2: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 1, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet2CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvVar} Private Subnet (AZ2) DBSubnet1: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 0, !GetAZs '' ] CidrBlock: !Ref DBSubnet1CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvVar} DB Subnet (AZ1) DBSubnet2: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 1, !GetAZs '' ] CidrBlock: !Ref DBSubnet2CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvVar} DB Subnet (AZ2) NatGateway1EIP: Type: AWS::EC2::EIP DependsOn: InternetGatewayAttachment Properties: Domain: vpc NatGateway2EIP: Type: AWS::EC2::EIP DependsOn: InternetGatewayAttachment Properties: Domain: vpc NatGateway1: Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt NatGateway1EIP.AllocationId SubnetId: !Ref PublicSubnet1 NatGateway2: Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt NatGateway2EIP.AllocationId SubnetId: !Ref PublicSubnet2 PublicRouteTable: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvVar} Public Routes DefaultPublicRoute: Type: AWS::EC2::Route DependsOn: InternetGatewayAttachment Properties: RouteTableId: !Ref PublicRouteTable DestinationCidrBlock: 0.0.0.0/0 GatewayId: !Ref InternetGateway PublicSubnet1RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PublicRouteTable SubnetId: !Ref PublicSubnet1 PublicSubnet2RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PublicRouteTable SubnetId: !Ref PublicSubnet2 PrivateRouteTable1: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvVar} Private Routes (AZ1) DefaultPrivateRoute1: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable1 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway1 PrivateSubnet1RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable1 SubnetId: !Ref PrivateSubnet1 PrivateRouteTable2: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvVar} Private Routes (AZ2) DefaultPrivateRoute2: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable2 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway2 PrivateSubnet2RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable2 SubnetId: !Ref PrivateSubnet2 NoIngressSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupName: "no-ingress-sg" GroupDescription: "Security group with no ingress rule" VpcId: !Ref VPC Outputs: VPC: Description: A reference to the created VPC Value: !Ref VPC PublicSubnets: Description: A list of the public subnets Value: !Join [ ",", [ !Ref PublicSubnet1, !Ref PublicSubnet2 ]] PrivateSubnets: Description: A list of the private subnets Value: !Join [ ",", [ !Ref PrivateSubnet1, !Ref PrivateSubnet2 ]] PublicSubnet1: Description: A reference to the public subnet in the 1st Availability Zone Value: !Ref PublicSubnet1 PublicSubnet2: Description: A reference to the public subnet in the 2nd Availability Zone Value: !Ref PublicSubnet2 PrivateSubnet1: Description: A reference to the private subnet in the 1st Availability Zone Value: !Ref PrivateSubnet1 PrivateSubnet2: Description: A reference to the private subnet in the 2nd Availability Zone Value: !Ref PrivateSubnet2 NoIngressSecurityGroup: Description: Security group with no ingress rule Value: !Ref NoIngressSecurityGroup 4. ECS Cluster and Service Configuration ECS clusters: Create two ECS clusters: one in the Test account and one in the Production account. Service and task definitions: Create ECS services and task definitions in the Test Account using new-ecs-test-infra.yaml CloudFormation templates. YAML Parameters: privatesubnet1: Type: String privatesubnet2: Type: String Resources: ECSService: Type: AWS::ECS::Service # DependsOn: HTTPListener # DependsOn: HTTPSListener Properties: LaunchType: FARGATE Cluster: new-cluster DesiredCount: 0 TaskDefinition: new-taskdef-anycompany DeploymentController: Type: CODE_DEPLOY HealthCheckGracePeriodSeconds: 300 SchedulingStrategy: REPLICA NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: DISABLED Subnets: [!Ref privatesubnet1 , !Ref privatesubnet2] LoadBalancers: - TargetGroupArn: arn:aws:elasticloadbalancing:us-east-1:487269258483:targetgroup/TargetGroup1/6b75e9eb3289df56 ContainerPort: 80 ContainerName: anycompany-test Create ECS services and task definitions in the Test account using new-ecs-prod-infra.yaml CloudFormation templates. YAML Parameters: privatesubnet1: Type: String privatesubnet2: Type: String Resources: ECSService: Type: AWS::ECS::Service # DependsOn: HTTPListener # DependsOn: HTTPSListener Properties: LaunchType: FARGATE Cluster: new-cluster DesiredCount: 0 TaskDefinition: new-anycompany-prod DeploymentController: Type: CODE_DEPLOY HealthCheckGracePeriodSeconds: 300 SchedulingStrategy: REPLICA NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: DISABLED Subnets: [!Ref privatesubnet1 , !Ref privatesubnet2] LoadBalancers: - TargetGroupArn: arn:aws:elasticloadbalancing:us-east-1:608377680862:targetgroup/TargetGroup1/d18c87e013000697 ContainerPort: 80 ContainerName: anycompany-test 5. CodeDeploy Blue/Green Deployment CodeDeploy configuration: Configure CodeDeploy for Blue/Green deployments. Deployment groups: Create specific deployment groups for each environment. Deployment configurations: Configure deployment configurations based on your requirements. 6. Notification Setup (SNS) SNS configuration: Manually create an SNS topic for notifications during the deployment process. Notification content: Configure SNS to send notifications for manual approval steps in the deployment pipeline. Pipeline and Deployment 1. Source Stage CodePipeline starts with the source stage, pulling Docker images from the CodeCommit repository. 2. Build Stage The build stage involves building and packaging the Docker images and preparing them for deployment. 3. Deployment to Development Upon approval, the pipeline deploys the Docker images to the ECS cluster in the Development account using a Blue/Green deployment strategy. 4. Testing in Development The deployed application in the Development environment undergoes testing and validation. 5. Deployment to Test If testing in the Development environment is successful, the pipeline triggers the deployment to the ECS cluster in the Test account using the same Blue/Green strategy. 6. Testing in Test The application undergoes further testing in the Test environment. 7. Manual Approval After successful testing in the Test environment, the pipeline triggers an SNS notification and requires manual approval to proceed. 8. Deployment to Production After the approval, the pipeline triggers the deployment to the ECS cluster in the Production account using the Blue/Green strategy. 9. Final Testing in Production The application undergoes final testing in the Production environment. 10. Completion The pipeline completes, and the new version of the application is running in the Production environment. Conclusion In this guide, we’ve explored the strategic approach to deploying Docker images across multiple AWS accounts using a combination of ECS, CodePipeline, CodeDeploy, and the reliability of Blue/Green deployment strategies, all through the power of AWS CloudFormation. This methodology not only enhances security and operational efficiency but also provides a scalable infrastructure capable of supporting growth. By following the steps outlined, organizations can fortify their deployment processes, embrace the agility of Infrastructure as Code, and maintain a robust and adaptable cloud environment. Implementing this guide's recommendations allows businesses to optimize costs by utilizing AWS services such as Fargate and embracing DevOps practices. The Blue/Green deployment strategy minimizes downtime, ensuring resources are utilized efficiently during transitions. With a focus on DevOps practices and the use of automation tools like AWS Code Pipeline, operational overhead is minimized. CloudFormation templates automate resource provisioning, reducing manual intervention and ensuring consistent and repeatable deployments.
For container applications, it's hard to find problems resulting from memory overuse. In case usage goes beyond container memory limit, an application can silently fail without leaving any trace. In this article, I’ll go through some of the techniques that can be used to identify the source of memory consumption in a Java container application. Memory Type In a typical Java application, memory can be broadly divided into heap and non-heap. Heap memory can be set by providing relevant JVM parameters when starting any Java application. Non-heap memory consists of native memory used by JVM itself or by any library used within the application using JNI (Java Native Interface). Method For heap memory, heap dump can be taken and analyzed using heap dump analysis tools. One of the best tools for heap dump analysis is eclipse MAT. Java provides a mechanism to track native memory allocation by enabling native memory tracking, but it may not reveal all memory allocated by native libraries. Jemalloc is a utility that can be used to track memory allocated by native libraries. Native memory is allocated using a default memory allocator called malloc. Jemalloc is a general-purpose malloc implementation with which memory allocation tracking can be enabled. It tracks all native memory allocation and generates heap profile dumps. These heap profiles can then be analyzed using Jeprof utility. Jeprof generates the heap allocation report, highlighting memory used by functions in the application. Analysis Below is a memory analysis of a sample container Java application. The application loads a sample Tensorflow model to enable native memory utilization and runs in a Docker container. Below is the Docker memory consumption. It shows 254MB. Let's try to pinpoint the source of memory consumption. Total Memory To get a sense of the total memory being used by the application process, we can check Resident Set Size (RSS). It is the total committed memory that resides in the main memory or RAM. There are multiple utilities that can help check this like top, ps, or pmap. Checking RSS does not help pinpoint the root source of usage. For the sample application, using the below command, the total RSS is 376MB. Shell ps --no-header -o rss $(pidof java) Heap Analysis Below is heap memory consumption as generated by the eclipse MAT tool. The total retained heap is shown as 2.2MB, which is way below the total memory consumption shown by Docker and indicates the majority of consumption from the non-heap area. Native Memory Analysis Upon reviewing the native memory summary using the command below, the total memory usage appears to be approximately 99MB. However, this value is less than the total memory consumption and does not accurately identify the root cause of the issue. Shell jcmd $(pidof java) VM.native_memory \ | grep -P "Total.*committed=" \ | grep -o -P "(?<=committed=)[0-9]+(?=KB)" Off-Heap Memory Analysis An analysis using Jemalloc and Jeprof reveals that native memory usage is primarily attributed to the Tensorflow library, with a total consumption of approximately 112MB. This insight provides a clear indication of the source of native memory usage and can be further investigated to minimize any excessive consumption. Conclusion Java memory analysis is critical, especially for container-based applications. Knowing the source of memory consumption in an application can help us understand the memory requirement and lower application costs by removing unnecessary consumption. When checking memory consumption, all types of memory and their sources need to be pinpointed. Heap dump analysis can pinpoint heap memory consumption sources, and Jemalloc and Jeprof are useful in pinpointing native memory consumption sources. Sample Application Code Link https://github.com/parveensaini/JavaContainerMemoryAnalysis
As we delve into the dynamic world of Kubernetes, understanding its core components and functionalities becomes pivotal for anyone looking to make a mark in the cloud computing and containerization arena. Among these components, static pods hold a unique place, often overshadowed by more commonly discussed resources like deployments and services. In this comprehensive guide, we will unveil the power of static pods, elucidating their utility, operational principles, and how they can be an asset in your Kubernetes arsenal. Understanding Static Pods Static pods are Kubernetes pods that are managed directly by the kubelet daemon on a specific node, without the API server observing them. Unlike other pods that are controlled by the Kubernetes API server, static pods are defined by placing their configuration files directly on a node's filesystem, which the kubelet periodically scans and ensures that the pods defined in these configurations are running. Why Use Static Pods? Static pods serve several critical functions in a Kubernetes environment: Cluster Bootstrapping They are essential for bootstrapping a Kubernetes cluster before the API server is up and running. Since they do not depend on the API server, they can be used to deploy the control plane components as static pods. Node-Level System Pods Static pods are ideal for running node-level system components, ensuring that these essential services remain running, even if the Kubernetes API server is unreachable. Simplicity and Reliability For simpler deployments or edge environments where high availability is not a primary concern, static pods offer a straightforward and reliable deployment option. Creating Your First Static Pod Let’s walk through the process of creating a static pod. You'll need access to a Kubernetes node to follow along. 1. Access Your Kubernetes Node First, SSH into your Kubernetes node: ssh your_username@your_kubernetes_node 2. Create a Pod Definition File Create a simple pod definition file. Let’s deploy an Nginx static pod as an example. Save the following configuration in /etc/kubernetes/manifests/nginx-static-pod.yaml: apiVersion: v1 kind: Pod metadata: name: nginx-static-pod labels: role: myrole spec: containers: - name: nginx image: nginx ports: - containerPort: 80 3. Configure the kubelet to Use This Directory Ensure the kubelet is configured to monitor the /etc/kubernetes/manifests directory for pod manifests. This is typically set by the --pod-manifest-path kubelet command-line option. 4. Verify the Pod Is Running After a few moments, use the docker ps command (or crictl ps if you're using CRI-O or containerd) to check that the Nginx container is running: docker ps | grep nginx Or, if your cluster allows it, you can check from the Kubernetes API server with: kubectl get pods --all-namespaces | grep nginx-static-pod Note that while you can see the static pod through the API server, you cannot manage it (delete, scale, etc.) through the API server. Advantages of Static Pods Simplicity: Static pods are straightforward to set up and manage on a node-by-node basis. Self-sufficiency: They can operate independently of the Kubernetes API server, making them resilient in scenarios where the API server is unavailable. Control plane bootstrapping: Static pods are instrumental in the initial setup of a Kubernetes cluster, particularly for deploying control plane components. Considerations and Best Practices While static pods offer simplicity and independence from the Kubernetes API server, they also come with considerations that should not be overlooked: Cluster management: Static pods are not managed by the API server, which means they do not benefit from some of the orchestration features like scaling, lifecycle management, and health checks. Deployment strategy: They are best used for node-specific tasks or cluster bootstrapping, rather than general application deployment. Monitoring and logging: Ensure that your node-level monitoring and logging tools are configured to include static pods. Conclusion Static pods, despite their simplicity, play a critical role in the Kubernetes ecosystem. They offer a reliable method for running system-level services directly on nodes, independent of the cluster's control plane. By understanding how to deploy and manage static pods, you can ensure your Kubernetes clusters are more robust and resilient. Whether you're bootstrapping a new cluster or managing node-specific services, static pods are a tool worth mastering. This beginner's guide aims to demystify static pods and highlight their importance within Kubernetes architectures. As you advance in your Kubernetes journey, remember that the power of Kubernetes lies in its flexibility and the diversity of options it offers for running containerized applications. Static pods are just one piece of the puzzle, offering a unique blend of simplicity and reliability for specific use cases. I encourage you to explore static pods further, experiment with deploying different applications as static pods, and integrate them into your Kubernetes strategy where appropriate. Happy Kubernetes-ing!
Kubernetes is a highly popular container orchestration platform designed to manage distributed applications at scale. With many advanced capabilities for deploying, scaling, and managing containers, It allows software engineers to build a highly flexible and resilient infrastructure. Additionally, it is important to note that it is an open-source software, that provides a declarative approach to application deployment and enables seamless scaling and load balancing across multiple nodes. With built-in fault tolerance and self-healing capabilities, Kubernetes ensures high availability and resiliency for your applications. One of the key advantages of Kubernetes is its ability to automate many operational tasks, abstracting the underlying complexities of the infrastructure, allowing developers to focus on application logic, and optimizing the performance of solutions. What Is ChatGPT? You've probably heard a lot about ChatGPT, it's a renowned language model that has revolutionized the field of natural language processing (NLP). bUILT by OpenAI, ChatGPT is powered by advanced artificial intelligence algorithms and trained on massive amounts of text data. ChatGPT's versatility goes beyond virtual assistants and chatbots as it can be applied to a wide range of natural language processing applications. Its ability to understand and generate human-like text makes it a valuable tool for automating tasks that involve understanding and processing written language. The underlying technology behind ChatGPT is based on deep learning and transformative models. The ChatGPT training process involves exposing the model to large amounts of text data from a variety of sources. This extensive training helps it learn the intricacies of the language, including grammar, semantics, and common patterns. Furthermore, the ability to tune the model with specific data means it can be tailored to perform well in specific domains or specialized tasks. Integrating ChatGPT (OpenAI) With Kubernetes: Overview Integrating Kubernetes with ChatGPT makes it possible to automate tasks related to the operation and management of applications deployed in Kubernetes clusters. Consequently, leveraging ChatGPT allows you to seamlessly interact with Kubernetes using text or voice commands, which in turn, enables the execution of complex operations with greater efficiency. Essentially, with this integration, you can streamline various tasks such as; Deploying applications Scaling resources Monitoring cluster health The integration empowers you to take advantage of ChatGPT's contextual language generation capabilities to communicate with Kubernetes in a natural and intuitive manner. Whether you are a developer, system administrator, or DevOps professional, this integration can revolutionize your operations and streamline your workflow. The outcome is more room to focus on higher-level strategic initiatives and improving overall productivity. Benefits of Integrating ChatGPT (OpenAI) With Kubernetes Automation: This integration simplifies and automates operational processes, reducing the need for manual intervention. Efficiency: Operations can be performed quickly and with greater accuracy, optimizing time and resources. Scalability: Kubernetes provides automatic scaling capabilities, allowing ChatGPT to manage to expand applications without additional effort. Monitoring: ChatGPT can provide real-time information about the state of Kubernetes clusters and applications, facilitating issue detection and resolution. How To Integrate ChatGPT (OpenAI) With Kubernetes: A Step-By-Step Guide At this point, we understand that you already have a suitable environment for integration, including the installation of Kubernetes and an OpenAI account for ChatGPT calls. Let’s proceed to show you how to configure the credentials for ChatGPT to access Kubernetes, using the `kubernetes-client` lib in the automation script for interactions with Kubernetes. First, create your Token on the OpenAI platform: We will forward messages to Slack about the status, and in case of problems in Kubernetes, ChatGPT will propose possible solutions to apply. Great, now let's configure the AgentChatGPT script, remember to change this: Bearer <your token> client = WebClient(token="<your token>" channel_id = "<your channel id>" Python import requests from slack_sdk import WebClient from kubernetes import client, config # Function to interact with the GPT model def interagir_chatgpt(message): endpoint = "https://api.openai.com/v1/chat/completions" prompt = "User: " + message response = requests.post( endpoint, headers={ "Authorization": "Bearer ", "Content-Type": "application/json", }, json={ "model": "gpt-3.5-turbo", "message": [{"role": "system", "content": prompt}], }, ) response_data = response.json() chatgpt_response = response_data["choices"][0]["message"]["content"] return chatgpt_response # Function to send notification to Slack def send_notification_slack(message): client = WebClient(token="") channel_id = "" response = client.chat_postMessage(channel=channel_id, text=message) return response # Kubernetes Configuration config.load_kube_config() v1 = client.CoreV1Api() # Kubernetes cluster monitoring def monitoring_cluster_kubernetes(): while True: # Collecting Kubernetes cluster metrics, logs, and events def get_information_cluster(): # Logic for collecting Kubernetes cluster metrics metrics = v1.list_node() # Logic for collecting Kubernetes cluster logs logs = v1.read_namespaced_pod_log("POD_NAME", "NAMESPACE") # Logic for collecting Kubernetes cluster events events = v1.list_event_for_all_namespaces() return metrics, logs, events # Troubleshooting based on collected information def identify_problems(metrics, logs, events): problems = [] # Logic to analyze metrics and identify issues for metric in metrics.items: if metric.status.conditions is None or metric.status.conditions[-1].type != "Ready": problems.append(f"The node {metric.metadata.name} not ready.") # Logic to analyze the logs and identify problems if "ERROR" in logs: problems.append("Errors were found in pod logs.") # Logic to analyze events and identify problems for evento in events.items: if evento.type == "Warning": problems.append(f"A warning event has been logged: {event.message}") return problem # Kubernetes cluster monitoring def monitoring_cluster_kubernetes(): while True: metrics, logs, events = get_information_cluster() problems = identify_problems(metrics, logs, events) if problemas: # Logic to deal with identified problems for problem in problems: # Logic to deal with each problem individually # May include corrective actions, additional notifications, etc. print(f"Identified problem: {problem}") # Logic to wait a time interval between checks time.sleep(60) # Wait for 1 minute before performing the next check # Running the ChatGPT agent and monitoring the Kubernetes cluster if __name__ == "__main__": monitoring_cluster_kubernetes() if problem_detected: # Logic for generating troubleshooting recommendations with ChatGPT resposta_chatgpt = interact_chatgpt(description_problem) # Send notification to Slack with issue description and recommendation message_slack = f"Identified problem: {description_problem}\nRecomendation: {response_chatgpt}" send_notification_slack(message_slack) # Running the ChatGPT agent and monitoring the Kubernetes cluster if __name__ == "__main__": monitorar_cluster_kubernetes() Now use the Dockerfile example to build your container with ChatGPT Agent, remember it’s necessary to create volume with your Kube config: Dockerfile # Define the base image FROM python:3.9-slim # Copy the Python scripts to the working directory of the image COPY agent-chatgpt.py /app/agent-chatgpt.py # Define the working directory of the image WORKDIR /app # Install required dependencies RUN pip install requests slack_sdk kubernetes # Run the Python script when the image starts CMD ["python", "agent-chatgpt.py"] Congratulations, if everything is properly configured. Running the script at some point in the monitoring you may get messages similar to this: Best Practices for Using Kubernetes With ChatGPT (OpenAI) Security Implement appropriate security measures to protect access to Kubernetes by ChatGPT. Logging and Monitoring Implement robust logging and monitoring practices within your Kubernetes cluster. Use tools like Prometheus, Grafana, or Elasticsearch to collect and analyze logs and metrics from both the Kubernetes cluster and the ChatGPT agent. This will provide valuable insights into the performance, health, and usage patterns of your integrated system. Error Handling and Alerting Establish a comprehensive error handling and alerting system to promptly identify and respond to any issues or failures in the integration. Essentially, set up alerts and notifications for critical events, such as failures in communication with the Kubernetes API or unexpected errors in the ChatGPT agent. This will help you proactively address problems and ensure smooth operation. Scalability and Load Balancing Plan for scalability and load balancing within your integrated setup. Consider utilizing Kubernetes features like horizontal pod autoscaling and load balancing to efficiently handle varying workloads and user demands. This will ensure optimal performance and responsiveness of your ChatGPT agent while maintaining the desired level of scalability. Backup and Disaster Recovery Implement backup and disaster recovery mechanisms to protect your integrated environment. Regularly back up critical data, configurations, and models used by the ChatGPT agent. Furthermore, create and test disaster recovery procedures to minimize downtime and data loss in the event of system failures or disasters. Continuous Integration and Deployment Implement a robust CI/CD (Continuous Integration/Continuous Deployment) pipeline to streamline the deployment and updates of your integrated system. Additionally, automate the build, testing, and deployment processes for both the Kubernetes infrastructure and the ChatGPT agent to ensure a reliable and efficient release cycle. Documentation and Collaboration Maintain detailed documentation of your integration setup, including configurations, deployment steps, and troubleshooting guides. Also, encourage collaboration and knowledge sharing among team members working on the integration. This will facilitate better collaboration, smoother onboarding, and effective troubleshooting in the future. By incorporating these additional recommendations into your integration approach, you can further enhance the reliability, scalability, and maintainability of your Kubernetes and ChatGPT integration. Conclusion Integrating Kubernetes with ChatGPT (OpenAI) offers numerous benefits for managing operations and applications within Kubernetes clusters. By adhering to the best practices and following the step-by-step guide provided in this resource, you will be well-equipped to leverage the capabilities of ChatGPT for automating tasks and optimizing your Kubernetes environment. The combination of Kubernetes' advanced container orchestration capabilities and ChatGPT's contextual language generation empowers you to streamline operations, enhance efficiency, enable scalability, and facilitate real-time monitoring. Whether it's automating deployments, scaling applications, or troubleshooting issues, the integration of Kubernetes and ChatGPT can significantly improve the management and performance of your Kubernetes infrastructure. As you embark on this integration journey, remember to prioritize security measures, ensure continuous monitoring, and consider customizing the ChatGPT model with Kubernetes-specific data for more precise results. Maintaining version control and keeping track of Kubernetes configurations will also prove invaluable for troubleshooting and future updates.
What Are Cloud-Native Applications? Cloud-native applications mark a change in how software is created and rolled out, making use of the capabilities of cloud computing environments. These apps are structured as a set of services known as microservices, which interact through clear APIs. Containerization tools like Docker are commonly used to package each microservice along with its dependencies to ensure consistency across setups and enable deployment. Platforms like Kubernetes automate the management of apps handling tasks like scaling load balancing and service discovery. DevOps methods that stress collaboration between development and operations teams play a role in the native approach by enabling continuous integration, continuous delivery, and swift iteration. With flexibility and scalability at their core, native applications can adapt resources dynamically to meet changing workloads for performance and cost effectiveness. Furthermore, they prioritize resilience, with fault tolerance measures in place to handle failures gracefully and maintain availability. Embracing native principles enables organizations to speed up innovation, boost agility, and streamline their software development processes. The Runtime Security Model The concept of the Runtime Security Model pertains to the security measures and protocols implemented while an application is actively running. It involves a range of strategies and technologies aimed at safeguarding the application and its infrastructure from security risks during operation. Some key elements of the Runtime Security Model are: Access Controls: Enforcing access controls in real time ensures that only authorized users or processes can interact with the application and its data. This includes setting up authentication mechanisms like factor authentication (MFA) or OAuth to verify user identities and enforce proper authorization rules. Encryption: Encrypting data as the application runs helps prevent access or interception. This involves encrypting data during transmission using protocols like HTTPS or TLS as encrypting data at rest using encryption algorithms and secure storage methods. Runtime Monitoring: Continuous monitoring of the application's runtime environment is crucial for detecting and responding to security threats or irregularities. This involves keeping track of activities auditing events and monitoring system and network traffic. Vulnerability Management: Consistently assessing the app and its parts is important to catch any weaknesses and uphold a setting. Using automated tools for vulnerability checks can aid in spotting and ranking vulnerabilities by their seriousness making it easier to address them. Container Security: When utilizing containerization technology for deploying the application, it is vital to focus on container security. This includes activities like scanning container images for vulnerabilities, monitoring container behavior during runtime, and implementing security measures at the orchestration layer of containers. Secure Configuration Management: Ensuring configuration management of the application and its operating environment plays a role in reducing potential attack points and minimizing security threats. This involves steps such as strengthening operating systems, securing network settings, and deactivating services or functions that could create vulnerabilities. Runtime Threat Detection and Response: Having mechanisms in place for identifying and responding to real-time threats during operation is essential in handling security incidents. Techniques like analysis machine learning algorithms or leveraging threat intelligence feeds can aid in recognizing suspicious activities or potential breaches to enhance security posture. Types of Cloud Native Environments Cloud-native environments can be classified based on the technologies and deployment models they use. Virtual Machines (VMs): In environments based on VMs, applications are deployed within virtual servers. Each VM operates with its operating system, ensuring separation between applications. Hypervisors handle the distribution of resources (such as CPU, memory, and storage) to VMs. Cloud service providers offer sizes and configurations of VM instances for users to deploy and scale applications as required. Storage Units: Containers act as packages that contain an application and its necessary components facilitating deployment in settings. Cloud-native environments that rely on containers employ technologies like Docker to bundle applications into containers. These containers utilize the host operating systems kernel resulting in overhead compared to machines (VMs). Kubernetes serves as a platform for managing containerized applications at a scale. Container Services: Container services platforms offer a managed environment for deploying, orchestrating, and scaling applications without users needing to handle the complexities of the underlying infrastructure. These platforms simplify container orchestration tasks and allow developers to focus on building and deploying their applications effectively. Serverless Functions: In serverless functions, developers can run functions or code segments without the need to manage servers or infrastructure. Cloud providers allocate resources dynamically to execute these functions based on events or triggers. These serverless functions are typically stateless, event-triggered, and short-lived, making them ideal for event-driven architectures, time data processing, and microservices applications. Some examples of serverless platforms are AWS Lambda, Google Cloud Functions, and Azure Functions. Cloud Native Application Security Best Practices Securing cloud-based applications involves a strategy that covers levels of the application stack ranging from the underlying infrastructure to the actual application code. Let’s explore some guidelines for ensuring security in cloud-based applications: Secure Development Practices: Make sure to use coding techniques and guidelines like OWASP Top 10 to prevent security risks such as injection attacks, XSS, CSRF, and others. Incorporate code evaluations, static code checks, and automated security assessments (like SAST and DAST) during development to pinpoint and fix security weaknesses at a stage. Container Security: Scan container images frequently for vulnerabilities by utilizing tools such as Clair, Trivy, or Anchore. Make sure that container images originate from sources opt for base images and incorporate essential dependencies exclusively. Implement security measures during runtime, like SELinux, AppArmor, or seccomp profiles to restrict container privileges and minimize the risk of attacks. Network Security: Utilize network segmentation and firewalls to control the movement of data between parts of the application. Incorporate encryption methods like TLS/SSL to safeguard data during transit from eavesdropping and interception by parties. Employ Web Application Firewalls (WAFs) to screen HTTP traffic for both content and security threats. API Security: Permit API requests by utilizing API keys, OAuth tokens, or JWT tokens. Set up restrictions on usage, control the flow of traffic, and enforce access rules to deter misuse and counteract DDoS attacks. Clean input data to ward off injection assaults and uphold the integrity of data. Logging and Monitoring: Set up a system for logging and monitoring to keep tabs on security incidents and unusual events. Make use of SIEM (Security Information and Event Management) tools to gather and connect security logs from places to detect threats and respond to incidents. Create alerts and automated actions for any activities or breaches, in security. Incident Response and Disaster Recovery: Keep up a plan for responding to incidents that details steps for recognizing, controlling, and recovering from security issues routinely. Confirm the effectiveness of backup and disaster recovery protocols to safeguard data accuracy and reduce disruptions in case of an intrusion or breakdown. Cloud Native Security Tools and Platforms Various security tools and platforms are available to tackle the security challenges of safeguarding native applications and environments. Below are some standout examples categorized by their functions: 1. Container Security: Docker Security Scanning Docker Security Scanning is a feature offered by Docker Hub for storing Docker container images. It enables users to check Docker container images for security issues and receive alerts about any vulnerabilities found. Here's a breakdown of how Docker Security Scanning operates: Uploading Images: When a user uploads a Docker image to Docker Hub it gets in line for security scanning Detecting Vulnerabilities: Docker Hub utilizes databases of known vulnerabilities to scan through the layers of the container image looking for security flaws in operating system packages, libraries, and dependencies integrated into the image. Security Alerts: After completing the scanning process Docker Hub generates security alerts highlighting any vulnerabilities discovered in the image. These alerts detail information about each vulnerability, such as its severity level, affected components, and recommended steps for fixing them. Clair Clair is a tool used to scan vulnerabilities in the source of container images. It was created by CoreOS, which is now part of Red Hat. It is commonly utilized in security processes for containers to identify and address security flaws in Docker and OCI (Open Container Initiative) images. Let's delve into Clair and explore its functionality: Detecting Vulnerabilities: Clair analyzes container images and their layers to detect known security vulnerabilities present in the operating system packages, libraries, and dependencies included in the image. It compares the components within the image with an updated database of known vulnerabilities obtained from security advisories. Architecture Design: Clair is structured with an architecture that allows for scalable vulnerability scanning. It comprises components such as a database (commonly PostgreSQL), a REST API server, and worker processes responsible for fetching vulnerability data and carrying out scanning operations. Analyzing Static Data: Clair analyzes container images without running them, enabling swift and lightweight vulnerability checks. It extracts metadata from image manifests and scrutinizes layers to gather details about installed packages, libraries, and their respective versions. CVE Matching: Clair conducts a comparison between the elements in container images and the Common Vulnerabilities and Exposures (CVE) database to identify any vulnerabilities. It provides information on each vulnerability, such as its CVE ID, severity rating, impacted versions, as well as references and advisories. Integration With Container Orchestration Platforms: Clair can be connected with container orchestration platforms like Kubernetes to automate vulnerability scans during deployment. There are plugins and extensions for integration with popular container runtime environments and orchestrators. Customization and Extensibility: Clair is highly customizable and flexible allowing users to personalize vulnerability scanning policies set scanning thresholds and link up with external systems and tools. Users can create custom plugins and extensions to expand Clair's capabilities and mesh them into existing security processes and toolsets. Anchore Engine The Anchore Engine is a container security platform that originates from the source and focuses on analyzing, evaluating, and validating container images for security vulnerabilities, compliance with policies, and adherence to industry standards. It allows organizations to uphold security protocols and guarantee that applications in containers are constructed and launched securely in settings. Let me provide you with an overview of the Anchore Engine along with its features: Vulnerability Assessment: The Anchore Engine conducts vulnerability assessments on container images, pinpointing established security vulnerabilities in operating system packages, libraries, and dependencies. It uses databases like CVE (Common Vulnerabilities and Exposures) to compare components within container images with known vulnerabilities. Policy Assessment: Users can set up and enforce security policies through the Anchore Engine that define configurations, package versions, and vulnerability thresholds for container images. It assesses container images against these policies to ensure alignment with security practices and organizational guidelines. Image Digest Analysis and Metadata Evaluation: The Anchore Engine scrutinizes metadata from container images such as image digest, layer data, and package manifests to offer insights into their contents and interconnectedness. This assists users in grasping the makeup of container images while identifying security threats or compliance concerns. Customizable Policies and Whitelists: Users have the option to craft security policies as well as whitelists customized for their distinct needs and scenarios. Anchore Engine offers policy customization options allowing organizations to adjust vulnerability severity levels blacklist packages and conduct compliance checks according to their risk tolerance and regulatory requirements. Seamless Integration With CI/CD Pipelines: Anchore Engine smoothly integrates with CI/CD pipelines to automate security assessments and ensure policy adherence throughout the container lifecycle. It provides plugins and APIs for integration with CI/CD tools enabling automated scanning for vulnerabilities and enforcing policies during the build and deployment stages. Notification System and Alerts: Anchore Engine alerts users about security vulnerabilities, policy breaches, and compliance concerns found in container images via email notifications, webhook alerts, and connections to external notification systems. This feature enables responses to address security issues and maintain compliance with security standards. Scalability and Performance Optimization: Anchore Engine is built for scalability supporting analysis and scanning of container images across distributed environments. By leveraging processing and caching mechanisms, it enhances performance efficiency while reducing scanning durations. This ensures swift security assessments of container images on a large scale. Container Orchestration Security Securing container orchestration involves protecting the platform itself and the containerized tasks it oversees. As platforms such as Kubernetes, Kube, Sysdig, Docker Swarm, and Apache Mesos gain popularity for orchestrating and scaling containerized applications, prioritizing security measures becomes crucial. Kubernetes Security Policy: A Kubernetes functionality that sets security rules at the pod level by controlling access and managing volume mounts. Kube Bench: A tool that assesses Kubernetes clusters against industry practices defined in the CIS Kubernetes Benchmark. Docker Swarm: Docker Swarm is Docker's native clustering and orchestration tool. It simplifies the orchestration of containers by providing features like load balancing and service discovery. Sysdig Secure: A platform for securing containers that includes threat detection during runtime, managing vulnerabilities, and ensuring compliance in Kubernetes setups. 2. Serverless Security AWS Lambda Security Best Practices: AWS provides guidelines on securing serverless applications specifically on AWS Lambda. The OWASP Serverless Top 10 project highlights security risks in serverless setups and provides effective mitigation strategies. Snyk is a platform dedicated to identifying and fixing vulnerabilities in open-source dependencies. 3. API Security API security involves the practices, methods, and technologies utilized to safeguard APIs from entry, data breaches, and harmful attacks. As APIs serve a function in software development by facilitating communication and data interchange among various systems, ensuring their security is crucial for protecting sensitive data and upholding the reliability of applications and services. Here are some essential elements of API security: Authentication: Employ robust authentication techniques to confirm the identity of API users and guarantee that approved individuals and applications can reach protected resources. This may involve utilizing API keys, OAuth tokens, JWT (JSON Web Tokens), or client certificates for authentication. Authorization: Enforce access controls and authorization policies to limit access to API endpoints and resources based on the roles, permissions, and privileges of users. Implement role-based access control (RBAC) or attribute-based access control (ABAC) to establish and oversee authorization regulations. Encryption: Secure sensitive data transmitted through APIs by encrypting it to prevent interception or monitoring. Utilize transport layer security (TLS/SSL) to encrypt communications between clients and servers ensuring data confidentiality and integrity. Input Validation: To ensure the safety of our systems we carefully clean up any data that comes from API users. This helps us protect against attacks like injecting code, such as SQL injection or XSS (Cross Site Scripting). By using validation and sanitization techniques, we make sure to filter and clean user input before using it in our processes. Rate Limiting and Throttling: We have set up measures to control the flow of API requests in order to prevent misuse, Denial of Service (DoS) attacks, and brute force attacks. By setting limits on how many requests can be made based on factors like user identity, IP address, or API key, we reduce the risk of overwhelming our system and depleting resources. Audit Logging: Keeping track of all activities within our APIs is vital for monitoring access attempts and security incidents. By logging these events, we can keep an eye on user actions, detect any behavior, and investigate security concerns promptly. Our detailed audit logs contain information such as requests made to the API responses received, user identities involved timestamps for each action taken, and the outcome of those actions. API Gateway: We use API gateways as a hub for managing and securing all our APIs effectively. These gateways help us enforce security policies across APIs by handling tasks like authentication checks, authorization verifications, data encryption processes, and controlling request rates. With features such as access control mechanisms, traffic management tools, real-time monitoring capabilities, and-in depth analytics reports, we enhance the security posture and operational efficiency of our APIs. Regularly test the security of APIs by conducting security assessments like penetration testing, vulnerability scanning, and code reviews. This helps to identify and fix security flaws, misconfiguration, and vulnerabilities to ensure the security of APIs and their related components. 4. Google Cloud Security Command Center Google Cloud Security Command Center (Cloud SCC) is a security management and data protection platform provided by Google Cloud Platform (GCP). It offers comprehensive insights and oversight of security and compliance risks across the GCP infrastructure, services, and applications. Key features of Google Cloud Security Command Center include: Asset Inventory: Cloud SCC offers a perspective of all cloud assets deployed in an organization's GCP environment such as machines, containers, databases, storage buckets, and networking resources. It automatically classifies cloud assets while providing metadata and contextual information about each asset. Security Findings: Cloud SCC consolidates security findings and insights from GCP security services like Google Cloud Monitoring, Google Cloud Logging, as well as third-party security tools. It prioritizes security threats like vulnerabilities, misconfiguration, or suspicious activities across resources. Moreover, it offers advice for addressing these issues. Vulnerability Assessment: Through integration with tools like Google Cloud Security Scanner and third-party vulnerability management solutions, Cloud SCC conducts automated vulnerability scans to assess the security status of cloud assets. By pinpointing known vulnerabilities in operating systems, software packages, and dependencies, it furnishes reports on vulnerabilities along with guidance for remediation. Threat Detection: Cloud SCC utilizes Google Cloud Security Command Center for Threat Detection to promptly identify and address security threats and suspicious activities. It relies on machine learning algorithms, anomaly detection methods, and threat intelligence sources to scrutinize cloud logs and telemetry data for signs of compromise (IOCs) and security incidents. Policy Monitoring and Enforcement: Cloud SCC empowers organizations to establish and uphold security policies and compliance needs for resources through Security Health Analytics and Policy Intelligence. It constantly watches over resources for compliance breaches, misconfiguration, and deviations from security policies issuing alerts and notifications for resolution. Data Risk Assessment: Cloud SCC provides tools for assessing data risks to help organizations pinpoint data like identifiable information (PII) intellectual property and confidential data stored in GCP services. It evaluates data usage trends, access controls, and encryption configurations to evaluate data security risks and compliance status. Compliance Reporting: Cloud SCC includes predefined compliance frameworks such as CIS benchmarks GDPR regulations and HIPAA standards. It generates compliance reports along with dashboards that assist organizations in showcasing adherence to mandates and industry norms. 5. Security Information and Event Management (SIEM) Security Information and Event Management (SIEM) is a cybersecurity approach that involves gathering, consolidating, scrutinizing, and linking security data from sources within an organization's IT setup. SIEM solutions offer a view of security events, alarms, and occurrences, empowering organizations to effectively spot, investigate, and address security risks. Key elements and functionalities of SIEM solutions encompass: Data Gathering: SIEM solutions amass security-related data from origins like network devices, servers, endpoints, applications, cloud services, and security utilities. Data inputs may include logs, events, alarms, flow records, configuration files, and threat intelligence feeds. Standardization and Consolidation: SIEM platforms. Consolidate security data from sources into a uniform format for examination and linkage. This process involves interpreting information accurately while categorizing and aligning security events to streamline analysis and correlation. Analysis and Correlation: SIEM solutions link security events from sources to pinpoint trends, irregularities, and possible security incidents. They leverage correlation rules, heuristics, statistical analysis, and machine learning algorithms to detect activities, threats, and attack patterns. Alerting and Notification: SIEM systems generate alerts and notifications for security events and incidents that meet predefined criteria or thresholds. They send out notifications display dashboards and generate reports to alert security teams about possible security breaches, policy infringements, or unusual activities. Responding to Incidents: Security Information and Event Management (SIEM) solutions aid in the detection and response to incidents by offering tools for probing security events examining evidence and performing root cause analysis. They empower security teams to assess, prioritize, and address security incidents efficiently. Ensuring Compliance and Generating Reports: SIEM platforms assist in monitoring compliance status and generating reports by offering predefined compliance templates audit trails and reporting functionalities. They assist organizations in showcasing adherence to mandates, industry norms, and internal policies through automated reporting procedures. Integrating Systems and Streamlining Processes: SIEM solutions seamlessly integrate with security tools and technologies to enhance their capabilities while streamlining security workflows. They enable connections with threat intelligence platforms, endpoint detection and response (EDR) solutions, incident response tools, and security orchestration automation platforms for a cohesive approach. Adaptability and Efficiency: SIEM platforms are crafted for adaptability and efficiency to manage datasets securely while catering to the demands of large-scale implementations. They utilize distributed architectures along with data partitioning techniques coupled with data compression methods to enhance performance levels effectively. Conclusion Embracing cloud-native applications revolutionizes software development and leverages cloud computing's power for innovation and agility through microservices, Docker, and Kubernetes. However, robust security practices are essential to safeguard these environments effectively. With a holistic security approach, organizations can unlock cloud-native benefits while mitigating risks and ensuring resilience in modern software ecosystems.
Yitaek Hwang
Software Engineer,
NYDIG
Emmanouil Gkatziouras
Cloud Architect,
egkatzioura.com
Marija Naumovska
Product Manager,
Microtica