Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.
DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!
Development and programming tools are used to build frameworks, and they can be used for creating, debugging, and maintaining programs — and much more. The resources in this Zone cover topics such as compilers, database management systems, code editors, and other software tools and can help ensure engineers are writing clean code.
The Importance of Code Profiling in Performance Engineering
Comparing Pandas, Polars, and PySpark: A Benchmark Analysis
In an era where data is the new oil, effectively utilizing data is crucial for the growth of every organization. This is especially the case when it comes to taking advantage of vast amounts of data stored in cloud platforms like Amazon S3 - Simple Storage Service, which has become a central repository of data types ranging from the content of web applications to big data analytics. It is not enough to store these data durably, but also to effectively query and analyze them. This enables you to gain valuable insights, find trends, and make data-driven decisions that can lead your organization forward. Without a querying capability, the data stored in S3 would not be of any benefit. To avoid such scenarios, Amazon Web Services (AWS) provides tools to make data queries accessible and powerful. Glue Crawler is best suited to classify and search data. Athena is a service used to make quick ad hoc queries. Redshift Spectrum is considered a solid analyst capable of processing complex queries at scale. Each tool has its niche and provides a flexible approach for querying data according to your needs and the complexity of the tasks. Exploring Glue Crawler for Data Cataloging With the vast quantities of data stored on Amazon S3, finding an efficient way to sort and make sense of this data is important. This leads us to Glue Crawler. It is like an automated librarian who can organize, classify, and update library books without human intervention. Glue Crawler does the same with Amazon S3 data. It automatically scans your storage, recognizes different data formats, and suggests schemas in the AWS Glue Data Catalog. This process simplifies what would otherwise be a hard manual task. Glue Crawler generates metadata tables by crawling structured and semi-structured data to organize it for query and analysis. The importance of a current data catalog cannot be exaggerated. A well-maintained catalog serves as a road map for stored data. An updated catalog ensures that when you use tools such as Amazon Athena or Redshift Spectrum, you use the most current data structure to streamline the query process. In addition, a centralized metadata repository improves collaboration between teams by providing a common understanding of the layout. To make the most of your Glue Crawler, here are some best practices: Classify Your Data Use classifiers to teach Glue Crawler about the different data types. Whether JSON, CSV, or Parquet, the accurate classification ensures the schema created is as meticulous as possible. Schedule Regular Crawls Data changes over time, so scheduled crawls are performed to keep the catalog updated. This can be done daily, weekly, or even after a particular event, depending on how frequently your data is updated. Use Exclusions Not all data must be crawled. Set temporary or redundant file exclusion patterns to save time and reduce costs. Review Access Policies Check that the correct permissions are in place. Crawlers need access to the data they expect to crawl, and users need the right permissions to access the updated catalog. By following these tips, you can ensure that Glue Crawler works harmoniously with your data and improves the data environment. Adopting these best practices improves the data discovery process, and lays a solid foundation for the next step in the data query process. Harnessing the Power of Amazon Athena for Query Execution Imagine a scenario in which you are sorting through an enormous amount of data looking for that decisive insight hidden deep inside. Imagine doing this in just a few clicks and commands, without complex server configurations. Amazon Athena, an interactive query service is tailor-made for this - it can analyze data directly on Amazon S3 using standard SQL. Amazon Athena is similar to having a powerful search engine for data lakes. It is serverless, meaning you do not have to manage the underlying infrastructure. You don't need to set up or maintain servers, you only pay for the queries you run. Athena automatically scales, executes queries in parallel, and generates quick results even with large amounts of data and complex queries. The advantages of Amazon Athena are numerous, especially in the context of ad hoc queries. First, it provides simplicity. With Athena, you can start querying data using standard SQL without learning new languages or managing infrastructure. Secondly, there is the cost aspect. You pay per query; i.e., pay only for the data scanned by your query, making it a cost-effective option for all kinds of use cases. Finally, Athena is very flexible, and you can query data in various formats such as CSV, JSON, ORC, Avro, and Parquet directly from S3 buckets. To maximize Athena's benefits, consider these best practices: Compress your data: Compressing your data can significantly reduce the data scanned by each query, resulting in faster performance and lower costs. Use columnar formats: store data in columnar formats such as Parquet or ORC. These formats are optimized for high-performance reading and help reduce costs by scanning only the columns required for your query. Partition your data: By partitioning your data according to commonly filtered columns, Athena can skip unnecessary data partitions, improve performance, and reduce the amount of data scanned. Avoid Select *: Be specific about the required columns. Using "SELECT *" can scan more data than necessary. By following these best practices, you will be able to improve the performance of your queries, as well as manage costs. As mentioned in the previous section, having well-organized and classified data is essential. Athena benefited directly from this organization, and if the underlying data was properly structured and indexed, it could be processed more efficiently. Leveraging Redshift Spectrum for Scalable Query Processing Redshift Spectrum is an extension of Amazon's cloud data warehouse service Redshift. It allows users to perform SQL queries directly on the data stored in Amazon S3 without prior data load or conversion. This function can analyze large amounts of structured and unstructured data in Redshift. The integration is seamless; point the Redshift spectrum to the S3 data lake, define a schema, and start querying using standard SQL. Traditional data warehouse solutions often require significant pre-processing and data movement before analysis. This not only increases complexity but can also delay understanding. On the contrary, Redshift Spectrum offers more agile approaches. You keep your data where it is – in Amazon S3, and give it the computing power. This method eliminates the time-consuming ETL (extraction, transformation, load) process and opens the door to real-time analytics at scale. Furthermore, because you pay only for the queries you run, you can save significantly compared to traditional solutions, where hardware and storage costs are a factor. Several tactics can be utilized to maximize the benefits of Redshift Spectrum. Initially, arranging data in a columnar structure increases effectiveness since it enables Redshift Spectrum to access the required columns only during a query. Dividing data according to frequently requested columns can also enhance performance by reducing the amount of data that needs to be examined. Moreover, consider the size of the files stored in S3: smaller files can result in higher overhead, whereas large files may not be easily parallelized. Striking the right balance is key. Another factor to consider in cost-efficient querying is controlling the volume of data scanned during each query. To minimize Redshift Spectrum charges, you should restrict the amount of data scanned by utilizing WHERE clauses to filter out unnecessary data, thereby decreasing the data volume processed by Redshift Spectrum. Finally, continuously monitoring and analyzing query patterns can aid in pinpointing chances to improve data structures or query designs for enhanced performance and reduced expenses. Conclusion As we conclude, it is crucial to consider the main points. In this article, we have explored the intricacies of retrieving information from Amazon S3. We understood the significance of having a strong data catalog and how Glue Crawler streamlines its development and upkeep. We also examined Amazon Athena, a tool that enables quick and easy serverless ad-hoc querying. Finally, we discussed how Redshift Spectrum expands on the features of Amazon Redshift by allowing queries on S3 data and providing a strong option in place of conventional data warehouses. These tools are more than just standalone units - they are components of a unified ecosystem that, when combined, can create a robust framework for analyzing data.
This article describes the GitHub Copilot tool and the main guidelines and assumptions regarding its use in software development projects. The guidelines concern both the tool’s configuration and its application in everyday work and assume the reader will use GitHub Copilot with IntelliJ IDEA (via a dedicated plugin). GitHub Copilot: What Is It? GitHub Copilot is an AI developer assistant that uses a generative AI model trained for all programming languages available in GitHub repositories. The full description and documentation of the tool is available here. There are other similar tools on the market, such as OpenAI Codex, JetBrains AI Assistant, or Tabnine, but GitHub Copilot stands out due to the following features: The largest and most diverse collection for training an AI model – GitHub repositories Estimated usage share – currently approx. 40-50% (according to Abhay Mishra’s article based on undisclosed industry insights), but the market is very dynamic Support for popular technologies – we’ve tested it with the Java programming language, Scala, Kotlin, Groovy, SQL, Spring, Dockerfile, OpenShift, Bash Very good integration with the JetBrains IntelliJ IDEA IDE Low entry-level due to quick and easy configuration, general ease of use, clear documentation, and many usage examples on the internet A wide range of functionalities, including: Suggestions while writing code Generating code based on comments in natural language Taking existing code into account when generating a new code snippet Creating unit tests Chat – allows you to ask questions regarding code, language, and technology, as well as suggests corrections for simplifying the code CLI – support for working in the console and creating bash scripts Our Goals Our main goal for using GitHub Copilot was to improve the efficiency of writing code and its quality. In addition, we intended it to support and assist us in work in which programmers lack knowledge and experience. Here are the specific goals that we wanted our development team to achieve by using GitHub Copilot: 1. Accelerating Development Generating code fragments Generating SQL queries Hints for creating and modifying OpenShift and Dockerfile configuration files Faster search for solutions using the chat function, e.g., explanation of regular expressions, operation of libraries or framework mechanisms 2. Improving Code Quality Generating unit tests with edge cases – both in Java and Groovy languages Suggesting corrections and simplifications in our own code 3. Working With Less Frequently Used Technologies Explaining and generating code (including unit tests) in Scala and Kotlin Support while using “legacy” solutions like Activiti, etc. Support in creating and understanding configuration files 4. More Efficient Administrative Work in the Console Using CLI Functions Tool Limitations Guidelines Since GitHub Copilot is based on generative AI, you must always remember that it may generate incorrect code or responses. Therefore, when using the tool, you must be aware of potential limitations and apply the principle of limited trust and verification. The main limitations are presented in the table below. Limitation Description Limited scope of knowledge The tool is based on code found in GitHub repositories. Some problems, or complex structures, languages or data notations, have poor representation in the training sets Dynamic development and features in the beta phase The tool is developing very dynamically. Patches and updates appear every week or every several weeks, which indicates that many elements of the tool are not working properly. Some functionalities, such as GitHub Copilot CLI, are still in beta Inaccurate code The tool provider informs that the generated code may not meet the user’s expectations, may not solve the actual problem, and may contain errors Inaccurate chat responses When using chat, the accuracy of the answer depends largely on the question or command formulated. The documentation says that “Copilot Chat is not designed to answer non-coding questions”, so there are possible answers, especially in areas not strictly related to the code (design, etc.), that will not be appropriate or even sensible Dangerous code The training set (repositories) may also contain code elements that violate security rules, both in the security and safety sense, such as API keys, network scanning, IP addresses, code that overloads resources or causes memory leaks, etc. To minimize the negative impact of the identified GitHub Copilot limitations, you should always: Check alternative suggestions (using Ctrl+[ and Ctrl+], etc.) and choose the ones that best suit a given situation Read and analyze the correctness of the generated code Test and run code in pre-production environments – primarily locally and in the development environment Submit the generated code to code review Important: Never deploy the code generated by GitHub Copilot to production environments without performing the above checks. Configuration Guidelines In this section, we’ll present the basic information regarding the pricing plans (with advantages and disadvantages for each option, as seen from the perspective of our intended goals) and personal account configuration (for both GitHub Copilot and the IntelliJ IDEA plugin). Pricing Plans GitHub Copilot offers three subscription plans with different scopes of offered functionality and cost. In our case, two plans were worth considering: Copilot Individual or Copilot Business. The Copilot Enterprise plan additionally offers access to chat via the github.com website and generating summaries for pull requests, which was unimportant for our assumed goals (but it may be different in your case). Both plans’ main advantages and disadvantages are presented in the table below. Plan Advantages Disadvantages GitHub Copilot Individual Lower cost at $10/month/user Offers the key functionality required to achieve the intended goals Lack of control over tool configuration and user access by the organization GitHub Copilot Business Offers the key functionality required to achieve the intended goals Control over tool configuration and user access by the organization Higher cost at $19/month/user In our case, Copilot Business was the better option, especially because it allows full control over the configuration and access to the tool for developers in the team. If you’re working on your own, the Copilot Individual plan might be enough. Account Configuration You can configure GitHub Copilot when purchasing a subscription plan, and the settings can also be changed after activating the account in the organization’s account settings on GitHub. At the account level, there were two key parameters for our use case to configure in GitHub Copilot, described in the table below. Option name Description Recommended settings Suggestions matching public code Available options: Allowed and Blocked Determines whether to show or to block code suggestions that overlap around 150 lines with public code Blocked This option reduces the risk of duplicating code from public repositories, thus reducing the uncertainty about the copyright ownership of the code Allow GitHub to use my code snippets for product improvements Available options: Yes and No Determines whether GitHub, its affiliates, and third parties may use user code snippets to explore and improve GitHub Copilot suggestions, related product models, and features No If you plan to use GitHub Copilot for commercial purposes, GitHub and its associated entities should not use user code due to copyright considerations Here is a detailed description and instructions for changing configuration options in your GitHub account. IntelliJ IDEA Plugin Configuration To enable GitHub Copilot in the IntelliJ IDEA IDE, you must install the GitHub Copilot extension from the Visual Studio Code marketplace. Installation is done via the IDE in the plugin settings. After installation, log in to your GitHub account with your device code. You can find detailed instructions for installing and updating the plugin here. The GitHub Copilot plugin for the IntelliJ IDEA IDE offers the ability to configure the following things: Automatic submission of suggestions The way suggestions are displayed Automatic plugin updates Supported languages Keyboard shortcuts In our case, using the default plugin settings was recommended because they ensure good working comfort and are compatible with the existing tool documentation. Any changes to the configuration can be made by each user according to their own preferences. Our GitHub Copilot plugin settings in IntelliJ IDEA Our keymap settings for GitHub Copilot in IntelliJ IDEA How To Use GitHub Copilot in IntelliJ Here are some guidelines for using key functionalities that will help you use the GitHub Copilot tool optimally. Generating Application Code When To Use Creating classes Creating fields, methods, constructors Writing code snippets inside methods How To Use By writing code and using automatic suggestions – it’s always worth checking other suggestions using the Ctrl+] / Ctrl+[ keys By writing concise and precise comments in natural English Using the chat function – the chat can generate a fragment of code in response to a query (see examples in the section “Using the GitHub Copilot Chat” below) and allows you to quickly generate code using the Copy Code Block or Insert Code Block at Cursor buttons that appear in the section with code in the chat window Writing Unit Tests When To Use Creating new classes and methods that we want to cover with unit tests Coverage of existing classes and methods with unit tests How To Use By writing a comment in the test class. For example, if you write // Unit test in JUnit for CurrencyService, you will get the following result: It is possible to generate individual test methods by entering in the comment the test case that the method is to test. Similarly, you can generate mocks in the test class. Using the chat – you can select the GitHub Copilot > Generate Test option from the context menu, enter the /tests command, or write an instruction in a natural language, e.g., Generate unit test for class CurrencyService. In response, you will receive a descriptive explanation of the test structure and the code of the entire test class: Generating SQL Queries and Stored Procedures When To Use When writing DDL, DML, and DQL queries that will be used in the application During data analysis and errors related to data in the database When writing scripts and stored procedures How To Use IMPORTANT: you must have a database connection configured in IntelliJ IDEA or DataGrip By writing queries and using automatic suggestions By writing a comment, e.g. if you write – – get party data for account, you will get the following result: Creating OpenShift Configuration or Other Configuration Files When To Use Creating or modifying configuration files Analysis of directives, their options and values, and configuration mechanisms How To Use By writing directives and using automatic suggestions Using the chat – you can select the directive and choose GitHub Copilot > Explain This from the context menu, enter the /explain command, or write a query in natural language about a given configuration element Using the BASH Console When To Use When trying to use obscure console commands For an explanation of command operation and its options To find the right command to perform a task When writing BASH scripts How To Use IMPORTANT: to use the CLI tool, install GitHub CLI with the gh-copilot extension according to the instructions Currently, the tool offers two commands, summarized in the table below Command Example Result gh copilot suggest # gh copilot suggest “find IP number in text file” grep -E -o ‘([0-9]{1,3}\.){3}[0-9]{1,3}’ <filename> gh copilot explain # gh copilot explain “curl -k” curl is used to issue web requests, e.g., download web pages –k or –insecure allows curl to perform insecure SSL connections and transfers How To Use GitHub Copilot Chat We’ve written a separate chapter for the GitHub Copilot Chat – as there are several use cases worth talking about. Let’s go through them individually and discuss specific guidelines for each case. Creating New Functionalities When To Use When you are looking for a solution to a problem, such as creating a website, a method that performs a specific task, error handling for a given block of code/method/class, etc. How To Use Enter a query in natural English regarding the functionality you are looking for. It should concern topics related to programming – code, frameworks/libraries, services, architecture, etc. Below is an example for the query: How to get currency exchange data? Using Regular Expressions When To Use When you need to create and verify a regular expression How To Use Enter a query in natural English regarding the pattern you are looking for. The example below shows a generated method with an incorrect pattern, a query, and a response with an explanation and corrected code Finding Errors in the Code When To Use When you create new classes or methods When analyzing a class or method that causes errors How To Use You can select the code and choose GitHub Copilot > Fix This from the context menu, enter the /fix command, or write an instruction in natural English, e.g., Find possible errors in this class. You can specify a command to a method name or error type. For example, for a simple class, explanations of potential errors were obtained, and the chat generated code to handle these errors: Explanation of Existing Code When To Use When you don’t understand what exactly a module, class, method, piece of code, regular expression, etc., does When you don’t know the framework or library mechanism used How To Use In a class or method, you can select GitHub Copilot > Explain this from the context menu, type the /explain command, or write a query in natural English about the problematic code element, e.g., Explain what is this class doing. The example below presents an explanation of the class and its methods. This applies to the class generated in the bug-finding example Simplify Existing Code When To Use When the code is complicated and difficult to understand or unnecessarily extensive When refactoring the code How To Use In a class or selected method or code fragment, you can select GitHub Copilot > Simplify This from the context menu, type the /simplify command, or write a query in natural English. An example of a simple method of refactoring for a class is below: The result: Summary: A Powerful Tool, as Long as You’re Cautious As you can see, GitHub Copilot can be a powerful tool in a software developer’s arsenal. It can speed up and simplify various processes and day-to-day tasks. However, as with all things related to generative AI, you can never fully trust this tool – therefore, the crucial rule is to always read, review, and test what it creates.
Data is the new oil—a saying I often hear, and it couldn't be more accurate in today's highly interconnected world. Data migration is crucial for organizations worldwide, from startups aiming to scale rapidly to enterprises seeking to modernize IT infrastructure. However, as a tech enthusiast, I've often found myself navigating the complexities of large volumes of data across different environments. A data migration that is not well planned or executed, whether it is a one-time event or ongoing replication, is done manually, not automated using any scripts, or not tested well, which can potentially cause issues during the migration and increase the delay or downtime. To take this challenge head-on, I've interacted with several technology heads to ease data migration journeys and understand how AWS DMS streamlines data migration journeys. AWS DMS sets up a platform to execute migrations effectively with minimal downtime. I've also realized that we can completely automate this process using Terraform IAC to trigger migration for any supported source database to the target database. Using Terraform, we can create an infrastructure required for target nodes and AWS DMS resources, which can complete the data migration automatically. In this blog, we'll dive deep into the intricacies of data migration using AWS DMS and Terraform IAC. In this blog, we'll learn: What is AWS Data Migration Service (AWS DMS)? How to Automate Data Migration using AWS DMS and Terraform IAC Key Benefits and Features of AWS DMS? Let's get started! 1. What Is AWS DMS (Database Migration Service)? AWS DMS (Database Migration Service) is a cloud-based tool that facilitates database migration to the AWS Cloud by replicating data from any supported source to any supported target. It also supports continuous data capture (CDC) functionality, which replicates data from source to target on an ongoing basis. AWS DMS Architectural Overview Use Cases of AWS DMS AWS Database Migration Service (AWS DMS) supports many use cases, from like-to-like migrations to complex cross-platform transitions. Homogeneous Data Migration Homogeneous database migration migrates data between identical or similar databases. This one-step process is straightforward due to the consistent schema structure and data types between the source and target databases. Homogeneous Database Migration Heterogeneous Database Migration Heterogeneous database migration involves transferring data between different databases, such as Oracle to Amazon Aurora, Oracle to PostgreSQL, or SQL Server to MySQL. This process requires converting the source schema and code to match the target database. Using the AWS Schema Conversion Tool, this migration becomes a two-step procedure: schema transformation and data migration. Source schema and code conversion involve transforming tables, views, stored procedures, functions, data types, synonyms, etc. Any objects that the AWS Schema Conversion Tool can't automatically convert are clearly marked for manual conversion to complete the migration. DMS Schema Conversion Heterogeneous Database Migrations Prerequisites for AWS DMS The following are prerequisites for AWS DMS data migration Access to source and target endpoints through firewall and security groups Source endpoint connection Target endpoint connection Replication instance Target schema or database CloudWatch event to trigger the Lambda function Lambda function to start the replication task Resource limit increase AWS DMS Components Before migrating to AWS DMS, let's understand AWS DMS components. Replication Instance Replication instances are managed by Amazon EC2 instances that handle replication jobs. They connect to the source data store, read and format the data for the target, and load it into the target data store. Replication Instance Source and Target Endpoints AWS DMS uses endpoints to connect to source and target databases, allowing it to migrate data from a source endpoint to a target endpoint. Supported Source Endpoints Include: Supported source endpoints include Google Cloud for MySQL, Amazon RDS for PostgreSQL, Microsoft SQL Server, Oracle Database, Amazon DocumentDB, PostgreSQL, Microsoft Azure SQL Database, IBM DB2, Amazon Aurora with MySQL compatibility, MongoDB, Amazon RDS for Oracle, Amazon S3, Amazon RDS for MariaDB, Amazon RDS for Microsoft SQL Server, MySQL, Amazon RDS for MySQL, Amazon Aurora with PostgreSQL compatibility, MariaDB, and SAP Adaptive Server Enterprise (ASE). Supported Target Endpoints Include Supported target endpoints include PostgreSQL, SAP Adaptive Server Enterprise (ASE), Google Cloud for MySQL, IBM DB2, MySQL, Amazon RDS for Microsoft SQL Server, Oracle Database, Amazon RDS for MariaDB, Amazon Aurora with MySQL compatibility, MariaDB, Amazon S3, Amazon RDS for PostgreSQL, Microsoft SQL Server, Amazon DocumentDB, Microsoft Azure SQL Database, Amazon RDS for Oracle, MongoDB, Amazon Aurora with PostgreSQL compatibility, Amazon RDS for MySQL, and Amazon RDS for Microsoft SQL Server. Replication Tasks Replication tasks facilitate smooth data transfer from a source endpoint to a target endpoint. This involves specifying the necessary tables and schemas for migration and any special processing requirements such as logging, control table data, and error handling. Creating a replication task is a crucial step before starting the migration, which includes defining the migration type, source and target endpoints, and the replication instance. A replication task includes three main migration types: Total Load: Migrates existing data only. Full Load with CDC (Change Data Capture): Migrates existing data and continuously replicates changes. CDC Only (Change Data Capture): Continuously replicates only the changes in data. Validation Only: Focuses solely on data validation. These types lead to three main phases: Migration of Existing Data (Full Load): AWS DMS transfers Data from the source tables to the target tables. Cached Changes Application: While the total load is in progress, changes to the loading tables are cached on the replication server. Once the total load for a table is complete, AWS DMS applies the cached changes. Ongoing Replication (Change Data Capture): Initially, a transaction backlog delays the source and target databases. Over time, this backlog is processed, achieving a steady migration flow. This detailed explanation ensures that AWS DMS methodically guides the data migration process, maintaining data integrity and consistency. CloudWatch Events AWS CloudWatch EventBridge delivers notifications about AWS DMS events, such as replication task initiation/deletion and replication instance creation/removal. EventBridge receives these events and directs notifications based on predefined rules. Lambda Function We use an AWS Lambda function to initiate replication tasks. When an event signaling task creation occurs in AWS DMS, the Lambda function is automatically triggered by the configured EventBridge rules. Resource Limits In managing AWS Database Migration Service (DMS), we adhere to default resource quotas, which serve as soft limits. With assistance from AWS support tickets, these limits can be increased as needed to ensure optimal performance. Critical AWS DMS resource limits include: Endpoints per user account: 1000 (default) Endpoints per replication instance: 100 (default) Tasks per user account: 600 (default) Tasks per replication instance: 200 (default) Replication instances per user account: 60 (default) For example, to migrate 100 databases from an On-Prem MySQL source to RDS MySQL, we use the following calculation: Tasks per database: 1 Endpoints per database: 2 Endpoints per replication instance: 100 Total tasks per replication instance = Endpoints per replication instance / Endpoints per database = 100 / 2 = 50. This means we can migrate up to 50 databases per replication instance. Using two replication instances, we can migrate all 100 databases efficiently in one go. This approach exemplifies the strategic use of resource quotas for effective database migration. How To Automate Data Migration With Terraform IaC: Overview Terraform and DMS automate and secure data migration, simplifying the process while managing AWS infrastructure efficiently. Here's a step-by-step overview of this seamless and secure migration process: Step 1: Fetching Migration Database List Retrieve a list of databases to be migrated. Step 2: Database Creation (Homogeneous Migration) Create target schema or database structures to prepare for data transition in case of homogeneous data migrations. Step 3: Replication Subnet Group Creation Create replication subnet groups to ensure seamless network communication for data movement. Step 4: Source/Target Connection Endpoints Equip each database set for migration with source and target connection. Step 5: Replication Instance Creation Create replication instances to handle the data migration process. Step 6: Lambda Integration With Cloud Watch Events Integrate a CloudWatch event and Lambda function to initiate replication tasks. Step 7: Replication Task Creation and Assignment Create and assign replication tasks to replication instances, setting up the migration. Step 8: Migration Task Initiation Migration tasks are initiated for each database. Migration Process & Workflow Diagram Architecture Overview for Data Migration Automation AWS DMS with Terraform Infrastructure as Code (IAC) automates the data migration. The data migration automation process begins with the dynamic framework of Jenkins pipelines. This framework uses various input parameters to customize and tailor the migration process, offering flexibility and adaptability. Here's a detailed overview of the architecture: AWS DMS Architecture with Terraform IAC Step 1: Jenkins Pipeline Parameters The Jenkins pipeline for AWS DMS starts by defining essential input parameters, such as region and environment details, Terragrunt module specifics, and migration preferences. Key input parameters include: AWS_REGION: Populates the region list from the repository. APP_ENVIRONMENT: Populates the application environment list from the repository. TG_MODULE: Populates the Terragrunt module folder list from the repository. TG_ACTION: Allows users to select Terragrunt actions from plan, validate, and apply). TG_EXTRA_FLAGS: Users can pass Terragrunt more flags. FETCH_DBLIST: Determines the migration DB list generation type (AUTOMATIC and MANUAL). CUSTOM_DBLIST: SQL Server custom Database list for migration if FETCH_DBLIST is selected as MANUAL. MIGRATION_TYPE: Allows users to choose the DMS migration type (full-load, full-load-and-cdc, cdc). START_TASKS: Allows users to turn migration task execution on or off. TEAMS: MS Teams channel for build notifications. Step 2: Execution Stages Based on the input parameters, the pipeline progresses through distinct execution stages: Source Code Checkout for IAC: The pipeline begins by checking out the source code for IAC, establishing a solid foundation for the following steps. Migration Database List: Depending on the selected migration type, the pipeline automatically fetches the migration database list from the source instance or uses a manual list. Schema or Database Creation: The target instance is created by creating the necessary schema or database structures for data migration. Terraform/Terragrunt Execution: The pipeline executes Terraform or Terragrunt modules to facilitate the AWS DMS migration process. Notifications: Updates are sent via email or MS Teams throughout the migration process. Step 3: Automatic and Manual List Fetching Fetched migration database list automatically from the source instance using a shell script and keeping FETCH_DBLIST automatic. Alternatively, users can manually provide a selective list for migration. Step 4: Migration Types The Terraform/Terragrunt module initiates CDC, full-load-and-cdc, and full-load migrations based on the specified migration type in MIGRATION_TYPE. Step 5: Automation Control Initiate the migration task, either manually or automatically, with START_TASKS. Step 6: Credentials Management For security, retrieve database credentials from AWS Secrets Manager while executing DMS Terraform/Terragrunt modules. Step 7: Endpoint Creation Establish endpoints for target and source instances, facilitating seamless connection and data transfer. Step 8: Replication Instances Create replication instances based on the database count or quota limits. Step 9: CloudWatch Integration Configure AWS CloudWatch events to trigger a Lambda function after AWS DMS replication tasks are created. Step 10: Replication Task Configuration Create replication tasks for individual databases and assign them to available replication instances for optimized data transfer. Step 11: Task Automation Replication tasks automatically start using the Lambda function in the Ready State. Step 12: Monitoring Migration Use the AWS DMS Console for real-time monitoring of data migration progress, gaining insights into the migration journey. Step 13: Ongoing Changes Seamlessly replicate ongoing changes into the target instance after the migration, ensuring data consistency. Step 14: Automated Validation Automatically validate migrated data against source and target instances based on provided validation configurations to reinforce data integrity. Step 15: Completion and Configuration Ensure user migration and database configurations are completed post-validation. Step 16: Target Testing and Validation Update the application configuration to use the target instance for testing to ensure functionality. Step 17: Cutover Replication Execute cutover replication from the source instance after thorough testing, taking a final snapshot of the source instance to conclude the process. Key Features and Benefits of AWS DMS With Terraform AWS DMS with Terraform IAC offers several benefits: cost-efficiency, ease of use, minimized downtime, and robust replication. Cost Optimization AWS DMS Migration offers a cost-effective model as it costs as per compute resources and additional log storage. Ease of Use The migration process is simplified with no need for specific drivers or application installations and often no changes to the source database. One-click resource creation streamlines the entire migration journey. Continuous Replication and Minimal Downtime AWS DMS ensures continuous source database replication, even while operational, enabling minimal downtime and seamless database switching. Ongoing Replication Maintaining synchronization between source and target databases with ongoing replication tasks ensures data consistency. Diverse Source/Target Support AWS DMS supports migrations from like-to-like (e.g., MySQL to MySQL) to heterogeneous migrations (e.g., Oracle to Amazon Aurora) across SQL, NoSQL, and text-based targets. Database Consolidation AWS DMS with Terraform can easily consolidate multiple source databases into a single target database, which applies to homogeneous and heterogeneous migrations. Efficiency in Schema Conversion and Migration AWS DMS minimizes manual effort in tasks such as migrating users, stored procedures, triggers, and schema conversion while validating the target database against application functionality. Automated Provisioning With Terraform IAC Leverage Terraform for automated creation and destruction of AWS DMS replication tasks, ideal for managing migrations involving multiple databases. Automated Pipeline Integration Integrate seamlessly with CI/CD pipelines for efficient migration management, monitoring, and progress tracking. Conclusion This blog talks in detail about how the combination of AWS DMS and Terraform IAC can be used to automate data migration. The blog serves as a guide, exploring the synergy between these technologies and equipping businesses with the tools for optimized digital transformation.
In this blog on AWS, I will do a comparison study among two EC2 initialization/configuration tools — User Data and AMI, which help in the configuration and management of EC2 instances. EC2 User Data EC2 User Data is a powerful feature of EC2 instances that allows you to automate tasks and customize your instances during the bootstrapping process. It’s a versatile tool that can be used to install software, configure instances, and even perform complex setup tasks. User Data refers to data that is provided by the user when launching an instance. This data is generally used to perform automated configuration tasks and bootstrap scripts when the instance boots for the first time. Purpose To automate configuration tasks and software installations when an instance is launched. Key Features Automation of Initial Configuration It can include scripts (e.g., shell scripts), commands, or software installation instructions. Runs on First Boot Executes only once during the initial boot (first start) of the instance unless specified otherwise. Use Cases Initialization Tasks Set up environment variables, download and install software packages, configure services, and more when the instance starts. One-Time Setup Run scripts that should only be executed once at the instance’s first boot. Dynamic Configurations Apply configurations that might change frequently and are specific to each instance launch. EC2 AMI An Amazon Machine Image (AMI) is a master image for the creation of EC2 instances. It is a template that contains a software configuration (operating system, application server, and applications) necessary to launch an EC2 instance. You can create your own AMI or use pre-built ones provided by AWS or AWS Marketplace vendors. Purpose To provide a consistent and repeatable environment for launching instances. Key Features Pre-Configured Environment Includes everything needed to boot the instance, including the operating system and installed applications. Reusable and Shareable Once created, an AMI can be used to launch multiple instances, shared with other AWS accounts, or even made public. Use Cases Base Images Create standardized base images with all necessary configurations and software pre-installed. Consistency Ensure that all instances launched from the same AMI have identical configurations. Faster Deployments Launch instances faster since the AMI already includes the required software and configurations. Key Differences Scripting vs. Pre-Configured User Data allows you to run a script when you launch an instance, automating tasks like installing software, writing files, or otherwise configuring the new instance. AMIs contain a snapshot of a configured instance, meaning all the software and settings are preserved. Dynamic Configuration vs. Quick Launch User Data is a flexible way to handle the instance configuration dynamically at the time of instance launch. Using an AMI that has software pre-installed can speed up instance deployment. Uniformity vs. Immutable With User Data, you can use a single AMI for all your instances and customize each instance on launch. AMIs are immutable, so each instance launched from the AMI has the same configuration. Late Binding vs. Early Binding Changes to User Data can be made at any time prior to instance launch, giving you more flexibility to adjust your instance’s behavior. Since the AMI is pre-configured, changes to the instance configuration must be made by creating a new AMI ONLY. Stateless vs. Stateful User Data is generally designed to be stateless, meaning the configuration is specified each time you launch a new instance and it is not saved with the instance. Once an AMI is created, it represents the saved state of an instance. This can include installed software, system settings, and even data. Resource Intensive vs. Resource Efficient With User Data, running complex scripts can be resource-intensive and can delay the time it takes for an instance to become fully operational. Since, in AMI, everything is pre-configured, fewer startup resources are needed. Size Limitation vs. No Size Limitation User Data is limited to 16KB. There are no specific size limitations for AMIs, other than the size of the EBS volume or instance storage. Security Sensitive data in User Data should be handled carefully as it’s visible in the EC2 console and through the API. AMIs can be encrypted, and access can be restricted to specific AWS accounts. However, once an AMI is launched, its settings and data are exposed to the account that owns the instance. Troubleshooting Errors in User Data scripts can sometimes be difficult to troubleshoot, especially if they prevent the instance from starting correctly. Errors in AMIs are easier to troubleshoot since you can start and stop instances, taking snapshots at various states for analysis. Commonalities Instance Initialization and Configuration Both User Data and AMIs are used to configure EC2 instances. User Data allows for dynamic script execution at boot time, while AMIs provide a snapshot of a pre-configured system state, including the operating system and installed applications. Automation Both tools enhance the automation capabilities of AWS EC2. User Data automates the process of setting up and configuring a new instance at launch, whereas AMIs automate the deployment of new instances by providing a consistent, repeatable template for instance creation. Scalability User Data and AMIs both support scalable deployment strategies. User Data can be used to configure instances differently based on their role or purpose as they are launched, adapting to scalable environments. AMIs allow for the rapid scaling of applications by launching multiple identical instances quickly and efficiently. Customization Both provide mechanisms for customizing EC2 instances. With User Data, users can write scripts that apply custom configurations every time an instance is launched. With AMIs, users can create a customized image that includes all desired configurations and software, which can be reused across multiple instance launches. Integration With AWS Services Both integrate seamlessly with other AWS services. For example, both can be utilized alongside AWS Auto Scaling to ensure that new instances are configured properly as they enter the service pool. They also work with AWS Elastic Load Balancing to distribute traffic to instances that are either launched from a custom AMI or configured via User Data. Security and Compliance Both can be configured to adhere to security standards and compliance requirements. For AMIs, security configurations, software patches, and compliance settings can be pre-applied. For User Data, security scripts and configurations can be executed at launch to meet specific security or compliance criteria. Version Control and Updates In practice, both User Data and AMIs can be version-controlled. For User Data, scripts can be maintained in source control repositories and updated as needed. For AMIs, new versions can be created following updates or changes, allowing for rollback capabilities and history tracking. Conclusion In essence, while User Data is suited for dynamic and specific configurations at instance launch, AMIs provide a way to standardize and expedite deployments across multiple instances. This is just an attempt to clear out ambiguities between EC2 initialization/configuration tools — User Data and AMI. Hope you find this article helpful in understanding the two important EC2 Configuration tools of AWS. Thank you for reading!! Please don’t forget to like, share, and also feel free to share your thoughts in the comments section.
As a Linux administrator or even if you are a newbie who just started using Linux, having a good understanding of useful commands in troubleshooting network issues is paramount. We'll explore the top 10 essential Linux commands for diagnosing and resolving common network problems. Each command will be accompanied by real-world examples to illustrate its usage and effectiveness. 1. ping Example: ping google.com Shell test@ubuntu-server ~ % ping google.com -c 5 PING google.com (142.250.189.206): 56 data bytes 64 bytes from 142.250.189.206: icmp_seq=0 ttl=58 time=14.610 ms 64 bytes from 142.250.189.206: icmp_seq=1 ttl=58 time=18.005 ms 64 bytes from 142.250.189.206: icmp_seq=2 ttl=58 time=19.402 ms 64 bytes from 142.250.189.206: icmp_seq=3 ttl=58 time=22.450 ms 64 bytes from 142.250.189.206: icmp_seq=4 ttl=58 time=15.870 ms --- google.com ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 14.610/18.067/22.450/2.749 ms test@ubuntu-server ~ % Explanation ping uses ICMP protocol, where ICMP stands for internet control message protocol and ICMP is a network layer protocol used by network devices to communicate. ping helps in testing the reachability of the host and it will also help in finding the latency between the source and destination. 2. traceroute Example: traceroute google.com Shell test@ubuntu-server ~ % traceroute google.com traceroute to google.com (142.250.189.238), 64 hops max, 52 byte packets 1 10.0.0.1 (10.0.0.1) 6.482 ms 3.309 ms 3.685 ms 2 96.120.90.197 (96.120.90.197) 13.094 ms 10.617 ms 11.351 ms 3 po-301-1221-rur01.fremont.ca.sfba.comcast.net (68.86.248.153) 12.627 ms 11.240 ms 12.020 ms 4 ae-236-rar01.santaclara.ca.sfba.comcast.net (162.151.87.245) 18.902 ms 44.432 ms 18.269 ms 5 be-299-ar01.santaclara.ca.sfba.comcast.net (68.86.143.93) 14.826 ms 13.161 ms 12.814 ms 6 69.241.75.42 (69.241.75.42) 12.236 ms 12.302 ms 69.241.75.46 (69.241.75.46) 15.215 ms 7 * * * 8 142.251.65.166 (142.251.65.166) 21.878 ms 14.087 ms 209.85.243.112 (209.85.243.112) 14.252 ms 9 nuq04s39-in-f14.1e100.net (142.250.189.238) 13.666 ms 192.178.87.152 (192.178.87.152) 12.657 ms 13.170 ms test@ubuntu-server ~ % Explanation Traceroute shows the route packets take to reach a destination host. It displays the IP addresses of routers along the path and calculates the round-trip time (RTT) for each hop. Traceroute helps identify network congestion or routing issues. 3. netstat Example: netstat -tulpn Shell test@ubuntu-server ~ % netstat -tuln Active LOCAL (UNIX) domain sockets Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr aaf06ba76e4d0469 stream 0 0 0 aaf06ba76e4d03a1 0 0 /var/run/mDNSResponder aaf06ba76e4d03a1 stream 0 0 0 aaf06ba76e4d0469 0 0 aaf06ba76e4cd4c1 stream 0 0 0 aaf06ba76e4ccdb9 0 0 /var/run/mDNSResponder aaf06ba76e4cace9 stream 0 0 0 aaf06ba76e4c9e11 0 0 /var/run/mDNSResponder aaf06ba76e4d0b71 stream 0 0 0 aaf06ba76e4d0aa9 0 0 /var/run/mDNSResponder test@ubuntu-server ~ % Explanation Netstat displays network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. It's useful for troubleshooting network connectivity, identifying open ports, and monitoring network performance. 4. ifconfig/ip Example: ifconfig or ifconfig <interface name> Shell test@ubuntu-server ~ % ifconfig en0 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=6460<TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> ether 10:9f:41:ad:91:60 inet 10.0.0.24 netmask 0xffffff00 broadcast 10.0.0.255 inet6 fe80::870:c909:df17:7ed1%en0 prefixlen 64 secured scopeid 0xc inet6 2601:641:300:e710:14ef:e605:4c8d:7e09 prefixlen 64 autoconf secured inet6 2601:641:300:e710:d5ec:a0a0:cdbb:79a7 prefixlen 64 autoconf temporary inet6 2601:641:300:e710::6cfc prefixlen 64 dynamic nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active test@ubuntu-server ~ % Explanation ifconfig and ip commands are used to view and configure network parameters. They provide information about the IP address, subnet mask, MAC address, and network status of each interface. 5. tcpdump Example:tcpdump -i en0 tcp port 80 Shell test@ubuntu-server ~ % tcpdump -i en0 tcp port 80 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on en0, link-type EN10MB (Ethernet), snapshot length 524288 bytes 0 packets captured 55 packets received by filter 0 packets dropped by kernel test@ubuntu-server ~ % Explanation Tcpdump is a packet analyzer that captures and displays network traffic in real-time. It's invaluable for troubleshooting network issues, analyzing packet contents, and identifying abnormal network behavior. Use tcpdump to inspect packets on specific interfaces or ports. 6. nslookup/dig Example: nslookup google.com or dig Shell test@ubuntu-server ~ % nslookup google.com Server: 2001:558:feed::1 Address: 2001:558:feed::1#53 Non-authoritative answer: Name: google.com Address: 172.217.12.110 test@ubuntu-server ~ % test@ubuntu-server ~ % dig google.com ; <<>> DiG 9.10.6 <<>> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46600 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 164 IN A 142.250.189.206 ;; Query time: 20 msec ;; SERVER: 2001:558:feed::1#53(2001:558:feed::1) ;; WHEN: Mon Apr 15 22:55:35 PDT 2024 ;; MSG SIZE rcvd: 55 test@ubuntu-server ~ % Explanation nslookup and dig are DNS lookup tools used to query DNS servers for domain name resolution. They provide information about the IP address associated with a domain name and help diagnose DNS-related problems such as incorrect DNS configuration or server unavailability. 7. iptables/firewalld Example: iptables -L or firewall-cmd --list-all Shell test@ubuntu-server ~# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination test@ubuntu-server ~# Explanation iptables and firewalld are firewall management tools used to configure packet filtering and network address translation (NAT) rules. They control incoming and outgoing traffic and protect the system from unauthorized access. Use them to diagnose firewall-related issues and ensure proper traffic flow. 8. ss Example: ss -tulpn Shell test@ubuntu-server ~# Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port udp UNCONN 0 0 *:161 *:* udp UNCONN 0 0 *:161 *:* test@ubuntu-server ~# Explanation ss is a utility to investigate sockets. It displays information about TCP, UDP, and UNIX domain sockets, including listening and established connections, connection state, and process IDs. ss is useful for troubleshooting socket-related problems and monitoring network activity. 9. arp Example: arp -a Shell test@ubuntu-server ~ % arp -a ? (10.0.0.1) at 80:da:c2:95:aa:f7 on en0 ifscope [ethernet] ? (10.0.0.57) at 1c:4d:66:bb:49:a on en0 ifscope [ethernet] ? (10.0.0.83) at 3a:4a:df:fe:66:58 on en0 ifscope [ethernet] ? (10.0.0.117) at 70:2a:d5:5a:cc:14 on en0 ifscope [ethernet] ? (10.0.0.127) at fe:e2:1c:4d:b3:f7 on en0 ifscope [ethernet] ? (10.0.0.132) at bc:d0:74:9a:51:85 on en0 ifscope [ethernet] ? (10.0.0.255) at ff:ff:ff:ff:ff:ff on en0 ifscope [ethernet] mdns.mcast.net (224.0.0.251) at 1:0:5e:0:0:fb on en0 ifscope permanent [ethernet] ? (239.255.255.250) at 1:0:5e:7f:ff:fa on en0 ifscope permanent [ethernet] test@ubuntu-server ~ % Explanation arp (Address Resolution Protocol) displays and modifies the IP-to-MAC address translation tables used by the kernel. It resolves IP addresses to MAC addresses and vice versa. arp is helpful for troubleshooting issues related to network device discovery and address resolution. 10. mtr Example: mtr Shell test.ubuntu.com (0.0.0.0) Tue Apr 16 14:46:40 2024 Keys: Help Display mode Restart statistics Order of fields quit Packets Ping Host Loss% Snt Last Avg Best Wrst StDev 1. 10.0.0.10 0.0% 143 0.8 9.4 0.7 58.6 15.2 2. 10.0.2.10 0.0% 143 0.8 9.4 0.7 58.6 15.2 3. 192.168.0.233 0.0% 143 0.8 9.4 0.7 58.6 15.2 4. 142.251.225.178 0.0% 143 0.8 9.4 0.7 58.6 15.2 5. 142.251.225.177 0.0% 143 0.8 9.4 0.7 58.6 15.2 Explanation mtr (My traceroute) combines the functionality of ping and traceroute into a single diagnostic tool. It continuously probes network paths between the host and a destination, displaying detailed statistics about packet loss, latency, and route changes. Mtr is ideal for diagnosing intermittent network problems and monitoring network performance over time. Mastering these commands comes in handy for troubleshooting network issues on Linux hosts.
Is it possible to build a time-tracking app in just a few hours? It is, and in this article, I'll show you how! I’m a senior backend Java developer with 8 years of experience in building web applications. I will show you how satisfying and revolutionary it can be to save a lot of time on building my next one. The approach I use is as follows: I want to create a time-tracking application (I called it Timelog) that integrates with the ClickUp API. It offers a simple functionality that will be very useful here: creating time entries remotely. In order to save time, I will use some out-of-the-box functionalities that the Openkoda platform offers. These features are designed with developers in mind. Using them, I can skip building standard features that are used in every web application (over and over again). Instead, I can focus on the core business logic. I will use the following pre-built features for my application needs: Login/password authentication User and organization management Different user roles and privileges Email sender Logs overview Server-side code editor Web endpoints creator CRUDs generator Let’s get started! Timelog Application Overview Our sample internal application creates a small complex system that can then be easily extended both model-wise and with additional business logic or custom views. The main focus of the application is to: Store the data required to communicate with the ClickUp API. Assign users to their tickets. Post new time entries to the external API. To speed up the process of building the application, we relied on some of the out-of-the-box functionalities mentioned above. At this stage, we used the following ones: Data model builder (Form) - Allows us to define data structures without the need to recompile the application, with the ability to adjust the data schema on the fly Ready-to-use management functionalities - With this one, we can forget about developing things like authentication, security, and standard dashboard view. Server-side code editor - Used to develop a dedicated service responsible for ClickUp API integration, it is coded in JavaScript all within the Openkoda UI. WebEndpoint builder - Allows us to create a custom form handler that uses a server-side code service to post time tracking entry data to the ClickUp servers instead of storing it in our internal database Step 1: Setting Up the Architecture To implement the functionality described above and to store the required data, we designed a simple data model, consisting of the following five entities. ClickUpConfig, ClickUpUser, Ticket, and Assignment are designed to store the keys and IDs required for connections and messages sent to the ClickUp API. The last one, TimeEntry, is intended to take advantage of a ready-to-use HTML form (Thymeleaf fragment), saving a lot of time on its development. The following shows the detailed structure of a prepared data model for the Timelog ClickUp integration. ClickUpConfig apiKey - ClickUp API key teamId - ID of space in ClickUp to create time entry in ClickUpUser userId - Internal ID of a User clickUpUserId - ID of a user assigned to a workspace in ClickUp Ticket name - Internal name of the ticket clickUpTicketid - ID of a ticket in ClickUp to create time entries Assignment userId - Internal ID of a User ticketId - Internal ID of a Ticket TimeEntry userId - Internal ID of a User ticketId - Internal ID of a ticket date - Date of a time entry durationHours - Time entry duration provided in hours durationMinutes - Time entry duration provided in minutes description - Short description for created time entry We want to end up with five data tiles on the dashboard: Step 2: Integrating With ClickUp API We integrated our application with the ClickUp API specifically using its endpoint to create time entries in ClickUp. To connect the Timelog app with our ClickUp workspace, it is required to provide the API Key. This can be done using either a personal API token or a token generated by creating an App in the ClickUp dashboard. For information on how to retrieve one of these, see the official ClickUp documentation. In order for our application to be able to create time entries in our ClickUp workspace, we need to provide some ClickUp IDs: teamId: This is the first ID value in the URL after accessing your workspace. userId: To check the user’s ClickUp ID (Member ID), go to Workspace -> Manage Users. On the Users list, select the user’s Settings and then Copy Member ID. taskId: Task ID is accessible in three places on the dashboard: URL, task modal, and tasks list view. See the ClickUp Help Center for detailed instructions. You can recognize the task ID being prefixed by the # sign - we use the ID without the prefix. Step 3: Data Model Magic With Openkoda Openkoda uses the Byte Buddy library to dynamically build entity and repository classes for dynamically registered entities during the runtime of our Spring Boot application. Here is a short snippet of entity class generation in Openkoda (a whole service class is available on their GitHub). Java dynamicType = new ByteBuddy() .with(SKIP_DEFAULTS) .subclass(OpenkodaEntity.class) .name(PACKAGE + name) .annotateType(entity) .annotateType(tableAnnotation) .defineConstructor(PUBLIC) .intercept(MethodCall .invoke(OpenkodaEntity.class.getDeclaredConstructor(Long.class)) .with((Object) null)); Openkoda provides a custom form builder syntax that defines the structure of an entity. This structure is then used to generate both entity and repository classes, as well as HTML representations of CRUD views such as a paginated table with all records, a settings form, and a simple read-only view. All of the five entities from the data model described earlier have been registered in the same way, only by using the form builder syntax. The form builder snippet for the Ticket entity is presented below. JavaScript a => a .text("name") .text("clickUpTaskId") The definition above results in having the entity named Ticket with a set of default fields for OpenkodaEntity and two custom ones named “name” and “clickUpTaskId”. The database table structure for dynamically generated Ticket entity is as follows: Markdown Table "public.dynamic_ticket" Column | Type | Collation | Nullable | Default ------------------+--------------------------+-----------+----------+----------------------- id | bigint | | not null | created_by | character varying(255) | | | created_by_id | bigint | | | created_on | timestamp with time zone | | | CURRENT_TIMESTAMP index_string | character varying(16300) | | | ''::character varying modified_by | character varying(255) | | | modified_by_id | bigint | | | organization_id | bigint | | | updated_on | timestamp with time zone | | | CURRENT_TIMESTAMP click_up_task_id | character varying(255) | | | name | character varying(255) | | | The last step of a successful entity registration is to refresh the Spring context so it recognizes the new repository beans and for Hibernate to acknowledge entities. It can be done by restarting the application from the Admin Panel (section Monitoring). Our final result is an auto-generated full CRUD for the Ticket entity. Auto-generated Ticket settings view: Auto-generated all Tickets list view: Step 4: Setting Up Server-Side Code as a Service We implemented ClickUp API integration using the Openkoda Server-Side Code keeping API calls logic separate as a service. It is possible to use the exported JS functions further in the logic of custom form view request handlers. Then we created a JavaScript service that delivers functions responsible for ClickUp API communication. Openkoda uses GraalVM to run any JS code fully on the backend server. Our ClickupAPI server-side code service has only one function (postCreateTimeEntry) which is needed to meet our Timelog application requirements. JavaScript export function postCreateTimeEntry(apiKey, teamId, duration, description, date, assignee, taskId) { let url = `https://api.clickup.com/api/v2/team/${teamId}/time_entries`; let timeEntryReq = { duration: duration, description: '[Openkoda Timelog] ' + description, billable: true, start: date, assignee: assignee, tid: taskId, }; let headers = {Authorization: apiKey}; return context.services.integrations.restPost(url, timeEntryReq, headers); } To use such a service later on in WebEndpoints, it is easy enough to follow the standard JS import expression import * as clickupAPI from 'clickupAPI';. Step 5: Building Time Entry Form With Custom GET/POST Handlers Here, we prepare the essential screen for our demo application: the time entry form which posts data to the ClickUp API. All is done in the Openkoda user interface by providing simple HTML content and some JS code snippets. The View The HTML fragment is as simple as the one posted below. We used a ready-to-use form Thymeleaf fragment (see form tag) and the rest of the code is a standard structure of a Thymeleaf template. HTML <!--DEFAULT CONTENT--> <!DOCTYPE html> <html xmlns:th="http://www.thymeleaf.org" xmlns:layout="http://www.ultraq.net.nz/thymeleaf/layout" lang="en" layout:decorate="~{${defaultLayout}"> <body> <div class="container"> <h1 layout:fragment="title"/> <div layout:fragment="content"> <form th:replace="~{generic-forms::generic-form(${TimeEntry}, 'TimeEntry', '', '', '', 'Time Entry', #{template.save}, true)}"></form> </div> </div> </body> </html> HTTP Handlers Once having a simple HTML code for the view, we need to provide the actual form object required for the generic form fragment (${TimeEntry}). We do it inside a GET endpoint as a first step, and after that, we set the currently logged user ID so there’s a default value selected when entering the time entry view. JavaScript flow .thenSet("TimeEntry", a => a.services.data.getForm("TimeEntry")) .then(a => a.model.get("TimeEntry").dto.set("userId", a.model.get("userEntityId"))) Lastly, the POST endpoint is registered to handle the actual POST request sent from the form view (HTML code presented above). It implements the scenario where a user enters the time entry form, provides the data, and then sends the data to the ClickUp server. The following POST endpoint JS code: Receives the form data. Reads the additional configurations from the internal database (like API key, team ID, or ClickUp user ID). Prepares the data to be sent. Triggers the clickupAPI service to communicate with the remote API. JavaScript import * as clickupAPI from 'clickupAPI'; flow .thenSet("clickUpConfig", a => a.services.data.getRepository("clickupConfig").search( (root, query, cb) => { let orgId = a.model.get("organizationEntityId") != null ? a.model.get("organizationEntityId") : -1; return cb.or(cb.isNull(root.get("organizationId")), cb.equal(root.get("organizationId"), orgId)); }).get(0) ) .thenSet("clickUpUser", a => a.services.data.getRepository("clickupUser").search( (root, query, cb) => { let userId = a.model.get("userEntityId") != null ? a.model.get("userEntityId") : -1; return cb.equal(root.get("userId"), userId); }) ) .thenSet("ticket", a => a.form.dto.get("ticketId") != null ? a.services.data.getRepository("ticket").findOne(a.form.dto.get("ticketId")) : null) .then(a => { let durationMs = (a.form.dto.get("durationHours") != null ? a.form.dto.get("durationHours") * 3600000 : 0) + (a.form.dto.get("durationMinutes") != null ? a.form.dto.get("durationMinutes") * 60000 : 0); return clickupAPI.postCreateTimeEntry( a.model.get("clickUpConfig").apiKey, a.model.get("clickUpConfig").teamId, durationMs, a.form.dto.get("description"), a.form.dto.get("date") != null ? (new Date(a.services.util.toString(a.form.dto.get("date")))).getTime() : Date.now().getTime(), a.model.get("clickUpUser").length ? a.model.get("clickUpUser").get(0).clickUpUserId : -1, a.model.get("ticket") != null ? a.model.get("ticket").clickUpTaskId : '') }) Step 6: Our Application Is Ready! This is it! I built a complex application that is capable of storing the data of users, assignments to their tickets, and any properties required for ClickUp API connection. It provides a Time Entry form that covers ticket selection, date, duration, and description inputs of a single time entry and sends the data from the form straight to the integrated API. Not to forget about all of the pre-built functionalities available in Openkoda like authentication, user accounts management, logs overview, etc. As a result, the total time to create the Timelog application was only a few hours. What I have built is just a simple app with one main functionality. But there are many ways to extend it, e.g., by adding new structures to the data model, by developing more of the ClickUp API integration, or by creating more complex screens like the calendar view below. If you follow almost exactly the same scenario as I presented in this case, you will be able to build any other simple (or not) business application, saving time on repetitive and boring features and focusing on the core business requirements. I can think of several applications that could be built in the same way, such as a legal document management system, a real estate application, a travel agency system, just to name a few. As an experienced software engineer, I always enjoy implementing new ideas and seeing the results quickly. In this case, that is all I did. I spent the least amount of time creating a fully functional application tailored to my needs and skipped the monotonous work. The .zip package with all code and configuration files are available on my GitHub.
1. Use "&&" to Link Two or More Commands Use “&&” to link two or more commands when you want the previous command to be succeeded before the next command. If you use “;” then it would still run the next command after “;” even if the command before “;” failed. So you would have to wait and run each command one by one. However, using "&&" ensures that the next command will only run if the preceding command finishes successfully. This allows you to add commands without waiting, move on to the next task, and check later. If the last command ran, it indicates that all previous commands ran successfully. Example: Shell ls /path/to/file.txt && cp /path/to/file.txt /backup/ The above example ensures that the previous command runs successfully and that the file "file.txt" exists. If the file doesn't exist, the second command after "&&" won't run and won't attempt to copy it. 2. Use “grep” With -A and -B Options One common use of the "grep" command is to identify specific errors from log files. However, using it with the -A and -B options provides additional context within a single command, and it displays lines after and before the searched text, which enhances visibility into related content. Example: Shell % grep -A 2 "java.io.IOException" logfile.txt java.io.IOException: Permission denied (open /path/to/file.txt) at java.io.FileOutputStream.<init>(FileOutputStream.java:53) at com.pkg.TestClass.writeFile(TestClass.java:258) Using grep with -A here will also show 2 lines after the “java.io.IOException” was found from the logfile.txt. Similarly, Shell grep "Ramesh" -B 3 rank-file.txt Name: John Wright, Rank: 23 Name: David Ross, Rank: 45 Name: Peter Taylor, Rank: 68 Name Ramesh Kumar, Rank: 36 Here, grep with -B option will also show 3 lines before the “Ramesh” was found from the rank-file.txt 3. Use “>” to Create an Empty File Just write > and then the filename to create an empty file with the name provided after > Example: Shell >my-file.txt It will create an empty file with "my-file.txt" name in the current directory. 4. Use “rsync” for Backups "rsync" is a useful command for regular backups as it saves time by transferring only the differences between the source and destination. This feature is especially beneficial when creating backups over a network. Example: Shell rsync -avz /path/to/source_directory/ user@remotehost:/path/to/destination_directory/ 5. Use Tab Completion Using tab completion as a habit is faster than manually selecting filenames and pressing Enter. Typing the initial letters of filenames and utilizing Tab completion streamlines the process and is more efficient. 6. Use “man” Pages Instead of reaching the web to find the usage of a command, a quick way would be to use the “man” command to find out the manual of that command. This approach not only saves time but also ensures accuracy, as command options can vary based on the installed version. By accessing the manual directly, you get precise details tailored to your existing version. Example: Shell man ps It will get the manual page for the “ps” command 7. Create Scripts For repetitive tasks, create small shell scripts that chain commands and perform actions based on conditions. This saves time and reduces risks in complex operations. Conclusion In conclusion, becoming familiar with these Linux commands and tips can significantly boost productivity and streamline workflow on the command line. By using techniques like command chaining, context-aware searching, efficient file management, and automation through scripts, users can save time, reduce errors, and optimize their Linux experience.
Debugging Terraform providers is crucial for ensuring the reliability and functionality of infrastructure deployments. Terraform providers, written in languages like Go, can have complex logic that requires careful debugging when issues arise. One powerful tool for debugging Terraform providers is Delve, a debugger for the Go programming language. Delve allows developers to set breakpoints, inspect variables, and step through code, making it easier to identify and resolve bugs. In this blog, we will explore how to use Delve effectively for debugging Terraform providers. Setup Delve for Debugging Terraform Provider Shell # For Linux sudo apt-get install -y delve # For macOS brew instal delve Refer here for more details on the installation. Debug Terraform Provider Using VS Code Follow the below steps to debug the provider Download the provider code. We will use IBM Cloud Terraform Provider for this debugging example. Update the provider’s main.go code to the below to support debugging Go package main import ( "flag" "log" "github.com/IBM-Cloud/terraform-provider-ibm/ibm/provider" "github.com/IBM-Cloud/terraform-provider-ibm/version" "github.com/hashicorp/terraform-plugin-sdk/v2/plugin" ) func main() { var debug bool flag.BoolVar(&debug, "debug", true, "Set to true to enable debugging mode using delve") flag.Parse() opts := &plugin.ServeOpts{ Debug: debug, ProviderAddr: "registry.terraform.io/IBM-Cloud/ibm", ProviderFunc: provider.Provider, } log.Println("IBM Cloud Provider version", version.Version) plugin.Serve(opts) } Launch VS Code in debug mode. Refer here if you are new to debugging in VS Code. Create the launch.json using the below configuration. JSON { "version": "0.2.0", "configurations": [ { "name": "Debug Terraform Provider IBM with Delve", "type": "go", "request": "launch", "mode": "debug", "program": "${workspaceFolder}", "internalConsoleOptions": "openOnSessionStart", "args": [ "-debug" ] } ] } In VS Code click “Start Debugging”. Starting the debugging starts the provider for debugging. To attach the Terraform CLI to the debugger, console prints the environment variable TF_REATTACH_PROVIDERS. Copy this from the console. Set this as an environment variable in the terminal running the Terraform code. Now in the VS Code where the provider code is in debug mode, open the go code to set up break points. To know more on breakpoints in VS Code refer here. Execute 'terraform plan' followed by 'terraform apply', to notice the Terraform provider breakpoint to be triggered as part of the terraform apply execution. This helps to debug the Terraform execution and comprehend the behavior of the provider code for the particular inputs supplied in Terraform. Debug Terraform Provider Using DLV Command Line Follow the below steps to debug the provider using the command line. To know more about the dlv command line commands refer here. Follow the 1& 2 steps mentioned in Debug Terraform provider using VS Code In the terminal navigate to the provider go code and issue go build -gcflags="all=-N -l" to compile the code To execute the precompiled Terraform provider binary and begin a debug session, run dlv exec --accept-multiclient --continue --headless <path to the binary> -- -debug where the build file is present. For IBM Cloud Terraform provider use dlv exec --accept-multiclient --continue --headless ./terraform-provider-ibm -- -debug In another terminal where the Terraform code would be run, set the TF_REATTACH_PROVIDERS as an environment variable. Notice the “API server” details in the above command output. In another (third) terminal connect to the DLV server and start issuing the DLV client commands Set the breakpoint using the break command Now we are set to debug the Terraform provider when Terraform scripts are executed. Issue continue in the DLV client terminal to continue until the breakpoints are set. Now execute the terraform plan and terraform apply to notice the client waiting on the breakpoint. Use DLV CLI commands to stepin / stepout / continue the execution. This provides a way to debug the terraform provider from the command line. Remote Debugging and CI/CD Pipeline Debugging Following are the extensions to the debugging using the dlv command line tool. Remote Debugging Remote debugging allows you to debug a Terraform provider running on a remote machine or environment. Debugging in CI/CD Pipelines Debugging in CI/CD pipelines involves setting up your pipeline to run Delve and attach to your Terraform provider for debugging. This can be challenging due to the ephemeral nature of CI/CD environments. One approach is to use conditional logic in your pipeline configuration to only enable debugging when a specific environment variable is set. For example, you can use the following script in your pipeline configuration to start Delve and attach to your Terraform provider – YAML - name: Debug Terraform Provider if: env(DEBUG) == 'true' run: | dlv debug --headless --listen=:2345 --api-version=2 & sleep 5 # Wait for Delve to start export TF_LOG=TRACE terraform init terraform apply Best Practices for Effective Debugging With Delve Here are some best practices for effective debugging with Delve, along with tips for improving efficiency and minimizing downtime: Use version control: Always work with version-controlled code. This allows you to easily revert changes if debugging introduces new issues. Start small: Begin debugging with a minimal, reproducible test case. This helps isolate the problem and reduces the complexity of debugging. Understand the code: Familiarize yourself with the codebase before debugging. Knowing the code structure and expected behavior can speed up the debugging process. Use logging: Add logging statements to your code to track the flow of execution and the values of important variables. This can provide valuable insights during debugging. Use breakpoints wisely: Set breakpoints strategically at critical points in your code. Too many breakpoints can slow down the debugging process. Inspect variables: Use the print (p) command in Delve to inspect the values of variables. This can help you understand the state of your program at different points in time. Use conditional breakpoints: Use conditional breakpoints to break execution only when certain conditions are met. This can help you focus on specific scenarios or issues. Use stack traces: Use the stack command in Delve to view the call stack. This can help you understand the sequence of function calls leading to an issue. Use goroutine debugging: If your code uses goroutines, use Delve's goroutine debugging features to track down issues related to concurrency. Automate debugging: If you're debugging in a CI/CD pipeline, automate the process as much as possible to minimize downtime and speed up resolution. By following these best practices, you can improve the efficiency of your debugging process and minimize downtime caused by issues in your code. Conclusion In conclusion, mastering the art of debugging Terraform providers with Delve is a valuable skill that can significantly improve the reliability and performance of your infrastructure deployments. By setting up Delve for debugging, exploring advanced techniques like remote debugging and CI/CD pipeline debugging, and following best practices for effective debugging, you can effectively troubleshoot issues in your Terraform provider code. Debugging is not just about fixing bugs; it's also about understanding your code better and improving its overall quality. Dive deep into Terraform provider debugging with Delve, and empower yourself to build a more robust and efficient infrastructure with Terraform.
The AIDocumentLibraryChat project has been extended to support questions for searching relational databases. The user can input a question and then the embeddings search for relevant database tables and columns to answer the question. Then the AI/LLM gets the database schemas of the relevant tables and generates based on the found tables and columns a SQL query to answer the question with a result table. Dataset and Metadata The open-source dataset that is used has 6 tables with relations to each other. It contains data about museums and works of art. To get useful queries of the questions, the dataset has to be supplied with metadata and that metadata has to be turned in embeddings. To enable the AI/LLM to find the needed tables and columns, it needs to know their names and descriptions. For all datatables like the museum table, metadata is stored in the column_metadata and table_metadata tables. Their data can be found in the files: column_metadata.csv and table_metadata.csv. They contain a unique ID, the name, the description, etc. of the table or column. That description is used to create the embeddings the question embeddings are compared with. The quality of the description makes a big difference in the results because the embedding is more precise with a better description. Providing synonyms is one option to improve the quality. The Table Metadata contains the schema of the table to add only the relevant table schemas to the AI/LLM prompt. Embeddings To store the embeddings in Postgresql, the vector extension is used. The embeddings can be created with the OpenAI endpoint or with the ONNX library that is provided by Spring AI. Three types of embeddings are created: Tabledescription embeddings Columndescription embeddings Rowcolumn embeddings The Tabledescription embeddings have a vector based on the table description and the embedding has the tablename, the datatype = table, and the metadata id in the metadata. The Columndescription embeddings have a vector based on the column description and the embedding has the tablename, the dataname with the column name, the datatype = column, and the metadata id in the metadata. The Rowcolumn embeddings have a vector based on the content row column value. That is used for the style or subject of an artwork to be able to use the values in the question. The metadata has the datatype = row, the column name as dataname, the tablename, and the metadata id. Implement the Search The search has 3 steps: Retrieve the embeddings Create the prompt Execute query and return result Retrieve the Embeddings To read the embeddings from the Postgresql database with the vector extension, Spring AI uses the VectorStore class in the DocumentVSRepositoryBean: Java @Override public List<Document> retrieve(String query, DataType dataType) { return this.vectorStore.similaritySearch( SearchRequest.query(query).withFilterExpression( new Filter.Expression(ExpressionType.EQ, new Key(MetaData.DATATYPE), new Value(dataType.toString())))); } The VectorStore provides a similarity search for the query of the user. The query is turned in an embedding and with the FilterExpression for the datatype in the header values, the results are returned. The TableService class uses the repository in the retrieveEmbeddings method: Java private EmbeddingContainer retrieveEmbeddings(SearchDto searchDto) { var tableDocuments = this.documentVsRepository.retrieve( searchDto.getSearchString(), MetaData.DataType.TABLE, searchDto.getResultAmount()); var columnDocuments = this.documentVsRepository.retrieve( searchDto.getSearchString(), MetaData.DataType.COLUMN, searchDto.getResultAmount()); List<String> rowSearchStrs = new ArrayList<>(); if(searchDto.getSearchString().split("[ -.;,]").length > 5) { var tokens = List.of(searchDto.getSearchString() .split("[ -.;,]")); for(int i = 0;i<tokens.size();i = i+3) { rowSearchStrs.add(tokens.size() <= i + 3 ? "" : tokens.subList(i, tokens.size() >= i +6 ? i+6 : tokens.size()).stream().collect(Collectors.joining(" "))); } } var rowDocuments = rowSearchStrs.stream().filter(myStr -> !myStr.isBlank()) .flatMap(myStr -> this.documentVsRepository.retrieve(myStr, MetaData.DataType.ROW, searchDto.getResultAmount()).stream()) .toList(); return new EmbeddingContainer(tableDocuments, columnDocuments, rowDocuments); } First, documentVsRepository is used to retrieve the document with the embeddings for the tables/columns based on the search string of the user. Then, the search string is split into chunks of 6 words to search for the documents with the row embeddings. The row embeddings are just one word, and to get a low distance, the query string has to be short; otherwise, the distance grows due to all the other words in the query. Then the chunks are used to retrieve the row documents with the embeddings. Create the Prompt The prompt is created in the TableService class with the createPrompt method: Java private Prompt createPrompt(SearchDto searchDto, EmbeddingContainer documentContainer) { final Float minRowDistance = documentContainer.rowDocuments().stream() .map(myDoc -> (Float) myDoc.getMetadata().getOrDefault(MetaData.DISTANCE, 1.0f)).sorted().findFirst().orElse(1.0f); LOGGER.info("MinRowDistance: {}", minRowDistance); var sortedRowDocs = documentContainer.rowDocuments().stream() .sorted(this.compareDistance()).toList(); var tableColumnNames = this.createTableColumnNames(documentContainer); List<TableNameSchema> tableRecords = this.tableMetadataRepository .findByTableNameIn(tableColumnNames.tableNames()).stream() .map(tableMetaData -> new TableNameSchema(tableMetaData.getTableName(), tableMetaData.getTableDdl())).collect(Collectors.toList()); final AtomicReference<String> joinColumn = new AtomicReference<String>(""); final AtomicReference<String> joinTable = new AtomicReference<String>(""); final AtomicReference<String> columnValue = new AtomicReference<String>(""); sortedRowDocs.stream().filter(myDoc -> minRowDistance <= MAX_ROW_DISTANCE) .filter(myRowDoc -> tableRecords.stream().filter(myRecord -> myRecord.name().equals(myRowDoc.getMetadata() .get(MetaData.TABLE_NAME))).findFirst().isEmpty()) .findFirst().ifPresent(myRowDoc -> { joinTable.set(((String) myRowDoc.getMetadata() .get(MetaData.TABLE_NAME))); joinColumn.set(((String) myRowDoc.getMetadata() .get(MetaData.DATANAME))); tableColumnNames.columnNames().add(((String) myRowDoc.getMetadata() .get(MetaData.DATANAME))); columnValue.set(myRowDoc.getContent()); this.tableMetadataRepository.findByTableNameIn( List.of(((String) myRowDoc.getMetadata().get(MetaData.TABLE_NAME)))) .stream().map(myTableMetadata -> new TableNameSchema( myTableMetadata.getTableName(), myTableMetadata.getTableDdl())).findFirst() .ifPresent(myRecord -> tableRecords.add(myRecord)); }); var messages = createMessages(searchDto, minRowDistance, tableColumnNames, tableRecords, joinColumn, joinTable, columnValue); Prompt prompt = new Prompt(messages); return prompt; } First, the min distance of the rowDocuments is filtered out. Then a list row of documents sorted by distance is created. The method createTableColumnNames(...) creates the tableColumnNames record that contains a set of column names and a list of table names. The tableColumnNames record is created by first filtering for the 3 tables with the lowest distances. Then the columns of these tables with the lowest distances are filtered out. Then the tableRecords are created by mapping the table names to the schema DDL strings with the TableMetadataRepository. Then the sorted row documents are filtered for MAX_ROW_DISTANCE and the values joinColumn, joinTable, and columnValue are set. Then the TableMetadataRepository is used to create a TableNameSchema and add it to the tableRecords. Now the placeholders in systemPrompt and the optional columnMatch can be set: Java private final String systemPrompt = """ ... Include these columns in the query: {columns} \n Only use the following tables: {schemas};\n %s \n """; private final String columnMatch = """ Join this column: {joinColumn} of this table: {joinTable} where the column has this value: {columnValue}\n """; The method createMessages(...) gets the set of columns to replace the {columns} placeholder. It gets tableRecords to replace the {schemas} placeholder with the DDLs of the tables. If the row distance was beneath the threshold, the property columnMatch is added at the string placeholder %s. Then the placeholders {joinColumn}, {joinTable}, and {columnValue} are replaced. With the information about the required columns the schemas of the tables with the columns and the information of the optional join for row matches, the AI/LLM is able to create a sensible SQL query. Execute Query and Return Result The query is executed in the createQuery(...) method: Java public SqlRowSet searchTables(SearchDto searchDto) { EmbeddingContainer documentContainer = this.retrieveEmbeddings(searchDto); Prompt prompt = createPrompt(searchDto, documentContainer); String sqlQuery = createQuery(prompt); LOGGER.info("Sql query: {}", sqlQuery); SqlRowSet rowSet = this.jdbcTemplate.queryForRowSet(sqlQuery); return rowSet; } First, the methods to prepare the data and create the SQL query are called and then queryForRowSet(...) is used to execute the query on the database. The SqlRowSet is returned. The TableMapper class uses the map(...) method to turn the result into the TableSearchDto class: Java public TableSearchDto map(SqlRowSet rowSet, String question) { List<Map<String, String>> result = new ArrayList<>(); while (rowSet.next()) { final AtomicInteger atomicIndex = new AtomicInteger(1); Map<String, String> myRow = List.of(rowSet .getMetaData().getColumnNames()).stream() .map(myCol -> Map.entry( this.createPropertyName(myCol, rowSet, atomicIndex), Optional.ofNullable(rowSet.getObject( atomicIndex.get())) .map(myOb -> myOb.toString()).orElse(""))) .peek(x -> atomicIndex.set(atomicIndex.get() + 1)) .collect(Collectors.toMap(myEntry -> myEntry.getKey(), myEntry -> myEntry.getValue())); result.add(myRow); } return new TableSearchDto(question, result, 100); } First, the result list for the result maps is created. Then, rowSet is iterated for each row to create a map of the column names as keys and the column values as values. This enables returning a flexible amount of columns with their results. createPropertyName(...) adds the index integer to the map key to support duplicate key names. Summary Backend Spring AI supports creating prompts with a flexible amount of placeholders very well. Creating the embeddings and querying the vector table is also very well supported. Getting reasonable query results needs the metadata that has to be provided for the columns and tables. Creating good metadata is an effort that scales linearly with the amount of columns and tables. Implementing the embeddings for columns that need them is an additional effort. The result is that an AI/LLM like OpenAI or Ollama with the "sqlcoder:70b-alpha-q6_K" model can answer questions like: "Show the artwork name and the name of the museum that has the style Realism and the subject of Portraits." The AI/LLM can within boundaries answer natural language questions that have some fit with the metadata. The amount of embeddings needed is too big for a free OpenAI account and the "sqlcoder:70b-alpha-q6_K" is the smallest model with reasonable results. AI/LLM offers a new way to interact with relational databases. Before starting a project to provide a natural language interface for a database, the effort and the expected results have to be considered. The AI/LLM can help with questions of small to middle complexity and the user should have some knowledge about the database. Frontend The returned result of the backend is a list of maps with keys as column names and values column values. The amount of returned map entries is unknown, because of that the table to display the result has to support a flexible amount of columns. An example JSON result looks like this: JSON {"question":"...","resultList":[{"1_name":"Portrait of Margaret in Skating Costume","2_name":"Philadelphia Museum of Art"},{"1_name":"Portrait of Mary Adeline Williams","2_name":"Philadelphia Museum of Art"},{"1_name":"Portrait of a Little Girl","2_name":"Philadelphia Museum of Art"}],"resultAmount":100} The resultList property contains a JavaScript array of objects with property keys and values. To be able to display the column names and values in an Angular Material Table component, these properties are used: TypeScript protected columnData: Map<string, string>[] = []; protected columnNames = new Set<string>(); The method getColumnNames(...) of the table-search.component.ts is used to turn the JSON result in the properties: TypeScript private getColumnNames(tableSearch: TableSearch): Set<string> { const result = new Set<string>(); this.columnData = []; const myList = !tableSearch?.resultList ? [] : tableSearch.resultList; myList.forEach((value) => { const myMap = new Map<string, string>(); Object.entries(value).forEach((entry) => { result.add(entry[0]); myMap.set(entry[0], entry[1]); }); this.columnData.push(myMap); }); return result; } First, the result set is created and the columnData property is set to an empty array. Then, myList is created and iterated with forEach(...). For each of the objects in the resultList, a new Map is created. For each property of the object, a new entry is created with the property name as the key and the property value as the value. The entry is set on the columnData map and the property name is added to the result set. The completed map is pushed on the columnData array and the result is returned and set to the columnNames property. Then a set of column names is available in the columnNames set and a map with column name to column value is available in the columnData. The template table-search.component.html contains the material table: HTML @if(searchResult && searchResult.resultList?.length) { <table mat-table [dataSource]="columnData"> <ng-container *ngFor="let disCol of columnNames" matColumnDef="{{ disCol }"> <th mat-header-cell *matHeaderCellDef>{{ disCol }</th> <td mat-cell *matCellDef="let element">{{ element.get(disCol) }</td> </ng-container> <tr mat-header-row *matHeaderRowDef="columnNames"></tr> <tr mat-row *matRowDef="let row; columns: columnNames"></tr> </table> } First, the searchResult is checked for existence and objects in the resultList. Then, the table is created with the datasource of the columnData map. The table header row is set with <tr mat-header-row *matHeaderRowDef="columnNames"></tr> to contain the columnNames. The table rows and columns are defined with <tr mat-row *matRowDef="let row; columns: columnNames"></tr>. The cells are created by iterating the columnNames like this: <ng-container *ngFor="let disCol of columnNames" matColumnDef="{{ disCol }">. The header cells are created like this: <th mat-header-cell *matHeaderCellDef>{{ disCol }</th>. The table cells are created like this: <td mat-cell *matCellDef="let element">{{ element.get(disCol) }</td>. element is the map of the columnData array element and the map value is retrieved with element.get(disCol). Summary Frontend The new Angular syntax makes the templates more readable. The Angular Material table component is more flexible than expected and supports unknown numbers of columns very well. Conclusion To question a database with the help of an AI/LLM takes some effort for the metadata and a rough idea of the users what the database contains. AI/LLMs are not a natural fit for query creation because SQL queries require correctness. A pretty large model was needed to get the required query correctness, and GPU acceleration is required for productive use. A well-designed UI where the user can drag and drop the columns of the tables in the result table might be a good alternative for the requirements. Angular Material Components support drag and drop very well. Before starting such a project the customer should make an informed decision on what alternative fits the requirements best.
The Advantages of Elastic APM for Observing the Tested Environment My first use of the Elastic Application Performance Monitoring (Elastic APM) solution coincides with projects that were developed based on microservices in 2019 for the projects on which I was responsible for performance testing. At that time (2019) the first versions of Elastic APM were released. I was attracted by the easy installation of agents, the numerous protocols supported by the Java agent (see Elastic supported technologies) including the Apache HttpClient used in JMeter and other languages (Go, .NET, Node.js, PHP, Python, Ruby), and the quality of the dashboard in Kibana for the APM. I found the information displayed in the Kibana APM dashboards to be relevant and not too verbose. The Java agent monitoring is simple but displays essential information on the machine's OS and JVM. The open-source aspect and the free solution for the main functions of the tool were also decisive. I generalize the use of the Elastic APM solution in performance environments for all projects. With Elastic APM, I have the timelines of the different calls and exchanges between web services, the SQL queries executed, the exchange of messages by JMS file, and monitoring. I also have quick access to errors or exceptions thrown in Java applications. Why Integrate Elastic APM in Apache JMeter By adding Java APM Agents to web applications, we find the services called timelines in the Kibana dashboards. However, we remain at a REST API call level mainly, because we do not have the notion of a page. For example, page PAGE01 will make the following API calls: /rest/service1 /rest/service2 /rest/service3 On another page, PAGE02 will make the following calls: /rest/service2 /rest/service4 /rest/service5 /rest/service6 The third page, PAGE03, will make the following calls: /rest/service1 /rest/service2 /rest/service4 In this example, service2 is called on 3 different pages and service4 in 2 pages. If we look in the Kibana dashboard for service2, we will find the union of the calls of the 3 calls corresponding to the 3 pages, but we don't have the notion of a page. We cannot answer "In this page, what is the breakdown of time in the different REST calls," because for a user of the application, the notion of page response time is important. The goal of the jmeter-elastic-apm tool is to add the notion of an existing page in JMeter in the Transaction Controller. This starts in JMeter by creating an APM transaction, and then propagating this transaction identifier (traceparent) with the Elastic agent to an HTTP REST request to web services because the APM Agent recognizes the Apache HttpClient library and can instrument it. In the HTTP request, the APM Agent will add the identifier of the APM transaction to the header of the HTTP request. The headers added are traceparent and elastic-apm-traceparent. We start from the notion of the page in JMeter (Transaction Controller) to go to the HTTP calls of the web application (gestdoc) hosted in Tomcat. In the case of an application composed of multi-web services, we will see in the timeline the different web services called in HTTP(s) or JMS and the time spent in each web service. This is an example of technical architecture for a performance test with Apache JMeter and Elastic APM Agent to test a web application hosted in Apache Tomcat. How the jmeter-elastic-apm Tool Works jmeter-elastic-apm adds Groovy code before a JMeter Transaction Controller to create an APM transaction before a page. In the JMeter Transaction Controller, we find HTTP samplers that make REST HTTP(s) calls to the services. The Elastic APM Agent automatically adds a new traceparent header containing the identifier of the APM transaction because it recognizes the Apache HttpClient of the HTTP sampler. The Groovy code terminates the APM transaction to indicate the end of the page. The jmeter-elastic-apm tool automates the addition of Groovy code before and after the JMeter Transaction Controller. The jmeter-elastic-apm tool is open source on GitHub (see link in the Conclusion section of this article). This JMeter script is simple with 3 pages in 3 JMeter Transaction Controllers. After launching the jmeter-elastic-apm action ADD tool, the JMeter Transaction Controllers are surrounded by Groovy code to create an APM transaction before the JMeter Transaction Controller and close the APM transaction after the JMeter Transaction Controller. In the “groovy begin transaction apm” sampler, the Groovy code calls the Elastic APM API (simplified version): Groovy Transaction transaction = ElasticApm.startTransaction(); Scope scope = transaction.activate(); transaction.setName(transactionName); // contains JMeter Transaction Controller Name In the “groovy end transaction apm” sampler, the groovy code calls the ElasticApm API (simplified version): Groovy transaction.end(); Configuring Apache JMeter With the Elastic APM Agent and the APM Library Start Apache JMeter With Elastic APM Agent and Elastic APM API Library Declare the Elastic APM Agent URLto find the APM Agent: Add the ELASTIC APM Agent somewhere in the filesystem (could be in the <JMETER_HOME>\lib but not mandatory). In <JMETER_HOME>\bin, modify the jmeter.bat or setenv.bat. Add Elastic APM configuration like so: Shell set APM_SERVICE_NAME=yourServiceName set APM_ENVIRONMENT=yourEnvironment set APM_SERVER_URL=http://apm_host:8200 set JVM_ARGS=-javaagent:<PATH_TO_AGENT_APM_JAR>\elastic-apm-agent-<version>.jar -Delastic.apm.service_name=%APM_SERVICE_NAME% -Delastic.apm.environment=%APM_ENVIRONMENT% -Delastic.apm.server_urls=%APM_SERVER_URL% 2. Add the Elastic APM library: Add the Elastic APM API library to the <JMETER_HOME>\lib\apm-agent-api-<version>.jar. This library is used by JSR223 Groovy code. Use this URL to find the APM library. Recommendations on the Impact of Adding Elastic APM in JMeter The APM Agent will intercept and modify all HTTP sampler calls, and this information will be stored in Elasticsearch. It is preferable to voluntarily disable the HTTP request of static elements (images, CSS, JavaScript, fonts, etc.) which can generate a large number of requests but are not very useful in analyzing the timeline. In the case of heavy load testing, it's recommended to change the elastic.apm.transaction_sample_rate parameter to only take part of the calls so as not to saturate the APM Server and Elasticsearch. This elastic.apm.transaction_sample_rate parameter can be declared in <JMETER_HOME>\jmeter.bat or setenv.bat but also in a JSR223 sampler with a short Groovy code in a setUp thread group. Groovy code records only 50% samples: Groovy import co.elastic.apm.api.ElasticApm; // update elastic.apm.transaction_sample_rate ElasticApm.setConfig("transaction_sample_rate","0.5"); Conclusion The jmeter-elastic-apm tool allows you to easily integrate the Elastic APM solution into JMeter and add the notion of a page in the timelines of Kibana APM dashboards. Elastic APM + Apache JMeter is an excellent solution for understanding how the environment works during a performance test with simple monitoring, quality dashboards, time breakdown timelines in the different distributed application layers, and the display of exceptions in web services. Over time, the Elastic APM solution only gets better. I strongly recommend it, of course, in a performance testing context, but it also has many advantages in the context of a development environment used for developers or integration used by functional or technical testers. Links Command Line Tool jmeter-elastic-apm JMeter plugin elastic-apm-jmeter-plugin Elastic APM Guides: APM Guide or Application performance monitoring (APM)
Bartłomiej Żyliński
Software Engineer,
SoftwareMill
Abhishek Gupta
Principal Developer Advocate,
AWS
Yitaek Hwang
Software Engineer,
NYDIG