Language Resources

DZone's Featured Languages Resources

.NET 9 and C# 13: New Features and Improvements

By Naga Santhosh Reddy Vootukuri

CORE

The development of the .NET platform and C# language moves forward with the launch of .NET 9 and C# 13, introducing a range of enhancements and advancements to boost developer efficiency, speed, and safety. This article delves into upgrades and new features in these releases giving developers a detailed look. Figure courtesy of Microsoft .NET 9 .NET 9 introduces a range of improvements, to the .NET ecosystem with a strong focus on AI and building cloud-native distributed applications by releasing .NET Aspire, boosting performance and enhancements to .NET libraries and frameworks. Here are some notable highlights: .NET Aspire It's an opinionated stack that helps in developing .NET cloud-native applications and services. I recently wrote and published an article related to this on DZone. Performance Improvements .NET 9 is focused on optimizing cloud-native apps, and performance is a key aspect of this optimization. Several performance-related improvements have been made in .NET 9, including: 1. Faster Exceptions Exceptions are now 2-4x faster in .NET 9, thanks to a more modern implementation. This improvement means that your app will spend less time handling exceptions, allowing it to focus on its core functionality. 2. Faster Loops Loop performance has been improved in .NET 9 through loop hoisting and induction variable widening. These optimizations allow loops to run faster and more efficiently, making your app more responsive. 3. Dynamic PGO Improvements Dynamic PGO (Profile-Guided Optimization) has been improved in .NET 9, reducing the cost of type checks. This means that your app will run faster and more efficiently, with less overhead from type checks. 4. RyuJIT Improvements RyuJIT, the .NET Just-In-Time compiler, has been improved in .NET 9 to inline more generic methods. This means that your app will run faster, with less overhead from method calls. 5. Arm64 Code Optimizations Arm64 code can now be written to be much faster using SVE/SVE2 SIMD instructions on Arm64. This means that your app can take advantage of the latest Arm64 hardware, running faster and more efficiently. 6. Server GC Mode The new server GC mode in .NET 9 has been shown to reduce memory usage by up to 2/3 in some benchmarks. This means that your app will use less memory, reducing costs and improving performance. These performance-related improvements in .NET 9 mean that your app will run faster, leaner, and more efficiently. Whether you're building a cloud-native app or a desktop app, .NET 9 has the performance optimizations you need to succeed. AI-Related Improvements These AI-related improvements in .NET enable developers to build powerful applications with AI capabilities, integrate with the AI ecosystem, and monitor and observe AI app performance. Multiple partnerships include Qdrant, Milvus, Weaviate, and more to expand the .NET AI ecosystem. It is easy to integrate with Semantic Kernel, Azure SQL, and Azure AI search. Feature Improvement Benefit Tensor<T> New type for tensors Effective data handling and information flow for learning and prediction Smart Components Prebuilt controls with end-to-end AI features Infuse apps with AI capabilities in minutes OpenAI SDK Official .NET library for OpenAI Delightful experience and parity with other programming languages Monitoring and Observing Features for monitoring and tracing AI apps Reliable, performant, and high-quality outcomes Note: There is some integration work within the .NET Aspire team, semantic kernel, and Azure to utilize the .NET Aspire dashboard to collect and track metrics. Web-Related Improvements Improved performance, security, and reliability Upgrades to existing ASP.NET Core features for modern cloud-native apps Built-in support for OpenAPI document generation Ability to generate OpenAPI documents at build-time or runtime Customizable OpenAPI documents using document and operation transformers These improvements aim to enhance the web development experience with .NET and ASP.NET Core, making it easier to build modern web apps with improved quality and fundamentals. Caching Improvements With HybridCache As one of my favorites, I will explain more in-depth along with code samples in a different article about HybridCache. In short, The HybridCache API in ASP.NET Core is upgraded to provide a more efficient and scalable caching solution. It introduces a multi-tier storage approach, combining in-process (L1) and out-of-process (L2) caches, with features like "stampede" protection and configurable serialization. This results in significantly faster performance, with up to 1000x improvement in high cache hit rate scenarios. C# 13: Introducing New Language Features C# 13 brings forth a range of language elements aimed at enhancing code clarity, maintainability, and developer efficiency. Here are some key additions: params collections: The params keyword is no longer restricted to just array types. It can now be used with any collection type that is recognized, including System.Span<T>, System.ReadOnlySpan<T>, and types that implement System.Collections.Generic.IEnumerable<T> and have an Addmethod. This provides greater flexibility when working with methods that need to accept a variable number of arguments. In the code snippet below, the PrintNumbers method accepts a params of type List<int>[], which means you can pass any number of List<int> arguments to the method. C# public void PrintNumbers(params List<int>[] numbersLists) { foreach (var numbers in numbersLists) { foreach (var number in numbers) { Console.WriteLine(number); } } } PrintNumbers(new List<int> {1, 2, 3}, new List<int> {4, 5, 6}, new List<int> {7, 8, 9}); New lock object: System.Threading.Lock has been introduced to provide better thread synchronization through its API. New escape sequence: You can use \e as a character literal escape sequence for the ESCAPE character, Unicode U+001B. Method group natural type improvements: This feature makes small optimizations to overload resolution involving method groups. Implicit indexer access in object initializers: The ^ operator allows us to use an indexer directly within an object initializer. Conclusion C# 13 and .NET 9 mark a crucial step towards the advancement of C# programming and the .NET environment. The latest release brings a host of new features and improvements that enhance developer productivity, application performance, and security. By staying up-to-date with these changes, developers can leverage these advancements to build more robust, efficient, and secure applications. Happy coding! More

Data Analysis and Automation Using Python

By Sandip Gami

Organizations heavily rely on data analysis and automation to drive operational efficiency. In this piece, we will look into the basics of data analysis and automation with examples done in Python which is a high-level programming language used for general-purpose programming. What Is Data Analysis? Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data so as to identify useful information, draw conclusions, and support decision-making. It is an essential activity that helps in transforming raw data into actionable insights. The following are key steps involved in data analysis: Collecting: Gathering data from different sources. Cleaning: Removing or correcting inaccuracies and inconsistencies contained in the collected dataset. Transformation: Converting the collected dataset into a format that is suitable for further analysis. Modeling: Applying statistical or machine learning models on the transformed dataset. Visualization: Representing the findings visually by creating charts, and graphs among others using suitable tools such as MS Excel or Python's matplotlib library. The Significance of Data Automation Data automation involves the use of technology to execute repetitive tasks associated with handling large datasets with minimal human intervention required. Automating these processes can greatly improve their efficiency thereby saving time for analysts who can then focus more on complex duties. Some common areas where it’s employed include: Data ingestion: Automatically collecting and storing data from various sources. Data cleaning and transformation: Using scripts or tools (e.g., Python Pandas library) for preprocessing the collected dataset before performing other operations on it like modeling or visualization. Report generation: Creating automated reports or dashboards that update themselves whenever new records arrive at our system etcetera. Data integration: Combining information obtained from multiple sources so as to get a holistic view when analyzing it further down during the decision-making process. Introduction to Python for Data Analysis Python is a widely used programming language for data analysis due to its simplicity, readability, and vast libraries available for statistical computing. Here are some simple examples that demonstrate how one can read large datasets as well as perform basic analysis using Python: Reading Large Datasets Reading datasets into your environment is one of the initial stages in any data analysis project. For this case, we will need the Pandas library which provides powerful data manipulation and analysis tools. Python import pandas as pd # Define the file path to the large dataset file_path = 'path/to/large_dataset.csv' # Specify the chunk size (number of rows per chunk) chunk_size = 100000 # Initialize an empty list to store the results results = [] # Iterate over the dataset in chunks for chunk in pd.read_csv(file_path, chunksize=chunk_size): # Perform basic analysis on each chunk # Example: Calculate the mean of a specific column chunk_mean = chunk['column_name'].mean() results.append(chunk_mean) # Calculate the overall mean from the results of each chunk overall_mean = sum(results) / len(results) print(f'Overall mean of column_name: {overall_mean}') Basic Data Analysis Once you have loaded the data, it is important to conduct some preliminary examination on it so as to familiarize yourself with its contents. Performing Aggregated Analysis There are times you might wish to perform a more advanced aggregated analysis over the entire dataset. For instance, let’s say we want to find the sum of a particular column across the whole dataset by processing it in chunks. Python # Initialize a variable to store the cumulative sum cumulative_sum = 0 # Iterate over the dataset in chunks for chunk in pd.read_csv(file_path, chunksize=chunk_size): # Calculate the sum of the specific column for the current chunk chunk_sum = chunk['column_name'].sum() cumulative_sum += chunk_sum print(f'Cumulative sum of column_name: {cumulative_sum}') Missing Values Treatment in Chunks It is common for missing values to exist during data preprocessing. Instead, here is an instance where missing values are filled using the mean of each chunk. Python # Initialize an empty DataFrame to store processed chunks processed_chunks = [] # Iterate over the dataset in chunks for chunk in pd.read_csv(file_path, chunksize=chunk_size): # Fill missing values with the mean of the chunk chunk.fillna(chunk.mean(), inplace=True) processed_chunks.append(chunk) # Concatenate all processed chunks into a single DataFrame processed_data = pd.concat(processed_chunks, axis=0) print(processed_data.head()) Final Statistics From Chunks At times, there is a need to get overall statistics from all chunks. This example illustrates how to compute the average and standard deviation of an entire column by aggregating outcomes from each chunk. Python import numpy as np # Initialize variables to store the cumulative sum and count cumulative_sum = 0 cumulative_count = 0 squared_sum = 0 # Iterate over the dataset in chunks for chunk in pd.read_csv(file_path, chunksize=chunk_size): # Calculate the sum and count for the current chunk chunk_sum = chunk['column_name'].sum() chunk_count = chunk['column_name'].count() chunk_squared_sum = (chunk['column_name'] ** 2).sum() cumulative_sum += chunk_sum cumulative_count += chunk_count squared_sum += chunk_squared_sum # Calculate the mean and standard deviation overall_mean = cumulative_sum / cumulative_count overall_std = np.sqrt((squared_sum / cumulative_count) - (overall_mean ** 2)) print(f'Overall mean of column_name: {overall_mean}') print(f'Overall standard deviation of column_name: {overall_std}') Conclusion Reading large datasets in chunks using Python helps in efficient data processing and analysis without overwhelming system memory. By taking advantage of Pandas’ chunking functionality, various tasks involving data analytics can be done on large datasets while ensuring scalability and efficiency. The provided examples illustrate how to read large datasets in portions, address missing values, and perform aggregated analysis; thus providing a strong foundation for working with huge amounts of data in Python. More

Strengthening Cloud Environments Through Python and SQL Integration

By Rajesh Remala

dovpanda: Unlock Pandas Efficiency With Automated Insights

By Balaji Dhamodharan

Machine Learning With Python: Data Preprocessing Techniques

By Karthik Rajashekaran

Beyond a Query Language: How GQL Is Shaping the Future of Graph Databases

Since the recent release of the GQL (Graph Query Language) standard by ISO, there have been many discussions among graph database vendors and research institutions on how it will influence the industry. Apparently, its prevalence is backed by the wide applications of graph databases across diverse sectors — from recommendation engines to supply chains, a standard unified language for querying and managing graph databases is needed. The significance of GQL lies in its ability to replace multiple database-specific query languages with a single, standardized one. This facilitates the interoperability between graph databases and calls for the end of dependence on certain graph database vendors. Moreover, beyond the query language, GQL defines what a graph database should be and what key characteristics it should own are finally standardized, laying a far-reaching influential foundation for the development of the graph database industry. In this article, I will walk you through some important terms of GQL and explore its transformative potential for the industry. Key Terms and Definitions of GQL GQL aims to establish a unified, declarative graph database query language that is both compatible with modern data types and can intuitively express the complex logic of a graph. It defines a comprehensive and robust framework for interacting with property graph databases, including DQL, DML, and DDL, providing a modern and flexible approach to graph data management and analysis. Below are some key definitions of GQL that developers or users of graph databases should be aware of. Property Graph Data Model GQL operates on a data model including nodes (vertices) and edges (relationships), which allows for pattern-based analysis and flexible data addition. The data model is specifically tailored for Property Graph Databases in that GQL is based on relatively mature graph query languages with wide applications, absorbing its advantages and settling the new standards. Resource Description Framework (RDF), once another type of graph data model, is not included in GQL as a standard graph data model. With the GQL definition, it is apparent that the property graph data model is the de facto standard. Graph Pattern Matching (GPM) GPM language defined by GQL enables users to write simple queries for complex data analysis. While traditional graph database query languages support single pattern matching, GQL further facilitates complex pattern matching across multiple patterns. For example, GQL supports path aggregation, grouping variables, and nested pattern matching with optional filtering, offering expressive capabilities to handle more sophisticated business logic. GQL Schema GQL allows for both schema-free graphs, which accept any data, and mandatory schema graphs, which are constrained by a predefined graph type specified in a “GQL schema”. This dual approach of GQL supporting two types of schema caters to a wide range of data management needs, from the flexibility of schema-free graphs to the precision of schema-constrained ones. Schema-free graphs allow adding new attributes to nodes or relationships at any time without modifying the data model. This adaptability is beneficial when dealing with complex and changing data, but from another perspective, schema-free graphs shift the burden of data management complexities, such as handling data consistency and data quality, onto developers. On the contrary, the mandatory schema graph offers a rigid framework that guarantees data consistency and integrity. The deterministic data structure within a mandatory schema makes any data changes clear and manageable. Furthermore, the predefined data structure enhances the comprehensibility and usability of data, which brings optimized query processes for both users and systems. While the mandatory schema graphs may sacrifice some flexibility, the trade-off is often justified in production environments where data structures are well-defined and the output data exhibits regular patterns. Graph Types Graph types are templates that restrict the contents of a graph to specific node and edge types, offering a certain level of data control and structure. Under the GQL definition, a graph type can be applied to multiple graphs, which means that the same graph structure type can be shared in different applications, making it more flexible. For example, data of a business might differ in different departments, among various regions and the data permissions may be isolated from each other. Under this situation, using the same graph type can facilitate business management as multiple graphs with the same graph type enable permission management and data privacy compliance regulations. Notable Advancements of GQL Separation of GQL-Catalog and GQL-Data GQL defines a persistent and extensible catalog initialization runtime environment with reference to SQL: GQL-catalog. GQL lists its stored data objects, including various metadata, such as graph, graph type, procedure, function, etc. GQL-catalog can be independently maintained or upgraded from the data itself, which allows for flexible permission management and a unified, standardized approach to catalog management. Multi-Graph Joint Query GQL enables multi-graph joint queries. By using different graph expressions in the query process, users can perform operations such as union, conditional rules, and join can be performed on different graphs. This capability benefits scenarios such as anti-fraud investigations and the integration of public and private knowledge graphs, where cross-referencing public and private data sets is crucial. These scenarios require both data isolation and integrated data analysis due to data compliance, maintenance, and other reasons. Therefore, the data needs to be split into multiple graphs, but they need to be combined to complete a certain business requirement. Supporting Undirected Graph Different from the previous definition of graph databases where relationships always have a direction, GQL allows undirected graphs. In some scenarios, there is naturally no direction of relationships between vertices, such as friendships. While these relationships could be modeled as directed, doing so would necessitate two separate edges, making the modeling and querying process complicated. Conclusion In summary, the standardization of GQL is a significant step forward for the graph database industry. Not only does it provide a simplified user experience, but GQL also regulates what property graph databases are and what features they should own, referring to real-world use cases. It boosts the transformative potential of graph databases for all industries where they are leveraged.

By Yang Fang

A Guide to Regression Analysis Forecasting in Python

Regression analysis is a technic of estimating the value of a dependent variable based on the values of independent values. These models largely explore the possibilities of studying the impact of independent variables on dependent variables. In this article, we will focus on estimating the value of revenue (dependent variable) based on the historical demand (independent values) coming from various demand channels like Call, Chat, and Web inquiries. I will use Python libraries like statsmodels and sklearn to develop an algorithm to forecast revenue. Once we can predict the revenue, this empowers the businesses to strategies the investment, and prioritize the customers and demand channels with an aim to grow the revenue. Data Ingestion and Exploration In this section, I will detail the test data, data ingestion using pandas, and data exploration to get familiar with the data. I am using a summarized demand gen dataset, containing the following data attributes. The following code block would ingest the input file “DemandGenRevenue.csv” and list the sample records. Python import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import os import statsmodels.formula.api as sm from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet from sklearn.metrics import mean_squared_error, r2_score from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.model_selection import GridSearchCV import warnings warnings.simplefilter(action='ignore', category=FutureWarning) df = pd.read_csv("DemandGenRevenue.csv") df.head() Python df.columns df.info() df.describe().T The following code can be used to design Scatter plots to explore the linearity assumption between the independent variables — Call, Chat, Web Inquiry, and the dependent variable — Revenue. Python sns.pairplot(df, x_vars=["call", "chat", "Web_Inquiry"], y_vars="Revenue", kind="reg") Let's explore the normality assumption of the dependent variable — Revenue using Histograms. Python df.hist(bins=20) Before we start working on the model, let's explore the relationship between each independent variable and the dependent variable using Linear regression plots. Python sns.lmplot(x='call', y='Revenue', data=df) sns.lmplot(x='chat', y='Revenue', data=df) sns.lmplot(x='Web_Inquiry', y='Revenue', data=df) Forecasting Model In this section, I will delve into the model preparation using statsmodels and sklearn libraries. We will build a linear regression model based on the demand coming from calls, chats, and web inquiries. Python X = df.drop('Revenue', axis=1) y = df[["Revenue"]] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=46) The following code will build a Linear regression model to forecast the revenue. Python lin_model = sm.ols(formula="Revenue ~ call + chat + Web_Inquiry", data=df).fit() print(lin_model.params, "\n") Use the below code to explore the coefficients of the linear model. Python print(lin_model.summary()) The code below can be used to define various models and loop through the models to forecast, for simplicity's sake, we will only focus on the Linear Regression Model. Python results = [] names = [] models = [('LinearRegression', LinearRegression())] for name, model in models: model.fit(X_train, y_train) y_pred = model.predict(X_test) result = np.sqrt(mean_squared_error(y_test, y_pred)) results.append(result) names.append(name) msg = "%s: %f" % (name, result) print(msg) Now that we have the model ready, let's try and forecast the revenue based on the input. If we were to get 1000 calls, 650 chats, and 725 Web Inquiries, based on the historical data we can expect $64.5M of revenue. Python new_data = pd.DataFrame({'call': [1000], 'chat': [650], 'Web_Inquiry': [725]}) Forecasted_Revenue = lin_model.predict(new_data) print("Forecasted Revenue:", int(Forecasted_Revenue)) The code below provides another set of inputs to test the model, if the demand center receives 2000 calls, 1200 chats, and 250 web inquiries, the model forecasts the revenue at $111.5M of revenue. Python new_data = pd.DataFrame({'call': [2000], 'chat': [1200], 'Web_Inquiry': [250]}) Forecasted_Revenue = lin_model.predict(new_data) print("Forecasted Revenue:", int(Forecasted_Revenue)) Conclusion In the end, Python offers multiple libraries to implement the forecasting, statsmodels and sklearn lays a solid foundation for implementing a linear regression model to predict the outcomes based on historical data. I would suggest continued Python exploration for working on enterprise-wide sales and marketing data to analyze historical trends and execute models to predict future sales and revenue. Darts is another Python library I would recommend implementing time series-based anomaly detection and user-friendly forecasting based on models like ARIMA to deep neural networks.

By Kapil Kumar Sharma

Exploring WebAssembly for Cloud-Native Development

I’m a senior solution architect and polyglot programmer interested in the evolution of programming languages and their impact on application development. Around three years ago, I encountered WebAssembly (Wasm) through the .NET Blazor project. This technology caught my attention because it can execute applications at near-native speed across different programming languages. This was especially exciting to me as a polyglot programmer since my programming expertise ranges across multiple programming languages including .NET, PHP, Node.js, Rust, and Go. Most of the work I do is building cloud-native enterprise applications, so I have been particularly interested in advancements that broaden Wasm’s applicability in cloud-native development. WebAssembly 2.0 was a significant leap forward, improving performance and flexibility while streamlining integration with web and cloud infrastructures to make Wasm an even more powerful tool for developers to build versatile and dynamic cloud-native applications. I aim to share the knowledge and understanding I've gained, providing an overview of Wasm’s capabilities and its potential impact on the cloud-native development landscape. Polyglot Programming and the Component Model My initial attraction to WebAssembly stemmed from its capability to enhance browser functionalities for graphic-intensive and gaming applications, breaking free from the limitations of traditional web development. It also allows developers to employ languages like C++ or Rust to perform high-efficiency computations and animations, offering near-native performance within the browser environment. Wasm’s polyglot programming capability and component model are two of its flagship capabilities. The idea of leveraging the strengths of various programming languages within a unified application environment seemed like the next leap in software development. Wasm offers the potential to leverage the unique strengths of various programming languages within a single application environment, promoting a more efficient and versatile development process. For instance, developers could leverage Rust's speed for performance-critical components and .NET's comprehensive library support for business logic to optimize both development efficiency and application performance. This led me to Spin, an open-source tool for the creation and deployment of Wasm applications in cloud environments. To test Wasm’s polyglot programming capabilities, I experimented with the plugins and middleware models. I divided the application business logic into one component, and the other component with the Spin component supported the host capabilities (I/O, random, socket, etc.) to work with the host. Finally, I composed with http-auth-middleware, an existing component model from Spin for OAuth 2.0, and wrote more components for logging, rate limit, etc. All of them were composed together into one app and run on the host world (Component Model). Cloud-Native Coffeeshop App The first app I wrote using WebAssembly was an event-driven microservices coffeeshop app written in Golang and deployed using Nomad, Consul Connect, Vault, and Terraform (you can see it on my GitHub). I was curious about how it would work with Kubernetes, and then Dapr. I expanded it and wrote several use cases with Dapr such as entire apps with Spin, polyglot apps (Spin and other container apps with Docker), Spin apps with Dapr, and others. What I like about it is the speed of start-up time (it’s very quick to get up and running), and the size of the app – it looks like a tiny but powerful app. The WebAssembly ecosystem has matured a lot in the past year as it relates to enterprise projects. For the types of cloud-native projects I’d like to pursue, it would benefit from a more developed support system for stateful applications, as well as an integrated messaging system between components. I would love to see more capabilities that my enterprise customers need such as gRPC or other communication protocols (Spin currently only supports HTTP), data processing and transformation like data pipelines, a multi-threading mechanism, CQRS, polyglot programming language aggregations (internal modular monolith style or external microservices style), and content negotiation (XML, JSON, Plain-text). We also need real-world examples demonstrating Wasm’s capabilities to tackle enterprise-level challenges, fostering a better understanding and wider technology adoption. We can see how well ZEISS does from their presentation at KubeCon in Paris last month. I would like to see more and more companies like them involved in this game, then, from the developer perspective, we will benefit a lot. Not only can we easily develop WebAssembly apps, but many enterprise scenarios shall also be addressed, and we will work together to make WebAssembly more handy and effective. The WebAssembly Community Sharing my journey with the WebAssembly community has been a rewarding part of my exploration, especially with the Spin community who have been so helpful in sharing best practices and new ideas. Through tutorials and presentations at community events, I've aimed to contribute to the collective understanding of WebAssembly and cloud-native development, and I hope to see more people sharing their experiences. I will continue creating tutorials and educational content, as well as diving into new projects using WebAssembly to inspire and educate others about its potential. I would encourage anyone getting started to get involved in the Wasm community of your choice to accelerate your journey. WebAssembly’s Cloud-Native Future I feel positive about the potential for WebAssembly to change how we do application development, particularly in the cloud-native space. I’d like to explore how Wasm could underpin the development of hybrid cloud platforms and domain-specific applications. One particularly exciting prospect is the potential for building an e-commerce platform based on WebAssembly, leveraging its cross-platform capabilities and performance benefits to offer a superior user experience. The plugin model existed for a long time in the e-commerce world (see what Shopify did), and with WebAssembly’s component model, we can build the application with polyglot programming languages such as Rust, Go, TypeScript, .NET, Java, PHP, etc. WebAssembly 2.0 supports the development of more complex and interactive web applications, opening the door for new use cases such as serverless stateless functions, data transformation, and the full-pledge of web API functionalities, moving into edge devices (some embedded components). New advancements like WASI 3.0 with asynchronous components are bridging the gaps. I eagerly anticipate the further impact of WebAssembly on our approaches to building and deploying applications. We’re just getting started.

By Thang Chung

Parsing Structured Environment Variables in Rust

I'm in the process of adding more components to my OpenTelemetry demo (again!). The new design deploys several warehouse services behind the inventory service so the latter can query the former for data via their respective HTTP interface. I implemented each warehouse on top of a different technology stack. This way, I can show OpenTelemetry traces across several stacks. Anyone should be able to add a warehouse in their favorite tech stack if it returns the correct JSON payload to the inventory. For this, I want to make the configuration of the inventory "easy"; add a new warehouse with a simple environment variable pair, i.e., the endpoint and its optional country. The main issue is that environment variables are not structured. I searched for a while and found a relevant post. Its idea is simple but efficient; here's a sample from the post: Properties files FOO__1__BAR=setting-1 #1 FOO__1__BAZ=setting-2 #1 FOO__2__BAR=setting-3 #1 FOO__2__QUE=setting-4 #1 FIZZ__1=setting-5 #2 FIZZ__2=setting-6 #2 BILL=setting-7 #3 Map-like structure Table-like structure Just a value With this approach, I could configure the inventory like this: YAML services: inventory: image: otel-inventory:1.0 environment: WAREHOUSE__0__ENDPOINT: http://apisix:9080/warehouse/us #1 WAREHOUSE__0__COUNTRY: USA #2 WAREHOUSE__1__ENDPOINT: http://apisix:9080/warehouse/eu #1 WAREHOUSE__2__ENDPOINT: http://warehouse-jp:8080 #1 WAREHOUSE__2__COUNTRY: Japan #2 OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317 OTEL_RESOURCE_ATTRIBUTES: service.name=inventory OTEL_METRICS_EXPORTER: none OTEL_LOGS_EXPORTER: none Warehouse endpoint Set country You can see the three warehouses configured in the above. Each has an endpoint/optional country pair. My first attempt looked like the following: Rust lazy_static::lazy_static! { //1 static ref REGEXP_WAREHOUSE: Regex = Regex::new(r"^WAREHOUSE__(\d)__.*").unwrap(); } std::env::vars() .filter(|(key, _)| REGEXP_WAREHOUSE.find(key.as_str()).is_some()) //2 .group_by(|(key, _)| key.split("__").nth(1).unwrap().to_string()) //3 .into_iter() //4 .map(|(_, mut group)| { //5 let some_endpoint = group.find(|item| item.0.ends_with("ENDPOINT")); //6 let endpoint = some_endpoint.unwrap().1; let some_country = group //7 .find(|item| item.0.ends_with("COUNTRY")) .map(|(_, country)| country); println! {"Country pair is: {:?}", some_country}; (endpoint, some_country).into() //8 } .collect::<Vec<_>>() For making constants out of code evaluated at runtime Filter out warehouse-related environment variable Group by index Back to an Iter with the help of itertools Consist of just the endpoint or the endpoint and the country Get the endpoint Get the country Into a structure — irrelevant I encountered issues several times when I started the demo. The code somehow didn't find the endpoint at all. I chose this approach because I've been taught that it's more performant to iterate throughout the key-value pairs of a map than iterate through its key only and then get the value in the map. I tried to change to the latter. Rust lazy_static! { static ref REGEXP_WAREHOUSE_ENDPOINT: Regex = Regex::new(r"^WAREHOUSE__(?<index>\d)__ENDPOINT.*").unwrap(); //1 } std::env::vars() .filter(|(key, _)| REGEXP_WAREHOUSE_ENDPOINT.find(key.as_str()).is_some()) //2 .map(|(key, endpoint)| { let some_warehouse_index = REGEXP_WAREHOUSE_ENDPOINT.captures(key.as_str()).unwrap(); //3//4 println!("some_warehouse_index: {:?}", some_warehouse_index); let index = some_warehouse_index.name("index").unwrap().as_str(); let country_key = format!("WAREHOUSE__{}__COUNTRY", index); //5 let some_country = var(country_key); //6 println!("endpoint: {}", endpoint); (endpoint, some_country).into() }) .collect::<Vec<_>>() Change the regex to capture only the endpoint-related variables Filter out warehouse-related environment variable I'm aware that the filter_map() function exists, but I think it's clearer to separate them here Capture the index Create the country environment variable from a known string, and the index Get the country With this code, I didn't encounter any issues. Now that it works, I'm left with two questions: Why doesn't the group()/find() version work in the deployed Docker Compose despite working in the tests? Is anyone interested in making a crate out of it? To Go Further Structured data in environment variables lazy_static crate envconfig crate

By Nicolas Fränkel

CORE

Why Use Rust Over C++ for IoT Solution Development

The Internet of Things has become integral to our daily routines, and devices are increasingly becoming smart. As this domain expands, there's an urgent need to guarantee these software-enabled devices' security, productivity, and efficiency. Hence, the Rust programming language is becoming the second popular choice after C++ for IoT device developers. This article will explore why Rust is becoming a favored choice for embedded IoT development and how it can be effectively used in this field. In IoT development, C++ has always been a go-to solution when speaking about IoT and embedded systems. Also, this language has an experienced development community and is widely used by engineers worldwide. However, recently, Rust came into play and showed its potential. So, we decided to explore why developers keep leaning toward embedded programming with Rust over tried-and-proven C++. History of Rust Software Development Services Rust, a modern system programming language, was initially conceptualized by Mozilla and the broader development community. It was designed for secure, swift, and parallel application development, eliminating potential memory and security challenges related to embedded solutions & custom IoT development. Since its inception in 2006, Rust language has undergone many changes and improvements and was finally introduced as an open-source ecosystem in 2010. Beyond the development community, major corporations like Microsoft, Google, Amazon, Facebook, Intel, and GitHub also support and finance Rust, furthering its development and usage. This undoubtedly speeds up its growth and increases its attractiveness for use. Rust vs. C++ Dilemma: Why Everyone Is Shifting From C++ to Rust in Embedded System Creation Rust and C++ programming languages are powerful tools for high-performance application development. For embedded IoT applications, several crucial factors influence development speed, security, and reliability beyond the foundational software. Below are the Top 5 most significant factors: 1. Security and Memory Management A standout feature of Rust is its compile-time security system. This ensures that many memory-related issues, like memory leaks and buffer overflows, are detected and addressed during the compilation phase, leading to more dependable and maintainable code. Rust employs a unique ownership system and moves semantics that proficiently handles object lifetimes, mitigating data access conflicts. However, this uniqueness can raise the entry barrier, particularly for newer developers, who might find these techniques somewhat unconventional. The C++ language also provides memory control, but it requires more careful programming. It’s susceptible to pitfalls like memory leaks and unsafe data access if not handled precisely. 2. Performance Rust aims to be competitive in C++ performance. The Rust compiler generates efficient machine code, and thanks to a secure type of system, Rust can predictably optimize code. C++ also safeguards high performance and provides a wide range of tools for optimization. 3. Code Syntax and Readability Rust offers state-of-the-art and clean syntax that helps create readable and understandable code. The Rust template system (traits) makes the code more expressive, legible, and easily extendable. C++ has historical syntax, which may be less intuitive and readable for some developers. 4. Integration and Multitasking Rust provides a convenient way to integrate with C and C++ through a Foreign Function Interface (FFI), which makes it easier to port existing projects but still requires additional effort. The Rust tenure and type systems exclude “data race” and help create secure multitasking applications. Rust also supports threads and competitive multitasking from the box. C++ also provides multitasking but can be integrated with C code with little or no effort. 5. Ecosystem and Community Rust has an active and rapidly growing development community. Cargo–Rust’s dependency and build management system make development more convenient and predictable. C++ also has a large and experienced community and an extensive ecosystem of libraries and tools that exceed the volumes of Rust. As we can see, Rust offers IoT app developers advanced security features that prevent many common errors and result in more reliable, clear code. It also benefits from active community support and utilizes the Cargo system for efficient dependency management and compilation. At the same time, Rust provides numerous tools and out-of-the-box libraries that allow results comparable to those of C++ but with significantly less effort and code. Yet, Rust still trails C++ in ecosystem maturity, C integration, and accessibility for Rust software development beginners. Real-Life Case of Using Rust for IoT Device Development: Smart Monitoring System for Toddlers The Sigma Software team was engaged as a technical partner to help develop a product that simplifies diverse childcare routines for parents. Namely, we were to build software for a baby monitoring device connected to the ESP32-S3 MCU. Our team was looking for the best-fit solution that could provide us with everything needed for successful delivery: multitasking capabilities, a secure coding environment, and interfaces with a network, microphone, and speaker connections. We saw the potential of Rust to fulfill these requirements as it had a robust ecosystem that allowed us to integrate the required functionality without much effort. Even though we chose Rust as our primary tool, we also effectively integrated specific C and C++ libraries using the Foreign Function Interface (FFI). As a result, it took us just six months from the project’s kick-off to the release of its beta version. One month later, the solution was already on the market and available for purchase. Over the next half-year, we refined and expanded its functionalities, including remote control, regime planning, and smooth integration options into the user’s existing ecosystem. The functionality expansion went smoothly, without much effort, and without leaving behind the smell of code, thus reducing the need for refactoring to a minimum. This project, completed by a trio of developers in just over a year, has reached over 5,000 households, underscoring Rust's viability in IoT development. C++ vs. Rust: Final Thoughts Unlike C++, Using Rust in embedded systems creation has a learning curve. Yes, this requires more time at the start of the project, as developers need to learn the language's innovations and features. Yes, finding, refining, or partially porting the necessary libraries for use in a specific solution will take longer. But the result is beautiful and readable code that expands quickly. Hence, a productive, safe, and lightweight solution is needed for embedded IoT applications.

By Den Smyrnov

How to Iterate Over Multiple Lists Sequentially in Python

Python list is a versatile data structure that allows you to easily store a large amount of data in a compact manner. Lists are widely used by Python developers and support many useful functions out-of-the-box. Often you may need to work with multiple lists or a list of lists and iterate over them sequentially, one after another. There are several simple ways to do this. In this article, we will learn how to go through multiple Python lists in a sequential manner. Let us say you have the following 3 lists. Python L1=[1,2,3] L2=[4,5,6] L3=[7,8,9] 1. Using itertools.chain() itertools is a very useful Python library that provides many functions to easily work with iterable data structures such as list. You can use the itertools.chain() function to quickly go through multiple lists sequentially. Here is an example of iterating through lists L1, L2, and L3 using the chain() function. Python >>> for i in itertools.chain(L1,L2,L3): print i 1 2 3 4 5 6 7 8 9 Using itertools is one of the fastest and most memory-efficient ways to go through multiple lists since it uses iterators. This is because iterators only return one item at a time, instead of storing a copy of the entire iterable in memory, as is the case of for loop. 2. Using for Loop Sometimes you may have a list of lists, as shown below. Python L4 = [L1, L2, L3] print L4 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In such cases, you can use a nested for loop to iterate through these lists. Python >>> for i in L4: for j in i: print j 1 2 3 4 5 6 7 8 9 Alternatively, you can also use itertools.chain() to go through a list of lists. Python >>> for i in itertools.chain(L4): for j in i: print j 1 2 3 4 5 6 7 8 9 3. Using Star Operator The above-mentioned methods work with most Python versions. But if you use Python 3+, then you can also avail the star (*) operator to quickly unpack a list of lists. Python for i in [*L1, *L2, *L3]: print(i) 1 2 3 4 5 6 7 8 9 4. Using itertools.izip() So far, in each of the above cases, all items of the first list are displayed, followed by all items of the second list, and so on. But sometimes you may need to sequentially process the first item of each list, followed by the second item of each list, and so on. For this kind of sequential order, you need to use the itertools.izip() function. Here is an example to illustrate it. Python for i in itertools.izip(*L4): for j in i: print j 1 4 7 2 5 8 3 6 9 Notice the difference in sequence. In this case, the output is the first item of each list (1, 4, 7), followed by the second item on each list (2, 5, 8), and so on. This is different from the sequence of the first list items (1, 2, 3) followed by second list items (4, 5, 6), and so on. Conclusion In this article, we have learned several simple ways to sequentially iterate over multiple lists in Python. Basically, there are two ways to do this. The first approach is when you need to process all items of one list before moving to the next one. The second approach is where you need to process the first item of each list then the second item of each list and so on. In the first case, you can use the itertools.chain() function, a for loop, or a star(*) operator. In the second case, you need to use the itertools.izip() function.

By Sreeram Sreenivasan

Enhancing Web Scraping With Large Language Models: A Modern Approach

During my early days as a Data Engineer (which dates back to 2016), I had the responsibility of scraping data from different websites. Web scraping is all about making use of tools that are automated to get vast amounts of data from the websites, usually from their HTML. I remember building around the application, digging into the HTML code, and trying to figure out the best solutions for scraping all the data. One of my main challenges was dealing with frequent changes to the websites: for example, the Amazon pages I was scraping changed every one to two weeks. One thought that occurred to me when I started reading about Large Language Models (LLMs) was, "Can I avoid all those pitfalls I faced using LLMs to structure data from webpages?" Let's see if I can. Web Scraping Tools and Techniques At the time, the main tools I was using were Requests, BeautifulSoup, and Selenium. Each service has a different purpose and is targeted at different types of web environments. Requests is a Python library that can be used to easily make HTTP requests. This library performs GET and POST operations against URLs provided in the requests. It is frequently used to fetch HTML content that can be parsed by BeautifulSoup. BeautifulSoup is a Python library for parsing HTML and XML documents, it constructs a parse tree from page source that allows you to access the various elements on the page easily. Usually, it is paired with other libraries like Requests or Selenium that provide the HTML source code. Selenium is primarily employed for websites that have a lot of JavaScript involved. Unlike BeautifulSoup, Selenium does not simply analyze HTML code: it interacts with websites by emulating user actions such as clicks and scrolling. This facilitates the data extraction from websites that create content dynamically. These tools were indispensable when I was trying to extract data from websites. However, they also posed some challenges: code, tags, and structural elements had to be regularly updated to accommodate changes in the website's layout, complicating long-term maintenance. What Are Large Language Models (LLMs)? Large Language Models (LLMs) are next-generation computer programs that can learn through reading and analyzing vast amounts of text data. At this age, they are gifted with the amazing capability to write in a human-like narrative making them efficient agents to process language and comprehend the human language. The outstanding ability shone through in that kind of situation, where the text context was really important. Integrating LLMs Into Web Scraping The web scraping process can be optimized in a great measure when implementing LLMs into it. We need to take the HTML code from a webpage and feed it into the LLM, which will pull out the objects it refers to. Therefore, this tactic helps in making maintenance easy, as the markup structure can evolve, but the content itself does not usually change. Here’s how the architecture of such an integrated system would look: Getting HTML: Use tools like Selenium or Requests to fetch the HTML content of a webpage. Selenium can handle dynamic content loaded with JavaScript, while Requests is suited for static pages. Parsing HTML: Using BeautifulSoup, we can parse out this HTML as text, thus removing the noise from the HTML (footer, header, etc.). Creating Pydantic models: Type the Pydantic model in which we are going to scrape. This makes sure that the data typed and structured conforms to the pre-defined schemas. Generating prompts for LLMs: Design a prompt that will inform the LLM what information has to be extracted. Processing by LLM: The model reads the HTML, understands it, and employs the instructions for data processing and structuring. Output of structured data: The LLM will provide the output in the form of structured objects which are defined by the Pydantic model. This workflow helps to transform HTML (unstructured data) into structured data using LLMs, solving problems such as non-standard design or dynamic modification of the web source HTML. Integration of LangChain With BeautifulSoup and Pydantic This is the static webpage selected for the example. The idea is to scrape all the activities listed there and present them in a structured way. This method will extract the raw HTML from the static webpage and clean it before the LLM processes it. Python from bs4 import BeautifulSoup import requests def extract_html_from_url(url): try: # Fetch HTML content from the URL using requests response = requests.get(url) response.raise_for_status() # Raise an exception for bad responses (4xx and 5xx) # Parse HTML content using BeautifulSoup soup = BeautifulSoup(response.content, "html.parser") excluded_tagNames = ["footer", "nav"] # Exclude elements with tag names 'footer' and 'nav' for tag_name in excluded_tagNames: for unwanted_tag in soup.find_all(tag_name): unwanted_tag.extract() # Process the soup to maintain hrefs in anchor tags for a_tag in soup.find_all("a"): href = a_tag.get("href") if href: a_tag.string = f"{a_tag.get_text()} ({href})" return ' '.join(soup.stripped_strings) # Return text content with preserved hrefs except requests.exceptions.RequestException as e: print(f"Error fetching data from {url}: {e}") return None The next step is to define the Pydantic objects that we are going to scrape from the webpage. Two objects need to be created: Activity: This is a Pydantic object that represents all the metadata related to the activity, with its attributes and data types specified. We have marked some fields as Optional in case they are not available for all activities. Providing a description, examples, and any metadata will help the LLM to have a better definition of the attribute. ActivityScraper: This is the Pydantic wrapper around the Activity. The objective of this object is to ensure that the LLM understands that it is needed to scrape several activities. Python from pydantic import BaseModel, Field from typing import Optional class Activity(BaseModel): title: str = Field(description="The title of the activity.") rating: float = Field(description="The average user rating out of 10.") reviews_count: int = Field(description="The total number of reviews received.") travelers_count: Optional[int] = Field(description="The number of travelers who have participated.") cancellation_policy: Optional[str] = Field(description="The cancellation policy for the activity.") description: str = Field(description="A detailed description of what the activity entails.") duration: str = Field(description="The duration of the activity, usually given in hours or days.") language: Optional[str] = Field(description="The primary language in which the activity is conducted.") category: str = Field(description="The category of the activity, such as 'Boat Trip', 'City Tours', etc.") price: float = Field(description="The price of the activity.") currency: str = Field(description="The currency in which the price is denominated, such as USD, EUR, GBP, etc.") class ActivityScrapper(BaseModel): Activities: list[Activity] = Field("List of all the activities listed in the text") Finally, we have the configuration of the LLM. We will use the LangChain library, which provides an excellent toolkit to get started. A key component here is the PydanticOutputParser. Essentially, this will translate our object into instructions, as illustrated in the Prompt, and also parse the output of the LLM to retrieve the corresponding list of objects. Python from langchain.prompts import PromptTemplate from langchain.output_parsers import PydanticOutputParser from langchain_openai import ChatOpenAI from dotenv import load_dotenv load_dotenv() llm = ChatOpenAI(temperature=0) output_parser = PydanticOutputParser(pydantic_object = ActivityScrapper) prompt_template = """ You are an expert making web scrapping and analyzing HTML raw code. If there is no explicit information don't make any assumption. Extract all objects that matched the instructions from the following html {html_text} Provide them in a list, also if there is a next page link remember to add it to the object. Please, follow carefulling the following instructions {format_instructions} """ prompt = PromptTemplate( template=prompt_template, input_variables=["html_text"], partial_variables={"format_instructions": output_parser.get_format_instructions} ) chain = prompt | llm | output_parser The final step is to invoke the chain and retrieve the results. Python url = "https://www.civitatis.com/es/budapest/" html_text_parsed = extract_html_from_url(url) activites = chain.invoke(input={ "html_text": html_text_parsed }) activites.Activities Here is what the data looks like. It takes 46 seconds to scrape the entire webpage. Python [Activity(title='Paseo en barco al anochecer', rating=8.4, reviews_count=9439, travelers_count=118389, cancellation_policy='Cancelación gratuita', description='En este crucero disfrutaréis de las mejores vistas de Budapest cuando se viste de gala, al anochecer. El barco es panorámico y tiene partes descubiertas.', duration='1 hora', language='Español', category='Paseos en barco', price=21.0, currency='€'), Activity(title='Visita guiada por el Parlamento de Budapest', rating=8.8, reviews_count=2647, travelers_count=34872, cancellation_policy='Cancelación gratuita', description='El Parlamento de Budapest es uno de los edificios más bonitos de la capital húngara. Comprobadlo vosotros mismos en este tour en español que incluye la entrada.', duration='2 horas', language='Español', category='Visitas guiadas y free tours', price=27.0, currency='€') ... ] Demo and Full Repository I have created a quick demo using Streamlit available here. In the first part, you are introduced to the model. You can add as many rows as you need and specify the name, type, and description of each attribute. This will automatically generate a Pydantic model to be used in the web scraping component. The next part allows you to enter a URL and scrape all the data by clicking the button on the webpage. A download button will appear when the scraping has finished, allowing you to download the data in JSON format. Feel free to play with it! Conclusion LLM provides new possibilities for efficiently extracting data from non-structured data such as websites, PDFs, etc. The automatization of web scraping by LLM not only will save time but also ensure the quality of the data retrieved. However, sending raw HTML to the LLM could increase the token cost and make it inefficient. Since HTML often includes various tags, attributes, and content, the cost can quickly rise. Therefore, it is crucial to preprocess and clean the HTML, removing all the unnecessary metadata and non-used information. This approach will help use LLM as a data extractor for webs while maintaining a decent cost. The right tool for the right job!

By Nacho Corcuera

Exploring Reactive Programming in Kotlin Coroutines With Spring Boot: A Comparison With WebFlux

Reactive programming has become increasingly popular in modern software development, especially in building scalable and resilient applications. Kotlin, with its expressive syntax and powerful features, has gained traction among developers for building reactive systems. In this article, we’ll delve into reactive programming using Kotlin Coroutines with Spring Boot, comparing it with WebFlux, another choice for reactive programming yet more complex in the Spring ecosystem. Understanding Reactive Programming Reactive programming is a programming paradigm that deals with asynchronous data streams and the propagation of changes. It focuses on processing streams of data and reacting to changes as they occur. Reactive systems are inherently responsive, resilient, and scalable, making them well-suited for building modern applications that need to handle high concurrency and real-time data. Kotlin Coroutines Kotlin Coroutines provides a way to write asynchronous, non-blocking code in a sequential manner, making asynchronous programming easier to understand and maintain. Coroutines allow developers to write asynchronous code in a more imperative style, resembling synchronous code, which can lead to cleaner and more readable code. Kotlin Coroutines vs. WebFlux Spring Boot is a popular framework for building Java and Kotlin-based applications. It provides a powerful and flexible programming model for developing reactive applications. Spring Boot’s support for reactive programming comes in the form of Spring WebFlux, which is built on top of Project Reactor, a reactive library for the JVM. Both Kotlin Coroutines and WebFlux offer solutions for building reactive applications, but they differ in their programming models and APIs. 1. Programming Model Kotlin Coroutines: Kotlin Coroutines use suspend functions and coroutine builders like launch and async to define asynchronous code. Coroutines provide a sequential, imperative style of writing asynchronous code, making it easier to understand and reason about. WebFlux: WebFlux uses a reactive programming model based on the Reactive Streams specification. It provides a set of APIs for working with asynchronous data streams, including Flux and Mono, which represent streams of multiple and single values, respectively. 2. Error Handling Kotlin Coroutines: Error handling in Kotlin Coroutines is done using standard try-catch blocks, making it similar to handling exceptions in synchronous code. WebFlux: WebFlux provides built-in support for error handling through operators like onErrorResume and onErrorReturn, allowing developers to handle errors in a reactive manner. 3. Integration With Spring Boot Kotlin Coroutines: Kotlin Coroutines can be seamlessly integrated with Spring Boot applications using the spring-boot-starter-web dependency and the kotlinx-coroutines-spring library. WebFlux: Spring Boot provides built-in support for WebFlux, allowing developers to easily create reactive RESTful APIs and integrate with other Spring components. Show Me the Code The Power of Reactive Approach Over Imperative Approach The provided code snippets illustrate the implementation of a straightforward scenario using both imperative and reactive paradigms. This scenario involves two stages, each taking 1 second to complete. In the imperative approach, the service responds in 2 seconds as it executes both stages sequentially. Conversely, in the reactive approach, the service responds in 1 second by executing each stage in parallel. However, even in this simple scenario, the reactive solution exhibits some complexity, which could escalate significantly in real-world business scenarios. Here’s the Kotlin code for the base service: Kotlin @Service class HelloService { fun getGreetWord() : Mono<String> = Mono.fromCallable { Thread.sleep(1000) "Hello" } fun formatName(name:String) : Mono<String> = Mono.fromCallable { Thread.sleep(1000) name.replaceFirstChar { it.uppercase() } } } Imperative Solution Kotlin fun greet(name:String) :String { val greet = helloService.getGreetWord().block(); val formattedName = helloService.formatName(name).block(); return "$greet $formattedName" } Reactive Solution Kotlin fun greet(name:String) :Mono<String> { val greet = helloService.getGreetWord().subscribeOn(Schedulers.boundedElastic()) val formattedName = helloService.formatName(name).subscribeOn(Schedulers.boundedElastic()) return greet .zipWith(formattedName) .map { it -> "${it.t1} ${it.t2}" } } In the imperative solution, the greet function awaits the completion of the getGreetWord and formatName methods sequentially before returning the concatenated result. On the other hand, in the reactive solution, the greet function uses reactive programming constructs to execute the tasks concurrently, utilizing the zipWith operator to combine the results once both stages are complete. Simplifying Reactivity With Kotlin Coroutines To simplify the complexity inherent in reactive programming, Kotlin’s coroutines provide an elegant solution. Below is a Kotlin coroutine example demonstrating the same scenario discussed earlier: Kotlin @Service class CoroutineHelloService() { suspend fun getGreetWord(): String { delay(1000) return "Hello" } suspend fun formatName(name: String): String { delay(1000) return name.replaceFirstChar { it.uppercase() } } fun greet(name:String) = runBlocking { val greet = async { getGreetWord() } val formattedName = async { formatName(name) } "${greet.await()} ${formattedName.await()}" } } In the provided code snippet, we leverage Kotlin coroutines to simplify reactive programming complexities. The HelloServiceCoroutine class defines suspend functions getGreetWord and formatName, which simulates asynchronous operations using delay. The greetCoroutine function demonstrates an imperative solution using coroutines. Within a runBlocking coroutine builder, it invokes suspend functions sequentially to retrieve the greeting word and format the name, finally combining them into a single greeting string. Conclusion In this exploration, we compared reactive programming in Kotlin Coroutines with Spring Boot to WebFlux. Kotlin Coroutines offer a simpler, more sequential approach, while WebFlux, based on Reactive Streams, provides a comprehensive set of APIs with a steeper learning curve. Code examples demonstrated how reactive solutions outperform imperative ones by leveraging parallel execution. Kotlin Coroutines emerged as a concise alternative, seamlessly integrated with Spring Boot, simplifying reactive programming complexities. In summary, Kotlin Coroutines excels in simplicity and integration, making them a compelling choice for developers aiming to streamline reactive programming in Spring Boot applications.

By Dursun Koç

CORE

How To Optimize the Salesforce CRM Analytics Dashboards Using SAQL

Salesforce Analytics Query Language (SAQL) is a Salesforce proprietary query language designed for analyzing Salesforce native objects and CRM Analytics datasets. SAQL enables developers to query, transform, and project data to facilitate business insights by customizing the CRM dashboards. SAQL is very similar to SQL (Structured Query Language); however, it is designed to explore data within Salesforce and has its own unique syntax which is somewhat like Pig Latin (pig-ql). You can also use SAQL to implement complex logic while preparing datasets using dataflows and recipes. Key Features Key features of SAQL include the following: It enables users to specify filter conditions, and group and summarize input data streams to create aggregated values to derive actionable insights and analyze trends. SAQL supports conditional statements such as IF-THEN-ELSE and CASE. This feature can be used to execute complex conditions for data filtering and transformation. SAQL DATE and TIME-related functions make it much easier to work with date and time attributes, allowing users to execute time-based analysis, like comparing the data over various time intervals. Supports a variety of data transformation functions to cleanse, format, and typecast data to alter the structure of data to suit the requirements SAQL enables you to create complex calculated fields using existing data fields by applying mathematical, logical, or string functions. SAQL provides seamless integration with the Salesforce objects and CRM Analytics datasets. SAQL queries can be used to design visuals like charts, graphs, and dashboards within the Salesforce CRM Analytics platform. The rest of this article will focus on explaining the fundamentals of writing the SAQL queries, and delve into a few use cases where you can use SAQL to analyze the Salesforce data. Basics of SAQL Typical SAQL queries work like any other ETL tool: queries load the datasets, perform operations/transformations, and create an output data stream to be used in visualization. SAQL statements can run into multiple lines and are concluded with a semicolon. Every line of the query works on a named stream, which can serve as input for any subsequent statements in the same query. The following SAQL query can be used to create a data stream to analyze the opportunities booked in the previous year by month. SQL 1. q = load "OpportunityLineItems"; 2. q = filter q by 'StageName' == "6 - Closed Won" and date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in ["1 year ago".."1 year ago"]; 3. q = group q by ('CloseDate_Year', 'CloseDate_Month'); 4. q = foreach q generate q.'CloseDate_Year' as 'CloseDate_Year', q.'CloseDate_Month' as 'CloseDate_Month', sum(q.'ExpectedTotal__c') as 'Bookings'; 5. q = order q by ('CloseDate_Year' asc, 'CloseDate_Month' asc); 6. q = limit q 2000; Line Number Description 1 This statement loads the CRM analytics dataset named “OpportunityLineItems” into an input stream q. 2 The input stream q is filtered to look for the opportunities closed won in the previous year. This is similar to the WHERE clause in SQL. 3 The statement focuses on grouping the records by the close date year and month so that we can visualize this data by the months. This is similar to the GROUP BY clause in SQL. 4 Statement 4 is selecting the attributes we want to project from the input stream. Here the expected total is being summed up for each group. 5 Statement 5 is ordering the records by the close of the year and month so that we can create a line chart to visualize this by month. 6 The last statement in the code above focuses on restricting the stream to a limited number of rows. This is mainly used for debugging purposes. Joining Multiple Data Streams The SAQL cogroup function joins input data streams like Salesforce objects or CRM analytics datasets. The data sources being joined should have a related column to facilitate the join. cogroup also supports the execution of both INNER and OUTER joins. For example, if you had two datasets, with one containing sales data and another containing customer data, you could use cogroup to join them based on a common field like customer ID. The resultant data stream contains both fields from both tables. Use Case The following code block can be used for a data stream for NewPipeline and Bookings for the customers. The pipeline built and bookings are coming from two different streams. We can join these two streams by Account Name. SQL q = load "Pipeline_Metric"; q = filter q by 'Source' in ["NewPipeline"]; q = group q by 'AccountName'; q = foreach q generate q.'AccountName' as 'AccountName', sum(ExpectedTotal__c) as 'NewPipeline'; q1 = load "Bookings_Metric"; q1 = filter q1 by 'Source' in ["Bookings"]; q1 = group q1 by 'AccountName'; q1 = foreach q1 generate q1.'AccountName' as 'AccountName', sum(q1.ExpectedTotal__c) as 'Bookings'; q2 = cogroup q by 'AccountName', q1 by 'AccountName'; result = foreach q2 generate q.'AccountName' as 'AccountName', sum(q.'NewPipeline') as 'NewPipeline',sum(q1.'Bookings') as 'Bookings'; You can also use a left outer cogroup to join the right data table with the left. This will result in all the records from the left data stream and all the matching records from the right stream. Use the coalesce function to replace all the null values from the right stream with another value. In the example above, if you want to report all the accounts with or without bookings, you can use the query below. SQL q = load "Pipeline_Metric"; q = filter q by 'Source' in ["NewPipeline"]; q = group q by 'AccountName'; q = foreach q generate q.'AccountName' as 'AccountName', sum(ExpectedTotal__c) as 'NewPipeline'; q1 = load "Bookings_Metric"; q1 = filter q1 by 'Source' in ["Bookings"]; q1 = group q1 by 'AccountName'; q1 = foreach q1 generate q1.'AccountName' as 'AccountName', sum(q1.ExpectedTotal__c) as 'Bookings'; q2 = cogroup q by 'AccountName' left, q1 by 'AccountName'; result = foreach q2 generate q.'AccountName' as 'AccountName', sum(q.'NewPipeline') as 'NewPipeline', coalesce(sum(q1.'Bookings'), 0) as 'Bookings'; Top N Analysis Using Windowing SAQL enables Top N analysis across value groups using the windowing functions within the input data stream. These functionalities are utilized for deriving the moving averages, cumulative totals, and rankings within the groups. You can specify the set of records where you want to execute these calculations using the “over” keyword. SAQL allows you to specify an offset to identify the number of records before and after the selected row. Optionally you can choose to work on all the records within a partition. These records are called windows. Once the set of records is identified for a window, you can apply an aggregation function to all the records within the defined window. Optionally you can create partitions to group the records based on a set of fields and perform aggregate calculations for each partition independently. Use Case The following SAQL code can be used to prepare data for the percentage contribution of new pipelines for each customer to the total pipeline by the region and the ranking of these customers by the region. SQL q = load "Pipeline_Metric"; q = filter q by 'Source' in ["NewPipeline"]; q = group q by ('Region','AccountName'); q = foreach q generate q.'Region' as 'Region',q.'AccountName' as 'AccountName', ((sum('ExpectedTotal__c')/sum(sum('ExpectedTotal__c')) over ([..] partition by 'Region')) * 100) as 'PCT_PipelineContribution', rank() over ([..] partition by ('Region') order by sum('ExpectedTotal__c') desc ) as 'Rank'; q = filter q by 'Rank' <=5; Data Aggregation: Grand Totals and Subtotals With SAQL SAQL offers rollup and grouping functions to aggregate the data streams based on pre-defined groups. While the rollup construct is used with the group by statement, grouping is used as part of foreach statements while projecting the input data stream. The rollup function aggregates the input data stream at various levels of hierarchy allowing you to create calculated fields on summarized datasets at higher levels of granularity. For example, in case you have datasets by the day, rollup can be used to aggregate the results by week, month, or year. The grouping function is used to group data based on specific dimensions or fields in order to segment the data into meaningful subsets for analysis. For example, you might group sales data by product category or region to analyze performance within each group. Use Case Use the code below to prepare data for the total number of accounts and accounts engaged by the region and theater. Also, add the grand total to look at the global numbers and subtotals for both regions and theaters. SQL q = load "ABXLeadandOpportunities_Metric"; q = filter q by 'Source' == "ABX Opportunities" and 'CampaignType' == "Growth Sprints" and 'Territory_Level_01__c' is not null; q = foreach q generate 'Territory_Level_01__c' as 'Territory_Level_01__c','Territory_Level_02__c' as 'Territory_Level_02__c','Territory_Level_03__c' as 'Territory_Level_03__c', q.'AccountName' as 'AccountName',q.'OId' as 'OId','MarketingActionedOppty' as 'MarketingActionedOppty','AccountActionedAcct' as 'AccountActionedAcct','ADRActionedOppty' as 'ADRActionedOppty','AccountActionedADRAcct' as 'AccountActionedADRAcct'; q = group q by rollup ('Territory_Level_01__c', 'Territory_Level_02__c'); q = foreach q generate case when grouping('Territory_Level_01__c') == 1 then "TOTAL" else 'Territory_Level_01__c' end as 'Level1', case when grouping('Territory_Level_02__c') == 1 then "LEVEL1 TOTAL" else 'Territory_Level_02__c' end as 'Level2', unique('AccountName') as 'Total Accounts',unique('AccountActionedAcct') as 'Engaged',((unique('AccountActionedAcct') / unique('AccountName'))) as '% of Engaged'; q = limit q 2000; Filling the Missing Date Fields You can use the fill() function to create a record for missing date, week, month, quarter, and year records in your dataset. This comes very handy when you want to show the result as 0 for these missing days/weeks/months instead of not showing them at all. Use Case The following SAQL code allows you to track the number of tasks for the sales agents by the days of the week. In case the agents are on PTO you want to show 0 tasks. SQL q = load "Tasks_Metric"; q = filter q by 'Source' == "Tasks"; q = filter q by date('MetricDate_Year', 'MetricDate_Month', 'MetricDate_Day') in [dateRange([2024,4,23], [2024,4,30])]; q = group q by ('MetricDate_Year', 'MetricDate_Month', 'MetricDate_Day'); q = foreach q generate q.'MetricDate_Year' as 'MetricDate_Year', q.'MetricDate_Month' as 'MetricDate_Month', q.'MetricDate_Day' as 'MetricDate_Day', unique(q.'Id') as 'Tasks'; q = order q by ('MetricDate_Year' asc, 'MetricDate_Month' asc, 'MetricDate_Day' asc); q = limit q 2000; The code above will be missing two days where there were no tasks created. You can use the code below to fill in the missing days. SQL q = load "Tasks_Metric"; q = filter q by 'Source' == "Tasks"; q = filter q by date('MetricDate_Year', 'MetricDate_Month', 'MetricDate_Day') in [dateRange([2024,4,23], [2024,4,30])]; q = group q by ('MetricDate_Year', 'MetricDate_Month', 'MetricDate_Day'); q = foreach q generate q.'MetricDate_Year' as 'MetricDate_Year', q.'MetricDate_Month' as 'MetricDate_Month', q.'MetricDate_Day' as 'MetricDate_Day', unique(q.'Id') as 'Tasks'; q = fill q by (dateCols=(MetricDate_Year, MetricDate_Month, MetricDate_Day, "Y-M-D")); q = order q by ('MetricDate_Year' asc, 'MetricDate_Month' asc, 'MetricDate_Day' asc); q = limit q 2000; You can also specify the start date and end date to populate the missing records between these dates. Conclusion In the end, SAQL has proven itself as a powerful tool for the Salesforce developer community, empowering them to extract actionable business insights from the CRM datasets using capabilities like filtering, aggregation, windowing, time-analysis, blending, custom calculation, Salesforce integration, and performance optimization. In this article, we have explored various capabilities of this technology and focused on targeted use cases. As a next step, I would recommend continuing your learnings by exploring Salesforce documentation, building your data models using dataflow, and using SAQL capabilities to harness the true potential of Salesforce as a CRM.

By Kapil Kumar Sharma

How To Integrate Weather Tracking Into Web Applications With PHP

Weather tracking is a common requirement for many web applications, ranging from personal projects to commercial services. PHP, a server-side scripting language, can be a powerful tool for retrieving and displaying weather information on websites. In this beginner's guide, we'll explore how you can use PHP to track weather data and integrate it into your web applications. Getting Started With Weather APIs To track Weather using PHP, we'll need access to weather data from a reliable source. Fortunately, there are several weather APIs available that provide developers with access to real-time and forecast weather information. One popular weather API is OpenWeatherMap. It offers a wide range of weather data, including current weather conditions, forecasts, and historical weather data. To get started, you'll need to sign up for an API key, which you can obtain by registering on the OpenWeatherMap website. Retrieving Weather Data With PHP Once you have obtained your API key, you can use PHP to retrieve weather data from the OpenWeatherMap API. Here's a simple example of how you can make a request to the API and display the current weather conditions: PHP <?php // Replace 'YOUR_API_KEY' with your actual OpenWeatherMap API key $apiKey = 'YOUR_API_KEY'; // City and country code for the location you want to retrieve weather data for $city = 'London'; $countryCode = 'UK'; // API endpoint URL $url = "http://api.openweathermap.org/data/2.5/weather?q={$city},{$countryCode}&appid={$apiKey}"; // Make a request to the API $response = file_get_contents($url); // Decode the JSON response $data = json_decode($response, true); // Check if the request was successful if ($data && $data['cod'] === 200) { // Extract relevant weather information $weatherDescription = $data['weather'][0]['description']; $temperature = round($data['main']['temp'] - 273.15, 1); // Convert temperature from Kelvin to Celsius // Display weather information echo "<h2>Current Weather in {$city}, {$countryCode}</h2>"; echo "<p><strong>Temperature:</strong> {$temperature} °C</p>"; echo "<p><strong>Description:</strong> {$weatherDescription}</p>"; } else { // Display error message if request fails echo 'Failed to retrieve weather data.'; } ?> In this example, we construct a URL with the desired city and country code, along with our API key. We then make a request to the OpenWeatherMap API using file_get_contents() and decode the JSON response using json_decode(). Finally, we extract relevant weather information from the response and display it on the webpage. Enhancements and Considerations Error handling: It's important to implement error handling to gracefully handle situations where the API request fails or returns unexpected data. Caching: Consider implementing caching mechanisms to reduce the number of API requests and improve performance. Display formatting: You can enhance the display of weather information by incorporating CSS styling and additional details such as wind speed, humidity, and atmospheric pressure. Localization: Make your weather-tracking application accessible to users worldwide by supporting multiple languages and units of measurement. Security: Keep your API key secure by avoiding hardcoding it directly into your PHP files. Consider using environment variables or configuration files to store sensitive information. Conclusion Tracking weather using PHP can be a valuable addition to your web applications, providing users with up-to-date weather information for their desired locations. By leveraging weather APIs such as OpenWeatherMap and incorporating PHP to retrieve and display weather data, you can create dynamic and engaging experiences for your website visitors. With the foundational knowledge provided in this guide, you can explore further customization and integration possibilities to meet the specific needs of your projects.

By Santosh Sahu

Languages

DZone's Featured Languages Resources

Top Languages Experts

The Latest Languages Topics