Gautam Goswami

CORE

Founder at DataView

Bangalore, IN

Joined Sep 2020

https://dataview.in/

About

Enthusiastic about learning and sharing knowledge on Big Data, Data Science & related headways including data streaming platforms through knowledge sharing platform Dataview.in. Presently serving as Head of Engineering & Data Streaming at Irisidea TechSolutions, Bangalore, India. https://www.irisidea.com/gautam-goswami/

Stats

Reputation:	1299
Pageviews:	248.1K
Articles:	32
Comments:	2

Expertise

Big Data

Articles
Comments

Articles

Partitioning Hot and Cold Data Tier in Apache Kafka Cluster for Optimal Performance

Discover how by partitioning the hot and cold data tiers in the Apache Kafka Cluster, we can optimize storage resources based on data characteristics.

June 28, 2024

· 5,149 Views · 2 Likes

Real-Time Data Transfer from Apache Flink to Kafka to Druid for Analysis/Decision-Making

In this article, follow an outline of the steps to transfer processed data from Flink 1.18.1 to a Kafka 2.13-3.7.0 topic.

April 15, 2024

· 2,032 Views · 2 Likes

Streaming Real-Time Data From Kafka 3.7.0 to Flink 1.18.1 for Processing

Flink seamlessly integrates with Kafka and offers robust support for exactly-once semantics, ensuring each event is processed precisely once. Learn more here.

March 10, 2024

· 10,516 Views · 2 Likes

Why Apache Kafka and Apache Flink Work Well Together to Boost Real-Time Data Analytics

Use Flink and Kafka to create reliable, scalable, low-latency real-time data processing pipelines with fault tolerance and exactly-once processing guarantees.

February 13, 2024

· 4,091 Views · 1 Like

Integrating Rate-Limiting and Backpressure Strategies Synergistically To Handle and Alleviate Consumer Lag in Apache Kafka

Kafka Consumer Lag refers to the variance between the most recent message within a Kafka topic and the message that has been processed by a consumer. This lag may arise when the consumer struggles to match the pace at which new messages are generated and appended to the topic.

January 23, 2024

· 2,373 Views · 2 Likes

Leveraging Apache Kafka for the Distribution of Large Messages

In this article, we will explore the architectural approach for separating the actual payload (the large video file) from the message intended to be circulated via Kafka.

December 19, 2023

· 3,838 Views · 3 Likes

The Zero Copy Principle With Apache Kafka

When doing computer processes, the zero-copy technique is employed to prevent the CPU from being used for data copying across memory regions.

November 17, 2023

· 2,664 Views · 1 Like

Understanding Supervisor in Apache Druid

A supervisor is a built-in part of Druid, making it easier to ingest, analyze, and monitor data in real-time. Learn more!

October 16, 2023

· 2,595 Views · 3 Likes

Causes and Remedies of Poison Pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts.

September 25, 2023

· 3,936 Views · 3 Likes

Apache Kafka’s Built-In Command Line Tools

I want to highlight the five scripts/tools that I believe will have the biggest influence on your development work, mostly related to real-time data stream processing.

August 21, 2023

· 2,345 Views · 2 Likes

The Significance of Deep Storage in Apache Druid

Druid’s Deep storage guarantees long-term data persistence even if data is deleted from the live cluster after compaction.

July 7, 2023

· 3,176 Views · 2 Likes

Forging Druid With Apache Kafka for Real-Time Streaming Analytics

A real-time analytics database called Apache Druid can be leveraged very effectively where real-time ingestion, fast query performance, and high uptime are crucial.

June 16, 2023

· 4,393 Views · 1 Like

Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, we should first carefully examine the replication process in the Kafka broker.

June 1, 2023

· 3,776 Views · 1 Like

Handling Bad Messages via DLQ by Configuring JDBC Kafka Sink Connector

When an error occurs, or bad data is encountered by the JDBC Kafka sink connector, these unprocessed messages are forwarded to the DLQ.

April 11, 2023

· 4,801 Views · 1 Like

Streaming Data to RDBMS via Kafka JDBC Sink Connector Without Leveraging Schema Registry

This article covers the biggest difficulty with the JDBC sink connector: it requires knowledge of the schema of data that has already landed on the Kafka topic.

February 22, 2023

· 7,021 Views · 2 Likes

Intrinsic Aspects of Apache ZooKeeper and Their Importance

This article explores ZNodes, sessions, watches, quorum, transactions, and local storage and snapshots, all aspects of Apache ZooKeeper.

January 23, 2023

· 1,704 Views · 1 Like

Internal Components of Apache ZooKeeper and Their Importance

In this article, readers will learn about the internal components of Apache ZooKeeper. The key concept is the zNode, which be acted as files or directories.

January 20, 2023

· 4,520 Views · 2 Likes

Resolve Apache Kafka Starting Issue Installed on Single/Multi-Node Cluster

Without integrating Apache Zookeeper, Kafka alone won’t be able to form the complete Kafka cluster.

January 12, 2023

· 3,411 Views · 1 Like

Processing of Streaming Data: Kappa vs Lambda Architectures

In today’s Big Data landscape, Lambda architecture is a new archetype for handling a vast amount of data. How does it compare to Kappa architecture?

August 19, 2022

· 5,732 Views · 1 Like

The Lakehouse: An Uplift of Data Warehouse Architecture

This article highlights how an architectural pattern is enhanced and transformed into a traditional data warehouse, eventually turning it into a data lakehouse.

April 5, 2022

· 5,336 Views · 5 Likes

A Short Introduction to Apache Iceberg

This tutorial shows how to use Apache Iceberg in order to address data consistency and performance issues. Read on to see how it can help you!

August 20, 2021

· 8,841 Views · 3 Likes

Confluent’s Kafka REST Proxy, The Silk Route for Data Movement to Operational Kafka Cluster

In this article, I am going to detailing out the steps to integrate the prebuilt versions of Confluent REST Proxy with running a multi-broker Apache Kafka cluster.

June 13, 2021

· 18,432 Views · 3 Likes

Resolving Permission Issue in Multi-node Hadoop Cluster

It has been observed when we configure and deploy a multi-node Hadoop cluster or add new DataNodes, there is an SSH permission issue in communication with Hadoop daemons.

April 22, 2021

· 6,866 Views · 2 Likes

Data Ingestion From RDBMS: Leveraging Confluent's JDBC Kafka Connector

Kafka Connect assumes a significant part for streaming data between Apache Kafka and other data systems. Importing data from the Database set to Kafka topic.

April 17, 2021

· 6,997 Views · 3 Likes

How Checksum Smartly Manages Data Integrity in HDFS

In two words, data integrity can be defined as an assurance of the accuracy and consistency of data throughout the entire life cycle.

March 16, 2021

· 5,457 Views · 2 Likes

Resolving a Common Error in Apache Zookeeper

Explains how to resolve: Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain” when starting Apache Zookeeper.

Updated January 28, 2021

· 10,739 Views · 4 Likes

Streaming Data From Files Into Multi-Broker Kafka Clusters

FileSource and FileSink Connector can be leveraged for streaming data from a text file to a multi-broker Apache topic and subsequently sink to another file.

January 16, 2021

· 9,221 Views · 3 Likes

Coupling Schema Registry (Confluent) With Multi-Broker Apache Kafka Cluster

We will explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment).

December 15, 2020

· 4,677 Views · 1 Like

Install and Configuration of Apache Hive-3.1.2 on Multi-Node

The Apache Hive is a data warehouse system built on top of the Apache Hadoop. Hive can be utilized for easy data summarization, and more!

December 2, 2020

· 14,921 Views · 2 Likes

Setup Zookeeper Cluster – A Minute Chore

Apache Zookeeper’s functionalities are not legitimately noticeable to end-client however it remains as the spine for hyped components like Hadoop to oversee.

November 24, 2020

· 10,077 Views · 3 Likes

Comments

Apache Kafka in a Smart City Architecture

Mar 15, 2021 · Kai Wähner

Nice read.

Install and Configure Confluent Platform (Kafka) in AWS EC2 Instance RHEL 8

Dec 01, 2020 · Enrico Rafols Dela Cruz

Nicely explained.