Your absolute guide to managing Hadoop Logging Configurations

Hadoop became an essential componenet of the infrastructure of any company nowadays. There are different distributions maintained and managed by different companies like Cloudera, Databricks and AWS. The distribution managed by AWS is named EMR. This distribution is supposdly fully managed by AWS (Not everything). One of the things that…

Using Spark For Data Exploration

Spark is actively supported by Apache Open Source community, and it is used in production by many famous firms and companies. In this blog, the focus would be on productionizing Apache Spark. I will discuss the use cases of Spark and how to enable each of them on production environment.…

Productionizing Apache Spark (Data Pipelines)

Apache Spark On Production (for Data Pipelines) This is the second post about Running Spark On Production, you can read the first post from here In the first post, we talked briefly about spark and then discussed the data exploration use case and compared between the available different tools . In…