Your absolute guide to managing Hadoop Logging Configurations

Hadoop became an essential componenet of the infrastructure of any company nowadays. There are different distributions maintained and managed by different companies like Cloudera, Databricks and AWS. The distribution managed by AWS is named EMR. This distribution is supposdly fully managed by AWS (Not everything). One of the things that…

Using Spark For Data Exploration

Spark is actively supported by Apache Open Source community, and it is used in production by many famous firms and companies. In this blog, the focus would be on productionizing Apache Spark. I will discuss the use cases of Spark and how to enable each of them on production environment.…

Productionizing Apache Spark (Data Pipelines)

Apache Spark On Production (for Data Pipelines) This is the second post about Running Spark On Production, you can read the first post from here In the first post, we talked briefly about spark and then discussed the data exploration use case and compared between the available different tools . In…

Hive, a must known tool for any data engineer

Hive is a data warehouse system built on top of hadoop for allowing querying and managing data sets. Who ? Hive was created by Facebook and is currently highly adopted by many firms including Netflix, Facebook and Bookings. Why ? Actually not everyone is fond of writing java programs for every problem…

Simpe Redis: A Simple Interface For Using Redis

In my last post, I talked briefly about Redis and how to install it. In this post, I will try to go deeper and will introduce a very simple interface for using Redis in seconds from Java. At first I would like to introduce you to some important commands in…

Redis : Installation and configuration

Redis is a famous caching layer and in-memory database that is used in a lot of large-scale projects. Redis is used by Twitter GitHub, Pinterest, Snapchat, StackOverflow and Flickr. It supports data structures such as strings, hashes, lists, sets, sorted sets, bitmaps and geospatial indexes with radius queries. Some common…

Intro to Hadoop and HDInsight in Microsoft Azure

Hi, in this blog post, I will try to give you some info about Hadoop and Microsoft distribution of Hadoop which is called HDInsight. Hadoop is one of the most famous No Sql and big data solutions. Hadoop is already used by big entities like Facebook , Twitter , yahoo and many…