Posts

Approach to generate case Classes from the Complex Nested JSON Schema in Spark

Approach to generate case Classes from the Complex Nested JSON Schema in Spark.md Approach to generate case Classes from the Complex Nested JSON Schema in Spark I worked in the Big Data Projects which does the ETL (Extract-Transform-Load) jobs from Oracle RDBMS to big data ecosystem (Spark, HDFS and Hive). Doing an ETL for lots of clients isn’t a easy task. You have to do lots for Oracle ETL query conversion to Spark query. You need to convert the functions that are in-built and custom, apparently you have to write UDF functions to handle it. So doing all these type of work we need a framework to maintain code, run unit testcases and scripts to validation. In the framework I mentioned, case class plays an important role during the extraction and validation especially handling the parquet files. case class is the meta-data for the tables or data files that we are dealing. Usally it contains the field (column) name and its datatype case class makes the spark to understand the t...

Processing Large JSON dataset with Spark SQL with better performance and optimization