Approach to generate case Classes from the Complex Nested JSON Schema in Spark.md Approach to generate case Classes from the Complex Nested JSON Schema in Spark I worked in the Big Data Projects which does the ETL (Extract-Transform-Load) jobs from Oracle RDBMS to big data ecosystem (Spark, HDFS and Hive). Doing an ETL for lots of clients isn’t a easy task. You have to do lots for Oracle ETL query conversion to Spark query. You need to convert the functions that are in-built and custom, apparently you have to write UDF functions to handle it. So doing all these type of work we need a framework to maintain code, run unit testcases and scripts to validation. In the framework I mentioned, case class plays an important role during the extraction and validation especially handling the parquet files. case class is the meta-data for the tables or data files that we are dealing. Usally it contains the field (column) name and its datatype case class makes the spark to understand the t...
- Get link
- X
- Other Apps