![]() Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce on EC2.Involved in the Complete Software development life cycle (SDLC) to develop the application.Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.Įnvironment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.Used Jira for bug tracking and BitBucket to check-in and checkout code changes.Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.Developed shell scripts for running Hive scripts in Hive and Impala.Integrated Hive and Tableau Desktop reports and published to Tableau Server.Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.Developed Kafka consumer's API in Scala for consuming data from Kafka topics.Developed Spark core and Spark SQL scripts using Scala for faster data processing.Developed Spark scripts to import large files from Amazon S3 buckets.Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.Involved in performance tuning of Hive from design, storage and query perspectives.Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.Developed Spark API to import data into HDFS from Teradata and created Hive tables.Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.Experienced in working with different file formats - Avro, Parquet, RC and ORC.Experience with Azure Components like Azure SQl Database and Data Factory.Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.Rich experience in automating Sqoop and Hive queries using Oozie workflow.Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it. ![]() ![]() Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.Experience in designing tables and views for reporting using Impala.Experience in developing Hive UDF's and running hive scripts using different execution engines like Tez and Spark (Hive on Spark ). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |