Site logo

What you’ll do Job Summary

The Big Data Engineer is responsible for designing, developing, and maintaining large-scale data processing systems. The role includes building efficient data pipelines, working with distributed technologies, and ensuring data availability, reliability, and performance for analytics and business use cases.

Key Responsibilities

1. Data Pipeline Development

Build scalable ETL/ELT pipelines for ingesting and transforming large datasets.

Develop batch and real-time data processing solutions using Apache Spark, Kafka, Hive, Flink, or similar tools.

Optimize data workflows for performance and cost.

2. Big Data Ecosystem Ownership

Work with Hadoop ecosystem components (HDFS, Hive, HBase, Oozie, MapReduce).

Manage and maintain data lakes, data warehouses, and big data clusters.

Implement partitioning, schema design, and optimized storage formats (Parquet, ORC).

3. Cloud Data Engineering

Develop data solutions on cloud platforms (AWS, Azure, GCP).

Use cloud-native big data services:

AWS: EMR, Glue, S3, Kinesis, Redshift

Azure: Databricks, ADLS, Synapse

GCP: Dataflow, BigQuery, Dataproc
Ensure compliance with security policies and governance frameworks.

Maintain documentation, metadata, and data lineage.

5. Collaboration

Work closely with data scientists, BI engineers, software engineers, and product owners.

Translate business requirements into scalable technical solutions.

Support analytics and machine learning workloads with high-quality datasets.

6. Performance Optimization

Tune Spark jobs, SQL queries, and cluster configurations.

Optimize data storage, compression, and file organization for faster processing.

Required Qualifications

Bachelors degree in Computer Science, Engineering, or related field.

Strong programming experience with Python, Java, or Scala.

Expertise in Apache Spark (batch + streaming) and Hadoop ecosystem.

Strong SQL skills with experience in large-scale distributed databases.

Experience with ETL tools and orchestration frameworks (Airflow, NiFi, Luigi).

Familiarity with cloud platforms (AWS, Azure, or GCP).

Preferred Qualifications

Experience with streaming tools (Kafka, Kinesis, Flink, Spark Streaming).

Knowledge of NoSQL databases (Cassandra, MongoDB, Redis).

Experience with Docker/Kubernetes for containerized data applications.

Understanding of data warehousing, data lakes, and Lakehouse architecture.

Experience with DevOps and CI/CD practices.

Strong exposure to performance tuning of distributed

systems.

Soft Skills

Strong analytical and problem-solving abilities.

Excellent communication and collaboration skills.

Ability to work in a fast-paced, agile environment.

Detail-oriented with a proactive approach.

No of Vacancies: 6
Print Job Listing
Loading...
Loading Image
Back

Share