Big Data Architect
Full TimeBookmark Details
Big Basket
What you’ll do Job Summary
The Big Data Engineer is responsible for designing, developing, and maintaining large-scale data processing systems. The role includes building efficient data pipelines, working with distributed technologies, and ensuring data availability, reliability, and performance for analytics and business use cases.
Key Responsibilities
1. Data Pipeline Development
Build scalable ETL/ELT pipelines for ingesting and transforming large datasets.
Develop batch and real-time data processing solutions using Apache Spark, Kafka, Hive, Flink, or similar tools.
Optimize data workflows for performance and cost.
2. Big Data Ecosystem Ownership
Work with Hadoop ecosystem components (HDFS, Hive, HBase, Oozie, MapReduce).
Manage and maintain data lakes, data warehouses, and big data clusters.
Implement partitioning, schema design, and optimized storage formats (Parquet, ORC).
3. Cloud Data Engineering
Develop data solutions on cloud platforms (AWS, Azure, GCP).
Use cloud-native big data services:
AWS: EMR, Glue, S3, Kinesis, Redshift
Azure: Databricks, ADLS, Synapse
GCP: Dataflow, BigQuery, Dataproc
Ensure compliance with security policies and governance frameworks.
Maintain documentation, metadata, and data lineage.
5. Collaboration
Work closely with data scientists, BI engineers, software engineers, and product owners.
Translate business requirements into scalable technical solutions.
Support analytics and machine learning workloads with high-quality datasets.
6. Performance Optimization
Tune Spark jobs, SQL queries, and cluster configurations.
Optimize data storage, compression, and file organization for faster processing.
Required Qualifications
Bachelors degree in Computer Science, Engineering, or related field.
Strong programming experience with Python, Java, or Scala.
Expertise in Apache Spark (batch + streaming) and Hadoop ecosystem.
Strong SQL skills with experience in large-scale distributed databases.
Experience with ETL tools and orchestration frameworks (Airflow, NiFi, Luigi).
Familiarity with cloud platforms (AWS, Azure, or GCP).
Preferred Qualifications
Experience with streaming tools (Kafka, Kinesis, Flink, Spark Streaming).
Knowledge of NoSQL databases (Cassandra, MongoDB, Redis).
Experience with Docker/Kubernetes for containerized data applications.
Understanding of data warehousing, data lakes, and Lakehouse architecture.
Experience with DevOps and CI/CD practices.
Strong exposure to performance tuning of distributed
systems.
Soft Skills
Strong analytical and problem-solving abilities.
Excellent communication and collaboration skills.
Ability to work in a fast-paced, agile environment.
Detail-oriented with a proactive approach.
Share
Facebook
X
LinkedIn
Telegram
Tumblr
Whatsapp
VK
Mail