Big Data Engineer
Recooty - Chicago, IL
Apply NowJob Description
Job Title: Sr. Big Data Engineer Location: San Francisco, CA (open to remote) Duration: 6 months (will extend; we have multiple consultants on this team that have been there 2+ years) Interview: 2 rounds (1st round 1-hour video technical interview, 2nd round 30 min formality personality call) Looking for a strong Big Data Engineer with experience in Spark, Scala, SQL, and Azure. The Architecture and Platform Organizations are seeking an experienced Big Data Engineer to build analytics and ML platforms to collect, store, process, and analyze huge sets of data spread across the organization. The platform will provide frameworks for quickly rolling out new data analysis for data-driven products and micro-services. The platform will enable machine/deep learning infrastructure that operationalizes data science models for broad consumption. You'll partner with end-to-end Product Managers and Data Scientists to understand customer requirements and design prototypes and bring ideas into production. You need to be an expert in design, coding, and scripting, writing high-quality code consistent with our standards, creating new standards as necessary, and demonstrating correctness with pragmatic automated tests. You'll review the work of other engineers to improve quality and engineering practices and participate in continuing education programs to grow your skills as a member of an Agile Engineering team. Ideally, you should have 5-8 years of experience as a Software Engineer, with experience in building distributed, scalable, and reliable data pipelines that ingest and process data at scale, both in batch and real-time. Strong knowledge of programming languages/tools including Java, Scala, Spark, SQL, Hive, and ElasticSearch is essential. Familiarity with most tools within the Hadoop Ecosystem is necessary, particularly Spark and Scala (Java if not Scala). Experience with streaming technologies such as Spark Streaming, Flink, or Apache Beam, along with Kafka, is a plus. Working experience with various NoSQL databases such as Cassandra, HBase, MongoDB, and/or Couchbase would be beneficial. Prior knowledge in Machine Learning or Deep Learning is a plus (this will be learned on the job). You will be working with the Marketing and Supply Chain side on a Personalization initiative, managing data feeds to and from 3rd party vendors doing analytics, marketing, and operations for email and catalog campaigns. Eventually, you will get into Machine Learning in areas of Product Recommendations on the site. The team is working in Spark in Scala to ingest transaction and clickstream data to generate associations and product recommendations. You will be involved in batch processing and real-time streaming projects, creating Spark Jobs & Azure Cloud using Azure tools for scheduling and workflow management for batch jobs. The team is currently migrating from Teradata to Microsoft Azure, building a new Data Platform using Spark and developing a data pipeline from transactional systems processed in Spark (framework written in Scala or Java). Key Responsibilities: Basic Transformations like filter, map & Actions like count, Group by, etc using Dataframe API Iterating over Scala collections Spark Parallelism - Data Ingestion from External RDBMS, Local Transformations Datawarehouse - Dimensions, Facts when to do full load vs Incremental, etc. Basic software engineering principles. Regards,Rakesh Kumar #J-18808-Ljbffr
Created: 2025-03-01