Data Engineer

January 31, 2022
Apply Now

Job Description

Job Description:

Overview:

The Data Engineer would need to have at least 5-10 years of experience. In this role, you will be responsible for the analysis, design, documentation, development, unit testing, and support of data integration and database objects development for software applications.

As the Data Engineer, you will provide support and guidance regarding Data Integration and best practices regarding Crowe’s Master Data Management (MDM) and Data Governance initiatives.

Responsibilities:

Collaborates with stakeholders and development team members to achieve business results. Work closely with other engineers to integrate databases with other applications.
Responsible for design, development, and implementation of database applications and solutions for managing and integrating data between operational systems, data repositories, and reporting and analytical applications. This includes but is not limited to ETL, stored procedures, views, and functions to support Master Data Management (MDM) initiatives.
Recommends and provide guidance regarding Data Integration and database development, T-SQL best practices, and standards to the development team members as needed.
Create and propose technical design documentation which includes current and future functionality, database objects affected, specifications, and flows/diagrams to detail the proposed database and/or Data Integration implementation.
Participates in industry and other professional networks to ensure awareness of industry standards, trends and best practices in order to strengthen organizational and technical knowledge.
Provides support for investigating and troubleshooting production issues.
Participate in the establishment of group standards and processes. Participates in the Communities of Practice.
Works continually on improving performance of source code using industry standard methodologies.
Create, maintain and advise on optimal data pipeline architecture from data ingestion to target for various data initiatives.

Design and implement ETL or ELT.

Assemble large, complex data sets that meet functional / non-functional business requirements.

Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.

Build the infrastructure required for optimal extraction, transformation, and loading of data (structured and semi-structured) from a wide variety of data sources using SQL and Azure technologies. Exposure to Snowflake a huge plus.

Create scripts that provide details about data quality, operational efficiency and other critical metrics.

Work with stakeholders including the executives, data owners and design teams to assist with data-related technical issues and support their data infrastructure needs.

Design and implement solutions that ensure adherence to the applicable regulations and client contractual obligations.

Create data tools for Data Transformation, Business Unit, and Data Science team members that assist them in building and optimizing our product into an innovative industry leader.

Work with data and analytics experts to strive for greater functionality in our data systems.

Assist with applying and monitoring data quality metrics from the enterprise quality library on the key fields of datasets.

Develop scripts and processes to migrate data across platforms.

Act as an expert with all facets of the data life cycle and advise product development teams on data modeling.

Identify sensitive and regulated data elements within product schemas that are mandated to be de-identified in non-production environments.

Help the product development teams to build de-identified development, testing and training datasets that are compliant with regulatory requirements.

Collaborate with the Information Security team to ensure that de-identified datasets are appropriate for usage.

Register the de-identified datasets in the Azure data catalog with the finalized metadata and other contextual information.

Experience ensuring a high standard of quality and experience with handling PII and PHI data.

Knowledge, Skills, and Abilities

3+ years SQL Server 2019 and Microsoft Azure
3+ years Data Integration technologies and principles
Advanced knowledge of T-SQL including complex SQL queries (ex: using various joins and sub-queries) and best practices
Experience with index design and T-SQL performance tuning techniques
Experience integrating data from structured and unstructured formats: flat files, XML, EDI, JSON, EXCEL
Working knowledge and experience in online transactional (OLTP) processing and analytical processing (OLAP) databases and schemas
Experience with Technical Design and Data Modeling
Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.

Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.

Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.

Strong analytic skills related to working with semi-structured datasets.

Build processes supporting data transformation, data structures, metadata, dependency and workload management.

A successful history of manipulating, processing and extracting value from large disconnected datasets.

Working knowledge of message queuing, stream processing, and highly scalable high volume data stores.

Strong project management and organizational skills.

Experience supporting and working with cross-functional teams in a dynamic environment.

They should also have experience using the following software/tools:

Deep experience with MS SQL, SSIS and Azure technologies (must have).

Deep experience with object-oriented/object function scripting languages: Shell, PowerShell, Python, Javascript, C#, etc. (must have).

Prior experience with data de-identification, scrubbing and masking. (must have).

Exposure to synthetic data generation tools, methods, and architectures. (nice to have)

Experience with Snowflake (nice to have).

Experience with NoSQL databases, including Azure Cosmos DB, Redis and MongoDB (nice to have).

Experience with high volume data tools: Azure Blob Storage, Azure Data Lake, HDFS etc. (nice to have).

Experience with data de-identification tools, like Redgate Data Masker, etc. (nice to have).

The following knowledge is not required, but is preferred:

Experience in distributed architectures such as Microservices, SOA, and RESTful APIs
Master Data Management tools & systems and how they are used, specifically Profisee