Why Separate the Data Engineer from the Data Scientist Role?

By Ed Crowley, CEO, Virtulytix, Inc.

Screen Shot 2018-05-30 at 2.18.15 PM.png

A Data Scientist is “a person employed to analyze and interpret complex digital data, such as the usage statistics of a website, especially in order to assist a business in its decision-making.”[1] This job title was developed during the early stages of the development of data science to describe the all-encompassing skill sets required to build data science solutions, particularly in the field of predictive and prescriptive analytics, machine learning, and other data science applications.

Most people tend to think of a Data Scientist as being very focused on developing and applying advanced statistical models and modelling techniques. Interestingly, real data science is mostly about working with data and in fact, according to IBM Data Scientist Tom Konchan, data science work is 80% data engineering and 20% analytics / statistics. This is exactly what we have found in our practice.

At Virtulytix, we develop SaaS predictive analytics solutions, create and deploy custom analytics solutions, and consult to clients on solutions development. In essence, we are a data analytics shop focused on the Industrial IoT, and one of our biggest challenges is how to get the most efficiency and effectiveness out our data science team. One of the ways we do this is by separating the Data Scientist and Data Engineering role.

Clearly, the two roles have many similarities and require some of the same skills, however, they are different enough to warrant splitting their roles to gain the optimum amount of efficiency.  As we look at it, the two roles require slightly different roles as shown below:

 

Screen Shot 2018-05-30 at 2.27.04 PM.png

 

So while Data Engineering is heavy into system integration, data structures and computer science – Data Scientists still need to have at least a working knowledge in these areas. And while the Data Scientist has deep experience in statistical techniques, data mining skills, and data visualization, the Data Engineer needs to have at least a working knowledge in these areas. So, it’s not an mutually exclusive skills set, but rather a matter of the depth of skills for each position.

By separating the roles, we find that our time to deployment, model development, and implementations all happen faster due to the specialization by our team members. It is important to acknowledge that our Data Scientists and Data Engineers work as a very close-knit team, with almost continuous daily interaction. “Throwing things over the wall” just doesn’t work in this area due to the inter-relatedness of each discipline.

[1]https://www.google.com/search?q=definition+of+data+scientist&oq=definition+of+data+scientist&aqs=chrome..69i57.9878j0j4&sourceid=chrome&ie=UTF-8

2https://en.wikipedia.org/wiki/Data_science