By Nameeta Raj
What constitutes the backbone of any data science project? According to me it is not the statisticians, not the analytics, and not the predictive model but the data, its quality and quantity. So, who is the chief enabler for a firm to obtain this data? It is none other than the data engineers.
Data needs to flow in, not once or twice, but in a continuous fashion, in most cases real time. Data engineers create robust pipelines to enable continuous inflow of data. A business problem with accessible data to solve the problem sounds like a tempting proposition to jump start a data science project, but who will extract useful data from the 41 million tweets or a host of images posted on social media? If the existing data volume and the different forms in which data is present are not enough of a challenge, the velocity with which the data is increasing adds to the evil. A data science team which does not possess data engineering skill can lead data science projects down the drain.
A well-conceived data science idea is like a seed. Once sown it survives only if it receives the right amount of water, sunlight and nutrients. In a similar fashion data science projects are dependent on data engineers to supply data to it in its initial phases and build pipelines to ensure continuous data inflow throughout its life cycle. Data engineering and data science skills can overlap. We have seen data scientists who create pipelines but if you want to ensure optimal resource usage and increased productivity it is time to introduce data engineering expertise into your analytic team.