Blog

Data Engineer vs Data Scientist – What’s the Difference?

05 Aug, 2021
Xebia Background Header Wave

People unfamiliar with data operations sometimes ask – is a data engineer the same as a data scientist? The answer is obviously no. However, these roles share major similarities which make them intertwined.

To help you get the whole picture, I’ll explain the difference between data scientist and data engineer in more detail.

Data Engineering vs Data Science – General Overview

Broadly speaking, data engineers develop and maintain solutions that enable data scientists to access and analyze data on a big scale (like Cloud infrastructure). Data engineering involves developing, constructing, testing, and maintaining architectures such as databases and large-scale processing systems. As a result, this role is relatively close to a DevOps with data specialization – data engineers should know how to develop data models, build data pipelines, and oversee ETL workflows.

On the other hand, data scientists set up and train predictive models with the data they’ve received from data engineers. This requires developing insights and running experiments on data samples; as a result, data scientists spend most of their time forming, testing, and tweaking algorithms for machine learning. Next, they also present their analyses to executives and decision-makers.

Data Scientist vs Data Engineer

However, this is only a theoretical (and trivialized) perspective. So now, I’m going to dive deeper into technical specifics.

The Difference Between Data Scientist and Data Engineer – Technical Perspective

As I’ve mentioned in the intro, data engineering and data science operate in similar areas. This makes their responsibilities intertwined. Moreover, both experts will sometimes perform similar tasks. However, the specifics may depend on how an enterprise defines these roles.

Let’s take data cleaning, for instance. Depending on whom you’ll ask, you can end up with a completely different picture of this responsibility. Some will say it’s an area data scientists cover; others will point out its part of the ETL workflow, which is operated by data engineers. This discrepancy probably stems from the fact that both data experts indeed have much to do with data cleaning – however, on a different level. Data scientists will often clean data manually, while data engineers automate the cleaning processes. And there’s a concrete reason for that.

But before I explain what this reason is, let’s take a look at this graph for a broader responsibility comparison:

 

Data Scientist vs Data Engineer

Experiments vs Production

Now, this different focus within the same areas comes from the fact that scientists mainly conduct experiments. As a result, they work on data samples. Data engineers, on the other hand, operate on large data streams and batches, automating data pipelines, and, finally, deploying models on production.

Getting back to our data cleaning, I’ll give you this example. We’ve already established both roles clean data, however, they usually do it differently. When data scientist cleans data during experiments, the files their working on can have, for example, 10 000 rows of information each. In contrast, when data engineers clean data, they may operate on files that have… 10 000 000 000 rows. You wouldn’t want to go through such a file manually, right? Obviously, data scientists don’t really do it row-by-row. However data engineers automate the process, and cleaning happens automatically as soon as new data arrives. So, their work is all about repeatability and auditability.

Data Engineer vs Data Scientist – Exemplary Cooperation Within a Project

To give you a more practical insight into how a cooperation between a data engineer and data scientist may look like, I’ll take you through an exemplary ML project.

Start

To start working, our data experts need data (you can call me Paolo Coelho). In commercial projects, this data will either be delivered by the client, or by data engineers (after they find a suitable source). During this initial stage, data engineers will also clean the data for the first time – although vaguely, since, at this point, the main goal is to deliver the data.

Next, data scientists will start experimenting on a data sample taken from the cleaned data. They will clean it further, perform an exploratory data analysis, work on new feature engineering, and, most importantly, on models for Machine Learning. At some point, they will likely also ask data engineers for more usable data.

At the same time, data engineers will start automation processes and work on underlying infrastructures that will empower data scientists. They can also further automate data cleaning with the data polished by data scientists.

Cycles

At this stage, the responsibilities of both data professions will likely happen in a cycle. Data scientists will train more models based on new data, and data engineers will develop and automate relevant architectures, and so on. Finally, this stage ends when data scientists develop a satisfactory predictive model (with code).

With that model, data engineers can develop a model training automation, and tend to hosting and monitoring. At the same time, data scientists work on getting more insights from the data source, and, after data engineers are done with their part, collect feedback (and analyze it).

Production

Finally, if the model is deemed good enough and ready to be produced, data engineers start the production phase.

Additionally, if visualization is needed, data scientists deliver the WHAT (what is worth visualizing), and data engineers work on the HOW (how this visualization can be done technically).

Data Engineer vs Data Scientist Salary

Another very popular question is: who gets paid more – data engineer or data scientist?

The answer very much depends on the region in question. In the US, on paper, data scientists earn sightly more. According to Glassdoor, the average annual base pay of a data scientist in the United States is $115,594. Analogically, data engineers are reported to make around $111,246 per year.  

However, in the UK, the answer is quite opposite. Indeed estimates that British data scientists earn £50,769 per year, while data engineers supposedly make an average of £56,376 per year.

In Poland – where I work – Devire reports that data engineers employed in the capital statistically make 18 000 PLN per month, and data scientists earn an average of 20 000 PLN.

Yet, I must point out that the typical responsibilities of data scientists and data engineers may be understood differently in Europe and the US – and that’s where these slight salary discrepancies probably come from.

Ultimately, I think it’s safe to say that both roles pay similarly. And since data services are currently in high demand, experts providing data science solutions and data engineering services should expect relatively high salaries.

Data Engineer vs Data Scientist vs… Data Analyst?

There’s also a 3rd data role I’d like to mention briefly – the data analyst.

While data scientists work on new ways of using and analyzing data, data analysts mainly make sense of data that already has a clear purpose. Practically, data analysts will analyze numeric data (and, possibly, more) to help executives with their decision making.

Compared to data engineers and data scientists, the data analyst is usually an entry-level position within the world of data. Edureka reports that the main prerequisites of landing a data analyst job are a relevant bachelor’s degree and some good statistical knowledge. Technical skills are a plus, that can help an applicant stand out among other candidates.

Data Engineer vs Data Scientist – Summary

As you can see, the data engineer and data scientist are two different roles – but they have a lot in common.

However, I must emphasize again that the understanding of data engineering and data science may vary significantly depending on the region. For instance, in the US, data engineering responsibilities are less specified than in Europe; as a result, the role is viewed more broadly.

But if you’re in an organization that’s looking to develop its own data solutions, you won’t have to worry about these issues. A software partner will start by learning (or identifying) your business needs and adjusting roles and projects to them – you won’t have to define data engineering and data science roles on your own. In this case, the data scientists will usually start by interviewing you and analyzing your operations and data. Thanks to both a good technical skillset and a deepened business understanding, they will be able to work on insights that can deliver optimal results.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts