1. Data Strategist
Ideally, before a company collects any data, it hires a data strategist—a senior professional who understands how data can create value for businesses.
According to the famous data strategist Bernard Marr, there are four main ways how companies in different fields can use data science:
They can make data-driven decisions.
Data can help create smarter products and services.
Companies can use data to improve business processes.
They could create a new revenue stream via data monetization.
Companies often outsource such data roles. They hire external consultants to devise a plan that aligns with the organizational strategy.
Once a firm has a data strategy in place, it is time to ensure data availability. This is when a data architect comes into play.
2. Data Architect
A data architect (or data modeler) plans out high-level database structures. This involves the planning, organization, and management of information within a firm, ensuring its accuracy and accessibility. In addition, they must assess the needs of business stakeholders and optimize schemas to address them.
Such data roles are of crucial importance. Without proper data architecture, key business questions may remain answered due to the lack of coherence between different tables in the database.
A data architect is a senior professional and often a consultant. To become one, you'd need a solid resume and rigorous preparation for the interview process.
3. Data Engineer
The role of data engineers and data architects often overlaps—especially in smaller businesses. But there are key differences.
Data engineers build the infrastructure, organize tables, and set up the data to match the use cases defined by the architect. What's more, they handle the so-called ETL process, which stands for Extract, Transform, and Load. This involves retrieving data, processing it in a usable format, and moving it to a repository (the firm's database). Simply put, they pipe data into tables correctly.
Typically, they receive many ad-hoc ETL-related tasks throughout their work but rarely interact with business stakeholders directly. This is one of the best-paid data scientist roles, and for good reason. You need a plethora of skills to work in this position, including software engineering.
Okay, let's recap.
As you can see, the jobs in data science are interlinked and complement each other, but each position has slightly different requirements. First come data strategists who define how data can serve business goals. Next, the architect plans the database schemas necessary to achieve the objectives. Lastly, the engineers build the infrastructure and pipe the data into tables.
4. Data Analyst
Data analysts explore, clean, analyze, visualize, and present information, providing valuable insights for the business. They typically use SQL to access the database.
Next, they leverage an object-oriented programming language like Python or R to clean and analyze data and rely on visualization tools, such as Power BI or Tableau, to present the findings.
Side note: What is Data analysis?
Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.
Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis.
Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination.
5. Business Intelligence Analyst
Data analyst's and BI analyst's duties overlap to a certain extent, but the latter has more of a reporting role. Their main focus is on building meaningful reports and dashboards and updating them frequently. More importantly, they have to satisfy stakeholders' informational needs at different levels of the organization.
6. Data Scientist
A data scientist has the skills of a data analyst but can leverage machine and deep learning to create models and make predictions based on past data.
We can distinguish three main types of data scientists:
# Traditional data scientists
# Research scientists
# Applied scientists
A traditional data scientist does all sorts of tasks, including data exploration, advanced statistical modeling, experimentation via A/B testing, and building and tuning machine learning models.
Research scientists primarily work on developing new machine learning models for large companies.
Applied scientists—frequently hired in big tech and larger companies—boast one of the highest-paid jobs in data science. These specialists combine data science and software engineering skills to productionize models.
More prominent companies prefer this combined skillset because it allows one person to oversee the entire ML implementation process—from the model building until productionization—which leads to quicker results. An applied scientist can work with data, model it for machine learning, select the correct algorithm, train the model, fine-tune hyperparameters, and then put the model in production.
As you can see, there's a significant overlap between data scientists, data analysts, and BI analysts. The image below is a simplified illustration of the similarities and differences between these data science roles.
7. ML Ops Engineer
Companies that don't have applied scientists hire ML Ops engineers. They are responsible for putting the ML models prepared by traditional data scientists into production.
In many instances, ML Ops engineers are former data scientists who have developed an engineering skillset. Their main responsibilities are to put the ML model in production and fix it if something breaks.
8. Data Product Manager
The last role we discuss in this article is that of а product manager. The person in this position is accountable for the success of a data product. They consider the bigger picture, identifying what product needs to be created, when to build it, and what resources are necessary.
A significant focus of such data science roles is data availability—determining whether to collect data internally or find ways to acquire it externally. Ultimately, product managers strategize the most effective ways to execute the production process.