Monday, September 4, 2023

Interview for NLP and ML Profile (2 Sep 2023) - For Data Scientist Role at Accenture

Problem 1

First question related to NLP.

What is ‘Dependency Parsing’?

Dependency Parsing is the process to analyze the grammatical structure in a sentence and find out related words as well as the type of the relationship between them.

Each relationship:

- Has one head and a dependent that modifies the head.

- Is labeled according to the nature of the dependency between the head and the dependent.

Ref: towardsdatascience

In the above diagram, there exists a relationship between car and black because black modifies the meaning of car. Here, car acts as the head and black is a dependent of the head. The nature of the relationship here is amod which stands for “Adjectival Modifier”. It is an adjective or an adjective phrase that modifies a noun.

sentence = 'Deemed universities charge huge fees'

Using SpaCy

Using NLTK and NetworkX

Visualization using DOT Definition

Problem 2

2.1) What is NER?

2.2) How do you build a custom NER?

Answer 2.1

NER refers to Named Entity Recognition.

Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Ref: custom-named-entity-recognition-with-bert

train-ner-with-custom-training-data-using-spacy

Answer 2.2

Most articles you would find on through Google search about how a custom NER can be built would point to a solution showcasing a supervised version of this problem.

Two very common approaches to building Custom NER using supervised learning are:

1. Using BERT

2. Using SpaCy

SpaCy uses labeled data in the form of Annotations.

Ref: Custom_Named_Entity_Recognition_with_BERT

Ref: custom-named-entity-recognition-using-spacy

Problem 3

How do you decide if you need Machine Learning for solving a problem?

Solution 3

This is a subjective question.

You can start by explaining the definition of Machine Learning as given by Tom Mitchell:

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Then you can start by asking questions like if it is a problem that falls into one these problems that Machine Learning can solve such as:

1. Classification

2. Regression

3. Clustering

4. Outlier Detection

Or is it some problem related to Data Mining like Frequent Pattern mining, or Market Basket analysis.

Problem 4

How do you select an approach / a model for a Machine Learning problem?

Solution 4

Questions to ask the client so that you can figure out which model will work best:

1. Explainable AI

Do you need to explain every prediction to the client or customer?

If yes, you need Explainable AI as in Decision Tree, Logistic Regression, etc.

If no, you can go with Black Box AI models like Artificial Neural Networks, Deep Learning, etc.

2. Prediction Time

Does the model need to give out predictions instantly?

You need to know the prediction time of your model and compare it with other choices you have. For example: prediction time of Isolation Forest is lesser than prediction time of ANN for the problem of Outlier Detection.

3. Complexity

Is there a need for simple model or a complex model would work?

4. Maintainability

The project stakeholders may have specific requirements, such as maintainability and limited model complexity. As such, a model that has lower skill but is simpler and easier to understand may be preferred.

5. Available resources

If number of available resources (in terms of ‘computing power’ and ‘training data’) is not limited, you can think of deploying a Deep Learning based solution like the LLMs, for example.

On a side note:

Methods to assist in choosing the set of candidate models

# Data transformation (statistics)

# Exploratory data analysis

# Model specification

As in In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include.

# Scientific method (as in Hypothesis Testing)

Ref: wikipedia

Problem 5

What experience do you have of the ‘Cloud’?

survival8

Pages