Monday, October 2, 2023
Monday, September 4, 2023
Interview for NLP and ML Profile (2 Sep 2023) - For Data Scientist Role at Accenture
Problem 1
First question related to NLP.
What is ‘Dependency Parsing’?
Dependency Parsing is the process to analyze the grammatical structure in a sentence and find out related words as well as the type of the relationship between them.
Each relationship:
- Has one head and a dependent that modifies the head.
- Is labeled according to the nature of the dependency between the head and the dependent.
In the above diagram, there exists a relationship between car and black because black modifies the meaning of car. Here, car acts as the head and black is a dependent of the head. The nature of the relationship here is amod which stands for “Adjectival Modifier”. It is an adjective or an adjective phrase that modifies a noun.
sentence = 'Deemed universities charge huge fees'
Using SpaCy
Using NLTK and NetworkX
Visualization using DOT Definition
Problem 2
2.1) What is NER?
2.2) How do you build a custom NER?
Answer 2.1
NER refers to Named Entity Recognition.
Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Ref: custom-named-entity-recognition-with-bert
train-ner-with-custom-training-data-using-spacy
Answer 2.2
Most articles you would find on through Google search about how a custom NER can be built would point to a solution showcasing a supervised version of this problem.
Two very common approaches to building Custom NER using supervised learning are:
1. Using BERT
2. Using SpaCy
SpaCy uses labeled data in the form of Annotations.
Ref: Custom_Named_Entity_Recognition_with_BERT
Ref: custom-named-entity-recognition-using-spacy
Problem 3
How do you decide if you need Machine Learning for solving a problem?
Solution 3
This is a subjective question.
You can start by explaining the definition of Machine Learning as given by Tom Mitchell:
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Then you can start by asking questions like if it is a problem that falls into one these problems that Machine Learning can solve such as:
1. Classification
2. Regression
3. Clustering
4. Outlier Detection
Or is it some problem related to Data Mining like Frequent Pattern mining, or Market Basket analysis.
Problem 4
How do you select an approach / a model for a Machine Learning problem?
Solution 4
Questions to ask the client so that you can figure out which model will work best:
1. Explainable AI
Do you need to explain every prediction to the client or customer?
If yes, you need Explainable AI as in Decision Tree, Logistic Regression, etc.
If no, you can go with Black Box AI models like Artificial Neural Networks, Deep Learning, etc.
2. Prediction Time
Does the model need to give out predictions instantly?
You need to know the prediction time of your model and compare it with other choices you have. For example: prediction time of Isolation Forest is lesser than prediction time of ANN for the problem of Outlier Detection.
3. Complexity
Is there a need for simple model or a complex model would work?
4. Maintainability
The project stakeholders may have specific requirements, such as maintainability and limited model complexity. As such, a model that has lower skill but is simpler and easier to understand may be preferred.
5. Available resources
If number of available resources (in terms of ‘computing power’ and ‘training data’) is not limited, you can think of deploying a Deep Learning based solution like the LLMs, for example.
On a side note:
Methods to assist in choosing the set of candidate models
# Data transformation (statistics)
# Exploratory data analysis
# Model specification
As in In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include.
# Scientific method (as in Hypothesis Testing)
Ref: wikipedia
Problem 5
What experience do you have of the ‘Cloud’?
Tags: Natural Language Processing,Machine Learning,Deep Learning Roadmap A Step-by-Step Guide to Learning Deep Learning
Introduction
Deep Learning, a subfield of Artificial Intelligence, has made astounding strides in recent years, powering everything from image recognition to language translation. If you're eager to embark on your journey into the world of Deep Learning, it's essential to have a roadmap. In this article, we'll provide you with a concise guide on the key milestones and steps to navigate as you master the art of Deep Learning.
Step 1: The Foundation - Understand Machine Learning Basics
Before diving deep, ensure you have a solid grasp of Machine Learning concepts. Familiarize yourself with supervised and unsupervised learning, regression, classification, and model evaluation. Books like "Machine Learning for Dummies" can be a great starting point.
Step 2: Python Proficiency
Python is the lingua franca of Deep Learning. Learn Python and its libraries, particularly NumPy, Pandas, and Matplotlib. Understanding Python is crucial as it's the primary language for developing Deep Learning models.
Step 3: Linear Algebra and Calculus
Deep Learning involves complex mathematics. Brush up on your linear algebra (vectors, matrices, eigenvalues) and calculus (derivatives, gradients) as they form the foundation of neural network operations.
Step 4: Dive into Neural Networks
RNNs are crucial for sequential data, such as natural language processing and time series analysis. Study RNN architectures, vanishing gradient problems, and LSTM/GRU networks.
Step 7: Deep Dive into Deep Learning Frameworks
Become proficient in popular Deep Learning frameworks like TensorFlow and PyTorch. These libraries simplify building and training complex neural networks.
Step 8: Projects and Hands-On Practice
For text-related tasks, delve into NLP. Learn about word embeddings, recurrent models for text, and pre-trained language models like BERT.
Step 10: Advanced Topics
Explore advanced Deep Learning topics like Generative Adversarial Networks (GANs), Reinforcement Learning, and transfer learning. Stay updated with the latest research through journals, conferences, and online courses.
Step 11: Model Optimization and Deployment
Understand model optimization techniques to make your models efficient. Learn how to deploy models in real-world applications using cloud services or on-device deployment.
Step 12: Continuous Learning
Deep Learning is a rapidly evolving field. Stay up-to-date with the latest research papers, attend conferences like NeurIPS and CVPR, and join online forums and communities to learn from others.
Conclusion
Full Stack Data Science with Python Course on Github
Monday, August 7, 2023
Enhancing AI Risk Management in Financial Services with Machine Learning
Introduction:
In your Python script, start by importing the necessary libraries:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, classification_report from sklearn.ensemble import RandomForestClassifier from xgboost import XGBClassifierLoad and preprocess your dataset:
data = pd.read_csv("financial_data.csv") X = data.drop("risk_label", axis=1) y = data["risk_label"]Step 3: Train-Test Split and Data Scaling
Split the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)Scale the features for better model performance:
scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)Step 4: Implement ML Models
In this example, we'll use two powerful ML models: Random Forest and XGBoost.
- Random Forest Classifier:
rf_model = RandomForestClassifier(n_estimators=100, random_state=42) rf_model.fit(X_train_scaled, y_train) rf_predictions = rf_model.predict(X_test_scaled) rf_accuracy = accuracy_score(y_test, rf_predictions) print("Random Forest Accuracy:", rf_accuracy) print(classification_report(y_test, rf_predictions))
- XGBoost Classifier:
xgb_model = XGBClassifier(n_estimators=100, random_state=42) xgb_model.fit(X_train_scaled, y_train) xgb_predictions = xgb_model.predict(X_test_scaled) xgb_accuracy = accuracy_score(y_test, xgb_predictions) print("XGBoost Accuracy:", xgb_accuracy) print(classification_report(y_test, xgb_predictions))Step 5: Evaluate and Compare
Evaluate the models' performance using accuracy and classification reports. Compare their results to determine which model is better suited for your risk management goals.
Conclusion:
AI-driven risk management is revolutionizing the financial services industry. By harnessing the capabilities of machine learning, financial institutions can accurately assess risks, make informed decisions, and ultimately ensure their stability and growth. In this article, we've demonstrated how to implement risk management using the best ML models in Python. Experiment with different models, fine-tune hyperparameters, and explore more advanced techniques to tailor the solution to your specific financial service needs. The future of risk management lies at the intersection of AI and finance, and now is the time to embrace its potential.
AI and Financial Risk Management – Critical Insights for Banking LeadersI hope this article was helpful. If you have any questions, please feel free to leave a comment below.