Single Choice Correct Ques 1. Which of the following models gives higher weightage to stop words? - Bag of words model (Correct) - TF-IDF model Ques 2. Suppose you want to create a TF-IDF vector and you learn that it is possible that your current data doesn't contain all the words you are likely to see in real life. You then decide to use the dictionary of the language as the features for the vector. Which of these problems are you likely to run into? - TF-IDF values for some words will be 0. - TF-IDF vectorization throws divide by zero error for words not in the corpus (Theoretically Correct) - TF-IDF values for all words will be less than 1.Practically, feature_extraction module does not produce feature out of words during testing time that it did not see during training time.
from sklearn.feature_extraction.text import TfidfVectorizer import sklearn print(sklearn.__version__) # 0.24.1 corpus = [ 'This is the first document.', 'This document is the second document.', 'And this is the third one.', 'Is this the first document?', ] vectorizer = TfidfVectorizer(smooth_idf = False, min_df = 0) X = vectorizer.fit_transform(corpus) # array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'], ...) print("X:") print(X) print() print("X.shape: " + str(X.shape)) # (4, 9) print() corpus_test = ['This is Ashish Jain speaking.'] X_test = vectorizer.transform(corpus_test) print("X_test:") print(X_test) print("X_test.shape: ") print(X_test.shape) print("type(X_test): " + str(type(X_test)))Output
(base) $ python "tfidf - experimenting with parameters.py" 0.24.1 X: (0, 1) 0.4694172843223779 (0, 2) 0.6172273175654565 (0, 6) 0.3645443967613799 (0, 3) 0.3645443967613799 (0, 8) 0.3645443967613799 (1, 5) 0.6095324555037936 (1, 1) 0.6578266523342082 (1, 6) 0.25543053926412473 (1, 3) 0.25543053926412473 (1, 8) 0.25543053926412473 (2, 4) 0.5324851898636142 (2, 7) 0.5324851898636142 (2, 0) 0.5324851898636142 (2, 6) 0.223143128752028 (2, 3) 0.223143128752028 (2, 8) 0.223143128752028 (3, 1) 0.4694172843223779 (3, 2) 0.6172273175654565 (3, 6) 0.3645443967613799 (3, 3) 0.3645443967613799 (3, 8) 0.3645443967613799 X.shape: (4, 9) X_test: (0, 8) 0.7071067811865475 (0, 3) 0.7071067811865475 X_test.shape: (1, 9) type(X_test): <class 'scipy.sparse.csr.csr_matrix'> Ques 3. Single Choice Correct Which of these lexical resources are not available as part of NLTK? a. List of names b. Pronunciations c. French to English dictionary d. None of the above (Correct) Ques 4. Using WordNet, match the following for the word 'dusk'. Col A -- Col B a. Synonyms other than the word itself -- 1. Night b. Hypernym -- 2. Fall c. Hyponym -- 3. Hour d. Lemmas -- 4. Twilight and NA -- 5. Crepuscle Correct Answer: a - 4, b - 3, c - 1, d - 2,4,5 Ques 5: Single Choice Correct Arrange the below steps in the correct order based on the NLP pipeline. 1. Apply machine learning techniques to build models. 2. Cleaning the input text data. 3. Deploying the model and making prediction on new data. 4. Normalizing the data. 5. Validating the model built. 6. Feature engineering 7. Collecting textual data a. 7 2 4 6 1 5 3 (Correct) b. 7 4 2 6 1 3 5 c. 7 1 5 2 6 3 4 Ques 6. Single Choice Correct Which of these applications is unlikely to use a TF-IDF model during implementation: a. chatbot b. sentiment analyzer c. information extraction (Correct) d. topic modeling
Pages
- Index of Lessons in Technology
- Index of Book Summaries
- Index of Book Lists And Downloads
- Index For Job Interviews Preparation
- Index of "Algorithms: Design and Analysis"
- Python Course (Index)
- Data Analytics Course (Index)
- Index of Machine Learning
- Postings Index
- Index of BITS WILP Exam Papers and Content
- Lessons in Investing
- Index of Math Lessons
- Downloads
- Index of Management Lessons
- Book Requests
- Index of English Lessons
- Index of Medicines
- Index of Quizzes (Educational)
Wednesday, July 20, 2022
NLP Questions and Answers (Set 3 of 6 Questions)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment