Saturday, June 29, 2024

Trying some prompts, listing all models and trying embeddings model of Google's Generative AI package

View All Articles on Large Language Models: Lessons in Technology
Step 1: Create an API key for free by logging into Google AI Studio 

A:

B:
C:
D:
E:
F: Your free API key is created. Copy it and save it somewhere.

Trying a couple of things...

import google.generativeai as genai API_KEY = 'A...o' genai.configure(api_key=API_KEY) model = genai.GenerativeModel() response = model.generate_content('Teach me about how an LLM works') print(response.text) **Understanding Large Language Models (LLMs)** **Introduction:** LLMs are advanced machine learning models trained on vast amounts of text data. They can generate human-like text, translate languages, write different types of content, and perform various other text-related tasks. **Working Principle:** LLMs are trained on billions of words of text using a technique called transformer neural networks. These networks process sequential data and learn the relationships between words and phrases within the text. **Training Process:** 1. **Massive Data Collection:** LLMs are trained on enormous datasets of text from various sources, such as books, articles, news, and social media posts. 2. **Tokenization:** The text is broken down into individual words and phrases called "tokens." 3. **Encoding:** Tokens are converted into numerical representations using embedding techniques. 4. **Transformer Architecture:** The transformer neural network processes the encoded tokens, identifying patterns and learning relationships within the text. 5. **Parameter Optimization:** The model adjusts its internal parameters (weights and biases) to minimize a loss function, which measures how well it predicts the next words in the text. 6. **Fine-tuning:** After general training, the LLM can be fine-tuned for specific tasks, such as language translation or content generation. **Capabilities:** * **Text Generation:** LLMs can create original, fluent, and grammatically correct text in response to a given prompt. * **Language Translation:** They can translate text from one language to another with high accuracy. * **Question Answering:** LLMs can extract information from text and answer questions related to the content. * **Summarization:** They can condense large amounts of text into concise and informative summaries. * **Content Creation:** LLMs can generate various types of content, including articles, stories, poems, and website copy. **Applications:** * **Natural Language Processing (NLP):** Sentiment analysis, text classification, chatbots * **Content Generation:** Creative writing, marketing materials, news articles * **Education:** Language learning, essay writing assistance * **Research:** Literature analysis, data extraction * **Customer Service:** Automated response systems, chat support **Limitations:** * **Bias:** LLMs can exhibit biases present in their training data. * **Factual Errors:** They are not immune to factual errors in the sources they are trained on. * **Limited Contextual Understanding:** They may struggle to understand the full context of complex text. * **Ethics:** The use of LLMs raises ethical concerns about authorship, misinformation, and deepfakes.

Listing all models

import pprint for model in genai.list_models(): pprint.pprint(model) Model(name='models/chat-bison-001', base_model_id='', version='001', display_name='PaLM 2 Chat (Legacy)', description='A legacy text-only model optimized for chat conversations', input_token_limit=4096, output_token_limit=1024, supported_generation_methods=['generateMessage', 'countMessageTokens'], temperature=0.25, top_p=0.95, top_k=40) Model(name='models/text-bison-001', base_model_id='', version='001', display_name='PaLM 2 (Legacy)', description='A legacy model that understands text and generates text as an output', input_token_limit=8196, output_token_limit=1024, supported_generation_methods=['generateText', 'countTextTokens', 'createTunedTextModel'], temperature=0.7, top_p=0.95, top_k=40) Model(name='models/embedding-gecko-001', base_model_id='', version='001', display_name='Embedding Gecko', description='Obtain a distributed representation of a text.', input_token_limit=1024, output_token_limit=1, supported_generation_methods=['embedText', 'countTextTokens'], temperature=None, top_p=None, top_k=None) Model(name='models/gemini-1.0-pro', base_model_id='', version='001', display_name='Gemini 1.0 Pro', description='The best model for scaling across a wide range of tasks', input_token_limit=30720, output_token_limit=2048, supported_generation_methods=['generateContent', 'countTokens'], temperature=0.9, top_p=1.0, top_k=None) Model(name='models/gemini-1.0-pro-001', base_model_id='', version='001', display_name='Gemini 1.0 Pro 001 (Tuning)', description=('The best model for scaling across a wide range of tasks. This is a stable ' 'model that supports tuning.'), input_token_limit=30720, output_token_limit=2048, supported_generation_methods=['generateContent', 'countTokens', 'createTunedModel'], temperature=0.9, top_p=1.0, top_k=None) Model(name='models/gemini-1.0-pro-latest', base_model_id='', version='001', display_name='Gemini 1.0 Pro Latest', description=('The best model for scaling across a wide range of tasks. This is the latest ' 'model.'), input_token_limit=30720, output_token_limit=2048, supported_generation_methods=['generateContent', 'countTokens'], temperature=0.9, top_p=1.0, top_k=None) Model(name='models/gemini-1.0-pro-vision-latest', base_model_id='', version='001', display_name='Gemini 1.0 Pro Vision', description='The best image understanding model to handle a broad range of applications', input_token_limit=12288, output_token_limit=4096, supported_generation_methods=['generateContent', 'countTokens'], temperature=0.4, top_p=1.0, top_k=32) Model(name='models/gemini-1.5-flash', base_model_id='', version='001', display_name='Gemini 1.5 Flash', description='Fast and versatile multimodal model for scaling across diverse tasks', input_token_limit=1048576, output_token_limit=8192, supported_generation_methods=['generateContent', 'countTokens'], temperature=1.0, top_p=0.95, top_k=64) Model(name='models/gemini-1.5-flash-001', base_model_id='', version='001', display_name='Gemini 1.5 Flash 001', description='Fast and versatile multimodal model for scaling across diverse tasks', input_token_limit=1048576, output_token_limit=8192, supported_generation_methods=['generateContent', 'countTokens', 'createCachedContent'], temperature=1.0, top_p=0.95, top_k=64) Model(name='models/gemini-1.5-flash-latest', base_model_id='', version='001', display_name='Gemini 1.5 Flash Latest', description='Fast and versatile multimodal model for scaling across diverse tasks', input_token_limit=1048576, output_token_limit=8192, supported_generation_methods=['generateContent', 'countTokens'], temperature=1.0, top_p=0.95, top_k=64) Model(name='models/gemini-1.5-pro', base_model_id='', version='001', display_name='Gemini 1.5 Pro', description='Mid-size multimodal model that supports up to 1 million tokens', input_token_limit=2097152, output_token_limit=8192, supported_generation_methods=['generateContent', 'countTokens'], temperature=1.0, top_p=0.95, top_k=64) Model(name='models/gemini-1.5-pro-001', base_model_id='', version='001', display_name='Gemini 1.5 Pro 001', description='Mid-size multimodal model that supports up to 1 million tokens', input_token_limit=2097152, output_token_limit=8192, supported_generation_methods=['generateContent', 'countTokens', 'createCachedContent'], temperature=1.0, top_p=0.95, top_k=64) Model(name='models/gemini-1.5-pro-latest', base_model_id='', version='001', display_name='Gemini 1.5 Pro Latest', description='Mid-size multimodal model that supports up to 1 million tokens', input_token_limit=2097152, output_token_limit=8192, supported_generation_methods=['generateContent', 'countTokens'], temperature=1.0, top_p=0.95, top_k=64) Model(name='models/gemini-pro', base_model_id='', version='001', display_name='Gemini 1.0 Pro', description='The best model for scaling across a wide range of tasks', input_token_limit=30720, output_token_limit=2048, supported_generation_methods=['generateContent', 'countTokens'], temperature=0.9, top_p=1.0, top_k=None) Model(name='models/gemini-pro-vision', base_model_id='', version='001', display_name='Gemini 1.0 Pro Vision', description='The best image understanding model to handle a broad range of applications', input_token_limit=12288, output_token_limit=4096, supported_generation_methods=['generateContent', 'countTokens'], temperature=0.4, top_p=1.0, top_k=32) Model(name='models/embedding-001', base_model_id='', version='001', display_name='Embedding 001', description='Obtain a distributed representation of a text.', input_token_limit=2048, output_token_limit=1, supported_generation_methods=['embedContent'], temperature=None, top_p=None, top_k=None) Model(name='models/text-embedding-004', base_model_id='', version='004', display_name='Text Embedding 004', description='Obtain a distributed representation of a text.', input_token_limit=2048, output_token_limit=1, supported_generation_methods=['embedContent'], temperature=None, top_p=None, top_k=None) Model(name='models/aqa', base_model_id='', version='001', display_name='Model that performs Attributed Question Answering.', description=('Model trained to return answers to questions that are grounded in provided ' 'sources, along with estimating answerable probability.'), input_token_limit=7168, output_token_limit=1024, supported_generation_methods=['generateAnswer'], temperature=0.2, top_p=1.0, top_k=40)

Getting Embeddings for Input Text

response = genai.generate_embeddings(model="models/embedding-gecko-001", text='Hello World!') print(response) {'embedding': [-0.020664843, 0.0005969583, 0.041870195, ..., -0.032485683]}
Tags: Technology,Large Language Models,

Set up Conda Environment For Google's Generative AI package

View all Ananconda (Environment, Kernel and Package Management) Articles: Lessons in Technology
Step 1: Create your env.yml file


name: googleai_202406
channels:
- conda-forge
dependencies:
- python=3.12
- ipykernel
- jupyter
- pip
- pip:
    - google-generativeai

Step 2: Create conda environment using the above env.yml 

(base) $ conda env create -f env.yml 

Step 3: Activate the environment

(base) $ conda activate googleai_202406

Step 4: Test the installation of "google-generativeai" by displaying package details 

(googleai_202406) $ conda list google-generativeai
# packages in environment at /home/ashish/anaconda3/envs/googleai_202406:
#
# Name                    Version                   Build  Channel
google-generativeai       0.7.1                    pypi_0    pypi

(googleai_202406) $ pip show google-generativeai
Name: google-generativeai
Version: 0.7.1
Summary: Google Generative AI High level API client library and tools.
Home-page: https://github.com/google/generative-ai-python
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: /home/ashish/anaconda3/envs/googleai_202406/lib/python3.12/site-packages
Requires: google-ai-generativelanguage, google-api-core, google-api-python-client, google-auth, protobuf, pydantic, tqdm, typing-extensions
Required-by: 

(googleai_202406) $ 

Step 5: Set up a kernel corresponding to the above 'conda environment'

(googleai_202406) $ python -m ipykernel install --user --name googleai_202406

# Reference: pypi.org    
Tags: Anaconda,Technology,

Thursday, June 20, 2024

10 Interview Questions on Cypher Queries and Knowledge Graph Using Neo4j (For Data Scientist Role) - Jun 2024

To See All Interview Preparation Articles: Index For Interviews Preparation
Question 1: Write a CREATE query having the following nodes and the relationship from ROOT to other nodes is 'HAS_CHILD'.

ROOT 
|--BROKER
|--PROVIDER
|--MEMBER

Answer:

CREATE (root:ROOT),
       (broker:BROKER),
       (provider:PROVIDER),
       (member:MEMBER),
       (root)-[:HAS_CHILD]->(broker),
       (root)-[:HAS_CHILD]->(provider),
       (root)-[:HAS_CHILD]->(member)

~~~

Question 2: Write a DELETE query to delete all nodes and relationships in a graph. 

Answer:
MATCH (n) DETACH DELETE n

Ref

~~~

Question 3: Write a query to get a count for all nodes of a given label:

Answer:

MATCH (n:Person)
RETURN count(n) as count

Ref

~~~

Question 4: There are three EPIC nodes in my graph. 
Each node has a numerical property CUSTOM_ID.
Now, I want to retrieve the node with the largest CUSTOM_ID.

Answer:

MATCH (n:EPIC)
RETURN n
ORDER BY n.CUSTOM_ID DESC
LIMIT 1

~~~ 

Question 5: Write query to get a node by property value in Neo4j.

Answer:


MATCH (n) 
WHERE n.name = 'Mark' 
RETURN n

Ref

~~~

Question 6: Delete a node with a given property.

Answer:
MATCH (n:Person {name: 'Tom Hanks'})
DELETE n

Ref

~~~

Question 7:  Delete ONLY nodes having label of ENTITY:

Answer:

MATCH (n:ENTITY)
DELETE n

~~~

Question 8: Return number of EPIC nodes in the knowledge graph.

Answer:

MATCH (epic:EPIC)
RETURN count(epic) as count

~~~

Question 9: Write a query to get the EPIC node with largest numerical property of CUSTOM_ID. 

Answer:

MATCH (epic:EPIC)
RETURN epic
ORDER BY epic.CUSTOM_ID DESC
LIMIT 1

~~~

Question 10:
What are some of the use cases where Between Centrality Algorithm is used?

Answer:
The Betweenness Centrality Algorithm is a powerful tool used to understand the roles of nodes in a graph and their impact on the network. Here are some use cases where it finds application:

Supply Chain Risk Analysis: In supply chain processes, Betweenness Centrality helps identify critical nodes that act as bridges between different parts of the network. For example, when transporting a product internationally, it can pinpoint bottleneck nodes during cargo ship stops in intermediate ports1.

Power Grid Contingency Analysis: The algorithm is used to analyze power grid networks, identifying critical nodes that affect the flow of electricity. Due to its computational intensity, this application often requires supercomputers2.

Community Detection and Network Routing: Betweenness Centrality assists in Girvan–Newman community detection and network routing tasks. It helps find influential nodes that connect different communities or guide information flow2.

Artificial Intelligence and Skill Characterization: Skill characterization in AI relies on identifying influential nodes. Betweenness Centrality helps determine which nodes play a crucial role in spreading information or resources2.

Epidemiology and Rumor Spreading: In epidemiology, it identifies nodes that influence the spread of diseases. Similarly, it helps analyze rumor propagation in social networks1.

Transportation Networks: The algorithm is applied to transportation networks, such as road or rail systems, to find critical nodes affecting traffic flow or resource distribution1.

Remember, Betweenness Centrality is about detecting nodes that serve as bridges, allowing information or resources to flow efficiently across a graph. 

1: graphable.ai
2: computationalsocialnetworks.springeropen.com
3: nature.com

---
Tags: Database,Technology

Wednesday, June 12, 2024

Index of Book Lists And Downloads

Downloads

Tags: List of Books,

Graph Machine Learning Books (Jun 2024)

To See All Tech Related Book Lists: Index of Book Lists And Downloads
Download Books
1.
Graph Machine Learning: Take Graph Data to the Next Level by Applying Machine Learning Techniques and Algorithms
Enrico Deusebio, 2021

2.
Graph-Powered Machine Learning
Alessandro Negro, 2021

3.
Graph Representation Learning
William L. Hamilton, 2020

4.
Deep Learning on Graphs
Jiliang Tang, 2021

5.
Graph-Powered Analytics and Machine Learning with TigerGraph
Alexander Thomas, 2023

6.
Graph Neural Networks: Foundations, Frontiers, and Applications
2022

7.
Graph Algorithms: Practical Examples in Apache Spark and Neo4j
Amy E. Hodler, 2019

8.
Building Knowledge Graphs
Jim Webber, 2023

9.
Graph Algorithms for Data Science: With Examples in Neo4j
Tomaž Bratanic, 2024

10.
Graph Neural Networks in Action
Keita Broadwater, 2024

11.
Hands-On Graph Neural Networks Using Python: Practical Techniques and Architectures for Building Powerful Graph and Deep Learning Apps with PyTorch
Maxime Labonne, 2023

12.
The Practitioner's Guide to Graph Data: Applying Graph Thinking and Graph Technologies to Solve Complex Problems
Denise Koessler Gosnell, 2020

13.
Algorithms in C, Part 5: Graph Algorithms
Robert Sedgewick, 2001

14.
Mining of Massive Datasets
Jeffrey Ullman, 2011

15.
Machine Learning for Text
Charu C. Aggarwal, 2018

16.
Knowledge Graphs: Fundamentals, Techniques, and Applications
Craig A. Knoblock, 2021

17.
Networks, Crowds, and Markets: Reasoning about a Highly Connected World
Jon Kleinberg, 2010

18.
Graph-based Natural Language Processing and Information Retrieval
Dragomir R. Radev, 2011

19.
Designing and Building Enterprise Knowledge Graphs
(Synthesis Lectures on Data, Semantics, and Knowledge) 
Juan Sequeda, Ora Lassila
Morgan & Claypool (2021)
Tags: Machine Learning,List of Books,

Saturday, June 1, 2024

Interview Questions For Big Data Engineer (2 Years of Experience)

To See All Interview Preparation Articles: Index For Interviews Preparation
1. How comfortable are you in Python?
2. How comfortable are you in PySpark?
3. How comfortable are you in Scala?
4. And shell scripting?

---

1. What is the difference between list and tuple?

2. What are the 3 ways to work on a dataset in PySpark? (RDD, Spark SQL, and Pandas Dataframe)

3. What is lazy evaluation?

4. What is the opposite of lazy evaluation? (Eager evaluation)

5. What is the regular expression?

6. What does grep command do?

7. What does find command do?

8. What is the difference between find and grep?

9. What does sed command do?

10. What does awk command do?

11. What is narrow transformation? (Like map())

12. What is wide transformation? (Like groupby and reduceby)

13. What is the difference between narrow transformation and wide transformation?

14. How much would you give yourself in Hive?

15. Write SQL query to get current date from Hive SQL interface? (getdate(), now())

16. Take out the year from the date. (year(date_col))

17. How would you get a;b;c into:
a
b
c
Into three rows.

18. What is Spark session? (Entry point to create Spark context)

19. What is spark context?

20. Scope of which one is bigger?

21. Is there any other context object we need to know about?

22. There is a CSV file. You have to load this CSV data into an RDD, SQL dataframe, and Pandas dataframe.
Tags: Big Data,Interview Preparation,