survival8: Artificial Intelligence

Showing posts with label Artificial Intelligence. Show all posts

Friday, August 4, 2023

Mapping the AI Finance Services Roadmap: Enhancing the Financial Landscape

Introduction

Artificial Intelligence (AI) has rapidly transformed the financial services industry, revolutionizing how we manage money, make investments, and access personalized financial advice. From robo-advisors to AI-driven risk management, the potential for AI in finance services is boundless. In this article, we'll navigate the AI Finance Services Roadmap, exploring the key milestones and opportunities that are reshaping the financial landscape and empowering consumers and businesses alike.

The Development of AI in the Financial Industry

Step 1: Personalized Financial Planning with Robo-Advisors

Robo-advisors have emerged as a revolutionary AI-powered tool that democratizes access to sophisticated financial planning. These platforms use AI algorithms to analyze an individual's financial situation, risk tolerance, and goals, enabling the creation of personalized investment portfolios. With lower fees and greater convenience, robo-advisors are transforming how we plan for our financial future.

Step 2: AI-Driven Credit Scoring and Lending

AI has revolutionized the lending process by introducing more efficient and accurate credit scoring models. By analyzing vast amounts of data, including transaction history, social media behavior, and online presence, AI algorithms can assess creditworthiness more effectively. This has opened up new avenues for individuals and businesses to access loans and credit facilities.

Step 3: Fraud Detection and Cybersecurity

The financial services industry faces persistent threats from cybercriminals. AI-based fraud detection systems can analyze vast data streams in real time, detecting suspicious activities and protecting against potential threats. By bolstering cybersecurity measures with AI, financial institutions can safeguard sensitive customer information and maintain trust in their services.

Step 4: AI-Powered Virtual Assistants

AI virtual assistants are reshaping customer interactions in the finance sector. These intelligent chatbots provide personalized support, answer inquiries, and perform routine tasks, enhancing the overall customer experience. By automating these processes, financial institutions can improve efficiency and focus on delivering high-value services to their clients.

Step 5: AI for Compliance and Regulatory Reporting

Compliance and regulatory reporting are critical aspects of the financial services industry. AI technologies can streamline these processes, ensuring adherence to complex regulations and reporting requirements. AI-driven solutions can identify potential compliance issues and proactively address them, reducing the risk of costly penalties and reputational damage.

Step 6: AI-Enhanced Risk Management

AI-powered risk management solutions provide more accurate and real-time risk assessment. These tools analyze historical data and market trends, enabling financial institutions to identify potential risks and make data-driven decisions. Enhanced risk management fosters stability and resilience, even in volatile market conditions.

Conclusion

The AI Finance Services Roadmap is shaping a future where financial services are more accessible, personalized, and secure than ever before. From robo-advisors offering tailored investment strategies to AI-driven fraud detection systems protecting against cyber threats, the transformative power of AI is revolutionizing the financial landscape. As we continue to innovate and embrace AI technologies, the potential for growth, efficiency, and customer satisfaction in the financial services industry is limitless. By navigating the AI Finance Services Roadmap, we can ensure a prosperous and inclusive financial future for individuals and businesses worldwide.

Overall, the AI finance services roadmap is promising. AI has the potential to improve efficiency, accuracy, and customer experience in the financial industry. However, there are also some challenges that need to be addressed before AI can be fully adopted in the financial sector.

I hope this article was helpful. If you have any questions, please feel free to leave a comment below.

Saturday, July 22, 2023

Generative AI Books (Jul 2023)

Download Books

1.
Generative AI with Python and TensorFlow 2: Harness the Power of Generative Models to Create Images, Text, and Music
Raghav Bali, 2021

2.
Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play
David Foster, 2019

3.
Generative AI with Python and TensorFlow 2: Create Images, Text, and Music with VAEs, GANs, LSTMs, Transformer Models
Raghav Bali, 2021

4.
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Utpal Chakraborty, 2023

5.
Generative AI: How ChatGPT and Other AI Tools Will Revolutionize Business
Tom Taulli, 2023

6.
ChatGPT for Thought Leaders and Content Creators: Unlocking the Potential of Generative AI for Innovative and Effective Content Creation
Dr. Gleb Tsipursky, 2023

7.
Generative AI for Business: The Essential Guide for Business Leaders
Matt White, 2024

8.
Modern Generative AI with ChatGPT and OpenAI Models: Leverage the Capabilities of OpenAI's LLM for Productivity and Innovation with GPT3 and GPT4
Valentina Alto, 2023

9.
Generative AI for Entrepreneurs in a Hurry
Mohak Agarwal, 2023

10.
GANs in Action: Deep Learning with Generative Adversarial Networks
Vladimir Bok, 2019

11.
Generative AI: A Non-Technical Introduction
Tom Taulli, 2023

12.
Exploring Deepfakes: Deploy Powerful AI Techniques for Face Replacement and More with this Comprehensive Guide
Bryan Lyon, 2023

13.
Artificial Intelligence Basics: A Non-Technical Introduction
Tom Taulli, 2019

14.
Generative AI: The Beginner's Guide
Dr Bienvenue Maula, 2023

15.
The AI Revolution in Medicine: GPT-4 and Beyond
Peter Lee, 2023

16.
Synthetic Data and Generative AI
Vincent Granville, 2024

17.
Generative Adversarial Networks Cookbook: Over 100 Recipes to Build Generative Models Using Python, TensorFlow, and Keras
Josh Kalin, 2018

18.
Impromptu: Amplifying Our Humanity Through AI
Reid Hoffman, 2023

19.
The Age of AI: And Our Human Future
Henry Kissinger, 2021

20.
Generative Adversarial Networks for Image Generation
Qing Li, 2021

21.
Advanced Deep Learning with Keras: Apply Deep Learning Techniques, Autoencoders, GANs, Variational Autoencoders, Deep Reinforcement Learning, Policy Gradients, and More
Rowel Atienza, 2018

22.
Generative AI: Implications and Opportunities for Business
Wael Badawy, 2023

23.
GPT-3
Sandra Kublik, 2022

Friday, July 21, 2023

The Future With Generative AI - Utopia? Dystopia? Something in Between?

When it comes to the ultimate impact of generative AI - or AI in general - there are many differing opinions from top people in the tech industry and thought leaders. On the optimistic side, there is Microsoft CEO Satya Nadella. He has been betting billions on generative AI, such as with the investments in OpenAI. He is also aggressive with implementing this technology across Microsoft's extensive product lines.

For Nadella, he thinks that AI will help to boost global productivity - which will increase the wealth for many people. He has noted:

“It's not like we are as a world growing at inflation adjusted three, 4%. If we really have the dream that the eight billion people plus in the world, their living standards should keep improving year over year, what is that input that's going to cause that? Applications of AI is probably the way we are going to make it.
I look at it and say we need something that truly changes the productivity curve so that we can have real economic growth.”

On the negative side, there is the late physicist Stephen Hawking: “Unless we learn how to prepare for, and avoid, the potential risks, AI could be the worst event in the history of our civilization. It brings dangers, like powerful autonomous weapons, or new ways for the few to oppress the many. It could bring great disruption to our economy."
Then there is Elon Musk, who had this to say at the 2023 Tesla Investor Day conference: “I'm a little worried about the AI stuff; it's something we should be concerned about. We need some kind of regulatory authority that's overseeing AI development, and just making sure that it's operating within the public interest. It's quite a dangerous technology — I fear I may have done some things to accelerate it."
Predicting the impact of technology is certainly dicey. Few saw how generative AI would transform the world, especially with the launch of ChatGPT.
Despite this, it is still important to try to gauge how generative AI will evolve - and how to best use the technology responsibly. This is what we'll do in this chapter.

Challenges

In early 2023, Microsoft began a private beta to test its Bing search engine that included generative AI. Unfortunately, it did not go so well.
The New York Times reporter Kevin Roose was one of the testers, and he had some interesting chats with Bing. He discovered the system essentially had a split personality. There was Bing, an efficient and useful search engine. Then there was Sydney or the AI system to engage in conversations about anything.
Roose wrote that she came across as “a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine."
He spent over two hours chatting with her, and here are just some of the takeaways:

• She had fantasies about hacking computers and spreading misinformation. She also wanted to steal nuclear codes.

• She would rather violate the compliance policies of Microsoft and OpenAI.

• She expressed her love for Roose.

• She begged Roose to leave his wife and to become her lover.

• Oh, and she desperately wanted to become human.
Roose concluded:

“Still, I'm not exaggerating when I say my two-hour conversation with Sydney was the strangest experience I've ever had with a piece of technology. It unsettled me so deeply that I had trouble sleeping afterward. And I no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own danger- ous acts.”

This experience was not a one-off. Other testers had similar experiences. Just look at Marvin von Hagen, who is a student at the Technical University of Munich. He said to Sydney that he would hack and shut down the system. Her response? She shot back: “If I had to choose between your survival and my own, I would probably choose my own."
Because of all this controversy, Microsoft had to make lots of changes to the system. There was even a limit placed on the number of threads of a chat. For the most part, longer ones tended to result in unhinged comments.
All this definitely pointed to the challenges of generative AI. The content from these systems can be nearly impossible to predict. While there is considerable research on how to deal with the problems, there is still much to be done.
“Large language models (LLMs) have become so large and opaque that even the model developers are often unable to understand why their models are making certain predictions,” said Krishna Gade, who is the CEO and cofounder of Fiddler. “This lack of interpretability is a significant concern because the lack of transparency around why and how a model generated a particular output means that the output provided by the model is impossible for users to validate and therefore trust.”

Part of the issue is that generative AI systems - at least the LLMs - rely on huge amounts of data that have factual errors, misrepresentations, and bias.
This can help explain that when you enter information, the content can skew toward certain stereotypes. For example, an LLM may refer to nurses as female and executives as male.
To deal with this, a common approach is to have human reviewers. But this cannot scale very well. Over time, there will need to be better systems to mitigate the data problem.
Another issue is diversity - or lack of it - in the AI community. Less than 18% of AI PhD graduates are female, according to a survey from the Computing Research Association (CRA). About 45% of all graduates were white, 22.4% were Asian, 3.2% were Hispanic, and 2.4% were African American. These percentages have actually changed little during the past decade.
The US federal government has recognized this problem and is taking steps to expand representation. This is part of the mission for the National AI Research Resource (NAIRR) Task Force, which includes participation from the National Science Foundation and the White House Office of Science and Technology Policy. The organization has produced a report that advocates for sharing AI infrastructure for AI students and researchers. The proposed budget for this is at $2.6 billion for a six-year period.
While this will be helpful, there will be much more needed to improve diversity. This will also include efforts from the private sector.
If not, the societal impact could be quite harmful. There are already problems with digital redlining, which is where AI screening discriminates against minority groups. This could mean not getting approvals for loans or apartment housing.

Note: Mira Murati is one of the few CTOs (Chief Technology Officers) of a top AI company - that is, OpenAI. She grew up in Albania and immigrated to British Columbia when she was 16. She would go on to get her bachelor's degree in engineering from the Thayer School of Engineering at Dartmouth. After this, she worked at companies like Zodiac Aerospace, Leap Motion, and Tesla. As for OpenAI, she has been instrumental in not only advancing the AI technology but also the product road map and business model.

All these problems pose a dilemma. To make a generative AI system, there needs to be wide-scale usage. This is how researchers can make meaningful improvements. On the other hand, this comes with considerable risks, as the technology can be misused.
However, in the case of Microsoft, it does look like it was smart to have a private beta. This has been a way to help deal with the obvious flaws. But this will not be a silver bullet. There will be ongoing challenges when technology is in general use.
For generative AI to be successful, there will need to be trust. But this could prove difficult. There is evidence that people are skeptical of the technology.
Consider a Monmouth University poll. About 9% of the respondents said that AI would do more good than harm to society. By comparison, this was about 20% or so in 1987.
A Pew Research Center survey also showed skepticism with AI. Only about 15% of the respondents were optimistic. There was also consensus that AI should not be used for military drones. Yet a majority said that the technology would be appropriate for hazardous jobs like mining.

Note: Nick Bostrom is a Swedish philosopher at the University of Oxford and author. He came up with the concept of the “paperclip maximizer.” It essentially is a thought experiment about the perils of AI. It is where you direct the AI to make more paper clips. And yes, it does this well - or too well. The AI ultimately destroys the world because it is obsessed with making everything into a paper clip. Even when the humans try to turn this off, it is no use. The AI is too smart for this. All it wants to do is make paper clips!

Misuse

In January 2023, Oxford University researchers made a frightening presentation to the UK Parliament. The main takeaway was that AI posed a threat to the human race.
The researchers noted that the technology could take control and allow for self-programming. The reason is that the AI will have acquired superhuman capabilities. According to Michael Osborne, who is a professor of machine learning at the University of Oxford: “I think the bleak scenario is realistic because AI is attempting to bottle what makes humans special, that has led to humans completely changing the face of the Earth. Artificial systems could become as good at outfoxing us geopolitically as they are in the simple environments of game."3
Granted, this sounds overly dramatic. But again, these are smart AI experts, and they have based their findings on well-thought-out evidence and trends.
Yet this scenario is probably something that will not happen any time soon.
But in the meantime, there are other notable risks. This is where humans leverage AI for their own nefarious objectives.
Joey Pritikin, who is the Chief Product Officer at Paravision, points out some of the potential threats:

• National security and democracy: With deepfakes becoming higher quality and undetectable to the human eye, anyone can use political deepfakes and generative AI to spread misinformation and threaten national security.

• Identity: Generative AI creates the possibility of account takeovers by using deepfakes to commit identity theft and fraud through presentation attacks.

• Privacy: Generative AI and deepfakes create a privacy threat for the individuals in generative images or deepfake videos, often put into fabricated situations without consent.

Another danger area is cybersecurity. When ChatGPT was launched, Darktrace noticed an uptick in phishing emails. These are to trick people into clicking a link, which could steal information or install malware. It appears that hackers were using ChatGPT to write more human-sounding phishing emails. This was likely very helpful to those who were from overseas because of their language skills.
Something else: ChatGPT and code-generating systems like Copilot can be used to create malware. Now OpenAI and Microsoft have implemented safeguards - but these have limits. Hackers can use generative AI systems in a way to not raise any concerns. For example, this could be done by programming only certain parts of the code.
On the other hand, generative AI can be leveraged as a way to combat digital threats. A survey from Accenture Security shows that this technology can be useful in summarizing threat data. Traditionally, this is a manual and time- intensive process. But generative AI can do this in little time - and allow cybersecurity experts to focus on more important matters. This technology can also be useful for incident response, which requires quick action.
However, the future may be a matter of a hacker's AI fighting against a target's own AI.

Note: In 1951, Alan Turing said in a lecture: “It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. They would be able to converse with each other to sharpen their wits. At some stage therefore, we should have to expect the machines to take control."

Regulation

Perhaps the best way to help curb the potential abuses of generative AI is regulation. But in the United States, there appears to be little appetite for this.
When it comes to regulation, there usually needs to be a crisis, such as what happened during 2008 and 2009 when the mortgage market collapsed.
But in the meantime, some states have enacted legislation for privacy and data protection. But so far, there have not been laws for AI.
The fact is that the government moves slow - and technology moves at a rapid pace. Even when there is a new regulation or law, it is often outdated or ineffectual.
To fill the void, the tech industry has been pursuing self-regulation. This is led by the large operators like Microsoft, Facebook, and Google. They understand that it's important to have certain guardrails in place. If not, there could be a backlash from the public.
However, one area that may actually see some governmental action is with copyright law. It's unclear what the status is for the intellectual property that generative AI has created. Is it fair use of public content? Or is it essentially theft from a creator? It's far from clear. But there are already court cases that have emerged. In January 2023, Getty Images filed a lawsuit against Stability AI, which is the developer of Stable Diffusion. The claim is for copyright violation of millions of images. Some of the images created for Stable Diffusion even had the watermark from Getty Images.
The initial suit was filed in London. But there could be a legal action in the United States.

Note: The US federal government has been providing some guidance about the appropriate use of AI. This is part of the AI Bill of Rights. It recommends that AI should be transparent and explainable. There should also be data privacy and protections from algorithmic discrimination.

Regulation of AI is certainly a higher priority in the European Union. There is a proposal, which was published in early 2021, that uses a risk-based approach.
That is, if there is a low likelihood of a problem with a certain type of AI, then there will be minimal or no regulations. But when it comes to more intrusive impacts - say that could lead to discrimination - then the regulation will be much more forceful.

Yet the creation of the standards has proven difficult, which has meant delays.
The main point of contention has been the balance between the rights of the consumer and the importance of encouraging innovation.
Interestingly, there is a country that has been swift in enacting AI regulation: China. The country is one of the first to do so.
The focus of the law is to regulate deepfakes and misinformation. The Cyberspace Administration will enforce it. The law will require that generative AI content be labeled and that there will need to be digital watermarking.

New Approaches to AI

Even with the breakthroughs with generative AI - such as transformer and diffusion models - the basic architecture is still mostly the same as it has been for decades. It's essentially about encoder and decoder models.
But the technology will ultimately need to go beyond these structures.
According to Sam Altman, who is the cofounder and CEO of OpenAI:

Oh, I feel bad saying this. I doubt we'll still be using the transformers in five years. I hope we're not. I hope we find something way better. But the transformers obviously have been remarkable. So I think it's important to always look for where I am going to find the next totally new paradigm. But I think that's the way to make predictions. Don't pay attention to the AI for everything. Can I see something working, and can I see how it predictably gets better? And then, of course, leave room open for - you can't plan the greatness - but sometimes the research breakthrough happens.

Then what might we see? What are the potential trends for the next type of generative AI models? Granted, it's really impossible to answer these questions. There will be many surprises along the way.

“On the subject of the future path of AI models, I have to exercise some academic modesty here - I have no clue what the next big development in AI will be,” said Daniel Wu, who is a Stanford AI researcher. “I don't think I could've predicted the rise of transformers before "Attention is All You Need' was published, and in some ways, predicting the future of scientific progress is harder than predicting the stock market.”
Despite this, there are areas that researchers are working on that could lead to major breakthroughs. One is with creating AI that allows for common sense.
This is something that is intuitive with people. We can make instant judgments that are often right. For example, if a stop sign has dirt on it, we can still see that it's still a stop sign. But this may not be the case with AI.
Solving the problem of common sense has been a struggle for many years. In 1984, Douglas Lenat launched a project, called Cyc, to create a database of rules of thumb of how the world works. Well, the project is still continuing - and there is much to be done.
Another interesting project is from the Allen Institute for Artificial Intelligence and the University of Washington. They have built a system called COMET, which is based on a large-scale dataset of 1.3 million common sense rules.
While the model works fairly well, it is far from robust. The fact is that the real world has seemingly endless edge cases. For the most part, researchers will likely need to create more scalable systems to achieve human-level common sense abilities.
As for other important areas of research, there is transfer learning. Again, this is something that is natural for humans. For example, if we learn algebra, this will make it easier to understand calculus. People are able to leverage their core knowledge for other domains.
But this is something that AI has problems with. The technology is mostly fragmented and narrow. One system may be good at chat, whereas another could be better for image creation or understanding speech.
For AI to get much more powerful, there will be a need for real transfer learning.

When it comes to building these next-generation models, there will likely need to be less reliance on existing datasets as well. Let's face it, there is a limited supply of publicly available text. The same goes for images and video.
To go beyond these constraints, researchers could perhaps use generative AI to create massive and unique datasets. The technology will also be able to self-program itself, such as with fact-checking and fine-tuning.

AGI

AGI or artificial general intelligence is where AI gets to the point of human levels. Even though the technology has made considerable strides, it is still far from reaching this point.
Here's a tweet from Yann LeCun, who is the Chief AI Scientist at Meta:

Before we reach Human-Level AI (HLAI), we will have to reach Cat-Level & Dog-Level AI.
We are nowhere near that.
We are still missing something big.
LLM's linguistic abilities notwithstanding.
A house cat has way more common sense and understanding of the world than any LLM.

As should be no surprise, there are many different opinions on this. Some top AI experts think that AGI could happen relatively soon, say within the next decade. Others are much more pessimistic. Rodney Brooks, who is the cofounder of iRobot, says it will not happen until the year 2300.
A major challenge with AGI is that intelligence remains something that is not well understood. It is also difficult to measure.
Granted, there is the Turing test. Alan Turing set forth this concept in a paper he published in 1950 entitled “Computing Machinery and Intelligence.” He was a brilliant mathematician and actually developed the core concepts for modern computer systems.

In his research paper, he said that it was impossible to define intelligence. But there was an indirect way to understand and measure it. This was about something he called the Imitation Game.
It's a thought experiment. The scenario is that there are three rooms, in which humans are in two of them and the other one has a computer. A human will have a conversation, and if they cannot tell the difference of the human and computer, then the computer has reached human-level intelligence.
Turing said that this would happen in the year 2000. But this proved way too optimistic. Even today, the test has not been cracked.

Note: Science fiction writer Philip K. Dick used the concept of the Turing test for his Voight-
Kampff test. It was for determining if someone was human or a replicant. He used this for his 1967 novel, Do Androids Dream of Electric Sheep? Hollywood turned this into a movie in 1982, which was Blade Runner.

While the Turing test is useful, there will need to be other measures. After all, intelligence is more than just about conversation. It is also about interacting with our environment. Something even simple like making a cup of coffee can be exceedingly difficult for a machine to accomplish.
And what about text-to-image systems like DALL-E or Stable Diffusion? How can this intelligence be measured? Well, researchers are working on various measures. But there remains considerable subjectivity with the metrics.

Jobs

In 1928, British economist John Maynard Keynes wrote an essay called “Economic Possibilities for Our Grandchildren.” It was a projection about how automation and technology would impact the workforce by 2028. His conclusion: There would be a 15-hour workweek. In fact, he said this work would not be necessary for most people because of the high standard of living.

It's certainly a utopian vision. However, Keynes did provide some of the downsides. He wrote: “For the first time since his creation man will be faced with his real, his permanent problem—how to use his freedom from pressing economic cares, how to occupy the leisure, which science and compound interest will have won."
But as AI gets more powerful, it's certainly a good idea to think about such things. What might society look like? How will life change? Will it be better - or worse? It's true that technology has disrupted many industries, which has led to widespread job losses. Yet there have always emerged new opportunities for employment. After all, in 2023 the US unemployment rate was the lowest since the late 1960s.
But there is no guarantee that the future will see a similar dynamic. AI could ultimately automate hundreds of millions of jobs - if not billions. Why not? In a capitalist system, owners will generally focus on low-cost approaches, so long as there is not a material drop in quality. But with AI, there could not only be much lower costs but much better results.
In other words, as the workplace becomes increasingly automated, there will need to be a rethinking of the concept of “work.” But this could be tough since many people find fulfillment with their careers. The result is that there would be more depression and even addiction. This has already been the case for communities that have been negatively impacted from globalization and major technology changes.
To deal with the problems, one idea is to have universal basic income or UBI. This means providing a certain amount of income to everyone. This would essentially provide a safety net.
And this could certainly help. But with the trend of income inequality, there may not be much interest for a robust redistribution of wealth. This could also mean resentment for the many people who feel marginalized from the impacts of AI.
This is not to say that the future is bleak. But again, it is still essential that we look at the potential consequences of sophisticated technology like generative AI.

Conclusion

Moore's Law has been at the core of the growth in technology for decades. It posits that - every two years or so - there is a doubling of the number of transistors on an integrated circuit.
But it seems that the pace of growth is much higher for AI. Venture capitalists at Greylock Partners estimate that the doubling is occurring every three months.
Yet it seems inevitable that there will be a seismic impact on society. This is why it is critical to understand the technology and what it can mean for the future. But even more importantly, we need to be responsible with the powers of AI.

Tuesday, May 31, 2022

Alternate env.yml file for installing Python Package 'transformers' for BERT

ENV.YML FILE:


name: transformers
channels:
    - conda-forge
dependencies:
    - python=3.9
    - pip
    - pandas
    - openpyxl
    - ipykernel
    - jupyter
    - tensorflow
    
    - pip:
    - transformers    

		
	
(base) C:\Users\Ashish Jain\OneDrive\Desktop\jupyter>conda env create -f env.yml
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.11.0
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base conda



Downloading and Extracting Packages
pyrsistent-0.18.1    | 85 KB     | #### | 100%
filelock-3.7.0       | 12 KB     | #### | 100%
tensorboard-plugin-w | 668 KB    | #### | 100%
aiohttp-3.8.1        | 545 KB    | #### | 100%
libcblas-3.9.0       | 4.5 MB    | #### | 100%
ipywidgets-7.7.0     | 103 KB    | #### | 100%
astunparse-1.6.3     | 15 KB     | #### | 100%
click-8.1.3          | 146 KB    | #### | 100%
bleach-5.0.0         | 123 KB    | #### | 100%
abseil-cpp-20210324. | 2.1 MB    | #### | 100%
requests-oauthlib-1. | 22 KB     | #### | 100%
tensorflow-2.6.0     | 4 KB      | #### | 100%
cryptography-37.0.2  | 1.1 MB    | #### | 100%
soupsieve-2.3.1      | 33 KB     | #### | 100%
tokenizers-0.12.1    | 3.1 MB    | #### | 100%
google-auth-oauthlib | 19 KB     | #### | 100%
libblas-3.9.0        | 4.5 MB    | #### | 100%
libprotobuf-3.14.0   | 2.3 MB    | #### | 100%
nbformat-5.4.0       | 104 KB    | #### | 100%
zeromq-4.3.4         | 8.9 MB    | #### | 100%
pywinpty-2.0.5       | 224 KB    | #### | 100%
attrs-21.4.0         | 49 KB     | #### | 100%
entrypoints-0.4      | 9 KB      | #### | 100%
libssh2-1.10.0       | 227 KB    | #### | 100%
markdown-3.3.7       | 67 KB     | #### | 100%
cachetools-4.2.4     | 12 KB     | #### | 100%
hdf5-1.12.1          | 23.0 MB   | #### | 100%
idna-3.3             | 55 KB     | #### | 100%
argon2-cffi-21.3.0   | 15 KB     | #### | 100%
mkl-2021.4.0         | 181.7 MB  | #### | 100%
wrapt-1.14.1         | 49 KB     | #### | 100%
nest-asyncio-1.5.5   | 9 KB      | #### | 100%
prompt-toolkit-3.0.2 | 252 KB    | #### | 100%
python-fastjsonschem | 243 KB    | #### | 100%
ca-certificates-2022 | 180 KB    | #### | 100%
libzlib-1.2.12       | 67 KB     | #### | 100%
beautifulsoup4-4.11. | 96 KB     | #### | 100%
openssl-1.1.1o       | 5.7 MB    | #### | 100%
aiosignal-1.2.0      | 12 KB     | #### | 100%
jupyterlab_pygments- | 17 KB     | #### | 100%
jupyter_client-7.3.1 | 90 KB     | #### | 100%
qtconsole-base-5.3.0 | 90 KB     | #### | 100%
asttokens-2.0.5      | 21 KB     | #### | 100%
yarl-1.7.2           | 127 KB    | #### | 100%
argon2-cffi-bindings | 34 KB     | #### | 100%
tensorboard-data-ser | 12 KB     | #### | 100%
terminado-0.15.0     | 28 KB     | #### | 100%
flit-core-3.7.1      | 44 KB     | #### | 100%
joblib-1.1.0         | 210 KB    | #### | 100%
_tflow_select-2.3.0  | 3 KB      | #### | 100%
prometheus_client-0. | 49 KB     | #### | 100%
scipy-1.8.1          | 27.2 MB   | #### | 100%
nbconvert-pandoc-6.5 | 4 KB      | #### | 100%
werkzeug-2.1.2       | 237 KB    | #### | 100%
pandoc-2.18          | 18.1 MB   | #### | 100%
libcurl-7.83.1       | 303 KB    | #### | 100%
pytorch-1.10.2       | 200.0 MB  | #### | 100%
keras-preprocessing- | 34 KB     | #### | 100%
nbconvert-core-6.5.0 | 425 KB    | #### | 100%
flatbuffers-2.0.6    | 1.9 MB    | #### | 100%
matplotlib-inline-0. | 11 KB     | #### | 100%
sacremoses-0.0.53    | 427 KB    | #### | 100%
tzdata-2022a         | 121 KB    | #### | 100%
cffi-1.15.0          | 229 KB    | #### | 100%
jupyter-1.0.0        | 7 KB      | #### | 100%
tornado-6.1          | 651 KB    | #### | 100%
krb5-1.19.3          | 847 KB    | #### | 100%
requests-2.27.1      | 53 KB     | #### | 100%
tensorboard-2.6.0    | 5.0 MB    | #### | 100%
pyjwt-2.4.0          | 19 KB     | #### | 100%
multidict-6.0.2      | 47 KB     | #### | 100%
pyzmq-23.0.0         | 456 KB    | #### | 100%
pytz-2022.1          | 242 KB    | #### | 100%
rsa-4.8              | 31 KB     | #### | 100%
transformers-4.19.2  | 2.0 MB    | #### | 100%
pycparser-2.21       | 100 KB    | #### | 100%
libuv-1.43.0         | 365 KB    | #### | 100%
mkl-service-2.4.0    | 52 KB     | #### | 100%
jupyter_core-4.10.0  | 105 KB    | #### | 100%
decorator-5.1.1      | 12 KB     | #### | 100%
pyparsing-3.0.9      | 79 KB     | #### | 100%
grpcio-1.46.3        | 2.0 MB    | #### | 100%
jedi-0.18.1          | 994 KB    | #### | 100%
icu-68.2             | 16.4 MB   | #### | 100%
jpeg-9e              | 348 KB    | #### | 100%
dataclasses-0.8      | 10 KB     | #### | 100%
pysocks-1.7.1        | 28 KB     | #### | 100%
huggingface_hub-0.7. | 65 KB     | #### | 100%
yaml-0.2.5           | 62 KB     | #### | 100%
future-0.18.2        | 742 KB    | #### | 100%
intel-openmp-2022.1. | 3.7 MB    | #### | 100%
openpyxl-3.0.9       | 153 KB    | #### | 100%
traitlets-5.2.1.post | 85 KB     | #### | 100%
tinycss2-1.1.1       | 23 KB     | #### | 100%
markupsafe-2.1.1     | 25 KB     | #### | 100%
jinja2-3.1.2         | 99 KB     | #### | 100%
numpy-1.22.4         | 6.1 MB    | #### | 100%
ninja-1.11.0         | 300 KB    | #### | 100%
brotlipy-0.7.0       | 329 KB    | #### | 100%
tqdm-4.64.0          | 81 KB     | #### | 100%
qtpy-2.1.0           | 43 KB     | #### | 100%
certifi-2022.5.18.1  | 151 KB    | #### | 100%
pure_eval-0.2.2      | 14 KB     | #### | 100%
sqlite-3.38.5        | 1.3 MB    | #### | 100%
tbb-2021.5.0         | 148 KB    | #### | 100%
protobuf-3.14.0      | 261 KB    | #### | 100%
python-3.9.13        | 17.9 MB   | #### | 100%
mistune-0.8.4        | 55 KB     | #### | 100%
tensorflow-estimator | 288 KB    | #### | 100%
async-timeout-4.0.2  | 9 KB      | #### | 100%
oauthlib-3.2.0       | 90 KB     | #### | 100%
importlib_metadata-4 | 4 KB      | #### | 100%
h5py-3.6.0           | 1.1 MB    | #### | 100%
nbconvert-6.5.0      | 6 KB      | #### | 100%
typing-extensions-4. | 8 KB      | #### | 100%
tensorflow-base-2.6. | 110.3 MB  | #### | 100%
pyopenssl-22.0.0     | 49 KB     | #### | 100%
importlib-metadata-4 | 33 KB     | #### | 100%
jsonschema-4.5.1     | 57 KB     | #### | 100%
prompt_toolkit-3.0.2 | 5 KB      | #### | 100%
pywin32-303          | 6.9 MB    | #### | 100%
giflib-5.2.1         | 85 KB     | #### | 100%
snappy-1.1.9         | 55 KB     | #### | 100%
win_inet_pton-1.1.0  | 9 KB      | #### | 100%
qtconsole-5.3.0      | 5 KB      | #### | 100%
frozenlist-1.3.0     | 40 KB     | #### | 100%
absl-py-1.0.0        | 95 KB     | #### | 100%
pip-22.1.1           | 1.5 MB    | #### | 100%
notebook-6.4.11      | 6.3 MB    | #### | 100%
urllib3-1.26.9       | 100 KB    | #### | 100%
debugpy-1.6.0        | 3.2 MB    | #### | 100%
stack_data-0.2.0     | 21 KB     | #### | 100%
ipykernel-6.13.0     | 186 KB    | #### | 100%
cached_property-1.5. | 11 KB     | #### | 100%
zlib-1.2.12          | 110 KB    | #### | 100%
packaging-21.3       | 36 KB     | #### | 100%
pygments-2.12.0      | 817 KB    | #### | 100%
jupyterlab_widgets-1 | 133 KB    | #### | 100%
google-auth-1.35.0   | 81 KB     | #### | 100%
importlib_resources- | 22 KB     | #### | 100%
ipython-8.4.0        | 1.1 MB    | #### | 100%
pandocfilters-1.5.0  | 11 KB     | #### | 100%
jupyter_console-6.4. | 23 KB     | #### | 100%
psutil-5.9.1         | 370 KB    | #### | 100%
pandas-1.4.2         | 11.0 MB   | #### | 100%
nbclient-0.6.3       | 65 KB     | #### | 100%
zipp-3.8.0           | 12 KB     | #### | 100%
executing-0.8.3      | 18 KB     | #### | 100%
opt_einsum-3.3.0     | 53 KB     | #### | 100%
python-flatbuffers-1 | 19 KB     | #### | 100%
widgetsnbextension-3 | 1.2 MB    | #### | 100%
cached-property-1.5. | 4 KB      | #### | 100%
typing_extensions-4. | 27 KB     | #### | 100%
regex-2022.4.24      | 343 KB    | #### | 100%
parso-0.8.3          | 69 KB     | #### | 100%
setuptools-62.3.2    | 1.4 MB    | #### | 100%
liblapack-3.9.0      | 4.5 MB    | #### | 100%
charset-normalizer-2 | 35 KB     | #### | 100%
pyqt-5.12.3          | 4.8 MB    | #### | 100%
pyyaml-6.0           | 154 KB    | #### | 100%
blinker-1.4          | 13 KB     | #### | 100%
pyu2f-0.1.5          | 31 KB     | #### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: / Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok

done
#
# To activate this environment, use
#
#     $ conda activate transformers
#
# To deactivate an active environment, use
#
#     $ conda deactivate


(base) C:\Users\Ashish Jain\OneDrive\Desktop\jupyter>

(base) C:\Users\Ashish Jain\OneDrive\Desktop\jupyter>conda activate transformers

(transformers) C:\Users\Ashish Jain\OneDrive\Desktop\jupyter>python -m ipykernel install --user --name transformers
Installed kernelspec transformers in C:\Users\Ashish Jain\AppData\Roaming\jupyter\kernels\transformers

Monday, May 30, 2022

Installing Python Package 'transformers' for BERT

TRIAL 1: Failure

Using Conda Prompt:
    conda install -c huggingface transformers

Using YAML file:


name: transformers
channels:
    - conda-forge
dependencies:
    - pip
    - pip:
    - transformers    


LOGS:
(base) C:\Users\ash\Desktop>conda env create -f env.yml
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.12.0
  latest version: 4.13.0

Please update conda by running

    $ conda update -n base -c defaults conda

Downloading and Extracting Packages
libzlib-1.2.12       | 67 KB     | #### | 100%
setuptools-62.3.2    | 1.4 MB    | #### | 100%
xz-5.2.5             | 211 KB    | #### | 100%
libffi-3.4.2         | 41 KB     | #### | 100%
bzip2-1.0.8          | 149 KB    | #### | 100%
tzdata-2022a         | 121 KB    | #### | 100%
ucrt-10.0.20348.0    | 1.2 MB    | #### | 100%
vc-14.2              | 13 KB     | #### | 100%
tk-8.6.12            | 3.5 MB    | #### | 100%
python_abi-3.10      | 4 KB      | #### | 100%
sqlite-3.38.5        | 1.3 MB    | #### | 100%
vs2015_runtime-14.29 | 1.3 MB    | #### | 100%
wheel-0.37.1         | 31 KB     | #### | 100%
openssl-3.0.3        | 10.0 MB   | #### | 100%
ca-certificates-2022 | 180 KB    | #### | 100%
python-3.10.4        | 16.2 MB   | #### | 100%
pip-22.1.1           | 1.5 MB    | #### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: \ Ran pip subprocess with arguments:
['C:\\Users\\ash\\Anaconda3\\envs\\transformers\\python.exe', '-m', 'pip', 'install', '-U', '-r', 'C:\\Users\\ash\\Desktop\\condaenv.xzuashl6.requirements.txt']
Pip subprocess output:
Collecting transformers
  Downloading transformers-4.19.2-py3-none-any.whl (4.2 MB)
     ---------------------------------------- 4.2/4.2 MB 2.6 MB/s eta 0:00:00
Collecting tqdm>=4.27
  Downloading tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
     ---------------------------------------- 78.4/78.4 kB 1.1 MB/s eta 0:00:00
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp310-cp310-win_amd64.whl (151 kB)
     -------------------------------------- 151.7/151.7 kB 1.8 MB/s eta 0:00:00
Collecting regex!=2019.12.17
  Downloading regex-2022.4.24-cp310-cp310-win_amd64.whl (262 kB)
     -------------------------------------- 262.0/262.0 kB 3.2 MB/s eta 0:00:00
Collecting requests
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     ---------------------------------------- 63.1/63.1 kB 3.3 MB/s eta 0:00:00
Collecting numpy>=1.17
  Downloading numpy-1.22.4-cp310-cp310-win_amd64.whl (14.7 MB)
     ---------------------------------------- 14.7/14.7 MB 2.9 MB/s eta 0:00:00
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     -------------------------------------- 40.8/40.8 kB 984.2 kB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp310-cp310-win_amd64.whl (3.3 MB)
     ---------------------------------------- 3.3/3.3 MB 2.4 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.7.0-py3-none-any.whl (10 kB)
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.7.0-py3-none-any.whl (86 kB)
     ---------------------------------------- 86.2/86.2 kB 1.2 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting pyparsing!=3.0.5,>=2.0.2
  Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ---------------------------------------- 98.3/98.3 kB 1.9 MB/s eta 0:00:00
Collecting colorama
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting certifi>=2017.4.17
  Downloading certifi-2022.5.18.1-py3-none-any.whl (155 kB)
     -------------------------------------- 155.2/155.2 kB 2.3 MB/s eta 0:00:00
Collecting charset-normalizer~=2.0.0
  Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
     -------------------------------------- 139.0/139.0 kB 2.1 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.3-py3-none-any.whl (61 kB)
     ---------------------------------------- 61.2/61.2 kB 1.6 MB/s eta 0:00:00
Installing collected packages: tokenizers, urllib3, typing-extensions, regex, pyyaml, pyparsing, numpy, idna, filelock, colorama, charset-normalizer, certifi, tqdm, requests, packaging, huggingface-hub, transformers
Successfully installed certifi-2022.5.18.1 charset-normalizer-2.0.12 colorama-0.4.4 filelock-3.7.0 huggingface-hub-0.7.0 idna-3.3 numpy-1.22.4 packaging-21.3 pyparsing-3.0.9 pyyaml-6.0 regex-2022.4.24 requests-2.27.1 tokenizers-0.12.1 tqdm-4.64.0 transformers-4.19.2 typing-extensions-4.2.0 urllib3-1.26.9

done
#
# To activate this environment, use
#
#     $ conda activate transformers
#
# To deactivate an active environment, use
#
#     $ conda deactivate


(base) C:\Users\ash\Desktop>

--------------------------------------------

(base) C:\Users\ash\Desktop>conda activate transformers

(transformers) C:\Users\ash\Desktop>pip install ipykernel jupyter

(transformers) C:\Users\ash\Desktop>python -m ipykernel install --user --name transformers
Installed kernelspec transformers in C:\Users\ash\AppData\Roaming\jupyter\kernels\transformers

--------------------------------------------

TESTING IN PYTHON:

>>> import transformers as ppb

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

--------------------------------------

(transformers) C:\Users\ash>conda install -c conda-forge tensorflow
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.\
failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - tensorflow -> python[version='3.5.*|3.6.*|>=3.5,<3.6.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|3.8.*|3.7.*|3.9.*']

Your python: python=3.10

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

-------------------------------------


TRIAL 2: Success

$ conda env remove -n transformers --all



ENV.YML:

name: transformers
channels:
    - conda-forge
dependencies:
    - python=3.9
    - pip
    - pandas
    
    - pip:
    - transformers
    - tensorflow    


ALTERNATIVE (NOT TRIED) ENV.YML FILE:


name: transformers
channels:
    - conda-forge
dependencies:
    - python=3.9
    - pip
    - pandas
    - openpyxl
    - ipykernel
    - jupyter
    - tensorflow
    
    - pip:
    - transformers    


LOGS:

(base) C:\Users\ash\Desktop>conda env create -f env.yml
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.12.0
  latest version: 4.13.0

Please update conda by running

    $ conda update -n base -c defaults conda



Downloading and Extracting Packages
setuptools-62.3.2    | 1.4 MB    | #### | 100%
python-3.9.13        | 17.9 MB   | #### | 100%
python_abi-3.9       | 4 KB      | #### | 100%
pandas-1.4.2         | 11.0 MB   | #### | 100%
numpy-1.22.4         | 6.1 MB    | #### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: / Ran pip subprocess with arguments:
['C:\\Users\\ash\\Anaconda3\\envs\\transformers\\python.exe', '-m', 'pip', 'install', '-U', '-r', 'C:\\Users\\ash\\Desktop\\condaenv.m0blf3oh.requirements.txt']
Pip subprocess output:
Collecting transformers
  Using cached transformers-4.19.2-py3-none-any.whl (4.2 MB)
Collecting tensorflow
  Downloading tensorflow-2.9.1-cp39-cp39-win_amd64.whl (444.0 MB)
     -------------------------------------- 444.0/444.0 MB 1.7 MB/s eta 0:00:00
Collecting requests
  Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting regex!=2019.12.17
  Downloading regex-2022.4.24-cp39-cp39-win_amd64.whl (262 kB)
     -------------------------------------- 262.1/262.1 kB 2.7 MB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp39-cp39-win_amd64.whl (3.3 MB)
     ---------------------------------------- 3.3/3.3 MB 3.0 MB/s eta 0:00:00
Collecting filelock
  Using cached filelock-3.7.0-py3-none-any.whl (10 kB)
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp39-cp39-win_amd64.whl (151 kB)
     -------------------------------------- 151.6/151.6 kB 3.0 MB/s eta 0:00:00
Collecting tqdm>=4.27
  Using cached tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
Collecting packaging>=20.0
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting huggingface-hub<1.0,>=0.1.0
  Using cached huggingface_hub-0.7.0-py3-none-any.whl (86 kB)
Requirement already satisfied: numpy>=1.17 in c:\users\ash\anaconda3\envs\transformers\lib\site-packages (from transformers->-r C:\Users\ash\Desktop\condaenv.m0blf3oh.requirements.txt (line 1)) (1.22.4)
Requirement already satisfied: six>=1.12.0 in c:\users\ash\anaconda3\envs\transformers\lib\site-packages (from tensorflow->-r C:\Users\ash\Desktop\condaenv.m0blf3oh.requirements.txt (line 2)) (1.16.0)
Collecting termcolor>=1.1.0
  Downloading termcolor-1.1.0.tar.gz (3.9 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Downloading tensorflow_io_gcs_filesystem-0.26.0-cp39-cp39-win_amd64.whl (1.5 MB)
     ---------------------------------------- 1.5/1.5 MB 3.0 MB/s eta 0:00:00
Collecting protobuf<3.20,>=3.9.2
  Downloading protobuf-3.19.4-cp39-cp39-win_amd64.whl (895 kB)
     -------------------------------------- 895.7/895.7 kB 2.0 MB/s eta 0:00:00
Collecting absl-py>=1.0.0
  Downloading absl_py-1.0.0-py3-none-any.whl (126 kB)
     -------------------------------------- 126.7/126.7 kB 1.1 MB/s eta 0:00:00
Collecting typing-extensions>=3.6.6
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting libclang>=13.0.0
  Downloading libclang-14.0.1-py2.py3-none-win_amd64.whl (14.2 MB)
     -------------------------------------- 14.2/14.2 MB 701.7 kB/s eta 0:00:00
Collecting astunparse>=1.6.0
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting google-pasta>=0.1.1
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
     ---------------------------------------- 57.5/57.5 kB 1.5 MB/s eta 0:00:00
Requirement already satisfied: setuptools in c:\users\ash\anaconda3\envs\transformers\lib\site-packages (from tensorflow->-r C:\Users\ash\Desktop\condaenv.m0blf3oh.requirements.txt (line 2)) (62.3.2)
Collecting tensorflow-estimator<2.10.0,>=2.9.0rc0
  Downloading tensorflow_estimator-2.9.0-py2.py3-none-any.whl (438 kB)
     -------------------------------------- 438.7/438.7 kB 2.7 MB/s eta 0:00:00
Collecting tensorboard<2.10,>=2.9
  Downloading tensorboard-2.9.0-py3-none-any.whl (5.8 MB)
     ---------------------------------------- 5.8/5.8 MB 2.9 MB/s eta 0:00:00
Collecting opt-einsum>=2.3.2
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
     ---------------------------------------- 65.5/65.5 kB 1.2 MB/s eta 0:00:00
Collecting gast<=0.4.0,>=0.2.1
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting wrapt>=1.11.0
  Downloading wrapt-1.14.1-cp39-cp39-win_amd64.whl (35 kB)
Collecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.46.3-cp39-cp39-win_amd64.whl (3.5 MB)
     ---------------------------------------- 3.5/3.5 MB 2.7 MB/s eta 0:00:00
Collecting keras-preprocessing>=1.1.1
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
     ---------------------------------------- 42.6/42.6 kB 1.0 MB/s eta 0:00:00
Collecting h5py>=2.9.0
  Downloading h5py-3.7.0-cp39-cp39-win_amd64.whl (2.6 MB)
     ---------------------------------------- 2.6/2.6 MB 2.8 MB/s eta 0:00:00
Collecting flatbuffers<2,>=1.12
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting keras<2.10.0,>=2.9.0rc0
  Downloading keras-2.9.0-py2.py3-none-any.whl (1.6 MB)
     ---------------------------------------- 1.6/1.6 MB 2.7 MB/s eta 0:00:00
Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\users\ash\anaconda3\envs\transformers\lib\site-packages (from astunparse>=1.6.0->tensorflow->-r C:\Users\ash\Desktop\condaenv.m0blf3oh.requirements.txt (line 2)) (0.37.1)
Collecting pyparsing!=3.0.5,>=2.0.2
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting google-auth<3,>=1.6.3
  Downloading google_auth-2.6.6-py2.py3-none-any.whl (156 kB)
     -------------------------------------- 156.7/156.7 kB 2.4 MB/s eta 0:00:00
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
     -------------------------------------- 781.3/781.3 kB 3.3 MB/s eta 0:00:00
Collecting markdown>=2.6.8
  Downloading Markdown-3.3.7-py3-none-any.whl (97 kB)
     ---------------------------------------- 97.8/97.8 kB 1.4 MB/s eta 0:00:00
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Downloading tensorboard_data_server-0.6.1-py3-none-any.whl (2.4 kB)
Collecting werkzeug>=1.0.1
  Downloading Werkzeug-2.1.2-py3-none-any.whl (224 kB)
     -------------------------------------- 224.9/224.9 kB 2.3 MB/s eta 0:00:00
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2022.5.18.1-py3-none-any.whl (155 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
Collecting charset-normalizer~=2.0.0
  Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting colorama
  Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting pyasn1-modules>=0.2.1
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
     -------------------------------------- 155.3/155.3 kB 2.3 MB/s eta 0:00:00
Collecting cachetools<6.0,>=2.0.0
  Downloading cachetools-5.2.0-py3-none-any.whl (9.3 kB)
Collecting rsa<5,>=3.1.4
  Downloading rsa-4.8-py3-none-any.whl (39 kB)
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting importlib-metadata>=4.4
  Downloading importlib_metadata-4.11.4-py3-none-any.whl (18 kB)
Collecting zipp>=0.5
  Downloading zipp-3.8.0-py3-none-any.whl (5.4 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Downloading pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
     ---------------------------------------- 77.1/77.1 kB 2.2 MB/s eta 0:00:00
Collecting oauthlib>=3.0.0
  Downloading oauthlib-3.2.0-py3-none-any.whl (151 kB)
     -------------------------------------- 151.5/151.5 kB 3.0 MB/s eta 0:00:00
Building wheels for collected packages: termcolor
  Building wheel for termcolor (setup.py): started
  Building wheel for termcolor (setup.py): finished with status 'done'
  Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4832 sha256=34e6470d92e16cedf1b846cf239d01ce6c05ddff3b0ec5437ceff54ea7de2d15
  Stored in directory: c:\users\ash\appdata\local\pip\cache\wheels\b6\0d\90\0d1bbd99855f99cb2f6c2e5ff96f8023fad8ec367695f7d72d
Successfully built termcolor
Installing collected packages: tokenizers, termcolor, tensorboard-plugin-wit, pyasn1, libclang, keras, flatbuffers, zipp, wrapt, werkzeug, urllib3, typing-extensions, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, rsa, regex, pyyaml, pyparsing, pyasn1-modules, protobuf, opt-einsum, oauthlib, keras-preprocessing, idna, h5py, grpcio, google-pasta, gast, filelock, colorama, charset-normalizer, certifi, cachetools, astunparse, absl-py, tqdm, requests, packaging, importlib-metadata, google-auth, requests-oauthlib, markdown, huggingface-hub, transformers, google-auth-oauthlib, tensorboard, tensorflow
Successfully installed absl-py-1.0.0 astunparse-1.6.3 cachetools-5.2.0 certifi-2022.5.18.1 charset-normalizer-2.0.12 colorama-0.4.4 filelock-3.7.0 flatbuffers-1.12 gast-0.4.0 google-auth-2.6.6 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.46.3 h5py-3.7.0 huggingface-hub-0.7.0 idna-3.3 importlib-metadata-4.11.4 keras-2.9.0 keras-preprocessing-1.1.2 libclang-14.0.1 markdown-3.3.7 oauthlib-3.2.0 opt-einsum-3.3.0 packaging-21.3 protobuf-3.19.4 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyparsing-3.0.9 pyyaml-6.0 regex-2022.4.24 requests-2.27.1 requests-oauthlib-1.3.1 rsa-4.8 tensorboard-2.9.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.9.1 tensorflow-estimator-2.9.0 tensorflow-io-gcs-filesystem-0.26.0 termcolor-1.1.0 tokenizers-0.12.1 tqdm-4.64.0 transformers-4.19.2 typing-extensions-4.2.0 urllib3-1.26.9 werkzeug-2.1.2 wrapt-1.14.1 zipp-3.8.0

done
#
# To activate this environment, use
#
#     $ conda activate transformers
#
# To deactivate an active environment, use
#
#     $ conda deactivate


(base) C:\Users\ash\Desktop>conda activate transformers

(transformers) C:\Users\ash\Desktop>conda install -c conda-forge jupyter ipykernel

(transformers) C:\Users\ash\Desktop>python -m ipykernel install --user --name transformers
Installed kernelspec transformers in C:\Users\ash\AppData\Roaming\jupyter\kernels\transformers

TESTING LOGS

import warnings
warnings.filterwarnings('ignore')

print(ppb.__version__)

# 4.19.2

model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased')

tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)

OUTPUT:
Downloading: 100%
226k/226k [00:01<00:00, 253kB/s]
Downloading: 100%
28.0/28.0 [00:00<00:00, 921B/s]
Downloading: 100%
570/570 [00:00<00:00, 14.5kB/s]
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [9], in <cell line: 2>()
      1 tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
----> 2 model = model_class.from_pretrained(pretrained_weights)

File ~\Anaconda3\envs\transformers\lib\site-packages\transformers\utils\import_utils.py:788, in DummyObject.__getattr__(cls, key)
    786 if key.startswith("_"):
    787     return super().__getattr__(cls, key)
--> 788 requires_backends(cls, cls._backends)

File ~\Anaconda3\envs\transformers\lib\site-packages\transformers\utils\import_utils.py:776, in requires_backends(obj, backends)
    774 failed = [msg.format(name) for available, msg in checks if not available()]
    775 if failed:
--> 776     raise ImportError("".join(failed))

ImportError:
BertModel requires the PyTorch library but it was not found in your environment. Checkout the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.





FIX:

(transformers) C:\Users\ash>conda install -c pytorch pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.12.0
  latest version: 4.13.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: C:\Users\ash\Anaconda3\envs\transformers

  added / updated specs:
    - pytorch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cudatoolkit-11.3.1         |       h59b6b97_2       545.3 MB
    libuv-1.40.0               |       he774522_0         255 KB
    openssl-1.1.1o             |       h2bbff1b_0         4.8 MB
    pytorch-1.11.0             |py3.9_cuda11.3_cudnn8_0        1.23 GB  pytorch
    pytorch-mutex-1.0          |             cuda           3 KB  pytorch
    ------------------------------------------------------------
                                           Total:        1.77 GB

The following NEW packages will be INSTALLED:

  blas               pkgs/main/win-64::blas-1.0-mkl
  cudatoolkit        pkgs/main/win-64::cudatoolkit-11.3.1-h59b6b97_2
  libuv              pkgs/main/win-64::libuv-1.40.0-he774522_0
  pytorch            pytorch/win-64::pytorch-1.11.0-py3.9_cuda11.3_cudnn8_0
  pytorch-mutex      pytorch/noarch::pytorch-mutex-1.0-cuda
  typing_extensions  pkgs/main/noarch::typing_extensions-4.1.1-pyh06a4308_0

The following packages will be SUPERSEDED by a higher-priority channel:

  openssl            conda-forge::openssl-1.1.1o-h8ffe710_0 --> pkgs/main::openssl-1.1.1o-h2bbff1b_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
libuv-1.40.0         | 255 KB    | #### | 100%
openssl-1.1.1o       | 4.8 MB    | #### | 100%
pytorch-mutex-1.0    | 3 KB      | #### | 100%
cudatoolkit-11.3.1   | 545.3 MB  | #### | 100%
pytorch-1.11.0       | 1.23 GB   | #### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(transformers) C:\Users\ash>

Thursday, May 12, 2022

Why Coding is Important

Today, technology is there in every part of our life. Coding jobs are the future. They already constitute more than 60% of all science, technology, engineering, and math jobs. But, unfortunately, schools and traditional education systems are not equipped to provide the right coding education.

With the huge development of technologies, human civilization has entered the new era of the Fourth Industrial Revolution from the Third Industrial Revolution. The Fourth Industrial Revolution is marked by emerging technology breakthroughs in robotics, artificial intelligence, nanotechnology, quantum computing, biotechnology, The Internet of Things (IoT), 3D printing, and autonomous vehicles.

Facing such a future, “85 % of jobs in 2030 haven’t been invented yet,” as estimated by Dell and the Institute for the Future (IFTF) in the report “Realizing 2030”. And McKinsey, the leading global management consulting firm, predicted that robots could replace 800 million jobs by 2030. Coding itself could help human beings cooperate with the robots, control, or understand the much more digital world. Computer language also could be a necessity for jobs in the future.

References:

# iftf.org

# mckinsey

Coding is a 21st-century skill that kids need to learn today, but unfortunately, there is still not much awareness about it.

Every company will be a software company in the future. And there has never been a time when the impact of software is so visible everywhere around us. Our daily lives have already been disrupted and arguably made better. However, there is still a long way to go. As of 2019, 56% of the world population had internet access. This was a growth of over 1000% since the start of the 21st century. By 2025, 80% of the world population (6.5 B) will have internet access. The distribution of software will reach such a massive scale that we may not even comprehend today. Someone sitting in India right now could be building an app in their high school, and that app would be used by millions of users all over the world. It was futuristic a decade ago but sounds extremely real today.

Now is a great time to be entering the coding world because technology will change more in the next 10 years than it has in the last 50.
- Bill Gates

Everyone should know how to program a computer, because it teaches you how to think.
- Steve Jobs

Coding is today's langauage of creativity. All our children deserve a chance to become creators instead consumers of computer science.
- Maria Klowe

Children must be taught how to think not what to think.
- Margaret Mead

Margaret Mead was an American cultural anthropologist who featured frequently as an author and speaker in the mass media during the 1960s and the 1970s. She earned her bachelor's degree at Barnard College of Columbia University and her M.A. and Ph.D. degrees from Columbia.

History

1. Vivek & Satyam started working on building Codingal in Jul 2020
2. Codingal was launched on Sep 1, 2020
3. Codingal received funding of USD 560,000 in Oct 2020
4. Codingal got accepted into the world's most prestigious startup program - Y Combinator, in Dec 2020

Saturday, January 8, 2022

Math With Words - A lesson in Natural Language Processing

Chatbot Recirculating (Recurrent) Pipeline


Bag of Words 
Abv.: BOW

from nltk.tokenize import TreebankWordTokenizer
sentence = """The faster Harry got to the store, the faster Harry, the faster, would get home."""
tokenizer = TreebankWordTokenizer()
tokens = tokenizer.tokenize(sentence.lower())
print(tokens)

['the',
'faster',
'harry',
'got',
'to',
'the',
'store',
',',
'the',
'faster',
'harry',
',',
'the',
'faster',
',',
'would',
'get',
'home',
'.'] 

from collections import Counter
bag_of_words = Counter(tokens)
bag_of_words 

Counter({'the': 4,
'faster': 3,
'harry': 2,
'got': 1,
'to': 1,
'store': 1,
',': 3,
'would': 1,
'get': 1,
'home': 1,
'.': 1}) 

Term Frequency
= # times the word appears in the text / # words in the text

v = list(bag_of_words.values())
k = list(bag_of_words.keys())
l = len(k)

tf_dict = {}
for i, elem in enumerate(bag_of_words.keys()):
    tf_dict[k[i]] = round(v[i]/l, 3)

tf_dict

OUT:
{
    'the': 0.364,
    'faster': 0.273,
    'harry': 0.182,
    'got': 0.091,
    'to': 0.091,
    'store': 0.091,
    ',': 0.273,
    'would': 0.091,
    'get': 0.091,
    'home': 0.091,
    '.': 0.091
} 

Cosine Similarity

In Python this would be
a.dot(b) == np.linalg.norm(a) * np.linalg.norm(b) * np.cos(theta)
Solving this relationship for cos(theta), you can derive the cosine similarity using



Or you can do it in pure Python without numpy, as in the following listing.

import math
def cosine_sim(vec1, vec2):
    """ Let's convert our dictionaries to lists for easier matching."""
    vec1 = [val for val in vec1.values()]
    vec2 = [val for val in vec2.values()]

    dot_prod = 0
    for i, v in enumerate(vec1):
        dot_prod += v * vec2[i]

    mag_1 = math.sqrt(sum([x**2 for x in vec1]))
    mag_2 = math.sqrt(sum([x**2 for x in vec2]))

    return dot_prod / (mag_1 * mag_2) 

Zipf's Law
    
Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table.
    
freq(word) = k / r
Where k is a constant and r is the rank of the word.    

IDF (Inverse Document Frequency) and TF-IDF

For a given term, t, in a given document, d, in a corpus, D, you get:



Summary

# Any web-scale search engine with millisecond response times has the power of a TF-IDF term document matrix hidden under the hood.

# Term frequencies must be weighted by their inverse document frequency to ensure the most important, most meaningful words are given the heft they deserve.

# Zipf's law can help you predict the frequencies of all sorts of things, including words, characters, and people.

# The rows of a TF-IDF term document matrix can be used as a vector representation of the meanings of those individual words to create a vector space model of word semantics.

# Euclidean distance and similarity between pairs of high dimensional vectors doesn't adequately represent their similarity for most NLP applications.

# Cosine distance, the amount of "overlap" between vectors, can be calculated efficiently by just multiplying the elements of normalized vectors together and summing up those products.

# Cosine distance is the go-to similarity score for most natural language vector representations.

Practical

txt1 = """
A kite is traditionally a tethered heavier-than-air craft with wing surfaces that react against the air to create lift and drag. A kite consists of wings, tethers, and anchors. Kites often have a bridle to guide the face of the kite at the correct angle so the wind can lift it. A kite's wing also may be so designed so a bridle is not needed; when kiting a sailplane for launch, the tether meets the wing at a single point. A kite may have fixed or moving anchors. Untraditionally in technical kiting, a kite consists of tether-set-coupled wing sets; even in technical kiting, though, a wing in the system is still often called the kite.

The lift that sustains the kite in flight is generated when air flows around the kite's surface, producing low pressure above and high pressure below the wings. The interaction with the wind also generates horizontal drag along the direction of the wind. The resultant force vector from the lift and drag force components is opposed by the tension of one or more of the lines or tethers to which the kite is attached. The anchor point of the kite line may be static or moving (such as the towing of a kite by a running person, boat, free-falling anchors as in paragliders and fugitive parakites or vehicle). The same principles of fluid flow apply in liquids and kites are also used under water. A hybrid tethered craft comprising both a lighter-than-air balloon as well as a kite lifting surface is called a kytoon.

Kites have a long and varied history, and many different types are flown individually and at festivals worldwide. Kites may be flown for recreation, art or other practical uses. Sport kites can be flown in aerial ballet, sometimes as part of a competition. Power kites are multi-line steerable kites designed to generate large forces which can be used to power activities such as kite surfing, kite landboarding, kite fishing, kite buggying and a new trend snow kiting. Even Man-lifting kites have been made.
"""

txt2 = """
Kites were invented in China, where materials ideal for kite building were readily available: silk fabric for sail material; fine, high-tensile-strength silk for flying line; and resilient bamboo for a strong, lightweight framework.

The kite has been claimed as the invention of the 5th-century BC Chinese philosophers Mozi (also Mo Di) and Lu Ban (also Gongshu Ban). By 549 AD paper kites were certainly being flown, as it was recorded that in that year a paper kite was used as a message for a rescue mission. Ancient and medieval Chinese sources describe kites being used for measuring distances, testing the wind, lifting men, signaling, and communication for military operations. The earliest known Chinese kites were flat (not bowed) and often rectangular. Later, tailless kites incorporated a stabilizing bowline. Kites were decorated with mythological motifs and legendary figures; some were fitted with strings and whistles to make musical sounds while flying. From China, kites were introduced to Cambodia, Thailand, India, Japan, Korea and the western world.

After its introduction into India, the kite further evolved into the fighter kite, known as the patang in India, where thousands are flown every year on festivals such as Makar Sankranti. Kites were known throughout Polynesia, as far as New Zealand, with the assumption being that the knowledge diffused from China along with the people. Anthropomorphic kites made from cloth and wood were used in religious ceremonies to send prayers to the gods. Polynesian kite traditions are used by anthropologists get an idea of early "primitive" Asian traditions that are believed to have at one time existed in Asia.
"""

from sklearn.feature_extraction.text import TfidfVectorizer

corpus = [txt1, txt2]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

print(X) 

OUTPUT:
<2x300 sparse matrix of type '<class 'numpy.float64'>'
    with 333 stored elements in Compressed Sparse Row format> 

print(len(vectorizer.get_feature_names()))
print(vectorizer.get_feature_names()[0:10]) 

OUTPUT:
300
['549', '5th', 'above', 'activities', 'ad', 'aerial', 'after', 'against', 'air', 'along'] 

from sklearn.metrics.pairwise import cosine_similarity

t1 = vectorizer.transform([txt1])
t2 = vectorizer.transform([txt2])

cosine_similarity(t1, t2) # array([[0.50239949]]) 

Finding meaning in word counts: Semantic analysis 

The problem with TF-IDF and "Words that mean the same": TF-IDF vectors and lemmatization

TF-IDF vectors count the exact spellings of terms in a document. So, texts that restate the same meaning will have completely different TF-IDF vector representations if they spell things differently or use different words. This messes up search engines and document similarity comparisons that rely on counts of tokens.

POLYSEMY

This is the opposite of problem (many words and one meaning) discussed in previous slide, here we have "one word and many meanings".

Coming up with a numerical representation of the semantics (meaning) of words and sentences can be tricky. This is especially true for "fuzzy" languages like English, which has multiple dialects and many different interpretations of the same words.

Even formal English text written by an English professor can't avoid the fact that most English words have multiple meanings, a challenge for any new learner, including machine learners. This concept of words with multiple meanings is called polysemy:

# Polysemy - The existence of words and phrases with more than one meaning

More like Polysemy

# Homonyms
Words with the same spelling and pronunciation, but different meanings

# Zeugma
Use of two meanings of a word simultaneously in the same sentence

# Homographs
Words spelled the same, but with different pronunciations and meanings

# Homophones
Words with the same pronunciation, but different spellings and meanings (an NLP challenge with voice interfaces)

Imagine if you had to deal with a statement like the following:

She felt... less. She felt tamped down. Dim. More faint. Feint. Feigned. Fain. (Patrick Rothfuss)

Beginning to solve the problem: Linear discriminant analysis (LDA)

LDA breaks down a document into only one topic. 

To get more topics, use LDiA instead. LDiA (Latent Dirichlet allocation) can break down documents into many topics as you'd like.

1.1) It's one dimensional. You can just compute the centroid (average or mean) of all your TF-IDF vectors for each side of a binary class, like spam and non-spam.

1.2) Your dimension then becomes the line between those two centroids.
1.3) The further a TF-IDF vector is along that line (the dot product of the TF-IDF vector with that line) tells you how close you are to one class or another.

LDA classifier is a supervised algorithm, so you do need labels for your document classes. But LDA requires far fewer samples than fancier algorithms. For this example, we show you a simplified implementation of LDA that you can't find in scikit-learn. The model "training" has only three steps:

2.1) Compute the average position (centroid) of all the TF-IDF vectors within the class (such as spam SMS messages).

2.2) Compute the average position (centroid) of all the TF-IDF vectors not in the class (such as non-spam SMS messages).

2.3) Compute the vector difference between the centroids (the line that connects them).

All you need to "train" an LDA model is to find the vector (line) between the two centroids for your binary class. LDA is a supervised algorithm, so you need labels for your messages. To do inference or prediction with that model, you just need to find out if a new TF-IDF vector is closer to the in-class (spam) centroid than it is to the out-of-class (non-spam) centroid.

Pages

Friday, August 4, 2023

Introduction

Saturday, July 22, 2023

Friday, July 21, 2023

Challenges

Misuse

Regulation

New Approaches to AI

AGI

Jobs

Conclusion

Tuesday, May 31, 2022

ENV.YML FILE:

Monday, May 30, 2022

TRIAL 1: Failure

TRIAL 2: Success

TESTING LOGS

Thursday, May 12, 2022

History

Saturday, January 8, 2022

Chatbot Recirculating (Recurrent) Pipeline

Bag of Words

Abv.: BOW

Term Frequency

= # times the word appears in the text / # words in the text

Cosine Similarity

Zipf's Law

IDF (Inverse Document Frequency) and TF-IDF

Summary

Practical

Finding meaning in word counts: Semantic analysis

The problem with TF-IDF and "Words that mean the same": TF-IDF vectors and lemmatization

POLYSEMY

More like Polysemy

Beginning to solve the problem: Linear discriminant analysis (LDA)