Tuesday, August 23, 2022

Compare two dictionaries in Python



def compare_dict(x, y):
    shared_items = {k: x[k] for k in x if k in y and x[k] == y[k]}
    differing_values = {k: x[k] for k in x if k in y and x[k] != y[k]}
    differing_keys = [k for k in x if k not in y]

    return {
        "shared_items_in_x": shared_items,
        "differing_values_in_x": differing_values,
        "differing_keys_in_x": differing_keys
    }

def is_dict_in_list(d, l):
    rtn = False
    for k in l:
        cd = compare_dict(d, k)
        if(len(cd['differing_values_in_x']) == 0 and len(cd['differing_keys_in_x']) == 0):
            rtn = True
            break
    return rtn

def purify_list_of_dicts(inlist):
    nlist = [i for j in inlist for i in j]
   
    olist = []

    for i in range(0, len(nlist)):
        if is_dict_in_list(nlist[i], olist) == False:
            olist.append(nlist[i])

    return olist

Tags: Python,

Thursday, August 11, 2022

Using Sentiment to Detect Bots on Twitter : Are Humans more Opinionated than Bots (Dickerson, Jul 2022)

Download Research Paper

Abstract

In many Twitter applications, developers collect only a limited sample of tweets and a local portion of the Twitter network. Given such Twitter applications with limited data, how can we classify Twitter users as either bots or humans? We develop a collection of network-, linguistic-, and application oriented variables that could be used as possible features, and identify specific features that distinguish well between humans and bots. In particular, by analyzing a large dataset relating to the 2014 Indian election, we show that a number of sentiment related factors are key to the identification of bots, significantly increasing the Area under the ROC Curve (AUROC). The same method may be used for other applications as well.

A. Previous Work

There has been recent interest in the detection of malicious and/or fake users from both the online social networks and computer networking communities. # For instance, Wang [4] looks at graph-based features to identify bots on Twitter, while Yang, Harkreader, and [4] A. H. Wang, “Detecting spam bots in online social networking sites: A machine learning approach,” in Conference on Data and Applications Security and Privacy. ACM, 2010, pp. 335–342. # Gu [5] combine similar graphbased features with syntactic metrics to build their classifiers. [5] C. Yang, R. C. Harkreader, and G. Gu, “Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers,” in Recent Advances in Intrusion Detection. Springer, 2011, pp. 318–337. # Thomas et al. [6] use a similar set of features to provide a retrospective analysis of a large set of recently-suspended Twitter accounts. [6] K. Thomas, C. Grier, D. Song, and V. Paxson, “Suspended accounts in retrospect: An analysis of Twitter spam,” in Internet Measurement Conference (IMC). ACM, 2011, pp. 243–258. # Boshmaf et al. [7] instead create bots (rather than detecting them), claiming that 80% of bots are undetectable and that Facebook’s Immune system [8] was unable to detect their bots. [7] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “The socialbot network: When bots socialize for fame and money,” in Annual Computer Security Applications Conference (ACSAC). ACM, 2011, pp. 93–102. [8] T. Stein, E. Chen, and K. Mangla, “Facebook immune system,” in Workshop on Social Network Systems (SNS). ACM, 2011. # Lee, Caverlee, and Webb [9] create “honeypot” accounts to lure both humans and spammers into the open, then provide a statistical analysis of the malicious accounts they identified. [9] K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: Social honeypots + machine learning,” in Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010, pp. 435–442. # In computer networks research, the detection of Sybil accounts in computer networks has been applied to social network data; these techniques tend to rely on the “fast mixing” property of a network—which may not exist in social networks [10]—and do not scale to the size of present-day social networks (e.g., SybilInfer [3] runs in time O(|V|^2 . log |V|), which is intractable for networks with millions users). [10] A. Mohaisen, A. Yun, and Y. Kim, “Measuring the mixing time of social graphs,” in Internet Measurement Conference (IMC). ACM, 2010, pp. 383–389.

V. CONCLUSION

In many real-world applications, developers are only able to collect tweets from the Twitter API that directly address a set of topics of interest (TOI) relevant to the application. Moreover, in such applications, developers also typically only collect a local portion of the Twitter network. As a consequence, many traditional primarily network-based methods for detecting bots are less or not effective (e.g., if the topics are quite specific, not discussed by very popular people, or not retweeted much), since a sparse subset of the global network and tweet database based on a set TOI is insufficient. The SentiBot framework presented in this paper addresses the classification of users as human versus bot in such applications. In order to achieve this, SentiBot relies on four classes of variables (or features) related to tweet syntax, tweet semantics, user behavior, and network-centric user properties. In particular, we introduce a large set of sentiment variables, including combinations of sentiment and network variables— to our knowledge, this is the first time such sentiment-based features have been used in bot detection. In addition, we introduce variables related to topics of interest. We apply a suite of classical machine learning algorithms to identify: (i) users who are bots and (ii) TOI-independent features that are particularly important in distinguishing between bots and humans. Based on an analysis of over 7.7 million tweets and 550,000 users associated with the recently concluded 2014 Indian election (where there were reports of social media campaigns), we were able to show that the use of sentiment variables significantly improved the accuracy of our classification. In particular, the Area under the ROC Curve (AUROC) increased from 0.65 to 0.73. As an AUROC of 0.5 represents random guessing, this reflects 53% improvement in accuracy. In addition, we discovered that (in our dataset): 1) Bots flip-flop much less frequently than humans in terms of sentiment; 2) When humans express positive sentiment, they tend to express stronger positive sentiment than bots; 3) A similar (but slightly more nuanced) trend holds in terms of expression of negative sentiments by humans; and 4) Humans disagree more with the general sentiment of the application's Twitter population than bots. Our results can feed into many applications. For instance, when assessing which Twitter users are influential on a given topic, we must discount for bots—which requires methods like those presented in this paper to identify bots. When identifying the expected spread of a sentiment through Twitter, we again must discount for bots. The paper presents a general framework within which applications can identify bots using the relatively limited local data they have.
Tags: Natural Language Processing

Fluoxetine (SSRI (Selective Serotonin Reuptake Inhibitor))

Fluoxetine, sold under the brand names Prozac and Sarafem, among others, is an antidepressant of the selective serotonin reuptake inhibitor class. It is used for the treatment of major depressive disorder, obsessive–compulsive disorder, bulimia nervosa, panic disorder, and premenstrual dysphoric disorder.

Fluoxetine Uses

Fluoxetine is used in the treatment of depression, Panic disorder and obsessive-compulsive disorder.

How Fluoxetine works

Fluoxetine is a selective serotonin reuptake inhibitor (SSRI) antidepressant. It works by increasing the levels of serotonin, a chemical messenger in the brain. This improves mood and physical symptoms of depression and also relieves symptoms of panic and obsessive disorders.

Common side effects of Fluoxetine

Weakness, Insomnia (difficulty in sleeping), Nervousness, Anxiety, Blurred vision, Decreased libido, Fatigue, Frequent urge to urinate, Gastrointestinal disturbance, Headache, Palpitations, Prolonged QT interval

Composition

Fluoxetine: 20 mg Capsule
Tags: Medicine,

Wednesday, August 10, 2022

Classification of Twitter Accounts into Automated Agents and Human Users (Zafar Gilani, Jul 2022)

Download Research Paper

Abstract

Online social networks (OSNs) have seen a remarkable rise in the presence of surreptitious automated accounts. Massive human user-base and business-supportive operating model of social networks (such as Twitter) facilitates the creation of automated agents. In this paper we outline a systematic methodology and train a classifier to categorise Twitter accounts into ‘automated’ and ‘human’ users. To improve classification accuracy we employ a set of novel steps. First, we divide the dataset into four popularity bands to compensate for differences in types of accounts. Second, we create a large ground truth dataset using human annotations and extract relevant features from raw tweets. To judge accuracy of the procedure we calculate agreement among human annotators as well as with a bot detection research tool. We then apply a Random Forests classifier that achieves an accuracy close to human agreement. Finally, as a concluding step we perform tests to measure the efficacy of our results.

Index Terms

Social network analysis; account classification; automated agents; bot detection

Our work has the following contributions:

(i) Use of raw historical data (60 million tweets) for attribute collection and account classification (722; 109 tweets) to cater for stealthier agents that are harder to discern from humans; (ii) A Twitter dataset divided into user popularity bands, further partitioned into lists of agents and humans (for reasons refer to xIV) using a human annotation task. This serves as a large ground truth dataset; (iii) 14 novel features from a total feature-set of 21 attributes (see xIV); (iv) Performance evaluation of current state of the art in bot detection by calculating agreement between human annotators and BOTORNOT; (v) Application of supervised learning approach – Random Forests classifier – for non-partisan account categorisation; (vi) Identification of a distinct group of features (using ablation tests) that are most informative for classifying automated agents within each popularity band (cf. Table VIII); and (vii) Hypotheses (cf. Table I) verification against our findings using t-tests (see xVI).

Infotainment

References

12: Datasets can be found here – https://goo.gl/SigsQB. Classifier is available as a part of Stweeler. The link is forbidden for public.
Tags: Natural Language Processing

Triclenz Shampoo (Sulphate Free Hair Cleanser by Curatio)

Tags: Medicine,

Monday, August 8, 2022

Accessing Twitter API From Two Systems. One With Firewall and Second Without Firewall

This note is less about accessing Twitter API but more about Cyber Security where you run a curl command and based on the output from that command you try to figure out the firewall settings of the system.

System 1 Configuration With Strict Firewall Where Our Curl Command For Accessing Twitter API is Not Working:

(base) C:\Users\ash\Desktop>systeminfo
OS Name:                   Microsoft Windows 10 Enterprise
OS Version:                10.0.19042 N/A Build 19042

Processor(s):              1 Processor(s) Installed.
                              [01]: AMD64 Family 23 Model 24 Stepping 1 AuthenticAMD ~2100 Mhz
BIOS Version:              HP R79 Ver. 01.10.03, 3/24/2020

Network Card(s):           4 NIC(s) Installed.
                              [01]: Realtek RTL8822BE 802.11ac PCIe Adapter
                                    Connection Name: Wi-Fi
                                    DHCP Enabled:    Yes
                                    DHCP Server:     192.168.1.1
                                    IP address(es)
                                    [01]: 192.168.1.100
                                    [02]: fe80::b1b2:6d59:f669:1b96
                                    [03]: 2401:4900:47f1:b174:70f4:de28:6287:b1c9
                                    [04]: 2401:4900:47f1:b174:b1b2:6d59:f669:1b96
                              [02]: Realtek PCIe GbE Family Controller
                                    Connection Name: Ethernet
                                    Status:          Media disconnected
                              [03]: Bluetooth Device (Personal Area Network)
                                    Connection Name: Bluetooth Network Connection
                                    Status:          Media disconnected
                              [04]: Check Point Virtual Network Adapter For Endpoint VPN Client
                                    Connection Name: Ethernet 2
                                    DHCP Enabled:    Yes
                                    DHCP Server:     10.79.251.145
                                    IP address(es)
                                    [01]: 10.79.251.146
                                    [02]: fe80::3df2:2a4:b2e1:cb0
Hyper-V Requirements:      VM Monitor Mode Extensions: Yes
                              Virtualization Enabled In Firmware: Yes
                              Second Level Address Translation: Yes
                              Data Execution Prevention Available: Yes   


System 2 Without Strict Firewall Where Curl Command is Working:


C:\Users\Ashish Jain>systeminfo

Host Name:                 LAPTOP-79RV456R
OS Name:                   Microsoft Windows 10 Home Single Language
OS Version:                10.0.19043 N/A Build 19043
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          Ashish Jain
Registered Organization:
Product ID:                00327-35105-52167-AAOEM
Original Install Date:     3/14/2021, 6:33:25 AM
System Boot Time:          7/14/2022, 5:34:13 PM
System Manufacturer:       LENOVO
System Model:              81H7
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                              [01]: Intel64 Family 6 Model 78 Stepping 3 GenuineIntel ~2000 Mhz
BIOS Version:              LENOVO 8QCN26WW(V1.14), 12/29/2020
Windows Directory:         C:\WINDOWS
System Directory:          C:\WINDOWS\system32
Boot Device:               \Device\HarddiskVolume1
System Locale:             en-us;English (United States)
Input Locale:              00004009
Time Zone:                 (UTC+05:30) Chennai, Kolkata, Mumbai, New Delhi
Total Physical Memory:     12,154 MB
Available Physical Memory: 7,634 MB
Virtual Memory: Max Size:  14,010 MB
Virtual Memory: Available: 8,057 MB
Virtual Memory: In Use:    5,953 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\LAPTOP-79RV456R
Hotfix(s):                 15 Hotfix(s) Installed.
                              [01]: KB5013887
                              [02]: KB4562830
                              [03]: KB4577586
                              [04]: KB4580325
                              [05]: KB4589212
                              [06]: KB5000736
                              [07]: KB5015807
                              [08]: KB5006753
                              [09]: KB5007273
                              [10]: KB5011352
                              [11]: KB5011651
                              [12]: KB5014032
                              [13]: KB5014035
                              [14]: KB5014671
                              [15]: KB5005699
Network Card(s):           4 NIC(s) Installed.
                              [01]: VirtualBox Host-Only Ethernet Adapter
                                    Connection Name: VirtualBox Host-Only Network
                                    DHCP Enabled:    No
                                    IP address(es)
                                    [01]: 192.168.56.1
                                    [02]: fe80::f839:dc84:9a7b:3087
                              [02]: Realtek 8821CE Wireless LAN 802.11ac PCI-E NIC
                                    Connection Name: Wi-Fi
                                    Status:          Media disconnected
                              [03]: Realtek PCIe FE Family Controller
                                    Connection Name: Ethernet
                                    Status:          Media disconnected
                              [04]: Bluetooth Device (Personal Area Network)
                                    Connection Name: Bluetooth Network Connection
                                    Status:          Media disconnected
Hyper-V Requirements:      VM Monitor Mode Extensions: Yes
                              Virtualization Enabled In Firmware: Yes
                              Second Level Address Translation: Yes
                              Data Execution Prevention Available: Yes

C:\Users\Ashish Jain>

    
I was able to make a successful request from System 2:


(base) C:\Users\Ashish Jain>curl "https://api.twitter.com/2/users/by/username/vantagepoint21" -H "Authorization: Bearer A***V"

{"data":{"id":"96529689","name":"Ashish Jain","username":"vantagepoint21"}}

(base) C:\Users\Ashish Jain>curl "https://api.twitter.com/2/users/by/username/elonmusk" -H "Authorization: Bearer A***V"

{"data":{"id":"44196397","name":"Elon Musk","username":"elonmusk"}}      


The curl command is not working on the System 1.

I think there is some issue being created by Network Firewall settings in my office laptop. From which I was not able to get a response from Twitter API.

(base) C:\Users\ash\Desktop\twitter_api>curl "https://api.twitter.com/2/users/by/username/vantagepoint21" -H "Authorization: Bearer 9***2"

curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate.

On further testing the "curl" command on 'System 1' for URLs with "http" and "https" protocols:

(base) C:\Users\ash\Desktop>curl www.survival8.blogspot.com
<HTML>
<HEAD>
<TITLE>Moved Permanently</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://survival8.blogspot.com/">here</A>.
</BODY>
</HTML>

Success for HTTP based URL

--- (base) C:\Users\ash\Desktop>curl https://survival8.blogspot.com curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate. (base) C:\Users\ash\Desktop>curl https://survival8.blogspot.com/2022/08/lets-talk-about-whataboutery.html curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate.

Failure for HTTPS based URL.

---

Successful Testing With Another HTTP based URL:

(base) C:\Users\ash\Desktop>curl http://survival8.blogspot.com/2022/08/lets-talk-about-whataboutery.html <!DOCTYPE html> <html class='v2' dir='ltr' lang='en'> <head> <link href='https://www.blogger.com/static/v1/widgets/2975350028-css_bundle_v2.css' rel='stylesheet' type='text/css'/> <meta content='width=1100' name='viewport'/> <meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/> <meta content='blogger' name='generator'/> <link href='http://survival8.blogspot.com/favicon.ico' rel='icon' type='image/x-icon'/> <link href='http://survival8.blogspot.com/2022/08/lets-talk-about-whataboutery.html' rel='canonical'/> <link rel="alternate" type="application/atom+xml" title="survival8 - Atom" href="http://survival8.blogspot.com/feeds/posts/default" /> <link rel="alternate" type="application/rss+xml" title="survival8 - RSS" href="http://survival8.blogspot.com/feeds/posts/default?alt=rss" /> <link rel="service.post" type="application/atom+xml" title="survival8 - Atom" href="https://draft.blogger.com/feeds/7823701911930369175/posts/default" /> <link rel="alternate" type="application/atom+xml" title="survival8 - Atom" href="http://survival8.blogspot.com/feeds/1169952638388485943/comments/default" /> <!--Can't find substitution for tag [blog.ieCssRetrofitLinks]--> <meta content='http://survival8.blogspot.com/2022/08/lets-talk-about-whataboutery.html' property='og:url'/> <meta content='Let’s talk about ‘Whataboutery’' property='og:title'/> <meta content=' what·about·ery [ˌwɒtəˈbaʊtəri] NOUN BRITISH the technique or practice of responding to an accusation or dif...' property='og:description'/> <title>survival8: Let’s talk about ‘Whataboutery’</title> <style id='page-skin-1' type='text/css'><!-- /* ----------------------------------------------- Blogger Template Style Name: Simple Designer: Blogger URL: www.blogger.com ----------------------------------------------- */ /* Content ----------------------------------------------- */ body { ...

Also, note that if that was Authorization failure from Twitter API, then the output would still be a JSON format informative message:

(base) C:\Users\Ashish Jain>curl "https://api.twitter.com/2/users/by/username/elonmusk" -H "Authorization: Bearer 9***INCORRECT_BEARER_TOKEN***2" { "title": "Unauthorized", "type": "about:blank", "status": 401, "detail": "Unauthorized" }

On a Side Note: Take a look at another error message from Twitter API:

(base) C:\Users\Ashish Jain>curl "https://api.twitter.com/2/users/by/username/elonmusj" -H "Authorization: Bearer A***V" { "errors": [ { "parameter":"username", "resource_id":"elonmusj", "value":"elonmusj", "detail":"User has been suspended: [elonmusj].", "title":"Forbidden", "resource_type":"user", "type":"https://api.twitter.com/2/problems/resource-not-found" } ] } Notice the typo in Elon Musk's user handle we provided in query: elonmusj

Sunday, August 7, 2022

Diclogem Tablet (Diclofenac (50mg) + Paracetamol (325mg))

 
Diclogem Tablet

Prescription Required
Manufacturer: Omega Pharmaceuticals Pvt Ltd
SALT COMPOSITION: Diclofenac (50mg) + Paracetamol (325mg)
Storage: Store below 30°C

Product introduction

Diclogem Tablet is a pain-relieving medicine. It is used to reduce pain and inflammation in conditions like rheumatoid arthritis, ankylosing spondylitis, and osteoarthritis. It may also be used to relieve muscle pain, back pain, toothache, or pain in the ear and throat. Diclogem Tablet should be taken with food. This will prevent you from getting an upset stomach. You should take it regularly as advised by your doctor. Do not take more or use it for a longer duration than recommended by your doctor. Some of the common side effects of this medicine include nausea, vomiting, stomach pain, loss of appetite, heartburn, and diarrhea. If any of these side effects bother you or do not go away with time, you should let your doctor know. Your doctor may help you with ways to reduce or prevent the side effects. The medicine may not be suitable for everybody. Before taking it, let your doctor know if you have any problems with your heart, kidneys, liver, or have stomach ulcers. To make sure it is safe for you, let your doctor know about all the other medicines you are taking. Pregnant and breastfeeding mothers should first consult their doctors before using this medicine.

Uses of Diclogem Tablet

Pain relief

Benefits of Diclogem Tablet

In Pain relief Diclogem Tablet is a combination of medicines that is used for short-term relief of pain, inflammation and swelling. It inhibits release of those chemical messengers in the brain that tell us that we have pain. It effectively relieves back pain, earache, throat pain, toothache and pain due to arthritis too. Take it as it is prescribed to get the most benefit. Do not take more or for longer than needed as that can be dangerous. In general, you should take the lowest dose that works, for the shortest possible time. This will help you to go about your daily activities more easily and have a better, more active, quality of life.

Side effects of Diclogem Tablet

Most side effects do not require any medical attention and disappear as your body adjusts to the medicine. Consult your doctor if they persist or if you’re worried about them: Common side effects of Diclogem Nausea Vomiting Stomach pain/epigastric pain Heartburn Diarrhea Loss of appetite

Fact Box

Habit Forming : No Therapeutic Class : PAIN ANALGESICS
Tags: Medicine,

Saturday, August 6, 2022

Calcitas - D3 Soft Gelatin Capsule

 
Calcitas - D3 Soft Gelatin Capsule

Manufacturer: Intas Pharmaceuticals Ltd

Information about Calcitas - D3 Soft Gelatin Capsule

Calcitas D3 Capsule contains Cholecalciferol 60,000 iu (International units). Cholecalciferol (Vitamin D3) is a fat soluble vitamin, that helps the body to absorb calcium and phosphorous found in food and supplements. Vitamin D is made by the body when skin is exposed to sunlight. Sunscreen, protective clothing, limited exposure to sunlight, dark skin, and age may prevent getting enough vitamin D from the sun, thus leading to Vitamin D3 Deficiency. Thus, Vitamin D3 in Calcitas D3 Capsule is essential for calcium absorption in the body. --- Cholecalciferol is a dietary supplement that is used to treat vitamin D deficiency. It is also used with calcium to maintain bone strength. This medicine is available both over-the-counter (OTC) and with your doctor's prescription. --- Cholecalciferol, also known as vitamin D₃ and colecalciferol, is a type of vitamin D that is made by the skin when exposed to sunlight; it is found in some foods and can be taken as a dietary supplement. Cholecalciferol is made in the skin following UVB light exposure. --- Other uses of Calcitas D3 Capsule are: Building and keeping the bones & teeth strong Reducing Fatigue/stress and muscular pains Boosting immunity and increasing resistance against infection Supplement for patients with diabetic complications and Cardio Vascular Diseases as well. Use under medical supervision.
Tags: Medicine,

Tuesday, August 2, 2022

Chatbot Examples in Use in Different Business Domains

The Apollo 11 Mission

Apollo 11 (July 16 - 24, 1969) was the American spaceflight that first landed humans on the Moon. Commander Neil Armstrong and lunar module pilot Buzz Aldrin landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon's surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a quarter hours together exploring the site they had named Tranquility Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg) of lunar material to bring back to Earth as pilot Michael Collins flew the Command Module Columbia in lunar orbit, and were on the Moon's surface for 21 hours, 36 minutes before lifting off to rejoin Columbia. Apollo 11 had a lunar system designed for geologists to answer their questions asked in natural language. The geologists would ask questions like "what is the average basalt content" and the system would respond back.

Chatbots in Healthcare

Chatbots like Molly, Eva, Ginger, Replika, Florence, and Izzy are widely used in healthcare.

Chatbots for mental health support

Bots like Wysa and Woebot are designed in such a way that they can provide support like a life coach. They are so good at asking right probing questions that can help the user to share their emotions and feelings after a hard day.

Chatbots for legal advice

Lawyers can use bots like DonotPay, LISA, Ross, and BillyBot to accelerate their work and provide better client experiences.

Other Chatbot applications

In Smart keyboards like Swiftkey, the software automatically completes your sentences by predicting the next word and corrects your spelling mistakes. Applications like Grammarly can automatically correct your spelling and grammar and assists you in writing better essays or emails. Dated: 2022-Aug-02
Tags: Natural Language Processing,

Thursday, July 28, 2022

Natural Language Processing Questions and Answers (Set 4 of 7 Questions)

Course: INTRODUCTION TO NATURAL LANGUAGE PROCESSING

Q1: Multiple Choice Correct

Which of the following are potential use cases of NLP?

a) A self driving car drawing your attentioin to an advertising billboard

b) Given the audio of a song, and its lyrics generate a translated song audio       

c) Understanding a cryptic language 

d) Determing what are the chances that you will win a law suit based on outcomes of previous similar law suits.

Answer: All four are correct.

Q2: Multiple Choice Correct 

Which of the below tasks can be performed effectively even without using sophisticated NLP techniques:

a) Identifying the main topic of a document assuming that its title is not provided.

b) Detecting the language in a document 

c) Extracting the phone numer, email address and year of graduatioin from a resume.

d) Substituting words like doesn't, can't, etc with does not, and can not, etc.

Answer: C and D 

Q3: Spam email is a persistent problem that service providers have been trying to solve for years now. One of the key tasks in building an effective spam detection system is identifying the features of an email that could be used to classify the email as spam or not.

Rank the following features based on the text content of an email based on your Understanding of the feature's importance.

a) Language (English, French, etc) used in the email text.

b) Presence of words with spelling mistakes / non standard form.

c) Emails addressed to you and contain your name.

Answer:
Correct order is: C > A > B 

Q4) Identify the kind of ambiguity in the given sentences:

a) Time flies like an arrow, fruit flies like a banana.

b) Iraqi head seeks arms.

c) A frog thought it saw a prince walk towards it. It thought it can't be true.

List of ambiuities for matching with above sentences.

I) Anaphoric Ambiguity 

II) Semantic Ambiguity 

III) Syntactic Ambiguity.

Answer: 
A -> III 
B -> II  
C -> I 

Syntactic ambiguity
Take a look at the sentence given below
“Old men and women were taken to safe locations”
This sentence has a syntactic ambiguity where the scope of the adjective “old” needs to be resolved.
In this sentence, we may not know if the adjective applies only to men or to both men and women.

Semantic ambiguity
Semantic ambiguity refers to ambiguity in the meaning.
For example, the sentence
“Alice loves her mother and so does Jacob.”
The ambiguity here is, we may not know if Jacob loves his own mother or Alice’s mother.

Anaphoric Ambiguity 

In the below paragraph
“The horse ran up the hill. It was very steep. It soon got tired.”
In this paragraph, the pronoun ‘it’ is used to refer to the hill first and then to the horse. To interpret this sentence, we need to have knowledge of the world and context. These ambiguities are called anaphoric ambiguities.


Q5) Consider the below review for co-sleeper sheets for a baby. What is the sentiment in this review?
"The shipping was quick the colors are pretty but the sheets themselves are not soft."

a) positive
b) negative 
c) Neutral

Amswer: Positive 
The user is appreciating the shipping and the colors.

Q6) Do sentiment analysis of following sentence:

"The parking was great, the restaurant anbience was good. But the food was utterly terrible."

a) positive
b) negative 
c) Neutral

Answer:
Although the number of positive words is greater than the number of negative words in these sentences, the overall sentiment was negative.

Weighted Scores to Find The Polarity
The short coming of this dictionary based, and weighted scores for doing Sentiment Analysis is that it misses out on the order of words and hence may classify the sentiment as wrong.

Q7) Assume that you have to build an NLP application that looks at a new document and estimates how similar it is to various text documents previously ingested. Consider that similarity of 2 documents is computed on the basis of presence of common words.

Based on your understanding of the NLP technique discussed so far, what are various basic pre-processing steps that you will include in this application while processing the historic data and making inferences on a new document?

Steps:

a. Remove any unwanted spaces, numbers, special characters, etc 
b. Convert all text into lower case.
c. Create n-grams based on the text.
d. Tokenize the text.
e. Normalize data using stemming and lemmatization techniques.
f. Determine the frequence of each word in each document and also in the whole corpus.
g. Remove stop words from the text.
h. Remove punctuation
i. Perform POS tagging on the text.

Options:

I. All the steps listed above need to be done.
II. a, b, d, f, g, h 
III. b, c, d, e, h, g 
IV. a, d, e, f, g

Answer: II 
Tags: Natural Language Processing

20220728 - Monitoring Effects of 1 tablet of Trini Calm and 1 tablet of Petril Beta 10

Index of Journals
20220728

1910: 
1 Tablet of Trinicalm Plus
SALT COMPOSITION: Trifluoperazine (5mg) + Trihexyphenidyl (2mg) 

1 Tablet of Petril Beta 10 Tablet
SALT COMPOSITION: Clonazepam (0.25mg) + Propranolol (10mg)

Note: 
1. Trihexyphenidyl is also referred to as "THP" medical prescriptions for psychiatric cases.
2. Clonazepam is also known as Clazzy in the underworld of drugs.

1914: Shiva Patel has just come for Math tuition.

1918:
My psychiatrist told me that: Propranolol is used to slow down racing heart beat an effect of facing a threatening situation.

2015: Finished teaching students.

2016: Having dinner.

2024: Going for shower.

2037: Am feeling sleepy and tired. Going for rest for an hour.

2040: Spoke to Anjali Devi's parents about NIOS (National Institute of Open Schooling) and readmitting her to study again.

2021: Going for rest.

8:52 pm: I cannot stop thinking how Rekha bua, Manju bua, and Kumkum bua are becoming a blocker in rental business.

8:54 pm: They do not understand that I purchased the flat after having a verbal fight with mom. Mom and I cannot live together.

9:32 pm: Self awareness was there but that panicky, irritated mood was not there.

2202: When I am in Mayur Vihar, I face harassment by uncle and aunt. And, when I am Tri Nagar, I face harassment by three buas.

Tags: Medicine,Psychology,

Student Update (2022-Jul-28)

Index of Journals

Counting

Srishti Patel Class: Nursery Till: 8 Anjali Devi Class: 5 Till: 9

Tables

Sonam Patel Class: 7 Till: 12 Shiva Patel Class: 6C Till: 18

Addition

Sonam Patel Class: 7 Till Level: 4 Shiva Patel Class: 6C Till Level: 9

Subtraction

Sonam Patel Class: 7 Till Level: 8
Tags: Student Update,

Types of Ambiguities in Natural Language

Lexical ambiguity

Take a look at the following sentences: John bagged two silver medals. Mary made a silver speech. Roger’s worries had silvered his hair. The word silver is used as a noun, an adjective, and a verb. The word silver in isolation is mostly associated with the metal and considered as a noun. However, in other sentences, the context gives the word silver different meanings and also different parts of speech like adjectives and verbs. This ambiguity is called lexical ambiguity.

Syntactic ambiguity

Take a look at the sentence given below “Old men and women were taken to safe locations” This sentence has a syntactic ambiguity where the scope of the adjective “old” needs to be resolved. In this sentence, we may not know if the adjective applies only to men or to both men and women.

Semantic ambiguity

Semantic ambiguity refers to ambiguity in the meaning. For example, the sentence “Alice loves her mother and so does Jacob.” The ambiguity here is, we may not know if Jacob loves his own mother or Alice’s mother.

Anaphoric ambiguity

In the below paragraph “The horse ran up the hill. It was very steep. It soon got tired.” In this paragraph, the pronoun ‘it’ is used to refer to the hill first and then to the horse. To interpret this sentence, we need to have knowledge of the world and context. These ambiguities are called anaphoric ambiguities.

Pragmatic Ambiguity

The hardest kind of ambiguity to resolve is the pragmatic ambiguity. This kind of ambiguity arises from the inability to process the intention or sentiment or world belief. For example, in the below conversation, My wife said: "Please go to the store and buy a carton of milk and if they have eggs, get six." I came back with 6 cartons of milk She said, "why did you buy six cartons of milk?" I replied, "They had eggs" As you can see here, the ambiguity is in understanding the intention of the speaker.
Tags: Natural Language Processing,

Wednesday, July 27, 2022

Risperidone (Salt) from 1mg.com

Risperidone Uses

Risperidone is used in the treatment of schizophrenia and mania.

How Risperidone works

Risperidone is an atypical antipsychotic. It works by affecting the levels of chemical messengers (dopamine and serotonin) to improve mood, thoughts and behavior.

Common side effects of Risperidone

Insomnia (difficulty in sleeping), Parkinsonism, Sedation, Dizziness, Weight gain, Akathisia (inability to stay still), Anxiety, Gastrointestinal symptom, Increased prolactin level in blood.

EXPERT ADVICE FOR RISPERIDONE

1. Risperidone helps treat schizophrenia and mania. 2. It may cause less weight gain, sedation, and heart problems as compared to other similar medicines. 3. It may take 4-6 weeks to notice any medication effects. Keep taking it as prescribed. 4. Use caution while driving or doing anything that requires concentration as Risperidone can cause dizziness and sleepiness. 5. It may cause increase in weight, blood sugar, cholesterol, and fat. Eat healthy, exercise, and monitor your levels regularly. 6. Inform your doctor if you experience any abnormal movements or restlessness. 7. Inform your doctor if you have a history of heart diseases as Risperidone can increase your risk of irregular heartbeat. 8. Do not stop taking Risperidone without talking to your doctor first as it may cause worsening of symptoms.
Tags: Medicine,Psychology

Student Update (2022-Jul-27)

Index of Journals

Counting

Komal Kumari Class: 4 Trial 1 (Beginning of class): Till: 16 Trial 2 (After an hour): Till: 20 Srishti Patel Class: Nursery Till: 1

Tables

Kusum Kumari Class: 5 Till: 2

Addition

Kusum Kumari Class: 5 Level: 7

Subtraction

Kusum Kumari Class: 5 Level: 1 URL: https://survival8.blogspot.com/2022/01/add-subtract-multiply-divide.html
Tags: Student Update,

Tuesday, July 26, 2022

Detailed Solution to Upto Three Digit Subtraction

Note: We are going to subtract the smaller number from the bigger one.
Enter two numbers between 0 to 999.


First Number:

Second Number:

0 0 0 0

0 0 0 0

 

-

------------

 

Tags: Mathematical Foundations for Data Science,

Monday, July 25, 2022

Student Update (2022-Jul-25)

Index of Journals

Counting

Komal Kumari Class: 4th Till: 16 Srishti Patel Class: Nursery Till: 10

Tables

Kusum Kumari Class: 5B Till: 3 Yash Kashyap Class: 5 Till: 8

Addition

Kusum Kumari Class: 5B Till Level: 4

Subtraction

Kusum Kumari Class: 5B Till Level: 1 Yash Kashyap Class: 5 Till Level: 2
Tags: Student Update,

Star Coat (Skin and Coat Tonic for Dogs)

For 3-4 Months Old Canine.
Tags: Medicine for dogs,

SkyCal (Pet Liquid for Stronger Bones)

For 3-4 Months old Canine.

Tags: Medicine for dogs,

Sunday, July 24, 2022

Converting image to text, saving to disk, reading text from disk and displaying image


A brief introduction of 'base64' functions 'b64encode' and 'b64decode':

(base) C:\Users\Ashish Jain>python
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from base64 import b64encode as b, b64decode as d
>>> s = 'hello'
>>> b(bytes(s, 'utf-8'))
b'aGVsbG8='
>>> bs = b(bytes(s, 'utf-8'))
>>> d(bs)
b'hello'
>>> d(b'aGVsbG8=')
b'hello'
>>> d(bs).decode("utf-8") 
'hello'

Now with image:

from base64 import b64decode, b64encode
image_handle = open('test_image.png', 'rb')
raw_image_data = image_handle.read()
encoded_data = b64encode(raw_image_data)

with open('i.txt', 'wb') as f:
  f.write(encoded_data)

with open('i.txt', 'rb') as f:
  b = f.read()

print(type(b)) 
[class 'bytes'] 
print(encoded_data == b) 
True 
with open('i.png', 'wb') as f:
  f.write(b64decode(b)) 

If you have a text file and it has data such as this: b'iVB...ggg=='
That means you had called str() function on 'bytes' type data and saved that string.

If you have a text file that has data such as this: iVB...ggg==
Then, you can read this file as ">>> with open('img.txt', 'rb') as f:" to get a 'bytes' type data. 
Tags: Technology,Python,