Showing posts with label Web Scraping. Show all posts
Showing posts with label Web Scraping. Show all posts

Tuesday, January 24, 2023

Resolving Selenium error for WebDriver and Chrome Browser version mismatch (Jan 2023)

Error for WebDriver and Chrome Browser version mismatch:

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 86
Current browser version is 85.0.4183.102 with binary path C:\Program Files (x86)\Google\Chrome\Application\chrome.exe 

You can resolve this error by downloading a Web Driver whose version number matches with the version number of the Chrome Browser.

Go to: Chrome Driver Download Site

Here you can see different links for different versions of Chrome Driver. v108 v110
Tags: Technology,Web Scraping,

Saturday, March 19, 2022

Ad-Serving Limit Applied Second Time (Mar 2022)

Refresh a Blogspot page 200 times in the browser and Google will apply the 'Ad Serving Limit' on your blog for a month.

Development IP

CMD>ipconfig Windows IP Configuration Ethernet adapter Ethernet: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Ethernet adapter VirtualBox Host-Only Network: Connection-specific DNS Suffix . : Link-local IPv6 Address . . . . . : fe80::f839:dc84:9a7b:3087%8 IPv4 Address. . . . . . . . . . . : 192.168.56.1 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : Wireless LAN adapter Local Area Connection* 3: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Wireless LAN adapter Local Area Connection* 4: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Wireless LAN adapter Wi-Fi: Connection-specific DNS Suffix . : IPv6 Address. . . . . . . . . . . : 2401:4900:47f0:b66:e86c:43b5:9a49:7a4a Temporary IPv6 Address. . . . . . : 2401:4900:47f0:b66:74f6:6d0b:4751:c8c4 Link-local IPv6 Address . . . . . : fe80::e86c:43b5:9a49:7a4a%10 IPv4 Address. . . . . . . . . . . : 192.168.1.100 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : fe80::ec47:62ff:fe2b:c17b%10 192.168.1.1 Ethernet adapter Bluetooth Network Connection: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . :

Cliend IP

(base) ashish@ashishdesktop:~$ ifconfig ens33: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 00:e0:4c:3c:16:6b txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 369 bytes 36814 (36.8 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 369 bytes 36814 (36.8 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 wlx00e02d420fcb: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.109 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 2401:4900:47f0:b66:c0de:5a07:95f6:8804 prefixlen 64 scopeid 0x0<global> inet6 fe80::1cdd:53e7:d13a:4f52 prefixlen 64 scopeid 0x20<link> inet6 2401:4900:47f0:b66:8021:a91b:4cac:da59 prefixlen 64 scopeid 0x0<global> ether 00:e0:2d:42:0f:cb txqueuelen 1000 (Ethernet) RX packets 6242 bytes 4410516 (4.4 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4207 bytes 625464 (625.4 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Blogger Stats

2022-Mar-12

Previous 7 Days:
24 Hours:
Posts:

2022-Mar-13

24 Hours

7 Days

Alphabets (Read it loud letter by letter)

Ad-serving limit notification

Policy Center Page

Ad serving issues

Popular Posts

Tags: Technology,Web Development,Web Scraping,

Friday, September 3, 2021

Google Sites, Tor Exit Nodes and Captcha



Top 10 Websites

Rank - Website - Monthly Visitors - Country of Origin - Category 1 Google.com 92.5B U.S. Search Engines 2 Youtube.com 34.6B U.S. TV Movies and Streaming 3 Facebook.com 25.5B U.S. Social Networks and Online Communities 4 Twitter.com 6.6B U.S. Social Networks and Online Communities 5 Wikipedia.org 6.1B U.S. Dictionaries and Encyclopedias 6 Instagram.com 6.1B U.S. Social Networks and Online Communities 7 Baidu.com 5.6B China Search Engines 8 Yahoo.com 3.8B U.S. News and Media 9 xvideos.com 3.4B Czech Republic Adult 10 pornhub.com 3.3B Canada Adult ... 41 Walmart.com 718.6M U.S. Marketplace 42 Bilibili.com 686.0M China Animation and Comics 43 Tiktok.com 663.2M China Social Networks and Online Communities 44 Paypal.com 657.2M U.S. Financial Planning and Management 45 Google.de 624.5M Germany Search Engines 46 Amazon.co.jp 619.2M Japan Marketplace 47 Aliexpress.com 611.0M China Marketplace 48 Amazon.de 608.8M Germany Marketplace 49 Rakuten.co.jp 593.4M Japan Marketplace 50 Amazon.co.uk 579.7M United Kingdom Marketplace

Top 50 Websites

Google.com

YouTube

2021-Sep-02
2021-Aug-24

survival8.blogspot.com

Ref: visualcapitalist Tags: Technology,Cyber Security,Web Scraping,

Monday, August 23, 2021

Multifactor Authentication protects you from scrapers and automation tools (Case study of LinkedIn)



1. Linkedin Scraping and Logging Issue with MFA
2. LinkedIn Scraping - Google Auth fails as Google knows that you in an automated environment
Tags: Technology,Cyber Security,Web Scraping,

Sunday, August 8, 2021

Censorship in India, Torrenting and Tribler (Aug 2021)



1: Censorship in India (Part 1)

Your requested URL has been blocked as per the directions received from Department of Telecommunications, Government of India. Please contact administrator for more information. Note: You can still open the link the Tor Browser.

2: Censorship in India (Part 2)

Censorship of trackers (discovered using qBit)

3: Now comes in the Tribler. What's that?

4: Getting the Magnet link for a Torrent download via qBitTorrent

5: Paste the Magnet link from qBit to Tribler

Then wait for Metadata to load.

6: Trackers information in Tribler

7: Peer info in Tribler

8: Caveat: Performance Issue in Tribler

Labels: Cyber Security, Indian Politics, Politics, Technology, Web Development, Web Scraping

Saturday, July 3, 2021

Before you continue to YouTube, Google uses cookies and data to...



The above screeshot was taken from Tor Browser on 20210704. Before you continue to YouTube Google uses cookies and data to: 1. Deliver and maintain services, like tracking outages and protecting against spam, fraud, and abuse 2. Measure audience engagement and site statistics to understand how our services are used If you agree, we’ll also use cookies and data to: 1. Improve the quality of our services and develop new ones 2. Deliver and measure the effectiveness of ads 3. Show personalized content, depending on your settings 4. Show personalized or generic ads, depending on your settings, on Google and across the web For non-personalized content and ads, what you see may be influenced by things like the content you’re currently viewing and your location (ad serving is based on general location). Personalized content and ads can be based on those things and your activity like Google searches and videos you watch on YouTube. Personalized content and ads include things like more relevant results and recommendations, a customized YouTube homepage, and ads that are tailored to your interests. Click “Customize” to review options, including controls to reject the use of cookies for personalization and information about browser-level controls to reject some or all cookies for other uses. You can also visit g.co/privacytools anytime. Labels: Technology,Cyber Security,Web Development,Web Scraping,

Wednesday, June 23, 2021

Tor Browser, Anonymity and Your IP Address



1 - What is my IP Address When I Use Tor Browser
URL: WhatIsMyIPAddress.com
2 - GitHub Security Logs for IP Address on Tor Browser
3 - What is my IP address as reported by Google Search
4 - IP Address Lookup for 'whatIsMyIPAddress.com'
5 - IP Address Lookup for IP Address showed by Google search
Tags: Technology,Cyber Security,Web Development,Web Scraping,GitHub,

This is a Tor Exit Router: 156.146.58.134

Most likely you are accessing this website because you had some issue with the traffic coming from this IP. This router is part of the Tor Anonymity Network, which is dedicated to providing privacy to people who need it most: average computer users. This router IP should be generating no other traffic, unless it has been compromised.

How Tor works

Tor sees use by many important segments of the population, including whistle blowers, journalists, Chinese dissidents skirting the Great Firewall and oppressive censorship, abuse victims, stalker targets, the US military, and law enforcement, just to name a few. While Tor is not designed for malicious computer users, it is true that they can use the network for malicious ends. In reality however, the actual amount of abuse is quite low. This is largely because criminals and hackers have significantly better access to privacy and anonymity than do the regular users whom they prey upon. Criminals can and do build, sell, and trade far larger and more powerful networks than Tor on a daily basis. Thus, in the mind of this operator, the social need for easily accessible censorship-resistant private, anonymous communication trumps the risk of unskilled bad actors, who are almost always more easily uncovered by traditional police work than by extensive monitoring and surveillance anyway.

In terms of applicable law, the best way to understand Tor is to consider it a network of routers operating as common carriers, much like the Internet backbone. However, unlike the Internet backbone routers, Tor routers explicitly do not contain identifiable routing information about the source of a packet, and no single Tor node can determine both the origin and destination of a given transmission.

As such, there is little the operator of this router can do to help you track the connection further. This router maintains no logs of any of the Tor traffic, so there is little that can be done to trace either legitimate or illegitimate traffic (or to filter one from the other). Attempts to seize this router will accomplish nothing.

Furthermore, this machine also serves as a carrier of email, which means that its contents are further protected under the ECPA. 18 USC 2707 explicitly allows for civil remedies ($1000/account plus legal fees) in the event of a seizure executed without good faith or probable cause (it should be clear at this point that traffic with an originating IP address of nyc-exit.privateinternetaccess.com should not constitute probable cause to seize the machine). Similar considerations exist for 1st amendment content on this machine.

If you are a representative of a company who feels that this router is being used to violate the DMCA, please be aware that this machine does not host or contain any illegal content. Also be aware that network infrastructure maintainers are not liable for the type of content that passes over their equipment, in accordance with DMCA "safe harbor" provisions. In other words, you will have just as much luck sending a takedown notice to the Internet backbone providers. Please consult EFF's prepared response for more information on this matter.

For more information, please consult the following documentation:

  1. Tor Overview
  2. Tor Abuse FAQ
  3. Tor Legal FAQ

That being said, if you still have a complaint about the router, you may email the maintainer. If complaints are related to a particular service that is being abused, I will consider removing that service from my exit policy, which would prevent my router from allowing that traffic to exit through it. I can only do this on an IP+destination port basis, however. Common P2P ports are already blocked.

You also have the option of blocking this IP address and others on the Tor network if you so desire. The Tor project provides a web service to fetch a list of all IP addresses of Tor exit nodes that allow exiting to a specified IP:port combination, and an official DNSRBL is also available to determine if a given IP address is actually a Tor exit server. Please be considerate when using these options. It would be unfortunate to deny all Tor users access to your site indefinitely simply because of a few bad apples.

Thursday, May 20, 2021

Censorship in India, Torrenting and Tor Browser



We were looking for a 1997 movie "Good Will Hunting" in ".mp4" format because my Sony Bravia Smart TV does not accept some other video file formats such ".webm" (See more about "webm" in End Note) and see how we solved this problem:

Note: Third link in top 10 results in Google search solved our requirement.

I1 - Good Will Hunting - Google Search
I2 - Magnet Link Site (The Second Link in Google Search Results) but it did not work. Downloaded some garbage video file that did not run in VLC media player.
I3 - The garbage files being downloaded with 'second search result' in qBitTorrent
I4 - The video file fails to open in VLC media player Important Note: Along with video file, a Windows Batch program (.bat file) was also downloaded. This is a very good indication that something is fishy about this magnet link and these files.
I5 - first search result and censorship in India - ytstvmovies.xyz - failed to open in Firefox with Airtel connection
I6 - first search result and censorship in India - ytstvmovies.xyz - failed to open in Chrome with Airtel connection
I7 - first search result and censorship in India - ytstvmovies.xyz - failed to open in Tor with Airtel connection
I8 - third search result and censorship in India - yifytorrentme - failed to open in Firefox with Airtel connection
I9 - third search result and censorship in India - yifytorrentme.com - failed to open in Chrome with Airtel connection
I10 - third search result and censorship overridden with Tor Browser - yifytorrentme.com - opened in Tor with Airtel connection
I11 - yifytorrentme.com - Found the movie using Tor and do check the ads on the site (Do not click any)
I12 - yifytorrentme.com - Download begins by copy-pasting the Magnet link in qBit

End Note

WebM is an audiovisual media file format. It is primarily intended to offer a royalty-free alternative to use in the HTML5 video and the HTML5 audio elements. It has a sister project WebP for images. The development of the format is sponsored by Google, and the corresponding software is distributed under a BSD license. The WebM container is based on a profile of Matroska. WebM initially supported VP8 video and Vorbis audio streams. In 2013, it was updated to accommodate VP9 video and Opus audio. [ Ref 1 ] About WebM WebM is an open, royalty-free, media file format designed for the web. WebM defines the file container structure, video and audio formats. WebM files consist of video streams compressed with the VP8 or VP9 video codecs and audio streams compressed with the Vorbis or Opus audio codecs. The WebM file structure is based on the Matroska container. [ Ref 2 ] [ Ref 3 - YouTube ] Tags: Technology,Cyber Security,Indian Politics,Politics,Web Development,Web Scraping,