Tuesday, October 13, 2020

Extracting Information From Search Engines


1. Google A Google Search URL looks like: https://www.google.com/search?q=nifty+50&start=0 In "nifty+50", "+" indicates [SPACE]. Pagination with 10 results per page, can be queried using 'start' parameter starting from 0. For first 10, start = 0 For second 10, start = 10 Notes: Google throws in a Captcha on sensing access through code or robots: Error: About this page Prove that you are not a robot by solving this Captcha... Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen? IP address: 34.93.163.15 Time: 2020-10-09T09:23:07Z URL: https://www.google.com/search?q=VERB+RIK&start=0 Ref 1: Guide to the Google Search Parameters Ref 2: RapidAPI Google Search (Free usage allows 300 requests per month. '300 requests per month' means you better look for other free alternative.) 2. Bing A Bing Search URL looks like: https://www.bing.com/search?q=nifty+50&first=21 In "nifty+50", "+" indicates [SPACE]. Pagination with 10 results per page, can be queried using 'first' parameter starting from 1. For first 10, first = 1 For second 10, first = 11 Notes: 2.1. The search engine ranks home pages, not blogs. 2.2. Forums are often ranked low in the search results. 2.3. Sends back empty pages with no results on sensing access through code or robots. 2.4. RapidAPI offers an API for Bing Search but it has '1000 / month quota Hard Limit' on free usage and this means you better look for other free alternatives of RapidAPI. 3. Yahoo By Oct 2020, Yahoo has dropped to the third spot in terms of market share. Its web portal is still popular, and it is said to be the eleventh most visited site according to Alexa. Notes: 3.1. Unclear labeling of ads make it hard to distinguish between organic and non-organic results 3.2. Boasts other services such as Yahoo Finance, Yahoo mail, Yahoo answers, and several mobile apps. 3.3. Search URL for Yahoo can formed using its "p" argument as in this URL: https://in.search.yahoo.com/search?p=nifty50 3.4. RapidAPI is not available for Yahoo. 4. Contextual Web Search Contextual Web says on its Home-page: Search APIs & Custom Search: Easy to use search APIs. Search through billions of webpages with a simple API call and without breaking the bank. Get started in minutes. [ Ref: contextualweb.io ] 'Contextual Web Search' has URL: usearch The 'Contextual Web Search' are available via RapidAPI site. It is a paid service with a 'free and limited' version allowing '500 / day quota Hard Limit (After which usage will be restricted)'. Code for RapidAPI: import requests from time import time import datetime import os import json url = "https://contextualwebsearch-websearch-v1.p.rapidapi.com/api/Search/WebSearchAPI" querystring = { "pageNumber" : "1", "q" : "nifty50", "autoCorrect" : "false", "pageSize" : "100" } headers = { 'x-rapidapi-host': "contextualwebsearch-websearch-v1.p.rapidapi.com", 'x-rapidapi-key': "..." } response = requests.request("GET", url, headers=headers, params=querystring) with open(os.path.join("output", querystring['q'] + "_" + str(datetime.datetime.now()).replace(":", "_") + ".json"), mode = 'w', encoding = 'utf-8') as f: f.write(json.dumps(response.json(), indent=2, sort_keys=True)) The above code produces output in the form of a JSON file in a folder alongside this script named "output" (this should be created before running the script). References % RapidAPI's All API(s)

No comments:

Post a Comment