ø Don’t waste your time reading this blog ø

Poor Man’s Google Scrape Technique

Filed under: Uncategorized — Tags: — taewoo @ 6:52 pm March 2, 2009

How to scrape for links on Google using notepad..

This requires cygwin or some sorta shell interpreter with grep/sed capability.

1) Search for whatever you wish.  Make sure you get 100 results per page (under “advanced search”)

2) Copy paste the entire screen

  • Ctrl+A  - this selects all text
  • Ctrl+C - copies text onto buffer
  • Open Notepad (or your fav. text editor), press Ctrl + V - pastes text onto editor
  • Save (i.e. urls.txt).

3) Run this command on the file

 grep “Cached” urls.txt | sed -e ’s/ - .*//g’ -e ’s/\?.*//g’ -e ’s/^      //g’ | sort | unq

 

Voila

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Spam protection by WP Captcha-Free