How to Scrape Mixergy for Audio Content
If you’re a nerd and care about what other nerds (especially about tech companies) have to say, Mixergy is a good source of audio interviews. If you’re too lazy to click on each link and look for mp3 and wanna download the “whole site”, here’s how:
1) get sitemap.xml
2) curl each page in the sitemap and search for “.mp3″
3) download the mp3
OR if you are even lazier than that, here’s the script:
#!/bin/sh
>mixergy_files.txt
for i in `curl -s http://mixergy.com/sitemap.xml | grep “<loc>” | sed -e :a -e ’s/<[^>]*>//g;/</N;//ba’ -e “s/\t//g”`
do
echo $i
curl -s $i | grep “\.mp3″ | sed ’s/^.*<a href=”//’ | sed ’s/”.*$//’ >> mixergy_files.txt
donefor i in `cat mixergy_files.txt`
do
wget $i
done

Recent Comments