How to Make a Sitemap with google-sitemap_gen on Nginx
Posted 4 years ago
4 years ago
I wanted to make a sitemap for my site to submit to google, but I didn’t want to pay for a sitemap generator, as my site is not a commercial site.
google-sitemap_gen is one of the first resources I found.
The Problem is…
He said he made a url list by spidering with wget, and writing to wget-log. Then use that source of urls to make the sitemap.xml with the google-sitemap_gen. I donwloaded the sitemap generator package and unzipped it.
Next I modified the config file like he says on the linked page above.
#<?xml version="1.0" encoding="UTF-8"?> <site base_url="//YOURDOMAIN.com/" store_into="/var/www/sitemap_gen-1.4/sitemap.xml" verbose="1" ><url href="//YOURDOMAIN.com/stats?q=name" /><url href="//YOURDOMAIN.com/stats?q=age" lastmod="2004-11-14T01:00:00-07:00" changefreq="yearly" priority="0.3"/><urllist path="urllist.txt" encoding="UTF-8" /><!-- Exclude URLs that end with a '~' (IE: emacs backup files) --><filter action="drop" type=wildcard pattern="*~" /><!-- Exclude URLs within UNIX-style hidden files or directories --><filter action="drop" type=regexp pattern="/\.[^/]*" />
He spidered with
wget -mk —spider -r -l2 //YOURDOMAIN.COM/
but I found it worked if I did it like this:
wget -mk --spider -r -l2 //YOURDOMAIN.COM/ -o wget-log
Then his next command to makd the urllist worked nicely:
cat wget-log | tr ' ' '\012' | grep "^http" | egrep -vi "[?]|[.]jpg$" | sort -u > urllist.txt
No problems after ensuring that the wget-log file was written and in place. then all that remained was to use the google-sitemap_gen with the command:
python sitemap_gen.py --config=example_config.xml
And presto! You have a nice sitemap. Just send it to google on their webmaster tools page.