Les Perras . com

How to Make a Sitemap with google-sitemap_gen on Nginx

  • Tags:

I wanted to make a sitemap for my site to submit to google, but I didn’t want to pay for a sitemap generator, as my site is not a commercial site.

google-sitemap_gen is one of the first resources I found.

The Problem is…

The problem is that it is described for use with apache but I am using the nginx server. A bit more searching lead me to this page and Wing Tang Wong’s answer.

He Said

He said he made a url list by spidering with wget, and writing to wget-log. Then use that source of urls to make the sitemap.xml with the google-sitemap_gen. I donwloaded the sitemap generator package and unzipped it.

Next I modified the config file like he says on the linked page above.

#<?xml version="1.0" encoding="UTF-8"?>


<url  href="http://YOURDOMAIN.com/stats?q=name"  />

<urllist  path="urllist.txt"  encoding="UTF-8"  />

<!-- Exclude URLs that end with a '~'   (IE: emacs backup files)      -->
<filter  action="drop"  type="wildcard"  pattern="*~"           />

<!-- Exclude URLs within UNIX-style hidden files or directories       -->
<filter  action="drop"  type="regexp"    pattern="/\.[^/]*"     />

He spidered with
wget -mk —spider -r -l2 http://YOURDOMAIN.COM/

but I found it worked if I did it like this:

 wget -mk --spider -r -l2 http://YOURDOMAIN.COM/ -o wget-log

Then his next command to makd the urllist worked nicely:

cat wget-log | tr ' ' '\012' | grep "^http" | egrep -vi "[?]|[.]jpg$" | sort -u > urllist.txt

No problems after ensuring that the wget-log file was written and in place. then all that remained was to use the google-sitemap_gen with the command:

python sitemap_gen.py --config=example_config.xml 

And presto! You have a nice sitemap. Just send it to google on their webmaster tools page.