Archive for the ‘Open Sitemap Generator’ Category.

Open Sitemap Generator is a db4o Community Project


We’re glad to announce that Open Sitemap Generator is now a db4o Community Project.
You can access its Project Space here.
Thanks to German for his support.

Mike

Open Sitemap Generator 0.6 released


The version 0.6 is out!
In this new version we’re using the great db4o technology to store all the retrieved URLs.
At the moment it’s only a temporary database that is deleted at the crawling’s end, but this will enable us to add a sitemaps management in a future version.
So the main new features are: splitting big sitemaps and using an index file, added an option to ignore html comments, also an installer version (this software is still fully portable).
All the news about this version and a download link are available at the OSG home page.

Mike

Sitemaps.org updated


The sitemaps site was not updated since November 2006, but it has been updated today, as announced on the Official Google Webmaster Central Blog.
Now the site is available in 18 languages, and the protocol has been updated to let the webmaster add the location of the sitemap in the robot.txt file!
Also, Ask.com is now supporting the sitemaps protocol.
Nothing new at the moment for the development of our Open Sitemap Generator, but we look forward for more news in the near future (and maybe our inclusion in the Google sitemaps third party tools page).

Mike

Open Sitemap Generator 0.5.1 released

We’ve released a bugfix version, the 0.5.1.
We’re looking forward to make a big upgrade to the Mapper core in the next major release, using db4o, with the vision to add a way to make multiple sitemaps with an index, automatically

You can download this version from here.

Mike

Escape a string for xml use in C#

If you need to escape a string to use in a xml file (or stream), you have to escape those entities:

CharacterEscape Code
Ampersand &  &
Single Quote   '
Double Quote   "
Greater Than >  >
Less Than <  &lt;

To achieve this result you could use the SecurityElement.Escape(string str) C# function, but it has a problem.
If your string has some entities already escaped, it escapes them again.
It happens to us testing our sitemaps generator when it finds URLs on a page that are already escaped.
So we’ve developed this function that tests every & character before to escape it.

public string EscapeXmlString(string URL)
{
//Avoid errors if the string is already escaped for xml use
    for (int i = 0; i < URL.Length-1; i++)
    {
        if (URL[i] == ‘&’)
        {
            switch (URL[i + 1])
            {
                case ‘a’:
                    if ((i + 5 < URL.Length) && (URL.Substring(i, 6) == “&apos;”))
                    {
                        continue;
                    }
                    else
                    {
                        if ((i + 4 < URL.Length) && (URL.Substring(i, 5) == “&amp;”))
                        {
                            continue;
                        }
                        else
                        {
                            //Escape it
                            URL = URL.Insert(i+1, “amp;”);
                        }
                    }
                break;
                case ‘q’:
                    if ((i + 5 < URL.Length) && (URL.Substring(i, 6) == “&quot;”))
                    {
                        continue;
                    }
                    else
                    {
                        //Escape it
                        URL = URL.Insert(i+1, “amp;”);
                    }
                break;
                case ‘g’:
                    if ((i + 3 < URL.Length) && (URL.Substring(i, 4) == “&gt;”))
                    {
                        continue;
                    }
                    else
                    {
                        //Escape it
                        URL = URL.Insert(i+1, “amp;”);
                    }
                break;
                case ‘l’:
                    if ((i + 3 < URL.Length) && (URL.Substring(i, 4) == “&lt;”))
                    {
                        continue;
                    }
                    else
                    {
                        //Escape it
                        URL = URL.Insert(i+1, “amp;”);
                    }
                break;
                default://Escape it
                    URL = URL.Insert(i+1, “amp;”);
                    break;
            }
        }
    }
 
    URL = URL.Replace(“‘”, “&apos;”);
    URL = URL.Replace(\”, “&quot;”);
    URL = URL.Replace(“>”, “&gt;”);
    URL = URL.Replace(“<”, “&lt;”);
 
    return URL;
}


Mike