Brief Project Description

The WebPageScraper is a C# based application used to snarf data out of Web pages and stuff it into an Access database. That said, it uses basic SQL commands and a generalized parser, so it's easily repurposed to scrape a wide range of information and stick it just about anywhere (careful.... :-) ) The original genesis of the application came from a need to obtain large amounts of fundamental financial data for publically traded companies as reported on various public financial information sites. Since then, people have modified the scraper to collect IP information from routers, tried to use it to collect email addresses (I refused input on that one), and one is hoping someone wants to modify the scraper to collect wine pricing. Running a software company in 2003, I've got more of a cheap beer budget these days, but it is an interesting idea.

While C# has it's detractors, for development on Windows platforms, the combination of C# and .Net makes it easy to develop applications quickly and cheaply, and .Net and many of its supporting technologies allow easy use/reuse of other software on the machine, such as native Windows, the Office products, Internet Exploder, and other third party applications. If there are those with a religious opposition to the use of C#, or simply a practical one (C# doesn't work too well on Linux at this time), I'm more than willing to open up another module for a port, or you can open your own SourceForge project.

The navigation links to the left provide access to other descriptive orts and morsels, as well as a slightly longer paper describing the interesting bits of the architecture and how it can be changed to scrape other data and bother other site operators