Javascript should be an excellent language to write a web crawler in. So why doesn't anyone do it? Is it too easy, too difficult or just too dull.
Well the Javascript web sites are full of scripts that solve problems that surely fall under at least one of those headings. Well leaving such mysteries aside I decided to try to create one. I should point out that I do have a sort of good reason besides intellectual curiosity: I want to make a search engine for my web site and I would like to ensure that it really does search the site and not just look up pages by keyword in a handmade index.
After some experimentation I have finally made a script that can crawl, here are the notes: JsCrawler.
This isn't quite the same idea as most of the existing such programs. First of all this is to be a command line program so that I can automate it but also it has to take a VBP or VBG and generate a web of pages that will sit beside the originals.
In fact it is only the VBP files that need to be processed they can refer directly to the modules.
The visible text of the VBP should be exactly the same as the real VBP so that a simple select all and copy to clipboard will copy the content of the VBP.
Well it didn't turn out quite like that as you can see by browsing the code.
Burning CDs as a means of freeing space is a pain. Burn to The Brim isn't quite as easy as it ought to be and the only competitor I know of is not free.
So what should such a program do:
What should it not do:
Free web hosts generally limit the size of the site that they will host. It oocurs to me that some of this could be circumvented, without violating the spirit of the restriction, by paritioning a web site into several disjoint pieces and having links between them. Then you could sign with a number of different hosts and put part of the site on each.
The problem is the management of the site source. It would obviously be most convenient to create the site as a single tree and not worry about which files should reside on which host then all internal links can be relative to the current file or absolute but with the same root.
What is needed is a program that can take such a site and distribute it automatically. This would mean that many references would have to be rewritten. Or is there another way to do it?
One way would be to always use ECMAScript links instead of static href links. Such links would have to have some way of discovering which host the file was on and would replace the current file with the appropriate file.
Possible methods:
I heartily dislike drag and drop form designers. They seem so easy to use but as soon as you need any fine control they let you down. I always liked the simple containers that the Java user interface libraries have that automatically size their contents and arrange them in columns or rows.
So I have decide to create something like it in C#: CsharpLayout. I originally thought of creating new classes derived from Panel but I think that is is simpler to simply create some functions that take panels and other controls as argument.
Both functions return the panel as their value. This lets us write code that shows the hierrchical structure of the layout. You can then build up the layout piece by piece replacing simple elements with Rows or Columns as you go:
Column(panelTop, splitterH, panelText);
You can compile it and check that it works. Now expand it by adding controls to the top panel:
Column(Row(panelTop, picDotPlot, panelTopRight),
splitterH,
panelText);
<example>
The example is intended as preparation for the use of this idea in
DotPlot.
<example>
Column(Row(panelTop, picDotPlot,
Column(panelTopRight,
buttonGo,
frameDotScaling,
frameParsing),
splitterH,
Row(panelText,
Column(panelText1, comboFile1, textFile1),
Column(panelText2, comboFile2, textFile2));
All the examples assume that the various controls already exist and that all the relevant properties have been set.
Therefore I need a tool to synchronize the site with the local directory.
Write a VB program to scan the local directory tree and write a script for FTP. The script just includes:
The script will produce a list of all the files and directories on the server. Those files that exist in server directories that do not have counterparts on the local disk will not appear but the parent directories (or their ancestors) will.
Compare this list with the list of files and directories on the local disk and write new scripts to:
This project actually exists, look at prjFTPSyncScript.vbp. It uses the Microsoft console based FTP program to do the actual work; this is the only example that I can think of of this kind of interaction between windows based and console based programs in MS Windows, of course such things are perfectly normal in Unix style operating systems.
To do:
I have been familar with the basic dotplot idea for quite a while but for some reason that I can no longer recall I must have thought it too dificult to implement. In fact it is easy to create a small but useable dotplot application. Look at DotPlot for a discussion or dive straight in to dotplot.vbp for the first version. That one actually works, the latest and potentially greatest is current but it might not even compile.