Open Source Version of LinkSearch

In the open source version, we expect to combine standard information retrieval techniques with links and link paths. For now, we shall ignore the content of web pages all together and just concentrate on the content of the hypertext links.

By standard techniques, I mean the tf*idf term weights with cosine normalization that is used in so many systems. (For a reference, see any information retrieval book.)

I propose that we apply this standard technique to hypertext links & paths in the following way:

One way to find these paths is to observe that each node must contain a distinct word of Q. So, we simply ensure that each new node contributes a distinct word. For each step, we also see if the path built so far contains all the query words.

The search for paths at query time is not that expensive; for example,the following precomputed function can make things very fast: given a query term t and a target node T, give me all source nodes S that have an incoming link with term t and some edge to node T. You can then use this precomputed function to search backwards for paths whose node profiles contain the query terms. I tried this and it is fast--- at least on the UW CSE web.

Now, these paths induce the set of urls returned by Q.

The urls are ranked as follows. Consider a url U induced by Q.

Let P1, P2, ..., Pk be the set of paths that induce U given query Q.

For each path P_i:

Now,to determine the rank of U, we compute:

sum ( I_i * S_i )
over all
paths P_i
that induce U

The larger the sum, the higher U should be ranked.

Some intuition: Now for the real-time features.

It seems to me that local search engines should be designed quite differently from global ones. In particular, there are different tradeoffs to be made. In local search engines... Specifically, I am thinking of local search engine that works as follows: This approach has several advantages: Comments are welcome in the public forums!
Amir Michail