Pensum: Algorithms for Web Indexing and Searching, Fall 2002

1. [ACGPR01] Searching the Web, Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan. ACM Transactions on Internet Technology, 1, p. 2-43, 2001. All
2. [BP98] The Anatomy of a Large-Scale Hypertextual Web Search Engine. Sergey Brin and Lawrence Page. Proceedings of the 7th International WWW conference, 1998. All
3. [NH01] High-Performance Web Crawling. Marc Najork and Allan Heydon. Compaq SRC Research Report 173 All
4. [HN99] Mercator: A Scalable, Extensible Web Crawler. Allan Heydon and Marc Najork. In World Wide Web, December 1999, pages 219-229. All
5. [NW01] Breadth-First Search Crawling Yields High-Quality Pages. Marc Najork and Janet L. Wiener. In Proceedings of the Tenth Internal World Wide Web Conference, pages 114-118, May, 2001 All
6. [RM02] Search engines and Web dynamics. Knut Magne Risvik and Rolf Michelsen. Computer Networks, Volume 39, Issue 3, 21 June 2002, Pages 289-302. All
7. [CG00] The Evolution of the Web and Implications for an Incremental Crawler. Junghoo Cho and Hector Garcia-Molina. Proceedings of 26th International Conference on Very Large Data Bases (VLDB), 10-14 September 2000, Cairo, Egypt, Pages 200-209. All
8. [MPSR01] Evaluating Topic-Driven Web Crawlers. Filippo Menczer, Gautam Pant, Padmini Srinivasan, and Miguel E. Ruiz. Proceedings SIGIR 01, September 9-12, 2001, New Orleans, Louisiana, USA, Pages 241-249. All
9. [CBD99] Focused crawling: a new approach to topic-specific Web resource discovery. Soumen Chakrabarti, Martin van den Berg, and Byron Dom. Eighth World Wide Web conference, Toronto, 1999. All
10. [RG00] Crawling the Hidden Web. Sriram Raghavan and Hector Garcia-Molina. Stanford Technical Report 2000-36. Full version of paper appearing in proceedings of 27th International Conference on Very Large Data Bases, Rome, Italy, September 11-14, 2001. All
11. [MRYG00] Building a Distributed Full-Text Index for the Web. Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Stanford Technical Report 2000-55. Short version of paper appeared in the proceedings of the 10th International WWW conference, May 2-5, 2001, Hong Kong. All
12. [WMB99] Managing Gigabytes: Compressing and Indexing Documents and Images. Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Morgan Kaufmann Publishing, San Francisco, 1999. ISBN 1-55860-570-3. All
13. [BS97] Fast Algorithms for Sorting and Searching Strings. Jon Bentley and Robert Sedgewick. Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms. New Orleans, January, 1997. Pages 360- 369. All
14. [MM90] Suffix arrays: a new method for on-line string searches. Udi Manber and Gene Myers. Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, 1990. Pages 319 - 327. All
15. [U95] On-Line Construction of Suffix Trees. Esko Ukkonen. Algorithmica, 14, pages 249-260, 1995. All
16. [H99] Efficient Computation of PageRank. Taher H. Haveliwala. Stanford Technical Report 2000-36. All
17. [K99] Authoritative Sources in a Hyperlinked Environment. Jon M. Kleinberg. Journal of the ACM, 46(5), 604-632, 1999. All
18. [CDRRGK98] Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson and Jon Kleinberg. Proceedings of the Seventh International World Wide Web Conference (WWW7). Brisbane, Australia. April 1998. Also appeared in: Computer Networks and ISDN Systems 30, 1998, pp.65-74 All
19. [CDGKRRT98] Experiments in topic distillation. S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia, 1998. ACM. All
20. [BH98] Improved Algorithms for Topic Distillationa in Hyperlinked Environments. Krishna Bharat and Monika R. Henzinger. Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1998, pp. 104-111. All
21. [BB99] Understanding Search Engines: Mathematical Modeling and Text Retrieval. M.W. Berry and M. Browne. SIAM Book Series: Software, Environments, and Tools, (June 1999), ISBN: 0-89871-437-0. Chapter 3
22. [BR99] Modern Information Retrieval. Ricardo Baeza-Yates, Berthier Ribiero-Neto. Addison Wesley Higher Education, 1999. Section 2.5.1-3, 3.2, 4.2-3, 8.4
23. [GT98] Data Structures and Algorithms in Java. M. Goodrich and R. Tamassia, Wiley, First edition, 1998. Pages 660-663
24. [B97] On the resemblance and containment of documents. Andrei Broder. In Compression and Complexity of Sequences (SEQUENCES'97), pages 21-29. IEEE Computer Society, 1998. All
25. [BGNZ97] Syntactic clustering of the Web. Andrei Broder, Steve Glassman, Mark Manasse, and Geoffrey Zweig. In Proceedings of the 6th International World Wide Web Conference, pages 391-404, April 1997, also appeared as SRC Technical Note 1997-015 (HTML). All
26. [B93] Some applications of Rabin's fingerprinting method. Andrei Z. Broder. In R. Capocelli, A. De Santis, U. Vaccaro (eds), Sequences II: Methods in Communications, Security, and Computer Science, Springer-Verlag, 1993. All
27. [HGI00] Scalable Techniques for Clustering the Web. Taher Haveliwala, Aristides Gionis, Piotr Indyk. Third International Workshop on the Web and Databases, 2000. All
28. [DFKVV99] Clustering in large graphs and matrices. P. Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay. Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, 1999. All
29. [BDB94] Using Linear Algebra for Intelligent Information Retrieval. Michael W. Berry and Susan T. Dumais, and Gavin W. O'Brien. December 1994. Published in SIAM Review 37:4 (1995), pp. 573-595. All
30. [AFKMS01] Spectral analysis of data. Yossi Azar, Amos Fiat, Anna R. Karlin, Frank McSherry, and Jared Saia. Proceedings of the 33rd annual ACM symposium on Theory of computing, pp 619-626, 2001. All
31. [BBDH00] A Comparison of Techniques to Find Mirrored Hosts on the WWW. Krishna Bharat, Andrei Broder, Jeffrey Dean, Monika R. Henzinger. Journal of the American Society for Information Science October 2000 Volume 51 Issue 12. Also in 1999 ACM Digital Library Workshop on Organizing Web Space (WOWS). All
32. [GL02] The Web Graph: an Overview. Jean-Loup Guillaume, Matthieu Latapy. In Proc. AlgoTel 2002. All
33. [BKMRRSTW00] Graph structure in the web. Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener. In Proc. Ninth International World Wide Web Conference (WWW9), 2000. All
34. [KRRSTU00] Stochastic models for the Web graph. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Proceedings of the 41th IEEE Symp. on Foundations of Computer Science. November 2000, pp. 57-65. All
35. [L98] Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. D. D. Lewis. Proceedings of the 10th European Conference on Machine Learning (ECML-98), LNAI, Vol. 1398, pp. 4-18, Springer, April 21-23 1998. All
36. [LM01] SALSA: The Stochastic Approach for Link-Structure Analysis. R. Lempel and S. Moran. ACM Transactions on Information Systems 19(2), pp. 131-160, April 2001. Also in 9th International World Wide Web conference, 2000 All
37. [P01] Algorithms, games, and the internet. Christos Papadimitriou.. Proc. 33rd Annual ACM Symposium on Theory of Computing, 2001, pp. 749-753, 2001. All


StudentGerth Stølting BrodalRolf Fagerberg