(Unknown). Focused Crawl Project Literature Review — Reference List

Aus Markus' Wiki
Wechseln zu: Navigation, Suche

This is taken from: http://www.ibiblio.org/jewel/foar.old/research/lanl-crawl/FocusedCrawlLitReview.html

1 Focused Crawl Project Literature Review — Reference List

Note: Focused crawling in the ACM DL falls under the H.3.4 classification. For some of the articles below, I searched under "personalization", "information filtering" and "alerts" or "alerts services". Most articles I found via Focused Crawling Reviews (Crimmins, Bergmark) or just worked my way back from the references section of a pertinent article, plus Google searcing.

Aggarwal, C.C., Al-Garawi, F. & Yu, P.S (2001). "Intelligent crawling on the World Wide Web with arbitrary predicates", in Proceedings of the 10th International World Wide Web Conference, Hong Kong, May 2001. http://citeseer.nj.nec.com/aggarwal01intelligent.html.

Angkawattanawit, N. & Rungsawang, A. (2002), "Learnable Crawling: An Efficient Approach to Topic-specific Web Resource Discovery", in the proceedings of the 2nd International Symposium on Communications and Information Technology (ISCIT), 2002. Available http://citeseer.nj.nec.com/angkawattanawit02learnable.html.

Arms, W.Y. (2000), "Automated Digital Libraries: How Effectively Can Computers Be Used for the Skilled Tasks of Professional Librarianship?", D-Lib Magazine, 6(7/8). Available http://www.dlib.org/dlib/july00/arms/07arms.html.

Bergmark, D. (2002), “Background Readings for Collection Synthesis”, bibliography of focused crawls of the web. Available http://citeseer.nj.nec.com/dlrg02background.html.

Bergmark, D. (2002), “Collection Synthesis”, Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 253-262. Available http://doi.acm.org/10.1145/544220.544275.

Bergmark, D., Lagoze, C., & Sbityakov, A. (2002), “Focused Crawls, Tunneling, and Digital Libraries”, in proceedings of the European Conference on Digital Libraries (ECDL) 2002, Rome, September 2002, 91-106. Available http://mercator.comm.nsdlib.org/CollectionBuilding/ECDLpaper.pdf.

Ben-Shaul, I., Herscovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalhaim, M., Soroka, V., & Ur, S. (1999), "Adding support for Dynamic and Focused Search with Fetuccino", in Proceedings of the 8th International World Wide Web Conference, Toronto, May 1999. Available http://www8.org/w8-papers/5a-search-query/adding/adding.html.

Bollacker, K.D., Lawrence, S., Giles, C.L. (1999), "A System For Automatic Personalized Tracking of Scientific Literature on the Web", in the proceedings of the 4th AcM Conference on Digital Libraries, Berkeley, August, 1999. Available http://citeseer.nj.nec.com/bollacker99system.html.

Brin, S. & Page, L. (1998), "The Anatomy of a Large-Scale Hypertextual Web Search Engine", in the proceedings of the Seventh International World Wide Web Conference (WWW7), Brisbane, April 1998, 107-117. Available http://dbpubs.stanford.edu:8090/pub/1998-8.

Chakrabarti, S. (1999), "Recent Results in Automatic Web Resource Discovery", in ACM Computing Surveys (CSUR), 31(4es), No. 17. Available http://doi.acm.org/10.1145/345966.346007.

Chakrabarti, S. (2003), "Mining the Web: Discovering Knowledge from Hypertext Data", Morgan Kaufmann, Boston, 2003. Available in the LANL library, call number QA76.9.D343 C43 2003.

Chakrabarti, S., van der Berg, M., & Dom, B. (1999), "Focused crawling: a new approach to topic-specific Web resource discovery", in Proceedings of the 8th International World Wide Web Conference, Toronto, May 1999. Availalble http://citeseer.nj.nec.com/chakrabarti99focused.html.

Chakrabarti, S., Punera, K. & Subramanyam, M. (2002), "Accelerated Focused crawling through Online Relevance Feedback", in Proceedings of the 11th International World Wide Web Conference, Honolulu, May 2002. Available http://doi.acm.org/10.1145/511446.511466.

Chen, H.,Chau, M., & Zeng, D. (2002), "CI spider: a tool for competitive intelligence on the web", Decision Support Systems 34 (1), 1 - 17. Available http://citeseer.nj.nec.com/chen02ci.html.

Chen, H., Chung, Y., & Ramsey, M. (1998) “A Smart Itsy Bitsy Spider for the Web”, Journal of the American Society for Information Science, 49(7), 604-618. Available http://ai.bpa.arizona.edu/go/intranet/papers/A_Smart-98.pdf.

Cheong, F. (1996), "Internet Agents Spiders, Wanderers, Brokers and Bots", New Riders Publishing, Indianapolis. Available in the LANL library, call number TK5105.875.I57 C435 1996.

Cho, J., Garcia-Molina, H., & Page, L. (1998), "Efficient Crawling Through URL Ordering", in proceedings of the 7th World Wide Web Conference, Brisbane, 1998. http://dbpubs.stanford.edu:8090/pub/1998-51.

Crimmins, F. (2001), "Focused Crawling Review", 2001. Available http://dev.funnelback.com/focused-crawler-review.html.

De Bra P.M.E., Houben G., Kornatzky Y., & Post R. (1994), "Information retrieval in distributed Hypertexts", in the proceedings of the 4th RIAO Conference, New-York, 1994, 481-491. Available http://citeseer.nj.nec.com/debra94information.html.

De Bra, P.M.E. & Post, R.D.J. (1994), "Information Retrieval in the World-Wide Web: Making Client-based Searching Feasible", in Proceedings of the 1st International World Wide Web Conference, 1994. http://citeseer.nj.nec.com/99604.html.

Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., & Gori, M. (2000), "Focused Crawling Using Context Graphs", in the proceedings of the 26th International Conference on Very Large Databases (VLDB), Cairo, 2000, 527-534. Available http://citeseer.nj.nec.com/diligenti00focused.html.

Faensen, L., Faulstich, L., Schweppe, H., Hinze, A., & Steidinger, A. (2001), “Hermes -- A Notification Service for Digital Libraries”, Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 373-380. Available http://doi.acm.org/10.1145/379437.379730.

French, J.C. & Viles, C.L. (1999), “Personalized Information Environments: An Architecture for Customizable Access to Distributed Digital Libraries”, D-Lib Magazine, 5(6). Available http://www.dlib.org/dlib/june99/french/06french.html.

Guernsey, L., (2003), "Digging for Nuggets of Wisdom", in the New York Times, October 16, 2003. Available http://www.clearforest.com/WhatsNew/IntheNewsArticles/Digging_for_Nuggets_of_Wisdom.pdf.

Kennedy, A.R., McGovern, N.Y., Botticelli, P., Entlich, R., Lagoze, C., & Payette, S. (2002), “Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism”, D-Lib Magazine, 8(1). Available http://www.dlib.org/dlib/january02/kenney/01kenney.html.

Kenney A.R., McGovern, N.Y., Martinez, I.T., Heidig, L. (2003), “Google Meets eBay: What Academic Librarians Can Learn from Alternative Information Providers”, D-Lib Magazine, 9(6). Available http://www.dlib.org/dlib/june03/kenney/06kenney.html.

Heydon, A. & Najork, M. (1999), "Mercator: A Scalable, Extensible Web Crawler" World Wide Web 2(4), 219-229, December 1999. Available: http://www.research.compaq.com/SRC/mercator/papers/www/paper.pdf.

Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalhaim, M., & Ur S. (1998), "The Shark-search Algorithm -- An Application: Tailored Web Site Mapping", in proceedings of the 7th World Wide Web Conference, Brisbane, 1998. http://www7.scu.edu.au/programme/fullpapers/1849/com1849.htm

Lagoze, .C. (1997), “From Static to Dynamic Surrogates: Resource Discovery in the Digital Age”, D-Lib Magazine. Available http://www.dlib.org/dlib/june97/06lagoze.html.

Lawrence, S., Bollacker, K., & Giles, C.L. (1999), "Indexing and Retrieval of Scientific Literature", from the proceedings of the Eighth International Conference on Information and Knowledge Management (CIKM), Kansas City, 1999. Available http://citeseer.nj.nec.com/lawrence99indexing.html.

Liu, F., Yu, C., Meng, W. (2002), "Personalized Web Search by Mapping User Queries to Categories", from the proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM), McLean, VA, 2002. Available http://doi.acm.org/10.1145/584792.584884.

Liu, X. (2003), "Bibliography: Focused Web Crawler", (2003). Available http://www.cs.odu.edu/~liu_x/paper/bibtool/html/web_focuscrawler.html.

Liu, X., Brody, T., Harnad, S., Carr, L., Maly, K., Zubair, M. & Nelson, M.L. (2002), “A Scalable Architecture for Harvest-based Digital Libraries: The ODU/Southampton Experiments”, D-Lib Magazine, 8(11). Available http://www.dlib.org/dlib/november02/liu/11liu.html.

Masanès, Julien. (2003), “Towards Continuous Web Archiving: First Results and an Agenda for the Future”, D-Lib Magazine, 8(12). Available http://www.dlib.org/dlib/december02/masanes/12masanes.html.

McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (1999), "Building Domain-Specific Search Engines with Machine Learning Techniques", in proceedings of the AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace. Available http://citeseer.nj.nec.com/mccallum99building.html.

Menczer, F., Pant, G., Srinivasan, P., & Ruiz, M.E. (2001), “Evaluating Topic-Driven Web Crawlers”, Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 241-249. Available http://doi.acm.org/10.1145/383952.383995.

Menczer, F., Street, W.N., Vishwakarma, N., Monge, A.E., & Jakobsson, M. (2002), "IntelliShopper: a proactive, personal, private shopping assistant", from the proceedings of AAMAS: the International Conference on Autonomous Agents and Multiagent Systems, 2002. Available http://www.informatics.indiana.edu/fil/Papers/intellishopper.pdf.

Miller, R.C. & Bharat, K. (1998), "SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers", in the proceedings of the Seventh International World Wide Web Conference (WWW7), Brisbane, April 1998. Available http://decweb.ethz.ch/WWW7/1875/com1875.htm.

Mitchell, S., Mooney, M., Mason, J., Paynter, G. W., Ruscheinski, J., Kedzierski, A., & Humphreys, K. (2003), “iVia Open Source Virtual Library System”, D-Lib Magazine, 9(1). Available http://www.dlib.org/dlib/january03/mitchell/01mitchell.html.

Najork, M., & Wiener, J.L. (2001), "Breadth-first search crawling yields high-quality pages", in the proceedings of the 10th International World Wide Web Conference, May 2001. http://citeseer.nj.nec.com/najork01breadthfirst.html.

Page, L., Brin, S., Motwani, R., & Winograd, T. (1998),"The PageRank Citation Ranking: Bringing Order to the Web", from the Stanford Digital Library Technologies Project. http://dbpubs.stanford.edu:8090/pub/1999-66.

Pinkerton, B. (1994),"Finding What People Want: Experiences with the WebCrawler", in Proceedings of the 1st International World Wide Web Conference, Geneva, 1994. http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/pinkerton/WebCrawler.html.

Raghavan, S. & Garcia-Molina, H. (2001), "Crawling the Hidden Web", in proceedings of the 27th International Conference on Very Large Data Bases, September 2001. Available http://citeseer.nj.nec.com/article/raghavan01crawling.html.

Rauber, A., Aschenbrenner, A., Witvoet, O., Bruckner, R., & Kaiser, M. (2002), “Uncovering Information Hidden in Web Archives: A Glimpse at Web Analysis Building on Data Warehouses”, D-Lib Magazine, 8(12). Available http://www.dlib.org/dlib/december02/rauber/12rauber.html.

Reich, V. & Rosenthal, D.S.H. (2001), “LOCKSS: A Permanent Web Publishing and Access System”, D-Lib Magazine, 7(6). Available http://www.dlib.org/dlib/june01/reich/06reich.html.

Tsoi, A.C., Forsali, D., Gori, M., Hagenbuchner, M., & Scarselli, F. (2003), “A Simple Focused Crawler”, Proceedings of the Twelfth International World Wide Web Conference, Budapest, 2003. Available http://www2003.org/cdrom/papers/poster/p181/p181-tsoi/p181-tsoi.html.

Warnick, W.L. (2000), “First Personalized Alert Service for Preprints”, D-Lib Magazine, 6(12). Available http://www.dlib.org/dlib/december00/12inbrief.html#WARNICK.

Warnick, W.L., Lederman, A., Scott, R.L., Spence, K.J., Johnson, L.A., & Allen, V.S. (2001), “Searching the Deep Web: Directed Query Engine Applications at the Department of Energy”, D-Lib Magazine, 7(1). Available http://www.dlib.org/dlib/january01/warnick/01warnick.html.

2 Focused Crawl Project Literature Review — To Be Read

Chen, C.C., Chen M.C., & Sun, Y. (2001), “PVA: a Self-Adaptive Personal View Agent System”, from the proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,, San Fransico, 2001. Available http://doi.acm.org/10.1145/502512.502548.

Mostafa, J., Mukhopadhyay, S., Palakal, M., & Lam, W. (1997), “A Multilevel Approach to Intelligent Information Filtering: Model, System, and Evaluation”, ACM Transaction on Information Systems (TOIS), 15(4). Available http://doi.acm.org/10.1145/263479.263481.

3 Focused Crawl Project Literature Review — News and General WWW

Better Search Results Than Google?" NEW YORK (AP) -- As wonderful as Internet search engines are, they have a pretty big flaw. They often deliver too much information, and a lot of it isn't quite what we're looking for. Who really bothers to read the dozens of pages of results that Google generates?" http://www.cnn.com/2004/TECH/internet/01/05/seeing.search1.ap/

Bot Spot: "The Spot for All Bots". "BotSpot classifies Bots and Intelligent Agents by subject. Most of the bots you'll find discussed at BotSpot can be downloaded and used on your computer; some require a fee for permanent registration. Others are completely free. Browse through Bots by Category to begin your journey in the brave new world of bots." http://www.botspot.com/

"Going deeper than Google" "...Grokker takes the raw output of a search and organizes it into categories and subcategories. Groxis has put more intelligence into the software this time, so it is not dependent, as it was with Northern Lights, on categories established by others. This means that a wide variety of types of databases can be Grokked-now Grokker can search with six different engines simultaneously -- Yahoo, MSN, Alta Vista, Fast, Teoma, and WiseNet...." http://www.cnn.com/2003/TECH/ptech/12/17/fortune.ff.deeper.google/

Inmagic Alert Tool/Profiler "The Profiler, developed by Trimagic Software, Sydney, Australia, is an automated search alert service that allows individual users or information professionals to establish user profiles that specify the topics of interest to the user. When new materials come into the organization's Inmagic database, Profiler sends the latest relevant information directly to the user, via an e-mail alert." http://www.inmagic.com/solutions/products/profiler/profiler.html