Research @ Hekkas.Com

Linguistically Enhanced Information Retrieval of Structured Documents

Bibliography

[1]XML representation of a relational database. http://www.w3.org/XML/RDB.html, July 1997.
[2]XML Linking Language (XLink) Version 1.0. http://www.w3.org/TR/xlink/, May 27 2001.
[3]XML Pointer Language (XPointer). http://www.w3.org/TR/xptr/, August 16 2002.
[4]Extensible Markup Language (XML). http://www.w3.org/XML, May 2006.
[5]Extensible Markup Language (XML) 1.0 (Third Edition). http://www.w3.org/TR/REC-xml, May 2006.
[6]HyperText Markup Language (HTML) Home Page. http://www.w3.org/MarkUp, May 2006.
[7]LaTeX ­ A document preparation system. http://www.latex-project.org, May 2006.
[8]PDF Reference. http://partners.adobe.com/public/developer/pdf/index_reference.html, May 2006.
[9]The Extensible Stylesheet Language Family (XSL). http://www.w3.org/Style/XSL, May 2006.
[10]Word Reference Documentation. http://msdn.microsoft.com/office/understanding/word/documentation/default.aspx, May 2006.
[11]XML Schema. http://www.w3.org/XML/Schema, May 2006.
[12]SAX (Simple API for XML). http://sax.sourceforge.net, February 2008.
[13]SAXON ­ The XSLT and XQuery Processor. http://saxon.sourceforge.net, October 2008.
[14]Kjersti Aas and Line Eikvil. Text Categorisation: A Survey. Technical report, Norwegian Computing Center, June 1999.
[15]Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. The Lorel Query Language for Semistructured Data. International Journal on Digital Libraries, 1(1):68­88, 1997.
[16]Steven Abney. Corpus-Based Methods in Language and Speech, chapter Part-of-Speech Tagging and Partial Parsing. Kluwer Academic Publishers, Dordrecht, 1996.
[17]Mohammad Abolhassani and Norbert Fuhr. Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents. In Sharon McDonald and John Tait, editors, Proceedings of the 26th European Conference on Information Retrieval Research (ECIR), volume 2997 of Lecture Notes in Computer Science, Sunderland, UK, April 4­7 2004. University of Sunderland, Springer Verlag.
[18]Mohammad Abolhassani, Norbert Fuhr, Norbert Gövert, and Kai Großjohann. HyREX: Hypermedia Retrieval Engine for XML. Technical report, University of Dortmund, Department of Computer Science, Dortmund, Germany, 2002.
[19]James F. Allen. Natural Language Understanding. The Benjamin/Cummings Publishing Company, Inc., Redwood City, CA, USA, second edition, 1994.
[20]Ofer Arazy and Carson Woo. Enhancing Information Retrieval through Statistical Natural Language Processing: A Study of Collocation Indexing. Management Information Systems Quarterly (MIS), 31(Issue 3):525­546, September 3 2007.
[21]Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, ACM Press, New York, Essex, England, 1999.
[22]Peter Bailey and David Hawking. A Parallel Architecture For Query Processing over a Terabyte of Text. Technical Report TR-CS-96-04, Department of Computer Science, Australian National University, Canberra 0200 ACT, Australia, 1996.
[23]Francois Bancilhon, Gilles Barbedette, Véronique Benzaken, Claude Delobel, Sophie Gamerman, Christophe Lécluse, Patrick Pfeffer, Philippe Richard, and Fernando Velez. The Design and Implementation of O2, an Object-Oriented Database System. pages 1­22, New York, NY, USA, 1988. Springer-Verlag New York, Inc.
[24]Francisco-Mario Barcala, Jesús Vilares Ferro, Miguel A. Alonso, Jorge Graña, and Manuel Vilares. Tokenization and Proper Noun Recognition for Information Retrieval. In A. Min Tjoa and Roland R. Wagner, editors, Proceedings of the 13th International Workshop on Database and Expert Systems Applications (DEXA), pages 246­250, Washington, DC, USA, September 2-6 2002. IEEE, IEEE Computer Society Press.
[25]David T. Barnard, Gwen Clarke, and Nicholas Duncan. Tree-to-Tree Correction for Document Trees. Technical Report 95-372, Department of Computing and Information Science, Queen's University, 1995.
[26]Philip Bille. A Survey on Tree Edit Distance and Related Problems. Theoretical Computer Science, 337(1-3):217­239, 2005.
[27]Patrick Billingsley. Probability and Measure. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, Inc., 1979.
[28]Scott Boag, Don Chamberlin, Mary F. Fernández, Daniela Florescu, Jonathan Robie, and Jérôme Siméon. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery, January 23 2007.
[29]Angela Bonifati and Stefano Ceri. Comparative Analysis of Five XML Query Languages. SIGMOD Record, 29(1):68­79, 2000.
[30]Abdelhamid Bouchachia and Marcus Hassler. Classification of XML Documents. In IEEE Symposium Series on Computational Intelligence (SSCI), Computational Intelligence and Data Mining (CIDM), pages 390­396. Honolulu, Hawaii, United States, IEEE Computational Intelligence Society, April 1-5 2007.
[31]Thorsten Brants. TnT ­ A Statistical Part-Of-Speech Tagger. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP), April 29 ­ May 3 2000.
[32]Thorsten Brants. Natural Language Processing in Information Retrieval. In Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands (CLIN), pages 1­13, Antwerp, The Netherlands, December 19 2003.
[33]Andrej Bratko and Bogdan Filipi¨c. Exploiting Structural Information in Semi-Structured Document Classification. In Proceedings of the 13th International Electrotechnical and Computer Science Conference (ERK), 2004.
[34]Eric Brill. A Simple Rule-based Part of Speech Tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (ANLP), pages 152­155, Morristown, NJ, USA, 1992. Laboratory for Computer Science, Massachusetts Institute of Technology, Association for Computational Linguistics.
[35]Eric Brill. A Corpus-Based Approach to Language Learning. PhD thesis, University of Pennsylvania, Philadelpha, PA, USA, December 1 1993.
[36]Eric Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI), volume 1, pages 722­727, Menlo Park, CA, USA, 1994. Laboratory for Computer Science, Massachusetts Institute of Technology, American Association for Artificial Intelligence.
[37]Eric Brill. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, 21(4):543­565, 1995.
[38]Eric Brill. Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging. In David Yarovsky and Kenneth Church, editors, Proceedings of the 3rd Workshop on Very Large Corpora (WVLC), pages 1­13, Somerset, New Jersey, 1995. Association for Computational Linguistics.
[39]Eric Brill. Part-Of-Speech Tagging, pages 403­414. Volume 1 of Dale et al. [61], 2000.
[40]Horst Bunke and Abraham Kandel. Mean and Maximum Common Subgraph of Two Graphs. Pattern Recognigion Letters, 21(2):163­168, 2000.
[41]Forbes J. Burkowski. Retrieval Activities in a Database Consisting of Heterogeneous Collections of Structured Text. In Nicholas J. Belkin, Peter Ingwersen, and Annelise Mark Pejtersen, editors, Proceedings of the 15th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 112­125, New York, NY, USA, June 1992. ACM.
[42]James P. Callan. Passage-Level Evidence in Document Retrieval. In W. Bruce Croft and C. J. van Rijsbergen, editors, Proceedings of the 17th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 302­310, Dublin, Ireland, July 1994. Springer-Verlag New York, Inc.
[43]Laurent Candillier, Isabelle Tellier, and Fabien Torre. Transforming XML Trees for Efficient Classification and Clustering. In Fuhr et al. [90], pages 469­480.
[44]David Carmel, Einat Amitay, Miki Herscovici, Yoëlle Maarek, Yael Petruschka, and Aya Soffer. Juru at TREC 10 ­ Experiments with Index Pruning. In Proceedings of the 10th NIST Text Retrieval Conference (TREC), pages 228­237, Haifa 31905, Israel, November 2001. Haifa IBM Labs.
[45]David Carmel, Nadav Efrati, Gad M. Landau, Yoelle S. Maarek, and Yosi Mass. An Extension of the Vector Space Model for Querying XML Documents via XML Fragments. In Ricardo Baeza-Yates, Norbert Fuhr, and Yoelle S. Maarek, editors, Proceedings of the SIGIR 2002 Workshop on XML and Information Retrieval, 15. August 2002.
[46]Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne Jekat, Ralf Klabunde, and Hagen Langer, editors. Computerlinguistik und Sprachtechnologie ­ Eine Einführung. Elsevier GmbH, Spektrum Akademischer Verlag, second edition, 2004.
[47]Sudarshan S. Chawathe. Comparing Hierarchical Data in External Memory. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pages 90­101, 1999.
[48]Sudarshan S. Chawathe and Hector Garcia-Molina. Meaningful Change Detection in Structured Data. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 26­37, 1997.
[49]Sudarshan S. Chawathe, Anand Rajaraman, Hector Garcia-Molina, and Jennifer Widom. Change Detection in Hierarchically Structured Information. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 493­504, 1996.
[50]Yves Chiaramella. Information Retrieval and Structured Documents. Lecture Notes on Computer Science (LNCS), 1980:286­309, 2001.
[51]Yves Chiaramella, Philippe Mulhem, and Franck Fourel. A Model for Multimedia Information Retrieval. Technical Report FERMI ESPRIT BRA 8134, University of Glasgow, July 4 1996.
[52]Heting Chu. Information Representation and Retrieval in the Digital Age. Thomas H. Hogan, Sr. for the American Society for Information Science and Technology, second edition, 2005.
[53]James Clark and Steve DeRose. XML Path Language (XPath). http://www.w3.org/TR/xpath, November 1999.
[54]Grégory Cobéna, Serge Abiteboul, and Amélie Marian. Detecting Changes in XML Documents. In Proceedings of the 18th International Conference on Data Engineering (ICDE), San Jose, CA, 2002.
[55]Ronald Cole, Joseph Mariani, Hans Uszkoreit, Giovanni Battista Varile, Annie Zaenen, and Antonio Zampolli, editors. Survey of the State of the Art in Human Language Technology. Cambridge University Press and Giardini, Center for Spoken Language Understanding CSLU, Carnegie Mellon University, Pittsburgh, PA., web edition edition, 1997.
[56]Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. Perfomance Evaluation of the VF Graph Matching Algorithm. In Proceedings of the 10th Internationall Conference on Image Analysis and Processing (ICIAP), volume 2, pages 1172­1177, Washington, DC, USA, 1999. IEEE Computer Society.
[57]Gianni Costa, Giuseppe Manco, Riccardo Ortale, and Andrea Tagarelli. A Tree-Based Approach to Clustering XML Documents by Structure. In Jean-Francois Boulicaut, Dino Pedreschi, Floriana Esposito, and Fosca Giannottl, editors, Knowledge Discovery in Databases: Proceeding of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), volume 3202, pages 137­148, Pisa, Italy, September 20-24 2004. Springer Lecture Notes in Computer Science (LNCS).
[58]W. Bruce Croft and Jinxi Xu. Corpus-Specific Stemming using Word Form Co-occurence. In Proceedings of the 4th Symposium on Document Analysis and Information Retrieval, pages 147­159, Las Vegas, Nevada, April 1995.
[59]Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. A Practical Part-of-Speech Tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (ANLP), pages 133­140, Morristown, NJ, USA, 1992. Association for Computational Linguistics.
[60]Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, and Timos Sellis. A Methodology for Clustering XML Documents by Structure. Information Systems, 31(3):187­228, 2006.
[61]Robert Dale, Hermann Moisl, and Harold Somers, editors. Handbook of Natural Language Processing, volume 1. Marcel Dekker, Inc., 2000.
[62]John L. Dawson. Suffix Removal and Word Conflation. ALLC Bulletin, 2(3):33­46, 1974.
[63]A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via de EM Algorithm. The Journal of Royal Statistical Society, 39:1­38, 1977.
[64]Ludovic Denoyer and Patrick Gallinari. Bayesian Network Model for Semi-Structured Document Classification. Information Processing and Management, Pergamon Press, Inc., Tarrytown, NY, USA, 40(5):807­827, 2004.
[65]Ludovic Denoyer, Patrick Gallinari, and Anna-Marie Vercoustre. XML Mining Challenge at INEX 2005. Technical report, University of Paris VI, INRIA, 2006.
[66]Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. XML-QL: A Query Language for XML.
[67]Alin Deutsch, Mary F. Fernandez, Daniela Florescu, Alon Y. Levy, David Maier, and Dan Suciu. Querying XML Data. IEEE Data Engineering Bulletin, 22(3):10­18, 1999.
[68]Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley and Sons, Inc., second edition, 2001.
[69]Christos Faloutsos and Douglas W. Oard. A Survey of Information Retrieval and Filtering Methods. Technical Report CS-TR-3514, University of Maryland at College Park, College Park, MD, USA, 1995.
[70]Mary Fernández, Daniela Florescu, Jaewoo Kang, Alon Levy, and Dan Suciu. Catching the Boat with Strudel: Experiences with a Web-site Management System. In Proceedings of the 1998 ACM International Conference on Management of Data (SIGMOD), pages 414­425, New York, NY, USA, 1998. ACM.
[71]Sergio Flesca, Giuseppe Manco, Elio Masciari, and Luigi Pontieri. Fast Detection of XML Structural Similarity. IEEE Transactions on Knowledge and Data Engineering, 17(2):160­175, 2005.
[72]Günther Fliedl. Natürlichkeitstheoretische Morphosyntax ­ Aspekte der Theorie und Implementierung. Gunter Narr Verlag (GNV), Dischingerweg 5, D-72070 Tübingen, 1999.
[73]Daniela Florescu and Donald Kossmann. A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical report, May 1999.
[74]Richard Foster. Document Clustering in Large German Corpora Using Natural Language Processing. PhD thesis, University of Zurich, 2006.
[75]Christopher Fox. Information Retrieval: Data Structures and Algorithms, chapter Lexical Analysis and Stoplists, pages 102­130. Prentice-Hall, Inc., 1992.
[76]Christopher Fox. A Stop List for General Text. SIGIR Forum, 24(1-2):19­21, r 90.
[77]William Bill Frakes and Ricardo A. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ, USA, 1992.
[78]Francesco De Francesca, Gianluca Gordano, Riccardo Ortale, and Andrea Tagarelli. Distance-Based Clustering of XML Documents. In Luc De Raedt and Takashi Washio, editors, Proceedings of the 1st International Workshop on Mining Graphs, Trees and Sequences (MGTS), pages 75­78. ECML/PKDD'03 Workshop Proceedings, September 2003.
[79]Norbert Fuhr and Norbert Gövert. Index Compression vs. Retrieval Time of Inverted Files for XML Documents. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM), pages 662­664, New York, NY, USA, 2002. ACM Press.
[80]Norbert Fuhr and Norbert Gövert. Index Compression vs. Retrieval Time of Inverted Files for XML Documents. Technical report, University of Dortmund, 2002.
[81]Norbert Fuhr, Norbert Gövert, and Kai Großjohann. HyREX: Hyper-Media Retrieval Engine for XML. In Kalervo Järvelin, Micheline Beaulieu, Ricardo Baeza-Yates, and Sung Hyon Myaeng, editors, Proceedings of the 25th ACM SIGIR International Conference on Research and Development in Information Retrieval, page 449, New York, NY, USA, 2002. ACM.
[82]Norbert Fuhr, Norbert Gövert, Gabriella Kazai, and Mounia Lalmas, editors. Advances in XML Information Retrieval and Evaluation, Proceedings of the 1st International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), ERCIM Workshop Proceedings, Sophia Antipolis, France, December 9­11 2002. ERCIM.
[83]Norbert Fuhr, Norbert Gövert, and Thomas Rölleke. DOLORES: A System for Logic-Based Retrieval of Multimedia Objects. In Proceedings of the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 257­265, New York, NY, USA, 1998. ACM Press.
[84]Norbert Fuhr and Kai Großjohann. XIRQL ­ An Extension of XQL for Information Retrieval. In Ricardo Baeza-Yates, Norbert Fuhr, Ron Sacks-Davis, and Ross Wilkinson, editors, Proceedings of the SIGIR 2000 Workshop on XML and Information Retrieval, Athens, Greece, July 28, 2000 2000. ACM.
[85]Norbert Fuhr and Kai Großjohann. XIRQL: A Query Language for Information Retrieval in XML Documents. In Kraft et al. [157], pages 172­180.
[86]Norbert Fuhr and Kai Großjohann. XIRQL: An XML Query Language Based on Information Retrieval Concepts. ACM Transactions on Information Systems, 22(2):313­356, 2004.
[87]Norbert Fuhr, Kai Großjohann, and Sasha Kriewel. A Query Language and User Interface for XML Information Retrieval, volume 2818 of Lecture Notes in Computer Science, pages 59­75. Springer, Heidelberg, 2003.
[88]Norbert Fuhr and Mounia Lalmas, editors. Advances in XML Information Retrieval and Evaluation, Proceedings of the 5th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), ERCIM Workshop Proceedings, Sophia Antipolis, France, December 17­19 2006. ERCIM Springer LNCS.
[89]Norbert Fuhr, Mounia Lalmas, and Saadia Malik, editors. Advances in XML Information Retrieval and Evaluation, Proceedings of the 2nd International Workshop of the INitiative for the Evaluation of XML Retrieval (INEX), ERCIM Workshop Proceedings, Sophia Antipolis, France, December 15­17 2003. ERCIM Springer LNCS.
[90]Norbert Fuhr, Mounia Lalmas, Saadia Malik, and Gabriella Kazai, editors. Advances in XML Information Retrieval and Evaluation, Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), volume 3977 of ERCIM Workshop Proceedings, Sophia Antipolis, France, November 28­30 2005. Dagstuhl Castle, Germany, ERCIM Springer Lecture Notes in Computer Science (LNCS), Springer-Verlag GmbH.
[91]Norbert Fuhr, Mounia Lalmas, Saadia Malik, and Gabriella Zoltan Szlavik, editors. Advances in XML Information Retrieval and Evaluation, Proceedings of the 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), volume 3493 of ERCIM Workshop Proceedings, Sophia Antipolis, France, December 06­08 2004. ERCIM Springer LNCS.
[92]Norbert Fuhr, Mounia Lalmas, and Andrew Trotman, editors. Advances in XML Information Retrieval and Evaluation, Proceedings of the 6th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), ERCIM Workshop Proceedings, Sophia Antipolis, France, December 17­19 2007. ERCIM Springer LNCS.
[93]Norbert Fuhr, Saadia Malik, and Mounia Lalmas. Overview of the INitiative for the Evaluation of XML Retrieval (INEX) 2003. In Fuhr et al. [89], pages 1­18.
[94]Norbert Fuhr and Thomas Rölleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM Transactions on Information Systems, 15(1):32­66, 1997.
[95]Norbert Fuhr and Thomas Rölleke. HySpirit ­ A Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases. In Proceedings of the 6th International Conference on Extending Database Technology (EDBT), pages 24­38, Heidelberg et al., 1998. Springer.
[96]Shlomo Geva, Marcus Hassler, and Xavier Tannier. XOR ­ XML Oriented Retrieval Language. In Andrew Trotman and Shlomo Geva, editors, Proceedings of the ACM SIGIR Workshop on XML Element Retrieval Methodology, pages 5­12, Seattle, WA, USA, August 10 2006. ACM Press, New York City, NY, USA.
[97]Joydeep Ghosh. The Handbook of Data Mining, chapter Scalable Clustering, pages 247­277. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, USA, 2003.
[98]Emmanuel Giguet. The Stakes of Multilinguality: Multilingual Text Tokenization in Natural Language Diagnosis. In Proceedings of the 4th Pacific Rim International Conference on Artificial Intelligence Workshop Future issues for Multilingual Text Processing, Cairns, Australia, August 27 1996.
[99]Kevin Glass and Shaun Bangay. Evaluating Parts-Of-Speech Taggers for Use in a Text-to-Scene Conversion System. In Proceedings of the 2005 Research Conference of the South African Institute of Computer Scientists and Information Technologists on IT Research in Developing Countries (SAICSIT), pages 20­28, Republic of South Africa, September 2005. South African Institute for Computer Scientists and Information Technologists.
[100]Marco Gori, Marco Maggini, and Lorenzo Sarti. Exact and Approximate Graph Matching Using Random Walks. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27(7):1100­1111, 2005.
[101]Norbert Gövert. Bilingual Information Retrieval with HyREX and Internet Translation Services. In Carol Peters, editor, Workshop on Cross-Language Information Retrieval and Evaluation, volume 2069 of LNCS, pages 237­244, Heidelberg, 2001. Springer-Verlag, Lecture Notes in Computer Science.
[102]Norbert Gövert, Norbert Fuhr, Mohammad Abolhassani, and Kai Großjohann. Content-Oriented XML Retrieval with HyREX. In Fuhr et al. [82], pages 26­32.
[103]Norbert Gövert and Gabriella Kazai. Overview of the INitiative for the Evaluation of XML Retrieval (INEX) 2002. In Fuhr et al. [82], pages 1­17.
[104]Torsten Grabs, Klemens Böhm, and Hans-Jörg Schek. XMLTM: Efficient Transaction Management for XML Documents. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM), pages 142­152, New York, NY, USA, 2002. ACM.
[105]Torsten Grabs and Hans-Jörg Schek. ETH Zürich at INEX: Flexible Information Retrieval from XML with PowerDB-XML. In Fuhr et al. [82], pages 141­148.
[106]Torsten Grabs and Hans-Jörg Schek. Generating Vector Spaces On-the-Fly for Flexible XML Retrieval. In Proceedings of the XML and Information Retrieval Workshop - 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4­13, Tampere, Finland, August 2002. ACM Press.
[107]Gregory Grefenstette and Pasi Tapanainen. What is a word, What is a Sentence? Problems of Tokenization. In Proceedings of the 3rd Conference on Computational Lexicography and Text Research (COMPLEX), pages 79­87. Xerox Research Centre Europe, MLTT, 1994.
[108]Kai Grosjohann, Norbert Fuhr, Daniel Effing, and Sasha Kriewel. Query Formulation and Result Visualization for XML Retrieval. In Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval. ACM, 2002.
[109]Kai Großjohann, Norbert Fuhr, Daniel Effing, and Sasha Kriewel. A User Interface for XML Document Retrieval. In Informatik bewegt: Informatik 2002 - 32. Jahrestagung der Gesellschaft für Informatik e.v. (GI), Informatik 2002, pages 166­170. Springer GI, Heidelberg, 2002.
[110]David A. Grossman and Ophir Frieder. Information Retrieval - Algorithms and Heuristics. Springer, second edition, 2004.
[111]David A. Grossman, Ophir Frieder, David O. Holmes, and David C. Roberts. Integrating Structured Data and Text: A Relational Approach. J. Am. Soc. Inf. Sci., 48(2):122­132, 1997.
[112]Torsten Grust. Accelerating XPath Location Steps. In Proceedings of the 2002 ACM International Conference on Management of Data (SIGMOD), pages 109­120. ACM Press, 2002.
[113]Sudipto Guha, H. V. Jagadish, Nick Koudas, Divesh Srivastava, and Ting Yu. Approximate XML Joins. In Proceedings of the 2002 ACM International Conference on Management of Data (SIGMOD), pages 287­298, New York, NY, USA, 2002. ACM Press.
[114]Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th International Conference on Data Engineering (ICDE), pages 512­521, Washington, DC, USA, 1999. IEEE Computer Society.
[115]Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems, 25(5):345­366, 2000.
[116]Jin Guo. Critical Tokenization and its Properties. Computational Linguistics, 23(4):569­596, 1997.
[117]Jin Guo. One Tokenization Per Source. In Christian Boitet and Pete Whitelock, editors, Proceedings of the 36th Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, volume 1, pages 457­463, Morristown, NJ, USA, 1998. Association for Computational Linguistics.
[118]Markus Hagenbuchner, Alessandro Sperduti, Ah Chung Tsoi, Francesca Trentini, Franco Scarselli, and Marco Gori. Clustering XML Documents Using Self-Organizing Maps for Structures. In Fuhr et al. [90], pages 481­496.
[119]David J. Hand, Padhraic Smyth, and Heikki Mannila. Principles of Data Mining. MIT Press, Cambridge, MA, USA, 2001.
[120]Marcus Hassler and Abdelhamid Bouchachia. Searching XML Documents ­ Preliminary Work. In Fuhr et al. [90], pages 119­133.
[121]Marcus Hassler, Abdelhamid Bouchachia, and Roland Mittermeir. Classification of XML Documents. International Journal of Information Technology and Intelligent Computing (Int. J. IT&IC), 2(4):26, November 2007.
[122]Marcus Hassler and Günther Fliedl. Text Preparation through Extended Tokenization. In Alessandro Zanasi, Carlos A. Brebbia, and Nelson F.F. Ebecken, editors, Data Mining VII: Data, Text and Web Mining and their Business Applications, volume 37, pages 13­21. Prague, Czech Republic, WIT Press, Wessex Institute of Technology, July 11-13 2006.
[123]Marcus Hassler, Christian Hofbauer, and Günther Fliedl. The Klagenfurt Computer Linguistic Resource Portal. http://clr.uni-klu.ac.at, September 2006.
[124]Marcus Hassler and Franz Kollmann. Secure Management of Structured Documents. In Veljko Milutinovic, editor, Proceedings of the International Conference on Advances in the Internet, Processing, Systems, and Interdisciplinary Research (IPSI), page 18. IPSI Belgrade, Academic Mind, November 10­13 2005.
[125]Kenji Hatano, Hiroko Kinutani, Masahiro Watanabe, Masatoshi Yoshikawa, and Shunsuke Uemura. Determining the Unit of Retrieval Results for XML Documents. In Fuhr et al. [82], pages 57­64.
[126]David Hawking. PADRE ­ A Parallel Document Retrieval Engine. In Proceedings of the 3rd Fujitsu Parallel Computing Workshop, Kawasaki, Japan, November 1994.
[127]David Hawking. The Design and Implementation of a Parallel Document Retrieval Engine. Technical Report TR-CS-95-08, Department of Computer Science, Australian National University, 1995.
[128]David Hawking. Document Retrieval in OCR-Scanned Text. In Proceedings of the 6th Parallel Computing Workshop, pages P2­F, Kawasaki, Japan, November 1996. paper P2-F.
[129]David Hawking. PADRE for COWs. In Paul Mackerras, editor, Proceedings of the 7th Parallel Computing Workshop, Canberra, Australia, September 1997. Department of Computer Science, ANU.
[130]David Hawking. Scalable Text Retrieval for Large Digital Libraries. In Proceedings of the 1st European Conference on Research and Advanced Technology for Digital Libraries, pages 127­145. Springer-Verlag, 1997.
[131]David Hawking and Peter Bailey. PADRE v. 2.4 User Manual. Department of Computer Science, Australian National University, August 28 1996.
[132]David Hawking, Peter Bailey, and David Campbell. A Parallel Document Retrieval Server for the World Wide Web. In Proceedings of the Australian Document Computing Symposium, pages 73­78, Melbourne, Australia, March 1996.
[133]David Hawking, Peter Bailey, David Campbell, Paul B. Thistlewaite, and Andrew Tridgell. A PADRE in MUFTI (A Multi User Free Text retrieval Intermediary). In Proceedings of the 4th Parallel Computing Workshop, pages 75­84, London, England, September 1995.
[134]David Hawking, Peter Bailey, and Nick Craswell. Efficient and Flexible Search Using Text and Metadata. Technical report, CSIRO Mathematical and Information Sciences, May 2000.
[135]David Hawking, Paul Thistlewaite, and Peter Bailey. ANU/ACSys TREC-5 Experiments. In D. K. Harman, editor, Proceedings of the 5th International Text Retrieval Conference (TREC), pages 275­290, Gaithersburg, MD, February 1 1997. U.S. National Institute of Standards and Technology.
[136]David Hawking and Paul B. Thistlewaite. Searching for Meaning with the Help of a PADRE. In D. K. Harmann, editor, Proceedings of the 3rd International Text Retrieval Conference (TREC), pages 257­267, Gaithersburg, MD, November 1994.
[137]Djoerd Hiemstra. A Database Approach to Content-Based XML Retrieval. In Fuhr et al. [82], pages 111­118.
[138]Adel Hlaoui and Shengrui Wang. A New Algorithm for Inexact Graph Matching. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR), volume 4, 2002.
[139]Adel Hlaoui and Shengrui Wang. A New Median Graph Algorithm. In Edwin R. Hancock and Mario Vento, editors, Proceedings of the 4th IAPR International Workshop on Graph Based Representations in Pattern Recognition (GbRPR), volume 2726 of Lecture Notes in Computer Science, pages 225­234, York, UK, June 30 - July 2 2003. Springer.
[140]Xiuzhen Huang and Jing Lai. Maximum Common Subgraph: Upper Bound and Lower Bound Results. First International Multi-Symposium of Computer and Computational Sciences (IMSCCS), 1:40­47, 2006.
[141]Peter Jackson and Isabelle Moulinier. Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorisation. John Benjamins Publishing Company, Amsterdam, Netherlands, Wolverhampton, United Kingdom, 2002.
[142]Hosagrahar V. Jagadish, Shurug A. Al-Khalifa, Adriane Chapman, Laks V. S. Lakshmanan, Andrew Nierman, Stylianos Paparizos, Jaqdish Himatlal Patel, Divesh Srivastava, Nuwee Wiwatwattana, Yuqing Wu, and Cong Yu. TIMBER: A Native XML Database. The VLDB Journal, 11(4):274­291, April 2002.
[143]Kalervo Järvelin and Jaana Kekäläinen. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems, 20(4):2002, 2002.
[144]Tao Jiang, Lusheng Wang, and Kaizhong Zhang. Alignment of Trees ­ An Alternative to Tree Edit. In Journal on Theoretical Computer Science, volume 143, pages 137­148, 1995.
[145]Jaap Kamps, Maarten de Rijke, and Börkur Sigurbjörnsson. Length Normalization in XML Retrieval. In Proceedings of the 27th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 80­87, New York, NY, USA, 2004. ACM Press.
[146]Jaap Kamps, Maarten Marx, Maarten de Rijke, and Börkur Sigurbjörnsson. XML Retrieval: What to Retrieve? In Proceedings of the 26th ACM SIGIR International Conference on Research and Development in Informaion Retrieval, pages 409­410, New York, NY, USA, 2003. ACM.
[147]Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A Local Search Approximation Algorithm for k-Means Clustering. In Proceedings of the 18th ACM Symposium on Computational Geometry (SCG), pages 10­18, New York, NY, USA, 2002. ACM Press.
[148]Marcin Kaszkiel and Justin Zobel. Passage Retrieval Revisited. In Proceedings of the 20th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 178­185, Philadelphia, July 1997. ACM Press.
[149]Gabriella Kazai and Mounia Lalmas. INEX 2005 Evaluation Measures. In Fuhr et al. [90], pages 16­29.
[150]Gabriella Kazai and Mounia Lalmas. Notes on What to Measure in INEX. In Andrew Trotman, Mounia Lalmas, and Norbert Fuhr, editors, Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, Second Edition, pages 22­38. University of Glasgow, Glasgow, Scotland, July 30 2005.
[151]Gabriella Kazai, Mounia Lalmas, and Thomas Rölleke. Focussed Structured Document Retrieval. In Proceedings of the 9th International Symposium on String Processing and Information Retrieval (SPIRE), volume 2476, pages 241­247, London, UK, 2002. Springer-Verlag.
[152]Gabriella Kazai and Thomas Roelleke. A Scalable Architecture for XML Retrieval. In Fuhr et al. [82], pages 49­56.
[153]Latifur Khan and Yan Rao. A Performance Evaluation of Storing XML Data in Relational Database Management Systems. In Proceedings of the 3rd International Workshop on Web Information and Data Management (WIDM), pages 31­38, New York, NY, USA, 2001. ACM.
[154]Pekka Kilpeläinen and Heikki Mannila. Ordered and Unordered Tree Inclusion. Society for Industrial and Applied Mathematics (SIAM) Journal on Computing, 24(2):340­356, 1995.
[155]Donald Knuth. The Art of Programming, volume 1-3. Addison-Wesley, 1968.
[156]Evangelos Kotsakis. Structured Information Retrieval in XML Documents. In Proceedings New York, NY, USA, 2002. ACM Press.
[157]Donald H. Kraft, Bruce W. Croft, David J. Harper, and Justin Zobel, editors. Proceedings of the 24th ACM SIGIR International Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, United States, September 9-13 2001. ACM Press.
[158]Robert Krovetz. Viewing Morphology as an Inference Process. In Proceedings of the 16th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 191­202, New York, NY, USA, 1993. ACM.
[159]Robert Jeffrey Krovetz and W. Bruce Croft. Word Sense Disambiguation Using Machine-Readable Dictionaries. SIGIR Forum, 23(SI):127­136, 1989.
[160]Daniel T. Larose. Discovering Knowledge in Data ­ An Introduction to Data Mining. John Wiley and Sons Inc, November 2004.
[161]Ray R. Larson. Cheshire II at INEX: Using a Hybrid Logistic Regression and Boolean Model for XML Retrieval. In Fuhr et al. [82], pages 18­25.
[162]Vladimir I. Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. In Soviet Physics Doklady, volume 10, pages 707­710, 1966.
[163]Wang Lian, David Wai lok Cheung, Nikos Mamoulis, and Siu-Ming Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Transactions on Knowledge and Data Engineering, 16(1):82­96, 2004.
[164]Shaorong Liu, Qinghua Zou, and Wesley W. Chu. Configurable Indexing and Ranking for XML Information Retrieval. In Proceedings of the 27th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 88­95. ACM Press, 2004.
[165]Julie Beth Lovins. Development of a Stemming Algorithm. Mechanical Translation and CDomputational Linguistics, 11(1­2):22­31, 1968.
[166]Anna Lubiw. Some NP-Complete Problems Similar to Graph Isomorphism. In SIAM Journal of Computing, volume 10, pages 11­21, 1981.
[167]Robert W. P. Luk, Alvin T. S. Chan, Tharam S. Dillon, and Hong Va Leong. A Survey of Search Engines for XML Documents. In Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Athens, Greece, July 2000.
[168]Robert W. P. Luk, Hong Va Leong, Tharam S. Dillon, Alvin T. S. Chan, W. Bruce Croft, and James Allan. A Survey in Indexing and Searching XML Documents. Journal of the American Society for Information Science and Technology (JASIST), 53(6):415­437, 2002.
[169]J. B. MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. In L. M. LeCam and J. Neyman, editors, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281­297. Berkeley University of California Press, 1967.
[170]Saadia Malik, Gabriella Kazai, Mounia Lalmas, and Norbert Fuhr. Overview of INEX 2005. In Fuhr et al. [90], pages 1­15.
[171]Murali Mani, Dongwon Lee, and Makoto Murata. Normal Forms for Regular Tree Grammars. Technical report, UCLA Computer Science Department, 2001.
[172]Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[173]Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, London, England, fifth printing edition, 2002.
[174]Yosi Mass and Matan Mandelbrod. Retrieving the Most Relevant XML Component. In Fuhr et al. [89], pages 53­58.
[175]Yosi Mass and Matan Mandelbrod. Component Ranking and Automatic Query Refinement for XML Retrieval. In Fuhr et al. [91], pages 134­140.
[176]Yosi Mass, Matan Mandelbrod, Einat Amitay, David Carmel, Yoelle Maarek, and Aya Soffer. JuruXML ­ An XML Retrieval System at INEX'02. In Fuhr et al. [82], pages 73­80.
[177]Jason McHugh, Serge Abiteboul, Roy Goldman, Dallas Quass, and Jennifer Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3):54­66, 1997.
[178]Wolfgang Meier. eXist: An Open Source Native XML Database. In Erhard Rahm, B. Chaudri, Mario Jeckle, and Rainer Unland, editors, Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems, volume 2593, pages 169­183, London, UK, 2003. Springer-Verlag LNCS.
[179]Bruno T. Messmer and Horst Bunke. A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):493­505, 1998.
[180]Andrei Mikheev. Periods, Capitalized Words, etc. Computational Linguistics, 28(3):289­ 318, 2002.
[181]Elke Mittendorf and Peter Schäuble. Document and Passage Retrieval Based on Hidden Markov Models. In Proceedings of the 17th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 318­327, Dublin, Ireland, July 1994. Springer-Verlag New York, Inc.
[182]Muc. Message Understanding Conference (MUC-6). http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html, September 2006.
[183]Muc. Message Understanding Conference (MUC-7). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc.html, September 2006.
[184]Richard Myers, Richard C. Wilson, and Edwin R. Hancock. Bayesian Graph Edit Distance. In Proceedings of the 10th International Conference on Image Analysis and Processing(ICIAP), page 1166, Washington, DC, USA, 1999. IEEE Computer Society.
[185]Gonzalo Navarro and Ricardo A. Baeza-Yates. Proximal Nodes: A Model to Query Document Databases by Content and Structure. ACM Transactions on Information Systems, 15(4):400­435, October 1997.
[186]Richi Nayak and Sumei Xu. XML Documents Clustering by Structures. In Fuhr et al. [90], pages 432­442.
[187]Andrew Nierman and Hosagrahar Jagadish. Evaluating Structural Similarity in XML Documents. In Proceedings of the 5th International Workshop on the Web and Databases (WebDB), June 2002.
[188]John O'Connor. Retrieval of Answer-Sentences and Answer-figures from Papers by Text Search. In Information Processing and Management, volume 11, pages 155­164, 1975.
[189]John O'Connor. Answer-Passage Retrieval by Text Searching. Journal of the American Society for Information Science, 31(4):227­239, July 1980.
[190]Paul Ogilvie and Jamie Callan. Language Models and Structured Document Retrieval. In Fuhr et al. [82], pages 33­40.
[191]Richard A. O'Keefe and Andrew Trotman. The Simplest Query Language That Could Possibly Work. In Fuhr et al. [89], pages 167­174.
[192]openNLP Tagger. openNLP Tagger. http://opennlp.sourceforge.net, September 2006.
[193]Chris D. Paice. Another Stemmer. SIGIR Forum, 24(3):56­61, 1990.
[194]Chris D. Paice. An Evaluation Method for Stemming Algorithms. In Proceedings of the 17th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 42­50, New York, NY, USA, 1994. Springer-Verlag New York, Inc.
[195]Sukomal Pal. XML Retrieval ­ A Survey. Technical report, Indian Statistical Institute, Kolkata, Computer Vision and Pattern Recognition Unit (CVPR), June 30 2006.
[196]David D. Palmer. Tokenisation and Sentence Segmentation, pages 11­35. Volume 1 of Dale et al. [61], 2000.
[197]David D. Palmer and Marti A. Hearst. Adaptive Multilingual Sentence Boundary Disambiguation. Computational Linguistics, 23(2):241­267, June 1997.
[198]Uchang Park. An Implementation of XML Documents Search System Based on Similarity in Structure and Semantics. In Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration (WIRI), pages 97­103. Yeojin Seo Duksung Women's University, April 08­09 2005.
[199]Dan Pelleg and Andrew Moore. Accelerating Exact k-Means Algorithms with Geometric Reasoning. In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 277­281, New York, NY, USA, 1999. ACM Press.
[200]Luuk Peters. Change Detection in XML Trees: A Survey. In Proceedings of the 4th Twente Student Conference on IT, 2005.
[201]Karen Pinel-Sauvagnat and Mohand Boughanem. A Survey on XML Focussed Component Retrieval. In Large-Scale Semantic Access to Content (Text, Image, Video and Sound) (RIAO), page Electronic Medium, Pittsburgh, USA, June 2007. Centre de Hautes Etudes Internationales D'Informatique Documentaire (C.I.D.).
[202]Benjamin Piwowarski, Georges-Etienne Faure, and Patrick Gallinari. Bayesian Networks and INEX. In Fuhr et al. [82], pages 149­154.
[203]Martin F. Porter. An Algorithm for Suffix Stripping. Program, 14(3):130­137, July 1980.
[204]QTag. QTag. http://www.english.bham.ac.uk/staff/omason/software/qtag.html, September 2006.
[205]Adwait Ratnaparkhi. A Maximum Entropy Model for Part-of-Speech Tagging. In Eric Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 133­142, Somerset, New Jersey, 1996. University of Pennsylvania, Association for Computational Linguistics.
[206]Jonathan Robie, Joe Lapp, and David Schach. XML Query Language (XQL). http://www.w3.org/TandS/QL/QL98/pp/xql.html, September 1998.
[207]Thomas Rölleke and Norbert Fuhr. HySpirit ­ A Flexible System for Investigating Probabilistic Logical Information Retrieval. Technical report, University of Dortmund, Dortmund, Germany, 1997.
[208]Thomas Rölleke and Norbert Fuhr. Retrieving Complex Objects with HySpirit. In J. Furner and D. J. Harper, editors, Proceedings of the 19th BCS-IRSG Colloquium on IR Research, pages 32­43, Aberdeen, 1997. Robert Gordon University.
[209]Thomas Rölleke, Ralf Lübeck, and Gabriella Kazai. The HySpirit Retrieval Platform. In Kraft et al. [157], page 454.
[210]Carl Sable and Ken Church. Using Bins to Empirically Estimate Term Weights for Text Categorization. In Lillian Lee and Donna Harman, editors, Proceedings of the 6th International Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 58­66, Pittsburgh, US, 2001. Association for Computational Linguistics, Morristown, US.
[211]Gerard Salton. The SMART Retrieval System ­ Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1971.
[212]Gerard Salton, J. Allan, and Chris Buckley. Approaches to Passage Retrieval in Full Text Information Systems. In Proceedings of the 16th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 49­58. ACM Press, 1993.
[213]Gerard Salton and Micheal E. Lesk. Computer Evaluation of Indexing and Text Processing. Journal of the ACM, 15(1):8­36, 1968.
[214]Gerard Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic Indexing. Commun. ACM, 18(11):613­620, 1975.
[215]Christer Samuelsson. Morphological Tagging Based Entirely on Bayesian Inference. In Robert Eklund, editor, Proceedings of the 9th Nordic Conference on Computational Linguistics (NODALIDA), pages 225­238. Stockholm University, 1993.
[216]Bilge Say. An Information-Based Approach to Punctuation. PhD thesis, Institute of Engineering and Science af Bilkent University, 1998.
[217]Bilge Say and Varol Akman. An Information-Based Treatment of Punctuation. In Proceedings of the 2nd International Conference on Mathematical Linguistics (ICML), pages 93­94, Tarragona, Spain, 1996.
[218]Anne Schiller, Simone Teufel, Christine Stöckert, and Christine Thielen. Guidelines für das Tagging deutscher Textcorpora mit STTS. Technical report, Institut für maschinelle Sprachverarbeitung, Stuttgart, 1999.
[219]Torsten Schlieder and Holger Meuss. Result Ranking for Structured Queries against XML Documents. In DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, December 2000.
[220]Torsten Schlieder and Holger Meuss. Querying and Ranking XML Documents. JASIST, 53(6):489­503, 2002.
[221]Helmut Schmid. Probabilistic Part-Of-Speech Tagging Using Decision Trees. In International Conference on New Methods in Language Processing, Manchester, UK, September 1994.
[222]Helmut Schmid. Improvements in Part-Of-Speech Tagging with an Application to German. In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pages 172­176, 1995.
[223]Albrecht Schmidt, Martin L. Kersten, Menzo Windhouwer, and Florian Waas. Efficient Relational Storage and Retrieval of XML Documents. In Selected papers from the 3rd International Workshop WebDB 2000 on the World Wide Web and Databases (WebDB), volume 1997 of Lecture Notes in Computer Science, pages 137­150, London, UK, 2001. Springer-Verlag.
[224]Fabrizio Sebastiani. Machine Learning in Automated Text Categorisation. ACM Computing Surveys, 34(1):1­47, 2002.
[225]Stanley M. Selkow. The Tree-to-Tree Editing Problem. In Information Processing Letters, volume 6, pages 184­186, 1977.
[226]Jayavel Shanmugasundaram, Kristin Tufte, Chun Zhang, Gang He, David J. DeWitt, and Jeffrey F. Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pages 302­314, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[227]Dennis Shasha and Kaizhong Zhang. Approximate Tree Pattern Matching. In Pattern Matching Algorithms, pages 341­371. Oxford University Press, 1997.
[228]Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, and Joachim Thomas. The Rufus System: Information Organization for Semi-Structured Data. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB), pages 97­107, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc.
[229]Peter Shoubridge, Miro Kraetzl, and David Ray. Detection of Abnormal Change in Dynamic Networks. In Proceedings of Information Decision and Control, pages 557­562. IEEE Inc., 1999.
[230]Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. Impact of Similarity Measures on Web-Page Clustering. In Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search (AAAI), pages 58­64, Austin, Texas, USA, 30­31 July 2000. AAAI.
[231]Stanford Tagger. Stanford Tagger. http://nlp.stanford.edu/software/tagger.shtml, September 2006.
[232]Kuo-Chung Tai. The Tree-to-Tree Correction Problem. Journal of the Association for Computing Machinery (ACM), 26(3):422­433, 1979.
[233]Xavier Tannier, Alan Woodley, Shlomo Geva, and Marcus Hassler. Approaches to Translating Natural Language Queries for Use in XML Information Retrieval Systems. Technical Report 2006-400-008, Ecole Nationale Supérieure des Mines de Saint-Etienne, Saint-Étienne, France, July 2006.
[234]Tei. The Text Encoding Initiative (TEI). http://www.tei-c.org, September 2006.
[235]Tei.The Text Encoding Initiative (TEI) Guidelines. http://www.tei-c.org/Guidelines2/index.html, September 2006.
[236]Anja Theobald and Gerhard Weikum. Adding Relevance to XML. In Selected papers from the 3rd International Workshop WebDB 2000 on the World Wide Web and Databases (WebDB), pages 105­124, London, UK, 2001. Springer-Verlag.
[237]Anja Theobald and Gerhard Weikum. The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In Proceedings of the 8th International Conference on Extending Database Technology (EDBT), pages 477­495, London, UK, 2002. Springer-Verlag.
[238]Richard M. Tong. Tarragon Consulting at INEX 2002: Experiments using the K2 Search Engine from Verity. In Fuhr et al. [82], pages 88­94.
[239]Kristina Toutanova and Christopher D. Manning. Enriching the Knowledge Sources used in a Maximum Entropy Part-Of-Speech Tagger. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 63­70, Morristown, NJ, USA, 2000. Association for Computational Linguistics.
[240]Andrew Trotman and Börkur Sigurbjörnsson. Narrowed Extended XPath I (NEXI). In Fuhr et al. [91], pages 16­40.
[241]Andrew Trotman and Börkus Sigurbjörnsson. NEXI, Now and Next. In Fuhr et al. [91], pages 41­53.
[242]Dan Tufis and Oliver Mason. Tagging Romanian Texts: A Case Study for QTAG, a Language Independent Probabilistic Tagger. In Proceedings of the 1st International Conference on Language Resources and Evaluation (LREC), pages 589­596, May 28­30 1998.
[243]Unicode. The Unicode Consortium. http://www.unicode.org, September 2006.
[244]UTF. The Unicode Standard, Unicode Transformation Format (UTF). http://www.unicode.org/standard/standard.html, September 2006.
[245]Hans van Halteren, Jakub Zavrel, and Walter Daelemans. Improving Data Driven Wordclass Tagging by System Combination. In Proceedings of the 36th Meeting on Association for Computational Linguistics (COLING), pages 491­497, Morristown, NJ, USA, 1998. Association for Computational Linguistics.
[246]Cornelis Joost van Rijsbergen. Information Retrieval. Butterworth-Heinemann, Dept. of Computer Science, University of Glasgow, London, England, 2 edition, 1979.
[247]Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, and Yves Lechevallier. A Flexible Structure-Based Representation for XML Document Mining. In Fuhr et al. [90], pages 443­457.
[248]Anne-Marie Vercoustre, James A. Thom, Alexander Krumpholz, Ian Mathieson, Peter Wilkins, Mingfang Wu, Nick Craswell, and David Hawking. CSIRO INEX Experiments: XML Search using PADRE. In Fuhr et al. [82], pages 65­72.
[249]Ellen M. Voorhees. The Cluster Hypothesis Revisited. In Proceedings of the 8th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 188­196, New York, NY, USA, 1985. ACM Press.
[250]Ellen M. Voorhees. Natural Language Processing and Information Retrieval. In M. T. Pazienza, editor, Information Extraction: Towards Scalable, Adaptable Systems, volume 1714, pages 32­48, London, UK, 1999. LNCS Springer-Verlag.
[251]Jason T. L. Wang, Kaizhong Zhang, and Gung-Wei Chirn. Algorithms for Approximate Graph Matching. Information Sciences Information Computer Science, 82(1-2):45­74, 1995.
[252]Jason Tsong-Li Wang, Kaizhong Zhang, Karpjoo Jeong, and Dennis Shasha. A System for Approximate Tree Matching. IEEE Transactions on Knowledge and Data Engineering (TKDE), 6(4):559­571, 1994.
[253]Yuan Wang. X-Diff: A Fast Change Detection Algorithm for XML Documents. Master's thesis, University of Wisconsin, WI, USA, 2003.
[254]Yuan Wang, David J. DeWitt, and Jin-Yi Cai. X-Diff: A Fast Change Detection Algorithm for XML Documents. In Umeshwar Dayal, Krithi Ramamritham, and T. M. Vijayaraman, editors, Proceedings of the 19th International Conference on Data Engineering (ICDE), Bangalore, India, March 5-8 2003. IEEE Computer Society.
[255]Jonathan J. Webster and Chunyu Kit. Tokenization as the Initial Phase in NLP. In International Center of Computational Logic (ICCL), editor, Proceedings of the 14th International Conference on Computational Linguistics (COLING), volume 4, pages 1106­ 1110, Morristown, NJ, USA, August 23-28 1992. University of Trier.
[256]Ralph Weischedel, Richard Schwartz, Jeff Palmucci, Marie Meteer, and Lance Ramshaw. Coping with Ambiguity and Unknown Words through Probabilistic Models. Comput. Linguist., 19(2):361­382, 1993.
[257]Ross Wilkinson. Effective Retrieval of Structured Documents. In Proceedings of the 17th ACM SIGIR International Conference on Research and Development in Information Retrieval, pages 311­317, Dublin, Ireland, 1994. Springer-Verlag New York, Inc.
[258]Ross Wilkinson and Justin Zobel. Comparison of Fragmentation Schemes for Document Retrieval. In Proceedings of the 3rd International Text Retrieval Conference (TREC), pages 81­84, 1994.
[259]Alan Woodley, Marcus Hassler, Xavier Tannier, and Shlomo Geva. Natural Language Processing and XML Retrieval. In Lawrence Cavedon and Ingrid Zukerman, editors, Proceedings of the Australasian Language Technology Workshop (ALTW), pages 165­166, Sidney, Australia, November 30 ­ December 1 2006. Australian Language Technology Association.
[260]Tatsuo Yamashita and Yuji Matsumoto. Language Independent Morphological Analysis. In Proceedings of the 6th International Conference on Applied Natural Language Processing, pages 232­238, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[261]Yiming Yang. An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval, 1(1/2):69­90, 1999.
[262]Yiming Yang and Xin Liu. A Re-Examination of Text Categorization Methods. In Marti A. Hearst, Fredric Gey, and Richard Tong, editors, Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 42­49, Berkley, US, August 1999. ACM Press, New York, US.
[263]Øystein Grøvlen. Natural Language Processing in Information Retrieval. Phd term paper, Norwegian Institute of Technology, January 1995.
[264]Mohammed J. Zaki and Charu C. Aggarwal. XRules: An Effective Structural Classifier for XML Data. In Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2003.
[265]Kaizhon Zhang, Jason Wang, and Dennis Sasha. On the Editing Distance between Undirected Acyclic Graphs. International Journal of Foundations of Computer Science, 7(13), 1995.
[266]Kaizhong Zhang and Dennis Elliott Shasha. Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. Society for Industrial and Applied Mathematics (SIAM) Journal on Computing, 18(6):1245­1262, 1989.
[267]Zhongping Zhang, Rong Li, Shunliang Cao, and Yangyong Zhu. Similarity Metric for XML Documents. In Workshop on Knowledge and Experience Management (FGWM), 2003.
[268]Ying Zhao and George Karypis. Criterion Functions for Document Custering: Experiments and Analysis. Technical Report TR 01­40, Department of Computer Science, University of Minnesota, Minneapolis, MN, 2001.