Lucene in Action: Covers Apache Lucene 3.0 (2nd Edition)

Lucene in Action: Covers Apache Lucene 3.0 (2nd Edition)

Erik Hatcher, Otis Gospodnetic, Michael McCandless

Language: English

Pages: 528

ISBN: 2:00047045

Format: PDF / Kindle (mobi) / ePub

When Lucene first hit the scene five years ago, it was nothing short of amazing. By using this open-source, highly scalable, super-fast search engine, developers could integrate search into applications quickly and efficiently. A lot has changed since then-search has grown from a "nice-to-have" feature into an indispensable part of most enterprise applications. Lucene now powers search in diverse companies including Akamai, Netflix, LinkedIn, Technorati, HotJobs, Epiphany, FedEx, Mayo Clinic, MIT, New Scientist Magazine, and many others.

Some things remain the same, though. Lucene still delivers high-performance search features in a disarmingly easy-to-use API. Due to its vibrant and diverse open-source community of developers and users, Lucene is relentlessly improving, with evolutions to APIs, significant new features such as payloads, and a huge increase (as much as 8x) in indexing speed with Lucene 2.3.

And with clear writing, reusable examples, and unmatched advice on best practices, Lucene in Action, Second Edition is still the definitive guide to developing with Lucene.

A Companion To Marx's Capital

The End of Poverty: Economic Possibilities for Our Time

Design Driven Testing: Test Smarter, Not Harder

Beginning COBOL for Programmers

Advanced Apex Programming for and (3rd Edition)

Foundations of Object-Oriented Languages Types and Semantics




















should be proud for getting this far. You’ve seen how Lucene models content, the steps for indexing at a high level, and the basics of how to add, delete, and update documents in an index. You understand all the field options that tell Lucene precisely what to do with each field’s value, and you now know how to handle interesting cases like multivalued fields, field truncation, document/field boosting, and numeric and date/time values. We’ve covered why and how to optimize and index, and the

has no effect on scoring. The last two assertions in listing 3.9, where wild and mild are closer matches to the pattern than mildew, demonstrate this. Our next query is FuzzyQuery. 3.4.8 Searching for similar terms: FuzzyQuery Lucene’s FuzzyQuery matches terms similar to a specified term. The Levenshtein distance algorithm determines how similar terms in the index are to a specified target term. (See for more information about Levenshtein

in section 3.1.2 at the start of the chapter. In this section we’ll first delve into the specific syntax for each of Lucene’s core Query classes that QueryParser supports. We’ll also describe some of the settings that control the parsing of certain queries. We’ll wrap up with further syntax that QueryParser accepts for controlling grouping, boosting, and field searching of each query clause. This discussion assumes knowledge of the Query types discussed in section 3.4. Note that some of these

document (we show this example in section 5.7.2). Possibly, in a commerce setting, your documents correspond to products, each with its own shipping weight (stored as a float or double, per document), and you’d like to access that to present the shipping cost next to each search result. These are all examples easily handled by Lucene’s field cache API. One important restriction for using the field cache is that all documents must have a single value for the specified field. This means the field

scoreFormatter = new DecimalFormat("0.######"); for (ScoreDoc sd : results.scoreDocs) { int docID = sd.doc; float score = sd.score; Document doc = searcher.doc(docID); System.out.println( StringUtils.rightPad( StringUtils.abbreviate(doc.get("title"), 29), 30) + StringUtils.rightPad(doc.get("pubmonth"), 10) +"" + docID, 4) + StringUtils.leftPad( scoreFormatter.format(score), 12)); out.println(" " + doc.get("category")); //out.println(searcher.explain(query, docID)); } G H

Download sample