Difference between revisions of "Simple Query Language and Metadata"
Revision as of 20:59, 20 March 2005
An effective search system must provide a query language which is:
At the same time, it is desirable for the language to be reasonably easy to parse in software.
Gnutella2 employs a simple query language that is familiar to users of web search engines and allows most common criteria to be entered intuitively.
Query Language Definition
- Every search string is considered a list of words.
- Words are identified as sequences of alphanumeric characters. Other symbols and white space are ignored.
- In the basic case, every word in the list must appear one or more times in a matching string.
- Words may be marked as negative words or excluded words by prefixing them with a dash (-).
- In this case, every positive word in the list must appear and every negative word must not appear in a matching string.
- Words can be grouped together with quotes. The negation operator (-) may not appear inside a quoted string, but it may prefix a quoted string in which case the negation is applied to the quoted string as a whole.
- The words in a quoted string must appear in the same order in a matching string. Conversely, the words in a negated quoted string must not appear or must not appear in the same order in a matching string.
Cat Dog (matches strings with "cat" and "dog", in any order) -Cat Dog (matches strings with "dog" but not "cat", in any order) -Cat -Dog (matches strings with neither "cat" nor "dog", illegal in an external search as there are no positive words) "cat dog" (matches strings with "cat" followed by "dog", "cat dog" matches, "dog cat" does not) -"cat dog" (matches strings without "cat dog", "cat dog" does not match, "dog cat" does) "cat dog" -fish (matches strings with "cat dog" and without "fish")
Searching metadata involves a set of specific rules:
- If a metadata schema is specified as search criteria, matching objects that have metadata must share the same schema
- Metadata can only be compared if criteria and object share the same schema
- Each data member (attribute or element) of metadata is compared separately
- Where a member is specified in the criteria but not in the object, the match fails
- Where a member is specified in the object but not in the criteria, the member is ignored
- The search criteria for each text/string member is in the simple query language defined above
- The search criteria for numeric members is range based, defined in a subsequent section.
Numeric Range Matching
When comparing numeric values, the match function is specified by the search criteria:
- If a value is specified, the value must match exactly
- If a range (X-Y) is specified, the value must lie within that inclusive range
- If (X-) is specified, the value must be greater than or equal to X
- If (-X) is specified, the value must be less than or equal to X
Generic Matching on Metadata
Generic search criteria (/Q2/DN) can be matched against metadata fields, however care must be taken not to match against data members which are not directly descriptive to the object. Client-side schema descriptors are a good solution, listing the scope of a general search on each recognised schema.