Search Guide

Our search engine tries to offer today's typical web searching experience, as gained with popular search engines such as Google. The nature of bibliographic searching differs from that of a web page searching, though. We provide many extensions to enable a complex and precise structured search, including an combined metadata, fulltext and reference search in one go. This page lists several tips and tricks that you may find useful to this effect.

Index

    Search guidance
    Searching for words versus phrases
    Boolean queries
    Parentheses
    Special characters and punctuation
    International characters
    Word truncation/stemming
    Structured metadata search
    Regular expressions
    Span queries
    Combined metadata/fulltext/citation search
    Frequently asked questions
        How to wisely choose your search terms (speed-wise)
        How to search for publications by a given author
        How to sort according to a certain pattern
        How to search in fulltext files
        How to search for citations

Search guidance

After you submit your query, the search engine will analyze it and will try to always guide you in case no exact match could be found. For example, it would print you a list of closest indexed terms in case of spelling troubles:

Alternative choices will be printed in red. The search engine will similarly warn you when your search terms could not be found, or when they could but your boolean query couldn't be met. The search engine will also silently try to search for alternative forms (e.g. remove punctuation), etc.

Thanks to multiple search stages and the guidance provided at each stage, it is usually sufficient to simple type what you are looking for and see what the system says in return. If you aren't satisfied, you would then add/remove words from your query until you get a satisfactory result.

Searching for words versus phrases

The default search mode is a search for words. This means that any whitespace you type is not significant, but is rather interpreted to mean "add an automatic boolean AND between words", like Google does. For example, to find all records that contain both the word ellis and the word muon anywhere in the record, type:

The whitespace would be significant if you include it within quotes. There are two phrase searching modes:

The double quotes instruct the search engine to search for the exact phrase. This phrase search mode will match if and only if the given metadata field is exactly equal to the input pattern. For example, to find all documents written by Ellis, J spelt exactly that way, type:
The single quotes instruct the search engine to search for partial phrases. Unlike the exact phrase search, this mode allows for an extra text appearing before/after the given pattern. This is somewhat similar to the "phrase search mode" common on Google and other fulltext engines that search for phrase expressions inside Web pages. For example, to find all the titles containing the expression muon decay regardless of the position of the expression, type:

Now you see how to search for an author spelt sometimes as Ellis, J and sometimes as Ellis, Jonathan Richard at the same time. Please note that this search would also return other authors, such as De Lellis, Jim:

(See also our specific author searching tips.)

The difference between exact and partial phrase searching modes may not be obvious at the first glance. While the latter is more similar to what "phrase search" usually means in the context of web page search engines, the former is usually an order of magnitude faster if you know the precise values you are looking for.

(Note: For some indexes such as any field, title, or abstract, there is no distinction between searching for double quoted and single quoted expressions. Both behave the same usual way.)

Another interesting searching mode besides the word and phrase searches is the regular expression search, introduced by slashes instead of quotes. For example, the above partial phrase query 'muon decay' is fully equivalent to the regular expression query /muon decay/. The regular expression syntax is very powerful and allows you to construct very complex queries. For more information, please consult the regular expression section of this guide.

Boolean queries

We have already seen how whitespace adds a silent boolean AND in the search for words. The other boolean operators include:

+
AND ellis +muon matches all records that contain both the word ellis and the word muon

ellis muon ditto, syntactic sugar

ellis and muon ditto, syntactic sugar

-
NOT ellis -muon matches all records that contain the word ellis but that do not contain the word muon

ellis not muon ditto, syntactic sugar

|
OR ellis |muon matches all records that contain at least one of the words

ellis or muon ditto, syntactic sugar

Logical operations are automatically chained from left to right. For example, if you want to search for documents written by Ellis on muons or kaons, write:

which looks for (muon or kaon) and ellis. Note that this gives different results from: which would search for (ellis and muon) or kaon, therefore returning also documents on kaons not written by Ellis.

The left-to-right chaining behaviour permits you to easily refine your search by adding/removing words with and/not or +/- operators. For example, to exclude the documents on decay from the above search, append -decay:

to get a refined list. Keep adding/removing terms until you are satisfied.

Parentheses

You can also use parentheses in your queries to group boolean expressions together:

This query returns records containing either gravity or supergravity, and either ellis or perelstein anywhere in the record.

Note that you can use any number of parentheses in the query. Nested parentheses, such as foo AND (bar OR (fuux NOT quux)), are also supported.

Special characters and punctuation

When indexing a word, attention is paid to index it both with and without punctuation, so that you should be able to search for terms containing special characters, such as C++, verbatim:

For example, to find records containing the LaTeX expression $e^{+}e^{-}$ in the title, type: For example, to find document with the report number hep-ph/0204133, type: Note that the search is case-insensitive:

International characters

The search engine works with Unicode UTF-8 so you can type your query strings in any language stored in the database. For example, to find the documents written by (or on) Пушкин, type:

Note that you don't have to type accents to find accented results. For example, you may type Lemaitre to find papers by Lemaître:

Word truncation/stemming

The word truncation is supported via the asterisk (*) wildcard character. The wildcard instructs the search engine to match any number of characters in that place. For example, to find records that contain words muon, muons, muonic etc, type:

The wildcard query works in prefix, infix and suffix position. For example, to get all the words that start by CERN-TH and end by 31, type: Note that the wildcard will be ignored if you try to apply it to very short words (less than 3 characters), such as a*: The wildcard character can be used also in the phrase searching mode. For example, to find all the documents whose title starts by "Neutrino mass", type: Recall that we have introduced exact and partial phrase search modes. Actually, a partial phrase search mode launches an exact search enclosed within wildcards: we could say that

'foo bar
   baz'

equals to "*foo bar baz*". Now you can see why the partial phrase search is slow: due to the usage of two asterisks in front and after the text, each and every title in the database has to be looked up to determine whether it matches or not. (There are currently no partial phrase indexes.)

Structured metadata search

Searching within various bibliographic fields (such as title, author) is supported via Google's "site:" like syntax. If a search term is preceded by a field name and a colon, then the term is searched for inside this field only. For example, to find documents containing the word ellis within the author index, type:

To select documents written by Ellis that contain words like muon, muons, muonic within the title, type: To select documents written by the NA60 experiment in year 2001, type: The most common fields you may want to use are author, title, reportnumber, abstract, keyword, year, experiment, fulltext, and reference.

Regular expressions

The regular expression searching mode is mostly for the power users acquainted with the traditional Unix/POSIX regexp syntax. In the Simple Search interface you can trigger it by using slashes instead of quotes:

The above example will find all the titles that start by the letter E, followed by any number of any characters, and end by the letter s.

Another example could be a search for an author expressed in the database as either Ellis, J or Ellis, John:

The regular expression search enables you to formulate very specific word proximity queries. For example, let us find all titles containing words dense and matter that are separated by at most one word that doesn't contain the letter l:

Note that you can also use character intervals such as [a-k] and occurrence counts such as {3}. For example, let us find all preprints that do not follow the year cataloguing policy, that is YYYY to denote year, optionally followed by ? or by another -YYYY:

You can use also character classes such as [:alnum:] or [:digit:], so that the above query is equivalent to:

To learn more about POSIX regular expressions, please consult the Wikipedia regexp article and the MySQL regexp documentation.

Span queries

The span query is provided via a -> sign. For example, to search for all documents on muon decay published between 1983 and 1992, type:

To find all documents by authors with names ranging from Ellis, J to Ellis, Qqq, type:

Combined metadata/fulltext/citation search

All the syntax mentioned above can be combined together in one query. For example, to find documents that have the word ellis inside the author field, that do not contain words like muon, 'muonic' etc in any field, that contain the phrase (or the substring, to be more precise) 'dense quark matter' inside the abstract field, and that were published in year starting by digits '200', type:

Note that the default "any field" global index contains only the metadata terms, not the citation nor fulltext terms. You have to explicitily mention fulltext or reference index to search there. For example, to find the term Higgs in either metadata, references or fulltext files, type: This allows an interesting combination of metadata, fulltext and citation search in the same query. For example, to get all documents written by Lin whose fulltext files contain the words Schwarzschild and AdS, and who cite journal Adv. Theor. Math. Phys., type:

Frequently asked questions

How to wisely choose your search terms (speed-wise)

Whenever possible, prefer word searches instead of phrase searches. Search rather for black hole than for "black hole".
Avoid common terms such as and, of, or CERN.
If you are searching for a specific metadata information, such as a report number, select the corresponding index.
If you are looking for a specific document collection, such as Theses, select the Theses collection first, and start your search from there.

How to search for publications by a given author

You can search for an author in many ways, each having its own advantages and disadvantages.

First of all, note that searching for words isn't usually what you would want here. If you choose to search for the words Ellis J within the author index, it means that two queries (for the words Ellis and J) are effected first and a boolean AND is performed next:

Such a query would match also a document whose first author is Ellis, R and the second author Finch, A J, which is probably not what you wanted. While the search is very fast and you would have found the results for the author you were looking for, such a technique could have returned you many false positives, as the one cited above. Instead of searching for words, a more suitable technique to apply in this case is to search for phrases which will permit you to achieve higher search precisions.
The author names are usually stored in a form containing initials only, such as Ellis, J. To get the list of publications of an author whose name is spelt exactly that way, type:

This way of searching gives you the highest precision and no false positives. (Assuming there are no other authors whose names are spelt Ellis, J, an assumption that is often false^*.) The search is very fast.
Sometimes an author's first name may be spelt abbreviated on some documents (such as Ellis, J) and sometimes full on others (such as Ellis, John; possibly also with the middle name: Ellis, John Rolfe). To get the list of publications for all these forms at the same time, you could use a boolean OR query:

This way of searching still keeps the highest precision and no false positives. (Assuming there are no other authors whose names are spelt Ellis, J or Ellis, John, an assumption that is often false^*.) The search is fast.
To match all of the above forms in a single search term, you can try to use a wildcard query:

It would match all author names that start by the text Ellis, J, i.e. not only the wanted forms Ellis, J and Ellis, John, but also Ellis, Jim, or Ellis, John Rolfe, or Ellis, Jonathan Richard.

This way of searching returns more results, which may be suitable in case you don't know how the names are spelt in the database. But you also risk the eventuality of getting false positives. The search is relatively fast.
Yet another, the most general alternative is to use a partial phrase matching:

It would find not only all the authors mentioned above, but also the ones whose names contain the expression Ellis, J anywhere inside the name, such as De Lellis, Jim. It thus gives you the largest possible number of hits at the largest risk of false positives. The search is relatively slow.

(Note though that this way of searching may be very handy in case of compound family names such Pepe-Altarelli, M or 't Hooft, G where a casual user query for Hooft, G would match the wanted author, unlike the methods mentioned above.)
Finally, let us note that you can use the regular expression syntax to construct any complex author query. A simple example is to search for an author expressed in the database as either Ellis, J or Ellis, John:

Please consult regular expression searching tips to find out more about regular expression search possibilities.

^*NOTE: If you produce your own list of publications and you notice that sometimes your first name is spelt abbreviated and sometimes in full, or if you want to identify your publications among several authors with the same abbreviation, please contact the administrators of DKFZ so that they could work with you on inputting a consistently spelt and properly formatted first name everywhere. Only the consistent database content will ensure a proper author searching behaviour.

How to sort according to a certain pattern

You may select a certain field according to which you wish to sort the search results, for example to sort the results by main title. However, sometimes you may want to sort by a report number and it happens that your documents have several of them. For example, the report numbers hep-ph/0204140, CERN-TH-2002-069 and RM3-TH-02-4 all denote the same document. Now if you sort your search results set containing this document, the system will take into consideration the first report number, that may be any of these three. Sometimes you may want to classify this document under its hep-ph number, sometimes under its CERN number, depending on whether you produce a list of CERN or hep-ph publications. How can you influence the search engine to prefer one report number rather than the other?

In other words, the search engine by default answers a query like "sort by first author" or "sort by first report number", but sometimes you may want to ask the search engine to "sort by first report number that starts by the text CERN-". The latter possibility is available via a "silent" sort parameter called sp (for "sort pattern") that sorts preferentially according to the given textual pattern if it can be found. The parameter is "silent" in a way that it is not present in the search interface: you have to add it manually to your search URL. For example, to get all CERN-TH publications of the year 2001 sorted by their CERN-TH numbers, you would search for CERN-TH-2001* within reportnumber index, and on the search results page, being satisfied with the results, you would add &sp=CERN-TH to the URL to sort the results preferentially by CERN-TH report numbers, to get a nicely sorted list of all CERN-TH 2001 publications.

How to search in fulltext files

If a metadata record contains some associated fulltext files, DKFZ tries to extract the textual information from the files and index it into a separate fulltext index. To search for all records that contain the term e- in their fulltext files, type:

Recall that fulltext words aren't included in the default global "any field" index, but that you may freely combine a fulltext and metadata search. For example, to find all articles written by Ellis that contain the word muon either in the metadata or in the fulltext, type:

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help

+ AND	`ellis +muon`	matches all records that contain both the word ellis and the word muon
	`ellis muon`	ditto, syntactic sugar
	`ellis and muon`	ditto, syntactic sugar
- NOT	`ellis -muon`	matches all records that contain the word ellis but that do not contain the word muon
- NOT	`ellis not muon`	ditto, syntactic sugar
\| OR	`ellis \|muon`	matches all records that contain at least one of the words
\| OR	`ellis or muon`	ditto, syntactic sugar