In practice, search engines will discover links to a large number of new (previously unseen) sites and pages as it crawls known sites. One of the difficult problems search engines have to solve is to find a useful way of prioritising, so that the finite processing resources it has available are used in the most effective way. For example, should it focus on making sure it has up-to-date content information for the sites it already knows about, or should it focus on discovering and indexing the new (previously unseen) sites? And of the previously unseen sites, how should it prioritise the crawling and indexing of these sites?
The places within the page that the words occur is also stored in the index as this can have an important influence on how relevant a search engine judges a page to be for a specific query. For example, with the search query "blue widgets", if these words occur in places like the page title or in page headings, which are judged to be important by the search engine, this will have a greater effect on the search results than if the words occurred in the normal text of a paragraph.
As well as storing information about the contents of a page (the "on-page" factors), a search engine index will also contain information specific to that page that comes from so-called "off-page" factors. The most important of these relate to links from other web pages that point to the page in question. The anchor text of such links (the words you actually click on, normally highlighted) will be stored in the index for that page and used by the search engine to help it decide what the page is about.
It's important to keep in mind, however, that whatever measure a search engine uses for the link popularity of a page, this is a general property of the page and does not relate to any particular word or search query. When a search engine returns results for a specific search, it will combine such general, search-term independent factors with factors that relate directly to the search term (e.g. word density in the page content). In other words, just because a page is popular (has many links) doesn't mean it will rank well for a given search term. Pages with lower link popularity can easily rank higher in the results if their content is strongly relevant to the search term.
Search-term dependent factors are, clearly, directly related to words used in the search term entered by the user. Pages that contain many occurrences of the words in the search query will be judged more relevant for the search, and where the words occur in important locations (title, headings) this will be given extra weighting by the search engines. The anchor text of links pointing to each web page is another important search-term dependent factor: in other words, if a page has many links to it from other sites where the words used in the anchor text match words in the search query, the page will tend to be ranked higher in the search results.
If the search query uses several words, the number of times those words are found close together in the page content (or anchor text) is also likely to be judged as important by the search engine.
Search-term independent factors include all those general properties of a web page that are not related to a specific word or phrase. The link popularity of the page is one such factor, and others may include the age of the page content or the frequency with which it is updated.
Most search engines allow you to run a site search directly by enetering a search term such as "site:www.bbc.co.uk".
Cache
The cache operator shows exactly what HTML content was retrieved by a search engine when it last visited a given page. The date at which the content was retrieved is usually also displayed. The Google cache for a specific page can be viewed by entering a query such as "cache:news.bbc.co.uk". The cached version of a page is available from most major search engines as a named link alongside the usual search results.
In cases where a particular website cannot be viewed (because the server is down or too many users are trying to access it), the search engine "cache" operator can be a useful way to see the contents of a web page.
When using search optimisation techniques it is often useful to know when a search engine last updated its index with the contents of a particular page. For example, if you've recently updated a page, you may want to check whether a search engine has visited the page since the content was updated. Viewing the cached copy of a page stored by a search engine will normally include the date at which the content was found, allowing you to confirm if an updated page has been found and indexed yet.
Link
The link operator is used to show which web pages have a link to a specific web page. Although all major search engines offer this option there are some important differences in the results that different search engines return. In particular, when using the "link" option with Google it only displays a selection of the pages that link to the defined web page. It may have come across many more pages that include a link to the specified page - it still will only show a small subset of these when you try a "link" search (a more complete list is available if via the Google Webmaster console).
With Google and Yahoo! you can run a link check by entering a query such as "link:www.bbc.co.uk" directly into the search box.