(検索クエリー)

Description

How to programmatically search and query content from a Plone site.

はじめに

Quering is action to retrieve data from search indexes.

Preface

Plone uses portal_catalog tool to perform most of content related queries. Special catalogs, like reference_catalog, exists, for specialized and optimized queries.

Accesing portal catalog

Plone queries are performed using portal_catalog tool which is available at the site root.

Example:

# portal_catalog is defined in the site root
portal_catalog = site.portal_catalog

You can also use ITools tool to get access to portal_catalog if you do not have Plone site object directly availble:

context = aq_inner(self.context)
tools = getMultiAdapter((context, self.request), name=u'plone_tools')

portal_url = tools.catalog()

There is also a third way, using traversing. This is discouraged, as this includes extra processing overhead:

# Use magical Zope acquisition mechanism
portal_catalog = context.portal_catalog

...and the same in TAL template:

<div tal:define="portal_catalog context/portal_catalog" />

Querying portal_catalog

Calling this persistent objectis a shortcut to query method itself, as it provides __call__() method. Each call argument is name of the index and argument value is what the index should contain:

# The following call does not return the actual objects,
# but brains instead
# The call takes list of indices and match values as arguments
brains = portal_catalog(creator="Get all objects with Title "Foobar")
for brain in brains:
    print "Name:" + brain["Title"] + " URL:" + brain.getURL()

Note that values can be special depending, on the queried index. Here is a path query:

# return myfolder and first level of child content
brains = portal_catalog(path={ "query": "/myploneinstance/myfolder", depth : 2})

If you call portal_catalog() without arguments it will return all indexed content objects:

# Print all content on the site
all_brains = portal_catalog()
for brain in all_brains:
        print "Name:" + brain["Title"] + " URL:" + brain.getURL()

Limiting query (batching)

You can use Python slice operator.

Example: getting 10 latest modified content items on the site:

brains = portal_catalog(sort_on="modified", sort_order="reverse")[0:10]
for brain in brains:
    print brain["Title"] + " " + brain["ModificationDate"]

Brain objects

portal_catalog queries return iterable of catalog brain objects.

Brains contain subset of the actual content object information. Available subset is defined by metadata columns in portal_catalog. You can see available metadata columns on portal_catalog “Metadata” in ZMI. For more information, see indexing.

You can access the brain object information by index name using Python dictionary look-up:

# Get the indexed Title of an portal_catalog entry i.e. brain
title = brain["Title"]

Brain result id

Result ID (RID) is given with the brain object and you can use this ID to query further info about the object from the catalog.

Example:

(Pdb) brain.getRID()
872272330

Brain object schema

To see what metadata columns a brain object contain, you can access this information from __record_schema__ attribute which is a dict.

Example:

for i in brain.__record_schema__.items(): print i

('startDate', 32)
('endDate', 33)
('Title', 8)
('color', 31)
('data_record_score_', 35)
('exclude_from_nav', 13)
('Type', 9)
('id', 19)
('cmf_uid', 29)

ノート

TODO: What those numbers represent?

Getting the real object

portal_catalog() query returns indexed brain objects. If you want to get the actual object, from which the search data was indexed, use the following:

# Load the actual object from the database (SLOW!)
# and modify it
object = brain.getObject()
object.setSomething("foobar")

ノート

Calling getObject() has performance implications. Waking up each object needs a separate query to the database.

getObject() and unrestrictedSearchResults()

You cannot call getObject() for restricted result, even in trusted code.

Instead you need to use:

unrestrictedTraverse(brain.getPath())

For more information, see

URL of item

Example:

# Return object absolute_url()
url = brain.getURL()

Path of item

Example:

# Return object physical path (in the database) -
# this will include Plone site id inside Zope application server
path = brain.getPath()

Text format

Since most indexes use Archetypes accessors to index the field value, the returned text is UTF-8 encoded. This is limitations inherid from the early ages of Plone.

To get unicode value for e.g. title you need to do the following:

title = brain["Title"]
title = title.decode("utf-8")

if title[0] == u"a*":
    # Unicode text matching etc. functions work correctly now
    pass

Accessing indexed data

Normally you don’t get copy of indexed data with brains, only metadata. You can still access the raw indexed data if you know what you are doing by using RID of the brain object.

Example:

(Pdb) data = self.context.portal_catalog.getIndexDataForRID(872272330)
(Pdb) for i in data.items(): print i
('Title', ['ulkomuseon', 'tarinaopastukset'])
('effectiveRange', (21305115, 278752140))
('object_provides', ['Products.CMFCore.interfaces._content.IDublinCore', 'Products.ATContentTypes.interface.interfaces.IHistoryAware', 'AccessControl.interfaces.IOwned', 'OFS.interfaces.ITraversable', 'plone.portlets.interfaces.ILocalPortletAssignable', 'Products.Archetypes.interfaces._base.IBaseObject', 'zope.annotation.interfaces.IAttributeAnnotatable', 'vs.event.interfaces.IVSEvent', 'Products.CMFCore.interfaces._content.IMutableMinimalDublinCore', 'OFS.interfaces.IPropertyManager', 'OFS.interfaces.IZopeObject', 'AccessControl.interfaces.IRoleManager', 'zope.annotation.interfaces.IAnnotatable', 'Acquisition.interfaces.IAcquirer', 'Products.ATContentTypes.interface.event.IATEvent', 'OFS.interfaces.ICopySource', 'Products.LinguaPlone.interfaces.ITranslatable', 'Products.ATContentTypes.interface.interfaces.ICalendarSupport', 'Products.ATContentTypes.interface.interfaces.IATContentType', 'plone.app.iterate.interfaces.IIterateAware', 'Products.Archetypes.interfaces._base.IBaseContent', 'Products.CMFCore.interfaces._content.ICatalogableDublinCore', 'Products.CMFDynamicViewFTI.interface._base.IBrowserDefault', 'Products.Archetypes.interfaces._referenceable.IReferenceable', 'plone.locking.interfaces.ITTWLockable', 'plone.app.imaging.interfaces.IBaseObject', 'persistent.interfaces.IPersistent', 'webdav.interfaces.IDAVResource', 'AccessControl.interfaces.IPermissionMappingSupport', 'OFS.interfaces.ISimpleItem', 'plone.app.kss.interfaces.IPortalObject', 'plone.app.kss.interfaces.IContentish', 'archetypes.schemaextender.interfaces.IExtensible', 'App.interfaces.IUndoSupport', 'OFS.interfaces.IManageable', 'App.interfaces.IPersistentExtra', 'Products.CMFCore.interfaces._content.IMutableDublinCore', 'Products.Archetypes.interfaces._athistoryaware.IATHistoryAware', 'dateable.kalends.IRecurringEvent', 'OFS.interfaces.IItem', 'zope.interface.Interface', 'OFS.interfaces.IFTPAccess', 'Products.CMFDynamicViewFTI.interface._base.ISelectableBrowserDefault', 'webdav.interfaces.IWriteLock', 'Products.CMFCore.interfaces._content.IMinimalDublinCore', 'Products.CMFCore.interfaces._content.IDynamicType', 'Products.CMFCore.interfaces._content.IContentish'])
('Type', u'VSEvent')
('id', 'ulkomuseon-tarinaopastukset')
('cmf_uid', 2)
('recurrence_days', [733960, 733981, 733974, 733967])
('end', 1077028380)
('Description', ['saamelaismuseon', 'ulkomuseossa', ...
('is_folderish', False)
('getId', 'ulkomuseon-tarinaopastukset')
('start', 1077028380)
('is_default_page', False)
('Date', 1077036795)
('review_state', 'published')
('Language', <LanguageIndex.IndexEntry id 872272330 language fi, cid 8b9a08c216b8e086f3446775ad71a748>)
('portal_type', 'VSEvent')
('expires', 1339244460)
('allowedRolesAndUsers', ['Anonymous'])
('getObjPositionInParent', 10)
('path', '/siida/sisalto/8-vuodenaikaa/ulkomuseon-tarinaopastukset')
('in_reply_to', '')
('UID', '8b9a08c216b8e086f3446775ad71a748')
('Creator', 'admin')
('effective', 1077036795)
('getRawRelatedItems', [])
('getEventType', [])
('created', 1077036792)
('modified', 1077048720)
('SearchableText', ['ulkomuseon', 'tarinaopastukset', ...
('sortable_title', 'ulkomuseon tarinaopastukset')
('meta_type', 'VSEvent')
('Subject', [])

You can also directly access a single index:

# Get event brain result id
rid = event.getRID()
# Get list of recurrence_days indexed value.
# ZCatalog holds internal Catalog object which we can directly poke in evil way
# This call goes to Products.PluginIndexes.UnIndex.Unindex class and we
# read the persistent value from there what it has stored in our index
# recurrence_days
indexed_days = portal_catalog._catalog.getIndex("recurrence_days").getEntryForObject(rid, default=[])

Dumping portal catalog content

Following is useful in unit test debugging:

# Print all objects visible to the currently logged in user
for i in portal_catalog(): print i.getURL()

Bypassing query security check

ノート

Security: All portal_catalog queries are limited to the current user permissions by default.

If you want to bypass this restrictions, use method unrestrictedSearchResults().

Example:

# Print absolute content of portal_catalog
for i in portal_catalog.unrestrictedSearchResults(): print i.getURL()

Bypassing language check

ノート

All portal_catalog() queries are limited to the selected language of current user. You specially need to bypass language check if you want to do multilingual queries.

Example how to bypass language check:

all = portal_catalog(language="ALL")

Expired content check

Plone and portal_catalog has a mechanism to list only active (non-expired) content by default.

Below is an example how the expired content check is made:

mtool = context.portal_membership
show_inactive = mtool.checkPermission('Access inactive portal content', context)

contents = context.portal_catalog.queryCatalog(show_inactive=show_inactive)

See also:

* :doc:`Listing </content/listing>`

None as query parameter

警告

Usually if you pass in None as the query value, it will match all the objects instead of zero objects.

ノート

TODO: How to query None values?

Querying by path

ExtendedPathIndex is the index used for content object paths. Path index stores the physical path of the objects.

** Warning: ** If you ever rename your Plone site instance, path index needs to be rebuild.

Example:

portal_catalog(path={ "query": "/myploneinstance/myfolder" }) # return myfolder and all child content

Query multiple values

KeywordIndex index type indexes list of values. It is used e.g. by Plone’s categories (subject) feature and object_provides` provided interfaces index.

You can either query

  • a single value in the list
  • many values in the list (all must present)
  • any value in the list

The index of the catalog to query is either the name of the keyword argument, a key in a mapping, or an attribute of a record object.

Attributes of record objects

  • query – either a sequence of objects or a single value to be passed as query to the index (mandatory)
  • operator – specifies the combination of search results when query is a sequence of values. (optional, default: ‘or’). Allowed values: ‘and’, ‘or’

Below is an example of matching any of multiple values gives as a Python list in KeywordIndex. It queries all event types and recurrence_days KeywordIndex must match any of given dates:

# Query all events on the site
# Note that there is no separate list for recurrent events
# so if you want to speed up you can hardcode
# recurrent event type list here.
matched_recurrence_events = self.context.portal_catalog(
                portal_type=supported_event_types,
                recurrence_days={
                    "query":recurrence_days_in_this_month,
                    "operator" : "or"
                })

Query by content type

To get all catalog brains of certain content type on the whole site:

campaign_brains = self.context.portal_catalog(portal_type="News Item")

To see available type names, visit in portal_types tool in ZMI.

Query published items

By default, the portal_catalog query does not care about the workflow state. You might want to limit the query to published items.

Example:

campaign_brains = self.context.portal_catalog(portal_type="News Item", review_state="published")

review_state is a portal_catalog index which reads portal_workflow variable “review_state”. For more information, see what portal_workflow tool Content tab in ZMI contains.

Getting a random item

The following view snippet allows you to get one random item on the site:

import random

def getRandomCampaign(self):
    """
    """


    campaign_brains = self.context.portal_catalog(portal_type="CampaignPage", review_state="published")

    # Filter out the current item which we have

    bad_ids = [ "you", "might", "want to black  list some ids here" ]

    items = [ brain for brain in campaign_brains if brain["getId"] not in bad_ids ]

    # Check that we have items left after filtering

    items = list(items)

    if len(items) >= 1:
        # Pick one
        chosen = random.choice(items)
        return chosen.getObject()
    else:
        # Fallback to the current content item if no random options available
        return self.context

Querying FieldIndexes by Range

The following examples demonstrate how to do range based queries. This is useful if you want to find the “minimum” or “maximum” values of something, the example assumes that there is an index called ‘getPrice’.

Get a value that is greater than or equal to 2:

items = portal_catalog({'getPrice':{'query':2,'range':'min'}})

Get a value that is less than or equal to 40:

items = portal_catalog({'getPrice':{'query':40,'range':'max'}})

Get a value that falls between 2 and 1000:

items = portal_catalog({'getPrice':{'query':[2,1000],'range':'min:max'}})

Querying by date

See DateIndex.

Example:

items = portal_catalog(effective_date = {'date': {'query':(DateTime('2002-05-08 15:16:17'),
                                        DateTime('2062-05-08 15:16:17')),
                               'range': 'min:max'})

Another example how to get news items for a particular year in the template code:

<div metal:fill-slot="main" id="content-news"
 tal:define="boundLanguages here/portal_languages/getLanguageBindings;
             prefLang python:boundLanguages[0];
             DateTime python:modules['DateTime'].DateTime;
             start_year request/year| python: 2004;
             end_year request/year| python: 2099;
             start_year python: int(start_year);
             end_year python: int(end_year);
             results python:container.portal_catalog(
                portal_type='News Item',
                sort_on='Date',
                sort_order='reverse',
                review_state='published',
                id=prefLang,
                created={ 'query' : [DateTime(start_year,1,1), DateTime(end_year,12,31)], 'range':'minmax'}
                );
             results python:[r for r in results if r.getObject()];
             Batch python:modules['Products.CMFPlone'].Batch;
             b_start python:request.get('b_start',0);
             portal_discussion nocall:here/portal_discussion;
             isDiscussionAllowedFor nocall:portal_discussion/isDiscussionAllowedFor;
             getDiscussionFor nocall:portal_discussion/getDiscussionFor;
             home_url python: mtool.getHomeUrl;
             localized_time python: modules['Products.CMFPlone.PloneUtilities'].localized_time;">
    ...
</div>

Query by language

You can query by language:

portal_catalog({"Language":"en"})

ノート

Products.LinguaPlone must be installed.

Combining queries using Boolean operators

See AdvancedQuery.

Example:

from Products import AdvancedQuery

portal_catalog = self.portal_catalog # Acquire portal_catalog from higher hierarchy level

path = self.getPhysicalPath() # Limit the search to the current folder and its children

# object.getPhysicalPath() returns the path as tuples of path parts
# Convert path to string
path = "/".join(path)

# Limit search to path in the current contex object and
# match all children implementing either of two interfaces
# AdvancedQuery operations can be combined using Python expressions & | and ~
# or AdvancedQuery objects
query = AdvancedQuery.Eq("path", path) & (AdvancedQuery.Eq("getMyIndexGetter1", "foo") | AdvancedQuery.Eq("getMyIndexGetter2", "bar"))

# The following result variable contains iterable of CatalogBrain objects
results = portal_catalog.evalAdvancedQuery(query)

# Convert the catalog brains to a Python list containing tuples of object unique ID and Title
pairs = []
for nc in results:
    pairs.append((nc["UID"], nc["Title"]))


# query = Eq("path", diagnose_path) & Eq("SearchableText", text_query_target)

query = Eq("path", diagnose_path) & Eq("SearchableText", text_query_target)

return self.context.portal_catalog.evalAdvancedQuery(query)

Sorting results

portal_catalog query takes sort_on argument which tells the index used for sorting. sort_order defines sort direction. It can be string “reverse”.

Sorting is supported only on FieldIndexes. Due to nature of searchable text indexes (they index split text, not strings) they cannot be used for sorting. For example, to do sorting by title, an index called sortable_tite should be used.

Example how to sort by id:

results = context.portal_catalog.searchResults(sort_on="id",
                                               portal_type="Document",
                                               sort_order="reverse")

Unique values

ZCatalog has uniqueValuesFor() method to retrieve all unique values for a certain index. It is intended to work on FieldIndexes only.

Example:

# getArea() is Archetype accessor for area field
# which is a string and tells the contet area.
# Custom getArea FieldIndex indexes these values
# to portal catalog.
# The following line gives all area values
# inputted on the site.
areas = portal_catalog.uniqueValuesFor("getArea")