Editions
Richie Editions Search
Developer guide for website implementators
Richie Editions Search is a REST-JSON API for finding words and phrases in issues stored on the Richie Search server. It may be called from native applications as well as used as a companion service to Richie Editions HTML5.
The Search API itself does not require authentication. If you wish to restrict searching to logged-in users, you may implement restrictions on your own site. However, it may be beneficial to provide search functionality to all users, and redirect non-logged-in users to a subscription of payment flow when they select an issue to open.
When used with Richie Editions HTML5, you must generate sign-in links to access issues discovered by searches when an authorized user selects an issue to view. Generating sign-in links is described in the Single Sign-On documentation.
The Search API is available on the general Richie API server
appdata.richie.fi
. For example to access version 1 of the search
endpoint, use the URL https://appdata.richie.fi/search/v1
.
API Endpoints
Phrase search
Path: /search/v1
Description
Search API will search the document collection for the provided query. It returns a collection of metadata of matching documents, and a headline containing words around the matching positions.
The text to be searched in the document index is provided in the
query, as parameter q
. The query is split into words at each
whitespace sequence, and too common words in the language of the
document collection (stop words) are discarded. Each whitespace
sequence in the query is considered a Boolean AND operator, so a
matching document has to contain each word in the query unless the
word is a stop word. The order of the words is not significant.
Also, the words are reduced to forms without suffixes while performing the search. For example, for Finnish-language content in the collection, a query with a word "lentoasema" will also match documents with possessive forms ("lentoaseman"), plural forms ("lentoasemat") or both possessive and plural forms ("lentoasemien") of word "lentoasema".
To restrict the search to a given time range, parameters from
and
to
can be used to specify the first and last publication dates
included in the search. By default, all the issues in the database are
searched regardless their publication dates.
The output is paginated - for instance, with default page_offset
and
page_size
the API will return 25 best matching or newest documents
in the store, and more matches can be queried with a page_offset
larger than 0.
Query parameters
products
for comma-separated list of products to search in, likefoo,archive
q
for query, mandatoryorder_with_dir
for requesting an order and a direction (rank
,date_asc
,date_desc
). If no value is specified,rank
is assumed.from
for first date included in the search (optional,YYYYMMDD
like19820101
)until
for the last date included in the search (optional,YYYYMMDD
like20170430
)page_size
(optional, default25
)page_offset
(optional, default0
)
Example
Fetch metadata of the newest document containing Töölönlahti
or a
closely related word in documents belonging to the product called
hsarchive/archive-hs
.
GET /search/v1?q=T%C3%B6%C3%B6l%C3%B6nlahti&products=hsarchive/archive-hs&order_with_dir=date_asc&page_size=1 HTTP/1.1
HTTP/1.1 200 OK
Content-Length: 855
Content-Type: application/json; charset=utf-8
Date: Wed, 31 May 2017 18:45:53 GMT
{
"direction": "asc",
"matches": [
{
"headline": "<b>Töölönlahteen</b>. Sokeritehdas aloitti Töölönlahden rannalla 1823. Uuden ympäristöohjelman tavoitteena on palauttaa <b>Töölönlahti</b> uimakelpoiseksi. Suunnitelmista",
"issue_uuid": "99ea7ce2-0155-48fe-88bb-8f1de06e41e6",
"metadata": {
"name": "28.05.1994",
"orientation": "portrait",
"thumbnail": {
"height": 240,
"url": "https://appdata.richie.fi/maggio/issue_42c74449-8042-41ff-823f-5c0080915744_issue/issue_html5_scaled_max.tar/13b85e51-006b-42fd-bc1a-194be8006628_p48/page_thumbnail.jpg",
"width": 176
}
},
"page": 49,
"product": "hsarchive/archive-hs",
"published_at": "19940528",
"rank": 5
}
],
"order": "date",
"params": {
"q": "Töölönlahti",
"page_offset": 0,
"page_size": 1
},
"total_count": 928
}
Response fields
order
: Eitherrank
ordate
, depending on whether response is ordered by ranking (relevancy) or publishing date. If the value isdate
, the user might get better-matching results by using a more specific query.direction
:asc
ordesc
, depending whether the results are in ascending or descending order.params
: The parameters provided in the request.
Each error response carries a JSON object with error
. For example:
GET /search/v1
HTTP/1.1 400 Bad Request
Content-Length: 44
Content-Type: application/json; charset=utf-8
Date: Fri, 26 May 2017 16:15:45 GMT
{
"error": "Missing mandatory parameter 'q'"
}
Phrase coordinates
Path: /search/v1/matchCoordinates
Description
Returns the coordinates of a matching word on a page. These may be used to highlight matched words in the thumbnail shown to the user.
Query parameters
products
for comma-separated list of products to search in, likefoo,archive
q
for the oiginal search query.issue_uuid
for the UUID of the issue containing the page.page
for the number of the page.
Example
GET /search/v1/matchCoordinates?products=hsarchive/archive-hs&q=T%C3%B6%C3%B6l%C3%B6nlahti&issue_uuid=99ea7ce2-0155-48fe-88bb-8f1de06e41e6&page=49 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Tue, 10 Oct 2017 12:07:13 GMT
Transfer-Encoding: chunked
{
"coordinates": [
[
[0.058258947, 0.13282979, 0.21115434, 0.13282979, 0.21115434, 0.15376516, 0.058258947, 0.15376516]
],
[
[0.07461662, 0.25288776, 0.08191043, 0.25288776, 0.08191043, 0.2558938, 0.07461662, 0.2558938]
],
[
[0.1191577, 0.28908217, 0.12631418, 0.28908217, 0.12631418, 0.2920882, 0.1191577, 0.2920882]
],
[
[0.058396276, 0.31206226, 0.0998703, 0.31206226, 0.0998703, 0.3164721, 0.058396276, 0.3164721]
],
[
[0.11885252, 0.32559702, 0.12616159, 0.32559702, 0.12616159, 0.32849622, 0.11885252, 0.32849622]
],
[
[0.14003204, 0.4051728, 0.1725948, 0.4051728, 0.1725948, 0.4094606, 0.14003204, 0.4094606],
[0.058258947, 0.4113985, 0.06877241, 0.4113985, 0.06877241, 0.41557947, 0.058258947, 0.41557947]
],
[
[0.08644236, 0.4532845, 0.12805371, 0.4532845, 0.12805371, 0.4576791, 0.08644236, 0.4576791]
]
]
}
Response fields
coordinates
: An array of quadrilateral coordinates of words occurrences (defined by 4 vertices, represented as arrays of 8 numbers), relative to the dimensions of the page. The outermost array has an entry for each occurrence of the word. Each occurrence is represented by an array of coordinate arrays, since each occurrence may span to multiple lines. Innermost array represents coordinates defining the polygon containing matching text.
In the example above, a page has 7 occurrences of a matching word. The 6th occurrence spans to two lines, and each one of the other matches is on one line. Each 8-number array defines a polygon.
For example,
[0.14003204, 0.4051728, 0.1725948, 0.4051728, 0.1725948, 0.4094606, 0.14003204, 0.4094606]
is a polygon whose vertices
are the points (0.140, 0.405), (0.173, 0.405), (0.173, 0.409)
and (0.140, 0.409) on the page represented as an (x, y) plane.