Editions

Richie Editions Search

Developer guide for website implementators

Richie Editions Search is a REST-JSON API for finding words and phrases in issues stored on the Richie Search server. It may be called from native applications as well as used as a companion service to Richie Editions HTML5.

The Search API itself does not require authentication. If you wish to restrict searching to logged-in users, you may implement restrictions on your own site. However, it may be beneficial to provide search functionality to all users, and redirect non-logged-in users to a subscription of payment flow when they select an issue to open.

When used with Richie Editions HTML5, you must generate sign-in links to access issues discovered by searches when an authorized user selects an issue to view. Generating sign-in links is described in the Single Sign-On documentation.

The Search API is available on the general Richie API server appdata.richie.fi. For example to access version 1 of the search endpoint, use the URL https://appdata.richie.fi/search/v1.

API Endpoints

Path: /search/v1

Description

Search API will search the document collection for the provided query. It returns a collection of metadata of matching documents, and a headline containing words around the matching positions.

The text to be searched in the document index is provided in the query, as parameter q. The query is split into words at each whitespace sequence, and too common words in the language of the document collection (stop words) are discarded. Each whitespace sequence in the query is considered a Boolean AND operator, so a matching document has to contain each word in the query unless the word is a stop word. The order of the words is not significant.

Also, the words are reduced to forms without suffixes while performing the search. For example, for Finnish-language content in the collection, a query with a word "lentoasema" will also match documents with possessive forms ("lentoaseman"), plural forms ("lentoasemat") or both possessive and plural forms ("lentoasemien") of word "lentoasema".

To restrict the search to a given time range, parameters from and to can be used to specify the first and last publication dates included in the search. By default, all the issues in the database are searched regardless their publication dates.

The output is paginated - for instance, with default page_offset and page_size the API will return 25 best matching or newest documents in the store, and more matches can be queried with a page_offset larger than 0.

Query parameters

  • products for comma-separated list of products to search in, like foo,archive
  • q for query, mandatory
  • order_with_dir for requesting an order and a direction (rank, date_asc, date_desc). If no value is specified, rank is assumed.
  • from for first date included in the search (optional, YYYYMMDD like 19820101)
  • until for the last date included in the search (optional, YYYYMMDD like 20170430)
  • page_size (optional, default 25)
  • page_offset (optional, default 0)

Example

Fetch metadata of the newest document containing Töölönlahti or a closely related word in documents belonging to the product called hsarchive/archive-hs.

GET /search/v1?q=T%C3%B6%C3%B6l%C3%B6nlahti&products=hsarchive/archive-hs&order_with_dir=date_asc&page_size=1 HTTP/1.1
HTTP/1.1 200 OK
Content-Length: 855
Content-Type: application/json; charset=utf-8
Date: Wed, 31 May 2017 18:45:53 GMT
 
{
    "direction": "asc",
    "matches": [
        {
            "headline": "<b>Töölönlahteen</b>. Sokeritehdas aloitti Töölönlahden rannalla 1823. Uuden ympäristöohjelman tavoitteena on palauttaa <b>Töölönlahti</b> uimakelpoiseksi. Suunnitelmista",
            "issue_uuid": "99ea7ce2-0155-48fe-88bb-8f1de06e41e6",
            "metadata": {
                "name": "28.05.1994",
                "orientation": "portrait",
                "thumbnail": {
                    "height": 240,
                    "url": "https://appdata.richie.fi/maggio/issue_42c74449-8042-41ff-823f-5c0080915744_issue/issue_html5_scaled_max.tar/13b85e51-006b-42fd-bc1a-194be8006628_p48/page_thumbnail.jpg",
                    "width": 176
                }
            },
            "page": 49,
            "product": "hsarchive/archive-hs",
            "published_at": "19940528",
            "rank": 5
        }
    ],
    "order": "date",
    "params": {
        "q": "Töölönlahti",
        "page_offset": 0,
        "page_size": 1
    },
    "total_count": 928
}

Response fields

  • order: Either rank or date, depending on whether response is ordered by ranking (relevancy) or publishing date. If the value is date, the user might get better-matching results by using a more specific query.
  • direction: asc or desc, depending whether the results are in ascending or descending order.
  • params: The parameters provided in the request.

Each error response carries a JSON object with error. For example:

GET /search/v1
HTTP/1.1 400 Bad Request
Content-Length: 44
Content-Type: application/json; charset=utf-8
Date: Fri, 26 May 2017 16:15:45 GMT
 
{
    "error": "Missing mandatory parameter 'q'"
}

Phrase coordinates

Path: /search/v1/matchCoordinates

Description

Returns the coordinates of a matching word on a page. These may be used to highlight matched words in the thumbnail shown to the user.

Query parameters

  • products for comma-separated list of products to search in, like foo,archive
  • q for the oiginal search query.
  • issue_uuid for the UUID of the issue containing the page.
  • page for the number of the page.

Example

GET /search/v1/matchCoordinates?products=hsarchive/archive-hs&q=T%C3%B6%C3%B6l%C3%B6nlahti&issue_uuid=99ea7ce2-0155-48fe-88bb-8f1de06e41e6&page=49 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Tue, 10 Oct 2017 12:07:13 GMT
Transfer-Encoding: chunked
 
{
  "coordinates": [
    [
      [0.058258947,  0.13282979,  0.21115434,  0.13282979,  0.21115434,  0.15376516,  0.058258947,  0.15376516]
    ],
    [
      [0.07461662,  0.25288776,  0.08191043,  0.25288776,  0.08191043,  0.2558938,  0.07461662,  0.2558938]
    ],
    [
      [0.1191577,  0.28908217,  0.12631418,  0.28908217,  0.12631418,  0.2920882,  0.1191577,  0.2920882]
    ],
    [
      [0.058396276,  0.31206226,  0.0998703,  0.31206226,  0.0998703,  0.3164721,  0.058396276,  0.3164721]
    ],
    [
      [0.11885252,  0.32559702,  0.12616159,  0.32559702,  0.12616159,  0.32849622,  0.11885252,  0.32849622]
    ],
    [
      [0.14003204,  0.4051728,  0.1725948,  0.4051728,  0.1725948,  0.4094606,  0.14003204,  0.4094606],
      [0.058258947,  0.4113985,  0.06877241,  0.4113985,  0.06877241,  0.41557947,  0.058258947,  0.41557947]
    ],
    [
      [0.08644236,  0.4532845,  0.12805371,  0.4532845,  0.12805371,  0.4576791,  0.08644236,  0.4576791]
    ]
  ]
}

Response fields

  • coordinates: An array of quadrilateral coordinates of words occurrences (defined by 4 vertices, represented as arrays of 8 numbers), relative to the dimensions of the page. The outermost array has an entry for each occurrence of the word. Each occurrence is represented by an array of coordinate arrays, since each occurrence may span to multiple lines. Innermost array represents coordinates defining the polygon containing matching text.

In the example above, a page has 7 occurrences of a matching word. The 6th occurrence spans to two lines, and each one of the other matches is on one line. Each 8-number array defines a polygon.

For example, [0.14003204, 0.4051728, 0.1725948, 0.4051728, 0.1725948, 0.4094606, 0.14003204, 0.4094606] is a polygon whose vertices are the points (0.140, 0.405), (0.173, 0.405), (0.173, 0.409) and (0.140, 0.409) on the page represented as an (x, y) plane.

Previous
Single Sign-On