Komodia's Redirector classification server integration

From Komodia
Jump to: navigation, search

Background

Komodia's classification server (URL server) is integrated into the Komodia's Redirector. In order for it to work you need to have a key entry inside the code, this is can only done by Komodia.

If the key is not embedded in the code then the code that handles classification is not compiled and will not work in your version.

Features

The module inside the SDK has the following features:

  • Will detect which protocol to work with the classification server, binary or HTTP.
  • Will detect the best server to work with (closest one).
  • In case of server failure will fallback to the next best server.
  • Sites category are managed according to the server replies which means that for sites that are master (one category for the entire site) will not be queried again, better latency.
  • Cache is saved for one hour.

How it works

Category structure

The categories will come in a string as comma delimited IDs, based on the IDs of the category server (URL server categories).

For example the results:

47,70

Means news and sports

At request

The SDK will check if it has the category in the cache, if it does, it will supply the cached result.

If it doesn't has the cache it will check the extension of the URL requested, if it's an excluded extension it will return an "excluded" indication (excluded is category number 254), if the extension is not excluded it will initiate a query and the category field will be empty.

To avoid unnecessary delays, requests are never held even if site classification is not yet known, for most sites, until you get a reply from the web server you should have the category at the reply stage.

COM Interface

Category will be inside bRedirectTo at the call to Komodia's Redirector COM framework guide#NewRequest.

DLL Interface

Category will be inside ppNewHeader (if *ppNewHeader is not null, it will contain the category) at the call to Komodia's Redirector DLL framework guide#HTTPRequestBeforeSend.

Don't modify or delete the category string pointer, it will corrupt the memory.

At reply

The SDK will check if it has the category in the cache (as a result of this query or a past query), if it does, it will supply the cached result.

If it doesn't has the cache it will check the extension of the URL requested, if it's an excluded extension it will return an "excluded" indication (excluded is category number 254), if the extension is not excluded and there's no reply from the server yet, the category field will be empty and the programmer would be able to request that the SDK will wait for a result (using the return code hrWaitForCategory from COM or dhrWaitForCategory from the DLL).

COM Interface

Category will be inside bRedirectTo at the call to Komodia's Redirector COM framework guide#NewReply.

DLL Interface

Category will be inside ppNewHeader (if *ppNewHeader is not null, it will contain the category) at the call to Komodia's Redirector DLL framework guide#HTTPRequestBeforeReply.

Flow

  1. Receive request notification, category can be:
    1. Actual category, which allows you to allow/block the request.
    2. Whitelist category, the extension is not classified.
    3. No category, wait for category at reply.
  2. Optional Receive reply notification for partial content, category can be:
    1. Actual category, which allows you to allow/block the request.
    2. Whitelist category (based on content-type), the extension is not classified.
    3. No category, wait for full reply, return hrNothing for COM or dhrNothing for DLL.
  3. Receive full reply notification, category can be:
    1. Actual category, which allows you to allow/block the request.
    2. Whitelist category (based on content-type), the extension is not classified.
    3. No category, return hrWaitForCategory from COM or dhrWaitForCategory from the DLL to wait for a reply, function will be called again with the category.

Protocol

The SDK implements both the HTTP and the binary protocol, it will first try the binary protocol but will fallback to HTTP if there's a blocking firewall.

Client side PerPageSDK

When the client side PerPageSDK is present in the SDK, the classification will work as normal, with the following differences:

  • If you return dhrSkip or atSkip the SDK will decide if to hold the page or not.
  • If you return dhDontDoParental or atDontDoParental the PerPageSDK will not process the page.
  • If you return dhNothing or atNothing, the PerPageSDK will process the data if needed.

Flags

  • kcsppenabled - Set to 1 in order to enable per page classification
  • kcsppbreakdown - Set to 1 in order to get differentiation between server response and client's response, the response will be in the form of:
Server= ID, ID Client=ID: Weight, ID: Weight