Monitoring the Server

iFun Engine Counter

iFun Engine exports its internal states via RESTful API. This is named Counter. Provided counters are listed in Counters. Also, you can add your own counters as well as pre-defined ones.

How to use the Counter

There are various counter-related functions in counters.h. Suppose we have a code snippet like this:

#include <funapi.h>

void example(){

  // Sets the item_count to 150.
  // You can choose a counter category, a counter name, and a value.
  UpdateCounter("server", "item_count", 150);

  // Also, possible to add a description.
  // UpdateCounter("server", "item_count", "The number of items", 150);

  // Increases "item_count" in the "server" category by 1.
  IncreaseCounterBy("server", "item_count", 1);

  // Decreases "item_count" in the "server" category by 1.
  DecreaseCounterBy("server", "item_count", 1);

  // Reads "item_count" in the "server".
  int64_t item_count = ReadCounterAsInteger("server", "item_count");
  BOOST_ASSERT(item_count == 150);

  // Floating point is also possible.
  UpdateCounter("server", "connection_per_second", 77.7);

  // Another counter category and name example.
  UpdateCounter("billing", "purchase_per_second", 7.1);
}

After writing the C++ code, RESTful APIs auto generated for the defined counters:

  • GET http://localhost:8014/v1/counters/: returns a list of all counter categories.

  • GET http://localhost:8014/v1/counters/funapi/: returns a list of counters in the reserved “funapi” category.

  • GET http://localhost:8014/v1/counters/server/item_count/: returns the counter “item_count” in the category “server”. It should be 150 in this example.

  • GET http://localhost:8014/v1/counters/billing/purchase_per_second/: return the counter “purchase_per_second” in the “billing” category. It should 7.1 in this example.

  • GET http://localhost:8014/v1/counters/server/item_count/description/: returns the description about the counter, if any. This is useful if you need to cooperate with external engineers.

Complex Counters with Callbacks

Some counters may be complex enough to dynamically calculate the value. Suppose we want to dynamically compute an average value while storing the sum and the count. In this case, we can add a callback for a counter. The callback will be triggered once the counter is invoked. Please see an example below:

#include <funapi.h>

http::StatusCode OnAverageQueried(const string &counter_group, const string &counter_name, Json *ret) {
   // If multiple counters share the same callback, you can distinguish using
   // "counter_group" and "counter_name" parameters.

   // In this example,
   // "counter_group" must be "server" as we specified the group name below.
   // Similarly, "counter_name" must contain "average_users_per_room".

   // Say, we are keeping track of total_users and total_rooms as global variables.

   // To make it more RESTful, we return kNoContent if no room.
   if (total_rooms == 0) {
     return http::kNoContent;
   }

   // Computes an average from the sum and the number.
   double average = total_users / total_rooms;

   // Stores the result.
   ret->SetDouble(average);

   // Returns that the result is available.
   return http::kOk;
}


void example() {
  // Registers a counter "average_users_per_room" in the "server" category.
  // And makes the counter trigger a callback when it's read.
  RegisterCallableCounter(
    "server", "average_users_per_room",
    "Returns the average number of users per game room", OnAverageQueried);
}
  • GET http://localhost:8014/v1/counters/server/average_users_per_room/: This API will invoke OnAverageQueried to get an average value.

Monitoring the Counter Values

There could be a case that we need to monitor some counters if they exceed thresholds. For example, spike in the number of rare items can be a signal that there’s an item duplication bug. iFun Engine can set a threshold for a counter of integer or double.

Here’s an example:

#include <funapi.h>


void OnResetGoldCounterTimerExpired(const Timer::Id &,
                                    const WallClock::Value &) {
  // Resets "gold_per_hour" to 0.
  UpdateCounter("game", "gold_per_hour", 0);
}


void Install() {
  // Registers "gold_per_hour" in the category "game".
  UpdateCounter("game", "gold_per_hour", 0);

  // Requests iFun Engine to report if "gold_per_hour" exceeds 100K.
  // Please note that counter registration is required before setting a threshold.
  MonitorCounter("game", "gold_per_hour", 100000);

  // Registers a timer to reset the counter.
  Timer::ExpireRepeatedly(WallClock::FromSec(3600), OnResetGoldCounterTimerExpired);
}


// Say, this function is invoked when the player earns gold.
void PickGold(int64_t gold) {
  // 골드량을 증가시킵니다.
  IncreaseCounterBy("game", "gold_per_hour", gold);
}

Note

Counter registration by either UpdateCounter(), IncreaseCounterBy(), or DecreaseCounterBy() must be done before MonitorCounter().

If the amount of gold exceeds 100K in an hour, we will see a message like this:

W0818 11:03:06.520730 18324 counter.cc:160] The 'gold_per_hour of game' counter exceeded threshold: value=123456, threshold=100000

If MonitorCounter() left log messages after exceeding the threshold, it would generate too many log messages. Instead, MonitorCounter() checks counters in question once every seconds given in counter_monitoring_interval_in_sec in MANIFEST.json.

As well as custom counters you register, iFun Engine itself monitors following counters:

  • event

    • event_queue_length: the number of events in the queue.

  • object

    • outstanding_fetch_query: the number of DB read operations in the queue.

    • outstanding_update_query: the number of DB write operations in the queue.

Regarding the threshold and the interval, please see the Counter MANIFEST.json.

Counters

These are predefined counter categories and counters in the categories:

  • process : Information about the running iFun Engine process. It’s the same as Linux PS output.

    • vsz

    • cpu

    • nivcsw

    • nswap

    • oublock

    • minflt

    • idrss

    • isrss

    • ixrss

    • nsignals

    • majflt

    • maxrss

    • msgsnd

    • msgrcv

    • nvcsw

    • stime

    • updated

    • utime

    • inblock

    • refresh_interval: counter’s refresh interval

  • os : Information about the server OS.

    • procs: The number of processors.

    • totalswap: Total swap size in bytes.

    • freeswap: Available swap size in bytes.

    • bufferram: Buffer memory size in bytes.

    • load15: Load average seen during the last 15 minutes.

    • load5: Load average seen during the last 5 minutes.

    • load1: Load average seen during the last 1 minute.

    • uptime: Server up time in seconds.

    • cpus: The number of cpu cores.

    • totalram: Total RAM size in bytes.

    • freeram: Free RAM size in bytes.

    • sharedram: Shared memory size in bytes.

    • type: os type string.

    • updated: Time stamp of the counter.

    • refresh_interval: counter’s refresh interval.

  • funapi: Information about the iFun Engine.

    • object_database_stat: Per-DB query processing stats. Please see (Advanced) Profiling The ORM for more information.

    • object: The number of objects cached on the server, the number of DB read / write operations in the queue.

    • rpc_stat: rpc-related stats.

    • zookeeper_stat: Zookeeper performance stats Please refer to Zookeeper profiling.

    • event/performance/queue: 서버의 이벤트 유입량, 처리량, 대기 중인 이벤트 수 등을 조회합니다.

    • event/profiling/summary: event profiling data. Please refer to Event profiling: summary for more information.

    • event/profiling/all: Per-event profiling data. Please refer to Event profiling: details for more information.

    • event/profiling/reset: flag if resetting the event profiling.

    • event/profiling/outstanding: The number of outstanding events.

  • funapi_object_model: Per-type the number of objects cached on the server.

MANIFEST.json

Counter relies on API Service. So, please see API Service MANIFEST as well as parameters in this section.

  • Component: CounterService

    • Arguments

      • counter_monitoring_interval_in_sec: Interval in seconds to monitor counters in question. Please see Monitoring the Counter Values on counter monitoring. It defaults to 30 seconds.

      • warning_threshold_event_queue_length: Threshold for the event_queue_length of the event counter. It defaults to 3000.

      • warning_threshold_outstanding_fetch_query: Threshold for the outstanding_fetch_query counter. It defaults to 5000.

      • warning_threshold_outstanding_update_query: Threshold for the outstanding_update_query counter. It defaults to 5000.

      • warning_threshold_slow_query_in_sec: Outputs a warning message if the query takes longer than specified seconds during counter monitoring(type=uint64, default=1)

      • warning_threshold_slow_distribution_in_sec: Outputs a warning message if processing time of the distribution(Redis/ZooKeeper) is longer than specified seconds during counter monitoring(type=uint64, default=3)

iFun Engine Dashboard

iFun Engine provides a dashboard that collects various performance metrics and visualizes the status of a running server (including possible performance bottlenecks.) The dashboard is implemented on top of iFun Engine Counter.

In short, iFun Engine Dashboard provides functionalities as follows:

OS-level resource monitoring

This monitoring includes the number of player sessions and the number of players authenticated, as well as CPU and RAM usage.

_images/game-server.png

Performance of iFun Engine’s event subsystem

The dashboard visualizes how many events generated per second, how long to handle them, and how much the queue increased.

_images/summary-page.png

Performance of iFun Engine’s ORM subsystem

For each DB (for shard instance DB if using the iFun Engine’s sharding feature), the dashboard visualizes read/write processing times, the number of read/write requests in the queue, and the number of data objects cached in the memory.

_images/objects.png

Performance of iFun Engine’s distribution subsystem

The dashboard depicts RPC traffic between iFun Engine servers.

_images/rpc-page.png

Per-Stage processing time and queuing time

For individual, user-defined stages, iFun Engine can compute processing time and queuing delay before entering a stage.

Overview information

_images/dash-summary-1.png
_images/dash-summary-2.png

Tabular format

_images/dash-tabular.png

Execution time analysis

_images/dash-exec-time.png

Waiting time analysis

_images/dash-wait-time.png