Step-by-Step Installation

<-- Overview

Custom Resource -->

🗂️ Create database topology config file

HariKube determines data locality using the object key structure and applies routing based on configurable policies, such as matching by resource type, namespace, key prefix, or custom resource definition.

Routing configurations are evaluated in order from top to bottom, and the first matching rule determines the data’s target database. Once a match is found, subsequent rules are ignored for that resource.

Routing policies must be carefully designed, as adding or changing a policy for resource types that already have stored data can result in the existing records becoming inaccessible. HariKube does not migrate previously stored resources to the new target automatically, so any change in routing may lead to apparent data loss unless migration handled manually.

During runtime the middleware monitors configuration changes and applies new configuration, but only adding new configuration to the bottom is supported.

Names and endpoints must be unique in the configuration. If you have to change endpoint, first ensure all data exists on the new endpoint, and then restart the middleware. If you have to change name, restart the middleware and all services - including Kubernetes - which depends on historical data.

topology.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
backends:
- name: rbac
  endpoint: http://127.0.0.1:2579
  regexp:
    prefix: (clusterrolebindings|clusterroles|rolebindings|roles|serviceaccounts)
    key: (clusterrolebindings|clusterroles|rolebindings|roles|serviceaccounts)
- name: kube-system
  endpoint: mysql://root:passwd@tcp(127.0.0.1:3306)/kube_system
  namespace:
    namespace: kube-system
- name: pods
  endpoint: postgres://postgres:passwd@127.0.0.1:5432/pods
  prefix:
    prefix: pods
- name: shirts
  endpoint: sqlite://./db/shirts.db?_journal=WAL&cache=shared
  customresource:
    group: stable.example.com
    kind: shirts

Routing Configuration Explained

ETCD with regular expression routing: Routes Kubernetes RBAC resources to an ETCD store.
MySQL endpoint with namespace matching: All objects in the kube-system namespace are routed to a MySQL backend.
If you want only a selected list of resources, you can configure them via kinds field. For custom resources you have to create a separate policy, because both given types and custom resources are not supported in the same time.
topology.yaml
1 2 3 4 5 6 7
- name: kube-system endpoint: mysql://root:passwd@tcp(127.0.0.1:3306)/kube_system namespace: namespace: kube-system kinds: - pods - deployments
PostgreSQL endpoint with prefix matching: All pods resources - except pods in kube-system namespace - are routed to a PostgreSQL backend.
SQLite endpoint for specific custom resources: Routes all resources of type shirts in the group stable.example.com to a lightweight embedded SQLite database.
Rest of the objects are stored in the default database.

Advanced Configuration Options

topology.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
backends:
- name: kube-system
  endpoint: https://127.0.0.1:2579
  namespace:
    namespace: kube-system
  backendTLS:
    caFile: /path/to/ca-file
    certFile: /path/to/cert-file
    keyFile: /path/to/key-file
    skipVerify: true

topology.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
backends:
- name: kube-system
  endpoint: mysql://root:passwd@tcp(127.0.0.1:3306)/kube_system
  namespace:
    namespace: kube-system
  connectionPool:
    maxIdle: 5
    maxOpen: 90
    maxLifeTime: 5m
    maxIdleLifeTime: 5m

You can deploy multiple instances of the middleware, where each instance can be configured to delegate to another middleware as its backend. This enables the construction of a multi-layered, hierarchical topology, where data access requests can propagate through multiple layers of the middleware.

Initialize your primary middleware instance to act as the entry point for Kubernetes by launching it with the following configuration. This instance will serve as the top-level interface for Kubernetes API server requests, abstracting the underlying data storage layer while maintaining full compatibility with the etcd API.

main-topology.yaml

1
2
3
4
5
backends:
- name: kube-system
  endpoint: http://<kube-system.middleware.server:2379>
  namespace:
    namespace: kube-system

Routing Configuration Explained

ETCD endpoint with namespace matching: All objects in the kube-system namespace are routed to a ETCD backend.
Rest of the objects are stored in the default database.

As shown, this middleware connects to a backing etcd server to store and retrieve objects within the kube-system namespace. However, instead of connecting directly to etcd, you can deploy another middleware that listens on kube-system.etcd.server:2379. This allows you to insert an additional routing layer between Kubernetes and the storage backend — enabling advanced behaviors, all while preserving the etcd-compatible API surface.

kube-system-topology.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
backends:
backends:
- name: pods
  endpoint: http://127.0.0.1:2579
  prefix:
    prefix: pods
- name: secrets
  endpoint: mysql://root:passwd@tcp(127.0.0.1:3306)/pods
  prefix:
    prefix: secrets
- name: configmaps
  endpoint: postgres://postgres:passwd@127.0.0.1:5432/configmaps
  prefix:
    prefix: configmaps
- name: deployments
  endpoint: sqlite://./db/deployments.db?_journal=WAL&cache=shared
  prefix:
    prefix: deployments

Routing Configuration Explained

ETCD endpoint with prefix matching: All pods resources are routed to an ETCD store.
MySQL endpoint with prefix matching: All secrets resources are routed to a MySQL backend.
PostgreSQL endpoint with prefix matching: All configmaps resources are routed to a PostgreSQL backend.
SQLite endpoint with prefix matching: All deployments resources are routed to a SQLite backend.
Rest of the objects are stored in the default database.

📘 Metadata Store Configuration

HariKube includes an internal metadata store that maintains mapping information about the underlying databases. It keeps track of which database is responsible for each data segment, ensuring consistency and fast lookups without querying every backend directly. The metadata store is central to HariKube’s ability to provide dynamic data placement, multi-database support, and high-performance routing across flat or hierarchical topologies. To configure metadata store you have to set environment variable(s) for the middleware. Default Metadata store is sqlite.

REVISION_MAPPER_HISTORY: Defines how long metadata revisions are retained in the system. After this period, older revisions are treated as compacted and are no longer accessible for historical lookups. This helps manage storage usage by limiting how long old revision data is preserved. If 0, the compaction is disabled. Default 4h
REVISION_MAPPER_CACHE_CAPACITY: Capacity of the in-memory cache. Default 10000, 1000000 for In-memory
REVISION_MAPPER_SKIP_VERIFY: Skip TLS verification. Default false

HariKube mappers are optimized for high-throughput environments, and some persist metadata asynchronously; in worst-case scenarios where metadata is lost, restarting services that rely on historical revisions will safely reinitialize a fresh revision history—allowing the system to continue operating, even if older state is no longer available.

The File System Mapper offers a lightweight, low-dependency alternative to traditional in-memory or database-backed mappers by persisting revision and metadata information directly to disk. However, it comes with tradeoffs: file I/O operations can introduce latency under high concurrency, and it lacks the built-in guarantees, or consistency of systems like ETCD or SQL-based mappers.

REVISION_MAPPER=file: Type of metdata store
REVISION_MAPPER_FILE_O2G_PATHS: Comma separated list of paths to the original-to-generated database directories. Default ./db/mapping-o2g
REVISION_MAPPER_FILE_G2O_PATHS: Comma separated list of paths to the generated-to-original database directories. Default ./db/mapping-g2o
REVISION_MAPPER_FILE_LEASE_PATH: Path to the lease database directory. Default ./db/mapping-lease

Changing of database directories doesn’t supported at the moment. To change storage layout you have to delete all existing directories, and restart all services depend on revision history.

5000 records creation on 60 threads

Total execution time: 43s

Average response time: ~14 μs
Maximum response time: ~65 μs

Performance Tip

When you layer a tmpfs-backed memory tier with an SSD-based cache (via dm-cache or bcache), you combine near-RAM latency with durable storage—despite bcache itself only boosting throughput by 10-20 percent, the overall architecture still delivers sub-millisecond I/O with guaranteed persistence. This hybrid model excels at handling small, random reads and writes in high-throughput environments while ensuring data survives reboots or failures. Looking ahead, as persistent memory modules and NVMe-over-Fabric become mainstream, this flexible two-tier approach will evolve into fully adaptive storage fabrics that automatically balance speed, capacity and resilience.

How to create bcache boosted storage?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apk add bcache-tools # Install bcache
modprobe bcache # Load bcache module
mkdir /mnt/ramcache0 # Create tmpfs mount point
mount -t tmpfs -o size=4G tmpfs /mnt/ramcache0 # Create tmpfs
fallocate -l 4G /mnt/ramcache0/bcache.img # Create a block device on tmpfs (you can use zram or brd)
losetup /dev/<loop0> /mnt/ramcache0/bcache.img # Set an unused loop device (not necessary on zram or brd)
make-bcache -C /dev/<loop0> # Format your RAM-based block device as a cache
make-bcache -B /dev/<sdb1> # Format your SSD (or partition) as a backing store
echo /dev/<sdb1> | tee /sys/fs/bcache/register # Register backing device (SSD)
echo /dev/<loop0> | tee /sys/fs/bcache/register # Register cache device
mkfs.ext4 /dev/bcache0 # Create file system on bcache device
mkdir /mnt/bcache0 # Create bcache0 mount point
mount /dev/bcache0 /mnt/bcache0 # Mount block device

The ETCD Mapper provides a production-grade, strongly consistent solution for managing revision metadata. Designed to run in a dedicated ETCD instance, it offers high availability and safe multi-node coordination — making it ideal for large-scale, deployments. While the mapper itself maintains an up-to-date in-memory cache, revision metadata is persisted to ETCD asynchronously. This means that although the mapper has the latest state, the backing database may temporarily lag behind the most recent revision.

You can use the same database instance for all metadata, be sure the endpoints are equal.

REVISION_MAPPER=etcd: Type of metdata store
REVISION_MAPPER_ETCD_BATCH_SIZE: Size of batch write operation. Default 8
REVISION_MAPPER_ETCD_O2G_ENDPOINTS: Comma separated list of original-to-generated endpoints, for example http://etcd.server:2379
REVISION_MAPPER_ETCD_O2G_CA_FILE: Path to the Certificate Authority (CA) file used to establish trust for the original-to-generated database connection
REVISION_MAPPER_ETCD_O2G_CRT_FILE: Path to the client certificate file used to authenticate with the original-to-generated database during TLS handshake
REVISION_MAPPER_ETCD_O2G_KEY_FILE: Path to the private key file used in conjunction with the client certificate to authenticate the original-to-generated database connection
REVISION_MAPPER_ETCD_G2O_ENDPOINTS: Comma separated list of generated-to-original endpoints, for example http://etcd.server:2379
REVISION_MAPPER_ETCD_G2O_CA_FILE: Path to the Certificate Authority (CA) file used to establish trust for the generated-to-original database connection
REVISION_MAPPER_ETCD_G2O_CRT_FILE: Path to the client certificate file used to authenticate with the generated-to-original database during TLS handshake
REVISION_MAPPER_ETCD_G2O_KEY_FILE: Path to the private key file used in conjunction with the client certificate to authenticate the generated-to-original database connection
REVISION_MAPPER_ETCD_LEASE_ENDPOINTS: Comma separated list of lease endpoints, for example http://etcd.server:2379
REVISION_MAPPER_ETCD_LEASE_CA_FILE: Path to the Certificate Authority (CA) file used to establish trust for the lease database connection
REVISION_MAPPER_ETCD_LEASE_CRT_FILE: Path to the client certificate file used to authenticate with the lease database during TLS handshake
REVISION_MAPPER_ETCD_LEASE_KEY_FILE: Path to the private key file used in conjunction with the client certificate to authenticate the lease database connection

5000 records creation on 60 threads

Total execution time: 34s

Average response time: ~11 μs
Maximum response time: ~75 μs

🔌 Start Middleware

Images are not public, please ask for registry user via info@inspirnation.eu or follow get started page.

1
2
3
4
5
6
7
docker run -d \
  -e TOPOLOGY_CONFIG=file:///topology.yaml \
  -v $(pwd)/topology.yaml:/topology.yaml \
  -v harikube_db:/db \
  -p 2379:2379 \
  registry.harikube.info/harikube/middleware:beta-v1.0.0-7 \
  --endpoint=multi://http://<default.database.server:2379>

--listen-address: Listen address of service. Default: unix://harikube.sock
--endpoint: Defines the default fallback database used to store any data not explicitly routed by the topology configuration
--ca-file: Path to the Certificate Authority (CA) file used to establish trust for the default database connection
--cert-file: Path to the client certificate file used to authenticate with the default database during TLS handshake
--key-file: Path to the private key file used in conjunction with the client certificate to authenticate the default database connection
--skip-verify: Controls whether the TLS client performs server certificate verification
--log-format: Log format to use. Options are ‘plain’ or ‘json’. Default ‘plain’
--metrics-bind-address: The address the metric endpoint binds to. Default :8080, set 0 to disable metrics serving
--server-cert-file: Path to the TLS certificate used by the middleware to secure incoming client connections
--server-key-file: Path to the private key used by the middleware to establish secure TLS communication with etcd-compatible clients
--datastore-max-idle-connections: Maximum number of idle connections retained by default datastore. If value = 0, the system default will be used. If value < 0, idle connections will not be reused
--datastore-max-open-connections: Maximum number of open connections used by default datastore. If value <= 0, then there is no limit
--datastore-connection-max-lifetime: Maximum amount of time a default database connection may be reused. If value <= 0, then there is no limit
--datastore-connection-max-idle-lifetime: Maximum amount of time a default database idle connection may be reused. If value <= 0, then there is no limit
--slow-sql-threshold: The duration which SQL executed longer than will be logged. Default 1s, set <= 0 to disable slow SQL log
--slow-sql-warning-threshold: The duration which SQL executed longer than will be logged at level warn. Default 5s
--metrics-enable-profiling: Enables Go performance profiling via net/http/pprof on the metrics bind address. Default is false
--watch-progress-notify-interval: Interval between periodic watch progress notifications. Default is 5s
--emulated-etcd-version: The emulated etcd version to return on a call to the status endpoint. Defaults to 3.5.13, in order to indicate support for watch progress notifications
--compact-interval: Interval between automatic compaction. Default is 5m
--compact-interval-jitter: Percentage of jitter to apply to interval durations. A value of 10 will apply a jitter of +/-10 percent to the interval duration. It cannot be negative, and must be less than 100. Default is 0
--compact-timeout: Timeout for automatic compaction. Default is 5s
--compact-min-retain: Minimum number of revisions to retain when compacting. Default is 1000
--compact-batch-size: Number of revisions to compact in a single batch. Default is 1000
--poll-batch-size: Number of revisions to poll in a single batch. Default is 500
--debug: Enable debug logging

Environment variables:

TOPOLOGY_CONFIG: File path for the main topology configuration, which is continuously scanned for modifications, supported formats are:
- file://<file-path>
- http(s)://<file-url> - Polling interval is 1 minute
- secret://<namespace>/<name> - Ensure this secret is stored at main database, and strongly suggested to add finalizer to prevent deletion. Files are consumed in name order
LIST_MAX_ITEMS: Max items for list operations. Default 1000
ENABLE_GARBAGE_COLLECTION: Enable garbage-collection, if Kubernetes garbage-collection is disabled
DISABLE_STORAGE_LEVEL_FILTERING: Disable storage level filtering
ENABLE_TELEMETRY_PUSH: Enables pushing usage telemetry to HariKube central monitoring site https://monitoring.harikube.info

🚀 Setup and start Kubernetes

Kubernetes Configuration

HariKube requires specific Kubernetes configuration to enable custom resource routing and external data store integration

Mandatory	Category	Option	Description
✅	Feature Gate	`CustomResourceFieldSelectors=true`	Enables CR field selectors
✅	Feature Gate	`WatchList=true`	Enables watch list support
✅	Feature Gate	`WatchListClient=true`	Enables watch list client feature
✅	API Server Flag	`--encryption-provider-config=""`	Encryption not supported
✅	API Server Flag	`--storage-media-type=application/json`	Sets storage format to JSON
✅	API Server Flag	`--etcd-servers=http(s)://middleware.service:2379`	Sets the middleware as the ETCD backend
➖	API Server Flag	`--watch-cache=false`	Disables watch cache (recommended for large data)
➖	API Server Flag	`--max-mutating-requests-inflight=400`	Increases concurrency for mutating requests
➖	API Server Flag	`--max-requests-inflight=800`	Increases concurrency for all requests
➖	API Server Flag	`--enable-garbage-collector=false`	On case all databases use automatic GC
➖	Controller Manager Flag	`--enable-garbage-collector=false`	On case all databases use automatic GC

Kubernetes is compatible with HariKube by default. However, due to architectural constraints in ETCD—its underlying storage system—it is not optimized for handling very large datasets. To enable support for high-volume data workloads, modifications to specific Kubernetes components (such as the API server) are required.

You can use our pre-built images, pre-built versions are:

Major version	Patch versions	Architecures
v1.32	v1.32.0, v1.32.1, v1.32.2, v1.32.3	amd64, arm64, ppc64le, s390x
v1.33	v1.33.0	amd64, arm64, ppc64le, s390x

1
2
docker pull registry.harikube.info/harikube/kube-apiserver-amd64:v1.33.0
docker pull registry.harikube.info/harikube/kube-controller-manager-amd64:v1.33.0

Comipiling Kubernetes From Source Code (optional)

For detailed information about how to build Kubernetes please follow official documentation. But here are some simple steps to compile it.

1
2
3
git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
git checkout v1.33.0

Download kubernetes-v1.33.0.patch and apply.

1
git apply kubernetes-v1.33.0.patch

Building Options

1
2
3
4
5
export KUBE_FASTBUILD=true # false for cross compiling
export KUBE_GIT_TREE_STATE=clean
export KUBE_GIT_VERSION=v1.33.0
make WHAT=cmd/kube-apiserver
make WHAT=cmd/kube-controller-manager

Find the compiled binaries in _output/local/bin/linux/amd64 folder.

1
2
3
4
5
export KUBE_FASTBUILD=true # false for cross compiling
export KUBE_GIT_TREE_STATE=clean
export KUBE_GIT_VERSION=v1.33.0
./build/run.sh make WHAT=cmd/kube-apiserver
./build/run.sh make WHAT=cmd/kube-controller-manager

Find the compiled binaries in _output/dockerized/bin/linux/amd64 folder.

1
2
3
4
5
export KUBE_FASTBUILD=true # false for cross compiling
export KUBE_GIT_TREE_STATE=clean
export KUBE_GIT_VERSION=v1.33.0
export KUBE_DOCKER_REGISTRY=<your-registry.example.com/kubernetes>
make release-images

Find the baked images at the local registry:

1
docker image ls | grep -E 'kube-apiserver|kube-controller-manager' | grep $KUBE_GIT_VERSION

Step-by-Step Installation

Table of Contents

🗂️ Create database topology config file

Routing Configuration Explained

Advanced Configuration Options

Routing Configuration Explained

Routing Configuration Explained

📘 Metadata Store Configuration

Total execution time: 32s

Total execution time: 43s

Performance Tip

Total execution time: 30s

Total execution time: 34s

🔌 Start Middleware

🚀 Setup and start Kubernetes

Kubernetes Configuration

Comipiling Kubernetes From Source Code (optional)

Building Options

<– Overview | Custom Resource –>