HariKube determines data locality using the object key structure and applies routing based on configurable policies,
such as matching by resource type, namespace, key prefix, or custom resource definition.
Routing configurations are evaluated in order from top to bottom, and the first matching rule determines the data’s target database. Once a match is found, subsequent rules are ignored for that resource.
Routing policies must be carefully designed, as adding or changing a policy for resource types that already have stored data can result in the existing records becoming inaccessible. HariKube does not migrate previously stored resources to the new target automatically, so any change in routing may lead to apparent data loss unless handled manually.
ETCD with regular expression routing:
Routes Kubernetes RBAC resources to an ETCD store.
MySQL endpoint with namespace matching:
All objects in the kube-system namespace are routed to a MySQL backend.
Only built-in core resource types are supported, for custom resource you have to create a separate policy.
PostgreSQL endpoint with prefix matching:
All pods resources - except pods in kube-system namespace - are routed to a PostgreSQL backend.
SQLite endpoint for specific custom resources:
Routes all resources of type shirts in the group stable.example.com to a lightweight embedded SQLite database.
Rest of the objects are stored in the default database.
You can deploy multiple instances of the middleware, where each instance can be configured to delegate to another middleware as its backend. This enables the construction of a multi-layered, hierarchical topology, where data access requests can propagate through multiple layers of the middleware.
Initialize your primary middleware instance to act as the entry point for Kubernetes by launching it with the following configuration. This instance will serve as the top-level interface for Kubernetes API server requests, abstracting the underlying data storage layer while maintaining full compatibility with the etcd API.
ETCD endpoint with namespace matching:
All objects in the kube-system namespace are routed to a ETCD backend.
Rest of the objects are stored in the default database.
As shown, this middleware connects to a backing etcd server to store and retrieve objects within the kube-system namespace. However, instead of connecting directly to etcd, you can deploy another middleware that listens on kube-system.etcd.server:2379. This allows you to insert an additional routing layer between Kubernetes and the storage backend — enabling advanced behaviors, all while preserving the etcd-compatible API surface.
ETCD endpoint with prefix matching:
All pods resources are routed to an ETCD store.
MySQL endpoint with prefix matching:
All secrets resources are routed to a MySQL backend.
PostgreSQL endpoint with prefix matching:
All configmaps resources are routed to a PostgreSQL backend.
SQLite endpoint with prefix matching:
All deployments resources are routed to a SQLite backend.
Rest of the objects are stored in the default database.
📘 Metadata Store Configuration
HariKube includes an internal metadata store that maintains mapping information about the underlying databases. It keeps track of which database is responsible for each data segment, ensuring consistency and fast lookups without querying every backend directly. The metadata store is central to HariKube’s ability to provide dynamic data placement, multi-database support, and high-performance routing across flat or hierarchical topologies. To configure metadata store you have to set environment variable(s) for the middleware. Default Metadata store is sqlite.
REVISION_MAPPER_HISTORY: Defines how long metadata revisions are retained in the system. After this period, older revisions are treated as compacted and are no longer accessible for historical lookups. This helps manage storage usage by limiting how long old revision data is preserved. If 0, the compaction is disabled. Default 4h
REVISION_MAPPER_CACHE_CAPACITY: Capacity of the in-memory cache. Default 10000, 1000000 for In-memory
HariKube mappers are optimized for high-throughput environments, and some persist metadata asynchronously; in worst-case scenarios where metadata is lost, restarting services that rely on historical revisions will safely reinitialize a fresh revision history—allowing the system to continue operating, even if older state is no longer available.
The SQLite Mapper provides a lightweight, embedded solution for storing revision metadata with minimal setup and no external dependencies. The SQLite Mapper maintains an in-memory cache of the latest revision data and writes to the SQLite database asynchronously. This means the mapper is always aware of the current logical state, but the on-disk database may temporarily lag behind.
REVISION_MAPPER=sqlite: Type of metdata store - Optional
REVISION_MAPPER_SQLITE_O2G_PATHS: Comma separated list of paths to the original-to-generated revision database directories. Default ./db
REVISION_MAPPER_SQLITE_G2O_PATHS: Comma separated list of paths to the generated-to-original tdatabase directories. Default ./db
REVISION_MAPPER_SQLITE_LEASE_PATH: Path to the lease database directory. Default ./db
REVISION_MAPPER_SQLITE_SYNCHRONOUS: Write mode of the database [OFF, NORMAL]. Default OFF
REVISION_MAPPER_SQLITE_CONN_LIFETIME: Connection life time in seconds. Default 60
REVISION_MAPPER_SQLITE_WRITE_QUEUE: Size of write ahead queue. Default 50
Changing of database directories doesn’t supported at the moment. To change database layout you have to delete all existing databases, and restart all services depend on revision history.
5000 records creation on 60 threads
Total execution time: 32s
Average response time: ~11 μs
Maximum response time: ~59 μs
The File System Mapper offers a lightweight, low-dependency alternative to traditional in-memory or database-backed mappers by persisting revision and metadata information directly to disk.
However, it comes with tradeoffs: file I/O operations can introduce latency under high concurrency, and it lacks the built-in guarantees, or consistency of systems like ETCD or SQL-based mappers.
REVISION_MAPPER=file: Type of metdata store
REVISION_MAPPER_FILE_O2G_PATHS: Comma separated list of paths to the original-to-generated database directories. Default ./db/mapping-o2g
REVISION_MAPPER_FILE_G2O_PATHS: Comma separated list of paths to the generated-to-original database directories. Default ./db/mapping-g2o
REVISION_MAPPER_FILE_LEASE_PATH: Path to the lease database directory. Default ./db/mapping-lease
Changing of database directories doesn’t supported at the moment. To change storage layout you have to delete all existing directories, and restart all services depend on revision history.
5000 records creation on 60 threads
Total execution time: 43s
Average response time: ~14 μs
Maximum response time: ~65 μs
Performance Tip
When you layer a tmpfs-backed memory tier with an SSD-based cache (via dm-cache or bcache), you combine near-RAM latency with durable storage—despite bcache itself only boosting throughput by 10-20 percent, the overall architecture still delivers sub-millisecond I/O with guaranteed persistence. This hybrid model excels at handling small, random reads and writes in high-throughput environments while ensuring data survives reboots or failures. Looking ahead, as persistent memory modules and NVMe-over-Fabric become mainstream, this flexible two-tier approach will evolve into fully adaptive storage fabrics that automatically balance speed, capacity and resilience.
How to create bcache boosted storage?
1
2
3
4
5
6
7
8
9
10
11
12
13
apk add bcache-tools # Install bcachemodprobe bcache # Load bcache modulemkdir /mnt/ramcache0 # Create tmpfs mount pointmount -t tmpfs -o size=4G tmpfs /mnt/ramcache0 # Create tmpfsfallocate -l 4G /mnt/ramcache0/bcache.img # Create a block device on tmpfs (you can use zram or brd)losetup /dev/<loop0> /mnt/ramcache0/bcache.img # Set an unused loop device (not necessary on zram or brd)make-bcache -C /dev/<loop0> # Format your RAM-based block device as a cachemake-bcache -B /dev/<sdb1> # Format your SSD (or partition) as a backing storeecho /dev/<sdb1> | tee /sys/fs/bcache/register # Register backing device (SSD)echo /dev/<loop0> | tee /sys/fs/bcache/register # Register cache devicemkfs.ext4 /dev/bcache0 # Create file system on bcache devicemkdir /mnt/bcache0 # Create bcache0 mount pointmount /dev/bcache0 /mnt/bcache0 # Mount block device
The BBolt Mapper provides a lightweight, embedded solution for storing revision metadata with minimal setup and no external dependencies. Under the hood it leverages BBolt’s memory‑mapped, ACID‑compliant key/value store, while keeping an in‑memory cache of the latest revision data and batching writes to the BBolt’s file asynchronously. This means the mapper always reflects the current logical state in your application, even if the on‑disk BBolt’s file may briefly lag behind during high‑throughput writes.
REVISION_MAPPER=bbolt: Type of metdata store - Optional
REVISION_MAPPER_BBOLT_BATCH_SIZE: Size of batch write operation. Default 8
REVISION_MAPPER_BBOLT_O2G_PATHS: Comma separated list of paths to the original-to-generated revision database directories. Default ./db
REVISION_MAPPER_BBOLT_G2O_PATHS: Comma separated list of paths to the generated-to-original tdatabase directories. Default ./db
REVISION_MAPPER_BBOLT_LEASE_PATH: Path to the lease database directory. Default ./db
Changing of database directories doesn’t supported at the moment. To change database layout you have to delete all existing databases, and restart all services depend on revision history.
5000 records creation on 60 threads
Total execution time: 30s
Average response time: ~10 μs
Maximum response time: ~75 μs
The ETCD Mapper provides a production-grade, strongly consistent solution for managing revision metadata. Designed to run in a dedicated ETCD instance, it offers high availability and safe multi-node coordination — making it ideal for large-scale, deployments.
While the mapper itself maintains an up-to-date in-memory cache, revision metadata is persisted to ETCD asynchronously. This means that although the mapper has the latest state, the backing database may temporarily lag behind the most recent revision.
You can use the same database instance for all metadata, be sure the endpoints are equal.
REVISION_MAPPER=etcd: Type of metdata store
REVISION_MAPPER_ETCD_BATCH_SIZE: Size of batch write operation. Default 8
REVISION_MAPPER_ETCD_BATCH_WRITE_CONN: Number of batch write connections. Default 64
REVISION_MAPPER_ETCD_O2G_ENDPOINTS: Comma separated list of original-to-generated endpoints, for example http://etcd.server:2379
REVISION_MAPPER_ETCD_O2G_CA_FILE: Path to the Certificate Authority (CA) file used to establish trust for the original-to-generated database connection
REVISION_MAPPER_ETCD_O2G_CRT_FILE: Path to the client certificate file used to authenticate with the original-to-generated database during TLS handshake
REVISION_MAPPER_ETCD_O2G_KEY_FILE: Path to the private key file used in conjunction with the client certificate to authenticate the original-to-generated database connection
REVISION_MAPPER_ETCD_G2O_ENDPOINTS: Comma separated list of generated-to-original endpoints, for example http://etcd.server:2379
REVISION_MAPPER_ETCD_G2O_CA_FILE: Path to the Certificate Authority (CA) file used to establish trust for the generated-to-original database connection
REVISION_MAPPER_ETCD_G2O_CRT_FILE: Path to the client certificate file used to authenticate with the generated-to-original database during TLS handshake
REVISION_MAPPER_ETCD_G2O_KEY_FILE: Path to the private key file used in conjunction with the client certificate to authenticate the generated-to-original database connection
REVISION_MAPPER_ETCD_LEASE_ENDPOINTS: Comma separated list of lease endpoints, for example http://etcd.server:2379
REVISION_MAPPER_ETCD_LEASE_CA_FILE: Path to the Certificate Authority (CA) file used to establish trust for the lease database connection
REVISION_MAPPER_ETCD_LEASE_CRT_FILE: Path to the client certificate file used to authenticate with the lease database during TLS handshake
REVISION_MAPPER_ETCD_LEASE_KEY_FILE: Path to the private key file used in conjunction with the client certificate to authenticate the lease database connection
--listen-address: Listen address of service. Default: unix://harikube.sock
--endpoint: Defines the default fallback database used to store any data not explicitly routed by the topology configuration
--ca-file: Path to the Certificate Authority (CA) file used to establish trust for the default database connection
--cert-file: Path to the client certificate file used to authenticate with the default database during TLS handshake
--key-file: Path to the private key file used in conjunction with the client certificate to authenticate the default database connection
--skip-verify: Controls whether the TLS client performs server certificate verification
--metrics-bind-address: The address the metric endpoint binds to. Default :8080, set 0 to disable metrics serving
--server-cert-file: Path to the TLS certificate used by the middleware to secure incoming client connections
--server-key-file: Path to the private key used by the middleware to establish secure TLS communication with etcd-compatible clients
--datastore-max-idle-connections: Maximum number of idle connections retained by default datastore. If value = 0, the system default will be used. If value < 0, idle connections will not be reused
--datastore-max-open-connections: Maximum number of open connections used by default datastore. If value <= 0, then there is no limit
--datastore-connection-max-lifetime: Maximum amount of time a default database connection may be reused. If value <= 0, then there is no limit
--datastore-connection-max-idle-lifetime: Maximum amount of time a default database idle connection may be reused. If value <= 0, then there is no limit
--slow-sql-threshold: The duration which SQL executed longer than will be logged. Default 1s, set <= 0 to disable slow SQL log
--metrics-enable-profiling: Enables Go performance profiling via net/http/pprof on the metrics bind address. Default is false
--watch-progress-notify-interval: Interval between periodic watch progress notifications. Default is 5s
--emulated-etcd-version: The emulated etcd version to return on a call to the status endpoint. Defaults to 3.5.13, in order to indicate support for watch progress notifications
--debug: Enable debug logging
Mounting of harikube_db volume isn’t mandatory, if all databases and the metada store are pointing to external target.
🚀 Setup and start Kubernetes
Kubernetes Configuration
HariKube requires specific Kubernetes configuration to enable custom resource routing and external data store integration
Mandatory
Category
Option
Description
✅
Feature Gate
CustomResourceFieldSelectors=true
Enables CR field selectors
✅
Feature Gate
WatchList=true
Enables watch list support
✅
Feature Gate
WatchListClient=true
Enables watch list client feature
✅
API Server Flag
--encryption-provider-config=""
Encryption not supported
✅
API Server Flag
--storage-media-type=application/json
Sets storage format to JSON
✅
API Server Flag
--watch-cache=false
Disables watch cache (recommended for large data)
✅
API Server Flag
--etcd-servers=http(s)://middleware.service:2379
Sets the middleware as the ETCD backend
➖
API Server Flag
--max-mutating-requests-inflight=400
Increases concurrency for mutating requests
➖
API Server Flag
--max-requests-inflight=800
Increases concurrency for all requests
Kubernetes is compatible with HariKube by default. However, due to architectural constraints in ETCD—its underlying storage system—it is not optimized for handling very large datasets. To enable support for high-volume data workloads, modifications to specific Kubernetes components (such as the API server) are required.
You can use our pre-built images, pre-built versions are:
exportKUBE_FASTBUILD=true# false for cross compilingexportKUBE_GIT_TREE_STATE=clean
exportKUBE_GIT_VERSION=v1.32.0
make WHAT=cmd/kube-apiserver
make WHAT=cmd/kube-controller-manager
Find the compiled binaries in _output/local/bin/linux/amd64 folder.
1
2
3
4
5
exportKUBE_FASTBUILD=true# false for cross compilingexportKUBE_GIT_TREE_STATE=clean
exportKUBE_GIT_VERSION=v1.32.0
./build/run.sh make WHAT=cmd/kube-apiserver
./build/run.sh make WHAT=cmd/kube-controller-manager
Find the compiled binaries in _output/dockerized/bin/linux/amd64 folder.
1
2
3
4
5
exportKUBE_FASTBUILD=true# false for cross compilingexportKUBE_GIT_TREE_STATE=clean
exportKUBE_GIT_VERSION=v1.32.0
exportKUBE_DOCKER_REGISTRY=<your-registry.example.com/kubernetes>
make release-images
Find the baked images at the local registry:
1
docker image ls | grep -E 'kube-apiserver|kube-controller-manager'| grep $KUBE_GIT_VERSION