GitHub - SneaksAndData/arcane-stream-json: JSON stream for Arcane Streaming Service

JSON Stream Plugin for Arcane

This repository contains implementation of a JSON-Iceberg streaming plugin for Arcane. Use this app to livestream Json files to an Iceberg table, backed by Trino as a streaming batch merge consumer and Lakekeeper as a data catalog.

Quickstart

This source continuously ingests files with multiline-JSON content into a target Iceberg table. In order to configure the stream, you must provide the following:

Desired AVRO schema for the source. Note that this schema should conform with JSON created after JSON pointers and array explode have been applied. All fields in the schema must be defined as nullable. You can use this handy tool to generate the schema.
Source S3 path
JSON pointer expression, if desired data is a subset of a source json. For example, given

{
  "colA": "a",
  "colB": {
    "colC": "c",
    "propA": 1,
    "propB": "ABC"
  }
}

and jsonPointerExpression set to /colB, source will be transformed to:

{
  "colC": "c",
  "propA": 1,
  "propB": "ABC"
}

JSON pointers for array explode, if any. For example, given

{
  "colA": "a",
  "colB": [{
    "colC": "c1",
    "propA": 1,
    "propB": "ABC1"
  },{
    "colC": "c2",
    "propA": 2,
    "propB": "ABC2"
  }]
}

and jsonArrayPointers set to "/colB": {}, source will be transformed to:

{"colC": "c1", "propA": 1, "propB": "ABC1"}
{"colC": "c2", "propA": 2, "propB": "ABC2"}

emitting 2 rows from 1 source file entry.

Development

Project uses Scala 3.6.1 and tested on JDK 23.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.container		.container
.github		.github
.helm		.helm
project		project
src		src
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
actions-tests.env		actions-tests.env
bootstrap-lk.py		bootstrap-lk.py
build.sbt		build.sbt
docker-compose.yaml		docker-compose.yaml
integration-tests.env		integration-tests.env
integration-tests.properties		integration-tests.properties
populate-s3-reader-bucket.py		populate-s3-reader-bucket.py
stream-context-serialized-example.json		stream-context-serialized-example.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JSON Stream Plugin for Arcane

Quickstart

Development

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JSON Stream Plugin for Arcane

Quickstart

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages