[Feature Request] External Worker/Compute #686

Jonsen94 · 2026-01-14T10:01:26Z

Jonsen94
Jan 14, 2026

Hi there,
even thought I think this might be out of scope I still want to start a discussion.

The idea is directly inspired by my (prob. not perfect) setup.

I am using a True Nas with a small-ish Processor: Intel N100 with 16 GB Ram.

When using reitti, both reitti and postgis will consume lots (most) of the CPU available, sometimes even requiring more than whats available leading to crashes (out of memory) or slow/non working request.

Thus I was wondering: can be allow dynamically registering runners to do the compute? Thinking about how gitlab runners work.
If available: use it to do the heavy lifting. Else use the main instance. Maybe even using this to do some precalculations (reducing raw point number by removing points on the same line, ...) and allowing the slower instance to use this data in case of big requests (multi year time spans and so on).

Again, I am pretty sure this is much to ask and probably out of scope. Just curious on your opinions.

dgraf-gh · 2026-01-14T10:21:50Z

dgraf-gh
Jan 14, 2026
Maintainer

Hi @Jonsen94,

the fun thing is: In an older version of reitti this actually would have been possible. We had every chunk of data which needed to be calculated flying around in the RabbitMQ queue and theoretically multiple workers could have picked that up. But this made the calculation unreliable because they already got calculated in parallel and got in the way of each other. So I decided to drop that whole stuff and made it synchronous.

For smaller systems, it could be possible to adjust the BATCH_SIZE property at first. This dictates how much data is handled in ne batch when an import runs. I doubt it will have a meaningful impact on the handling of "normal" data points. They are handled on the fly in general.

There are already a couple of optimizations we take when ingesting date. Reitti for example marks geopoints which are to dense as ignored. They should be remove from the paths. Then we allow max 10k points per path, this is already simplified in the database.

I would like to tackle #672 and further improving the performance for now before we, at one point decide it is still not enough. Since this is highly complicated to sync them all, this should be the last resort :)

0 replies

Jonsen94 · 2026-01-15T12:29:45Z

Jonsen94
Jan 15, 2026
Author

Hi,
thank you for you fast replies!

Funny and understandable.

I set the BATCH_SIZE to 100, for daily operations I think it works fine, just not for loading multiple years at a time and the initial filling was tough. But I think it mostly works.

Sorry if I missed it but is there any technical documentation on the used algorithms and data pipelines? If not I think I will take a look at the code as soon as I have time for a deeper dive.

#642: sounds good :) Thank you and yes, external (and multiple) workers would be rather expensive to implement and maintain.

I have some general questions if you don't mind and got the time for a better system understanding:
Retti keeps all data. Every point collected, plausible or not is and will be stored. Thus a later full recompute is always possible (e.g. if "better" algorithms are available for post processing, ...) and also is an export. This includes: lat, lon, alt and estimated accuracy.

Points are then processed, marked and the values from the raw_point_data table are used for further computations and drawing of the paths in the frontend.

Regarding compute, I currently see this in my log:
reitti | 2026-01-15T12:25:04.989Z INFO 1 --- [pool-4-thread-1] d.r.s.p.UnifiedLocationProcessingService : Completed processing for user [admin] in 285133ms: 5 visits → 1 processed visits → 1 trips

Its running on my higher power computer, the database still on the nas.
The time is constantly increasing. Frontend says
Visit Detection
217
Est. processing time: 7 min

Any tips? And is it save to stop reitti during point processing and restarting it (possibly on another device with the same DB configured) later? Docs say that everything except for database and reitti* is stateless.

"The storage path used by Reitti needs to be backed up regularly. This contains uploaded files."
This should not be a problem for the calculation part I guess?

Thank you also for the answers in the other topics! Gonna come back to them soon.

1 reply

dgraf-gh Jan 15, 2026
Maintainer

Hi, thank you for you fast replies!

Funny and understandable.

I set the BATCH_SIZE to 100, for daily operations I think it works fine, just not for loading multiple years at a time and the initial filling was tough. But I think it mostly works.

Yes, the 10k batch size is mainly for larger imports. That is the size how many datapoints are looked at a time. If you make this smaller, the more often the processor will run.

Sorry if I missed it but is there any technical documentation on the used algorithms and data pipelines? If not I think I will take a look at the code as soon as I have time for a deeper dive.

No, the code is for that part the best documentation.

#642: sounds good :) Thank you and yes, external (and multiple) workers would be rather expensive to implement and maintain.

I have some general questions if you don't mind and got the time for a better system understanding: Retti keeps all data.
Every point collected, plausible or not is and will be stored. Thus a later full recompute is always possible (e.g. if "better" algorithms are available for post processing, ...) and also is an export. This includes: lat, lon, alt and estimated accuracy.

Exactly, reitti now stores all points and will export them.

Points are then processed, marked and the values from the raw_point_data table are used for further computations and drawing of the paths in the frontend.

Yes, points come in. Then we fetch around these points other already existing points +-48h. Now the fun part begins. First comes the anomaly detection, this should filter out gps outliers with excessive speed. Points here are marked invalid. This can also happen to already stored points.
Next comes the density filters, they either mark points as ignored because we have mor than 4 points per minute logged or synthetic points are inserted to get back to our 4 points per minute goal. After all that, the UnifiedProcessingPipeline is put on her jpurney. It first ask the database to give it all points as cluster with your searchradius applied. This clusters the points by location, not by time. So we split these clusters up if there is a timegap inside the cluster. Now we have staypoints. These are the filtered by minimum duration, transfered into Visits.
These then get merged into ProcessedVisits. In a nutshell we take one visit and the next, if they are at the same place and the time between them is larger than your threshold they form on Visit and so on. It is a little bit more complicated but you get the idea. These processed visits then are stored in tje database and we create trips in between them. That is the pipeline in short description.

Regarding compute, I currently see this in my log: reitti | 2026-01-15T12:25:04.989Z INFO 1 --- [pool-4-thread-1] d.r.s.p.UnifiedLocationProcessingService : Completed processing for user [admin] in 285133ms: 5 visits → 1 processed visits → 1 trips

Its running on my higher power computer, the database still on the nas. The time is constantly increasing. Frontend says Visit Detection 217 Est. processing time: 7 min

Are there large gaps in your import file? I have the same when importing my records.json because there are multiple years without any data and this extends the search time tremendously. I had a fix for this by limiting the look back but then decided against it because it could have splitup longer visits into two and I thought that will propably not happen that often. Silly me 😆 once it gets into denser area it should speed up again.

If this is not an export, we need to look deeper. If it is a normal incoming point it should be pretty fast.

Any tips? And is it save to stop reitti during point processing and restarting it (possibly on another device with the same DB configured) later? Docs say that everything except for database and reitti* is stateless.

You would need a way to trigger the processing again for that time period. Either by throwing the file against reitti again or changing your sensitivity settings. You could also export one day from reitti and import it again.

* "The storage path used by Reitti needs to be backed up regularly. This contains uploaded files."
  This should not be a problem for the calculation part I guess?

No, they are only used for uploaded pictures at the moment from the memories feature.

Thank you also for the answers in the other topics! Gonna come back to them soon.

You're welcome 😃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] External Worker/Compute #686

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Feature Request] External Worker/Compute #686

Uh oh!

Jonsen94 Jan 14, 2026

Replies: 2 comments · 1 reply

Uh oh!

dgraf-gh Jan 14, 2026 Maintainer

Uh oh!

Jonsen94 Jan 15, 2026 Author

Uh oh!

dgraf-gh Jan 15, 2026 Maintainer

Jonsen94
Jan 14, 2026

Replies: 2 comments 1 reply

dgraf-gh
Jan 14, 2026
Maintainer

Jonsen94
Jan 15, 2026
Author

dgraf-gh Jan 15, 2026
Maintainer