Replies: 2 comments 1 reply
-
|
Hi @Jonsen94, the fun thing is: In an older version of reitti this actually would have been possible. We had every chunk of data which needed to be calculated flying around in the RabbitMQ queue and theoretically multiple workers could have picked that up. But this made the calculation unreliable because they already got calculated in parallel and got in the way of each other. So I decided to drop that whole stuff and made it synchronous. For smaller systems, it could be possible to adjust the BATCH_SIZE property at first. This dictates how much data is handled in ne batch when an import runs. I doubt it will have a meaningful impact on the handling of "normal" data points. They are handled on the fly in general. There are already a couple of optimizations we take when ingesting date. Reitti for example marks geopoints which are to dense as ignored. They should be remove from the paths. Then we allow max 10k points per path, this is already simplified in the database. I would like to tackle #672 and further improving the performance for now before we, at one point decide it is still not enough. Since this is highly complicated to sync them all, this should be the last resort :) |
Beta Was this translation helpful? Give feedback.
-
|
Hi, Funny and understandable. I set the BATCH_SIZE to 100, for daily operations I think it works fine, just not for loading multiple years at a time and the initial filling was tough. But I think it mostly works. Sorry if I missed it but is there any technical documentation on the used algorithms and data pipelines? If not I think I will take a look at the code as soon as I have time for a deeper dive. #642: sounds good :) Thank you and yes, external (and multiple) workers would be rather expensive to implement and maintain. I have some general questions if you don't mind and got the time for a better system understanding: Points are then processed, marked and the values from the raw_point_data table are used for further computations and drawing of the paths in the frontend. Regarding compute, I currently see this in my log: Its running on my higher power computer, the database still on the nas. Any tips? And is it save to stop reitti during point processing and restarting it (possibly on another device with the same DB configured) later? Docs say that everything except for database and reitti* is stateless.
Thank you also for the answers in the other topics! Gonna come back to them soon. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
even thought I think this might be out of scope I still want to start a discussion.
The idea is directly inspired by my (prob. not perfect) setup.
I am using a True Nas with a small-ish Processor: Intel N100 with 16 GB Ram.
When using reitti, both reitti and postgis will consume lots (most) of the CPU available, sometimes even requiring more than whats available leading to crashes (out of memory) or slow/non working request.
Thus I was wondering: can be allow dynamically registering runners to do the compute? Thinking about how gitlab runners work.
If available: use it to do the heavy lifting. Else use the main instance. Maybe even using this to do some precalculations (reducing raw point number by removing points on the same line, ...) and allowing the slower instance to use this data in case of big requests (multi year time spans and so on).
Again, I am pretty sure this is much to ask and probably out of scope. Just curious on your opinions.
Beta Was this translation helpful? Give feedback.
All reactions