Skip to content

Conversation

@swiedenfeld
Copy link
Contributor

I ran a profiler to analyse performance of ScannedResult # addFilesInDirectory in the scenario described in #1409. All ignore strings must be matched on each of the +60k files, and a not neglible amount of time is spent compiling the Strings from ignores into patterns over and over again. It seems sensible to cache already compiled patterns into a map for quick resolution.

Unfortunately, it does not improve the situation in #1409 too much. The majority of processing time is spent walking the directory tree, which is possibly bound by I/O.


private static final String PATTERN_CHARS_TO_ESCAPE = "\\.[]{}()*+-?^$|";

private static final Map<String, Pattern> PATTERNS = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is static, I'd suggest using ConcurrentHashMap here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I changed it to java.util.concurrent.ConcurrentHashMap

@bsideup
Copy link
Member

bsideup commented Jun 17, 2020

@swiedenfeld I was about to merge it but then realized... if the app is long running, such cache may eventually explode and cause an OOM. Since the class is in core, we can use Guava's cache with TTL to prevent that.

@swiedenfeld
Copy link
Contributor Author

@bsideup Good point. I was unaware it could be running long as I focused on its use in build step. There, the number of ignore patterns must be bounded by the lines in .dockerignore, isn't it? In which other dimensions can we expect it to grow?

For the TTL, do you suggest a reasonable time interval until cache entries expire?

@bsideup bsideup modified the milestones: 3.2.3, next Jun 17, 2020
@bsideup
Copy link
Member

bsideup commented Jun 17, 2020

@swiedenfeld there are a few usages of docker-java that run for hours, days or even months :D

Since it parses the user input, there is no guarantee that the set will be final, despite the relationship to "a few lines in .dockerignore.

For TTL, I'd suggest something like 1 hour. Alternatively, IIRC Guava can limit by the cache size, which may be another option (to limit it to 25Mb, or 10k patterns, or similar)

@bsideup bsideup modified the milestones: 3.2.4, next Jun 23, 2020
@bsideup
Copy link
Member

bsideup commented Jun 24, 2020

@swiedenfeld ping :)

@swiedenfeld
Copy link
Contributor Author

pong :)
I'm still on this and already tried something. I am quite convinced it can work using a LoadingCache with expiration. Still I want to write a test for the expiration behaviour.

@swiedenfeld swiedenfeld changed the title Compile each file match pattern only once Cache file match patterns Jun 24, 2020
@swiedenfeld swiedenfeld changed the title Cache file match patterns Memoize file match patterns Jun 24, 2020
@bsideup
Copy link
Member

bsideup commented Jun 25, 2020

@swiedenfeld FYI I just changed some defaults and will merge as soon as CI is green :)
I appreciate the desire to contribute the tests, feel free to submit a follow up PR containing them 👍

@bsideup bsideup merged commit c3661f3 into docker-java:master Jun 25, 2020
@swiedenfeld swiedenfeld deleted the compile_less_patterns branch June 25, 2020 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants