Memoize file match patterns #1412

swiedenfeld · 2020-06-12T12:52:34Z

I ran a profiler to analyse performance of ScannedResult # addFilesInDirectory in the scenario described in #1409. All ignore strings must be matched on each of the +60k files, and a not neglible amount of time is spent compiling the Strings from ignores into patterns over and over again. It seems sensible to cache already compiled patterns into a map for quick resolution.

Unfortunately, it does not improve the situation in #1409 too much. The majority of processing time is spent walking the directory tree, which is possibly bound by I/O.

bsideup · 2020-06-16T13:06:32Z

docker-java-core/src/main/java/com/github/dockerjava/core/GoLangFileMatch.java


    private static final String PATTERN_CHARS_TO_ESCAPE = "\\.[]{}()*+-?^$|";

+    private static final Map<String, Pattern> PATTERNS = new HashMap<>();


Since it is static, I'd suggest using ConcurrentHashMap here

Good, I changed it to java.util.concurrent.ConcurrentHashMap

bsideup · 2020-06-17T06:43:07Z

@swiedenfeld I was about to merge it but then realized... if the app is long running, such cache may eventually explode and cause an OOM. Since the class is in core, we can use Guava's cache with TTL to prevent that.

swiedenfeld · 2020-06-17T07:19:44Z

@bsideup Good point. I was unaware it could be running long as I focused on its use in build step. There, the number of ignore patterns must be bounded by the lines in .dockerignore, isn't it? In which other dimensions can we expect it to grow?

For the TTL, do you suggest a reasonable time interval until cache entries expire?

bsideup · 2020-06-17T11:34:14Z

@swiedenfeld there are a few usages of docker-java that run for hours, days or even months :D

Since it parses the user input, there is no guarantee that the set will be final, despite the relationship to "a few lines in .dockerignore.

For TTL, I'd suggest something like 1 hour. Alternatively, IIRC Guava can limit by the cache size, which may be another option (to limit it to 25Mb, or 10k patterns, or similar)

bsideup · 2020-06-24T08:06:06Z

@swiedenfeld ping :)

swiedenfeld · 2020-06-24T13:02:39Z

pong :)
I'm still on this and already tried something. I am quite convinced it can work using a LoadingCache with expiration. Still I want to write a test for the expiration behaviour.

#1409

bsideup · 2020-06-25T05:59:17Z

@swiedenfeld FYI I just changed some defaults and will merge as soon as CI is green :)
I appreciate the desire to contribute the tests, feel free to submit a follow up PR containing them 👍

This was referenced Jun 12, 2020

Compile each file match pattern only once #1411

Closed

Build context scan feels slow when directory contains large file tree #1409

Closed

bsideup added this to the next milestone Jun 12, 2020

bsideup reviewed Jun 16, 2020

View reviewed changes

bsideup added the type/enhancement label Jun 16, 2020

bsideup modified the milestones: 3.2.3, next Jun 17, 2020

bsideup modified the milestones: 3.2.4, next Jun 23, 2020

Compile each file match pattern only once

cea9aff

#1409

swiedenfeld changed the title ~~Compile each file match pattern only once~~ Cache file match patterns Jun 24, 2020

swiedenfeld changed the title ~~Cache file match patterns~~ Memoize file match patterns Jun 24, 2020

Update GoLangFileMatch.java

f54ffdd

bsideup approved these changes Jun 25, 2020

View reviewed changes

bsideup merged commit c3661f3 into docker-java:master Jun 25, 2020

swiedenfeld deleted the compile_less_patterns branch June 25, 2020 11:15

swiedenfeld mentioned this pull request Jul 27, 2020

Accelerate build context directory file walk #1442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memoize file match patterns #1412

Memoize file match patterns #1412

Uh oh!

swiedenfeld commented Jun 12, 2020

Uh oh!

bsideup Jun 16, 2020

Uh oh!

swiedenfeld Jun 16, 2020

Uh oh!

bsideup commented Jun 17, 2020

Uh oh!

swiedenfeld commented Jun 17, 2020

Uh oh!

bsideup commented Jun 17, 2020

Uh oh!

bsideup commented Jun 24, 2020

Uh oh!

swiedenfeld commented Jun 24, 2020

Uh oh!

bsideup commented Jun 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		private static final String PATTERN_CHARS_TO_ESCAPE = "\\.[]{}()*+-?^$\|";

		private static final Map<String, Pattern> PATTERNS = new HashMap<>();

Memoize file match patterns #1412

Memoize file match patterns #1412

Uh oh!

Conversation

swiedenfeld commented Jun 12, 2020

Uh oh!

bsideup Jun 16, 2020

Choose a reason for hiding this comment

Uh oh!

swiedenfeld Jun 16, 2020

Choose a reason for hiding this comment

Uh oh!

bsideup commented Jun 17, 2020

Uh oh!

swiedenfeld commented Jun 17, 2020

Uh oh!

bsideup commented Jun 17, 2020

Uh oh!

bsideup commented Jun 24, 2020

Uh oh!

swiedenfeld commented Jun 24, 2020

Uh oh!

bsideup commented Jun 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants