-
Notifications
You must be signed in to change notification settings - Fork 48
bundle #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
bundle #586
Conversation
|
I would argue that the tarball download optimization is some welcome but unrelated optimization. I would move it to some separate PR that should probably be merged first. Reusing the URL SCM for the bundles is IMHO not the right approach. It should instead work like the From a more general angle: should this work for indeterministic checkouts too? I would argue against this and only bundle deterministic checkouts. But if there is a good reason to decide otherwise I'm open to it. It's just that my gut feeling is that it will get nasty to get all corner cases correct... |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #586 +/- ##
==========================================
+ Coverage 89.03% 89.11% +0.08%
==========================================
Files 50 50
Lines 16137 16290 +153
==========================================
+ Hits 14367 14517 +150
- Misses 1770 1773 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I added a new implementation using the archive stuff. Since adding files to the final bundle using To get the blackbox test working #606 is required. As of today I haven't tested bundling / unbundling a larger, real world project. I'll do this in the next days. |
9d269b4 to
059f566
Compare
|
I thought long about this and I have no final idea yet. So far, just some random thoughts:
IMHO, bundling the sources is nothing special compared to up- or downloading dist workspaces. In fact, I could imagine it would make sense to have this feature for all other transports too. Maybe not as default but as option. As such, what about defining new In the end, the bundle option is just another way of defining a local source upload backend. It's just special in the way that it will create a big zip file in the end. That zip file could also hold "normal" binary artifacts. Not sure if this orthogonal design yields something useful but in principle it looks feasible. I'm not yet sure what we should do about possible git or svn directories in source workspaces. As they are ignored when hashing the workspace we could skip them when packing the sources too... |
🤷 I don't use indeterministic checkouts so I don't have a strong opinion here. I see valid arguments for all 3 options, so maybe we should add a
Holding it like this we don't even need the
Same as for the indeterministic packages. I think there are use cases where it could be useful to have these directories on the air-gaped machine, e.g. when development takes place there. If it's bundled just for review we can skip them, so I'd add another option here to have this configurable. |
I guess the
Sounds reasonable. Also the source up-/download should probably be separate switches.
Sounds reasonable. I think that we should throw it away by default because it will bloat the archive. But if somebody wants to retain the git-directories, why not... |
|
I've created a 1.0 milestone for mid of April because a new release is overdue IMHO. Do you think this PR should be part of it? I'm not entirely sure because it feels the design is not really settled yet. What's your opinion? |
|
I did not continue working on this to avoid conflicts with the remote archive clean PR. Mid of April (which year?) might be possible with some priority shifting on my side... but it wouldn't be a big issue if this lands after 1.0. |
Obviously this year. Otherwise I wouldn't have asked. 😆 😉
Then I would tentatively shift this feature to the next release if there is no need to hurry... |
496d465 to
276140d
Compare
jkloetzke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all I'm not sure how the interaction between bundles- and non-bundles-builds should work. That is, a previous build used one method and the next the other approach. There are 4 possibilities:
- Bundles build -> bundles build: The bundle was extracted in the first build. The next build will skip the checkout step.
- Regular build -> regular build: as always
- Regular build -> bundles build: should this do nothing? Obviously the bundle should not be extracted again. But what if the bundle was updated for an indeterministic checkout? We cannot overwrite source code.
- Bundles build -> regular build: I think we must prevent the SCMs from running. Otherwise they will immediately collide with the existing files in the workspace.
My gut feeling is that these corner cases need more thought...
doc/manpages/bob-build-dev.rst
Outdated
| Do not add packages matching (fnmatch) RE to the bundle. | ||
|
|
||
| ``--bundle-indeterministic {yes,no,fail}`` | ||
| Control how indeterministic sources are added to bundles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an explanation what yes/no/fail mean?
| zip-file containing all sources needed to build the package. This also enables | ||
| `--always-checkout`. | ||
|
|
||
| ``--bundle-exclude BUNDLE_EXCLUDE`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this a regex? That would be more consistent with --always-checkout and --download=packages= and --download-layer=.
doc/manpages/bob-build.rst
Outdated
| [--install | --no-install] | ||
| [--sandbox | --slim-sandbox | --dev-sandbox | --strict-sandbox | --no-sandbox] | ||
| [--clean-checkout] [--attic | --no-attic] | ||
| [--bundle BUNDLE] [--bundle-exclude BUNDLE_EXCLUDE] [--unbundle] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this needs some updates. --bundle-indeterministic and --bundle-vcs are missing.
doc/tutorial/compile.rst
Outdated
| Using source bundles | ||
| ==================== | ||
|
|
||
| A source code bundle is a tar file with all the sources needed to build a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole sections needs some update too. The bundle is a zip file by now...
| def canDownload(self): | ||
| return True | ||
|
|
||
| def canDownloadSrc(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to touch the JenkinsArchive. It is some internal backend that is only used for Jenkins jobs. As such, it will never up-/download sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I need to add these methods since they are called from
await self._cookCheckoutStep(step, depth)\n File "/home/ralf/projects/bob/pym/bob/builder.py", line 1237, in _cookCheckoutStep
if self.__archive.canDownloadSrc():
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. 🤦 But it does not need to up-/download sources. So just return false here and at least you do not need to bother with uploadPackage...
pym/bob/archive.py
Outdated
| if self.__bundle: | ||
| self.__temp = TemporaryDirectory(dir=os.getcwd(), | ||
| prefix=".bundle") | ||
| os.makedirs(self.__temp.name, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is necessary to create the directory. The TemporaryDirectory will have securely created it.
pym/bob/archive.py
Outdated
| self.__srcUploadIndeterministic = spec.get("src-upload-indeterministic", "no") | ||
| spec["flags"] = ["src-upload" if self.__bundle else "src-download"] | ||
| if self.__bundle: | ||
| self.__temp = TemporaryDirectory(dir=os.getcwd(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this leak the temporary directory if the build fails?
| if self.__archive.canDownloadSrc(): | ||
| audit= os.path.join(os.path.dirname(checkoutStep.getWorkspacePath()), | ||
| "audit.json.gz") | ||
| wasDownloaded = await self.__archive.downloadPackage(checkoutStep, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very dangerous. It could overwrite existing workspaces. Downloading something must only ever happen into a freshly created workspace.
pym/bob/builder.py
Outdated
| assert predicted, "Non-predicted incorrect Build-Id found!" | ||
| self.__handleChangedBuildId(checkoutStep, checkoutHash) | ||
|
|
||
| if self.__archive.canUploadSrc(checkoutStep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we would upload every time the checkout step is visited. This includes when the checkout was skipped! I think the upload should only be done after a fresh checkout, like the live built-Id. That may be different for bundles but otherwise it will create a lot of traffic with the artifact server.
I'd say deterministic packages will stay as they are as long as they are not changed. For indeterministic packages or if the checkout was updated there might be some attic logic needed...
IMO we also need to move the workspace to attic and run the scm-checkout.
I don't have a use-case for anything but 1 and 2. Maybe we can just forbid 3 and 4 somehow until there is a need for it? BTW - I'm doing some testing with a real project ATM. Besides two packages triggering the "Hash doesn't match after unbundle" error (I'm look into it) it was necessary to disable the Also it was necessary to change the layers from managed, ... |
|
The failure on name resolution errors should be fixed with the current master. Does it still happen with the latest version? They should be treated like the artifact was not found by now. |
|
Yes, this happens on -> https://github.com/BobBuildTool/bob/blob/master/pym/bob/archive.py#L459 I did a quick test on my system and it seams the handling of socket.gaierror is missing there... |
Grmpf. You looked (like me) at the wrong spot because the error is handled 5 lines above. But there is another path for live build-id. Hopefully fixed with 5c7e356. |
Sounds good. So whenever we change between bundle-/non-bundle mode the sources will be moved to the attic. I'm unsure what to do of the indeterministic bundle packages. Is this a use case on your side? If not, I would forbid them. |
I think we have the same question/issue for regular src-upload / src-download vs. scm-checkouts? If the sources are downloaded in first place but couldn't be downloaded after a recipe change we can not simply run the regular checkout in this workspace. (Also extracting the new sources in the same workspace could lead to a different result.) So every time a checkout should run we need to move modified existing sources to attic and run the checkout in a fresh workspace. Unmodified sources can be deleted? Or do I miss something? |
Right. Sounds like the safest and easiest option. We should prevent re-runs for downloaded indeterministic checkouts though. Unless the bundle is changed, it will yield the same result and just create churn. BTW, are indeterministic bundled checkouts needed?
In principle yes. Currently we don't make a checksum before the update. That would be a new case for downloaded source workspaces. I wouldn't mind if we just consistently move the workspace to the attic instead if that turns out to be hard to implement. |
I don't have a usecase for them |
This method is called by the builder once the build has been finished successfully.
The bundle archive is for up/download of sources only. It uses a temporary directory in the workspace to collect all sources needed to build the package. The temporary directory is in the workspace since the typical temp-dir localtion might not have enought free space. The finish method puts all the sources in the temp-dir into an uncomporessed zip file. When downloading sources from this Archive the sources are extracted from the zip-file again.
And use the builder functions to enable and finish bundling.
Since 27d791f the error codes of subprocesses are returned if such a command fails. Make expect_fail happy with anything but zero.
Move the affected sources to attic whenever the bundle state changes.
|
I found some time to continue on this, but now I'm stuck. For the tests I'd like to test that we only bundle sources as they are referenced by the recipes without any local modifications. But for url-scms the |
No, it doesn't. Bob relies on git and svn to tell us when something was changed. For the URL SCM, no state is tracked internally. :/ |
Bundle the results of all checkoutSteps needed to build a package and provide a bundle-config file to be able to rebuild the package only with the bundle.
Such a bundle can be used to build on a air-gapped system or to archive all sources of a build. When building from the bundle only the bundle is extracted but no other
checkoutScriptsare run.There is one issue with the actual URL-Scm extraction. The source workspace contains both, the original downloaded file and the extracted sources. This unnecessarily doubles the size of the bundle and - since the bundle-extraction uses the UrlScm as well - produces a different workspace hash when the bundle is extracted. That's why I changed to download the original file into
workspace/../_downloadwhere also the.extractedfile is placed. This change makes the unittest-failing ATM as the../_downloadfolder is always the same when using a tempDir.I'm not sure how to proceed here:
workspace/.bob-download/) for the downloaded files and exclude this directory when hashing / bundling?