Skip to content

py_zipapp duplicates shared libraries #3857

Description

@gwjo

🐞 bug report

Affected Rule

The issue is caused by the rule: py_zipapp_binary

With the following settings

common --@rules_python//python/config_settings:bootstrap_impl=script
common --@rules_python//python/config_settings:venvs_site_packages=yes
common --@rules_python//python/config_settings:venvs_use_declare_symlink=yes

Is this a regression?

No

Description

Shared libraries that shipped in pip packages are duplicated in the zipapp, once in the venv path and once in the wheel run files path. For native heavy packages, such as torch, this can significantly increase the size of the resulting zip file.

This is related to the closed #3439, but there the issue was packaging the run files using third-party rules.

The run file layout has both paths as symlinks pointing to the real file, but when building the zip file one is created as a symlink and the other as a root_symlink. These are added separately into the zip manifest and don't get deduped tools/private/zipapp/zipper.py and instead get byte-copied into the zip file.

I would expect these to be deduped, so that only one copied into the zip file and the other becomes a symlink to that.

🔬 Minimal Reproduction

# BUILD.bazel
load("@rules_python//python:py_binary.bzl", "py_binary")
load("@rules_python//python:py_zipapp_binary.bzl", "py_zipapp_binary")

py_binary(
    name = "demo",
    srcs = ["demo.py"],
    deps = ["@pypi//grpcio"],
)

py_zipapp_binary(name = "demo_zip", binary = ":demo")
$ bazel build //:demo_zip
$ unzip -l bazel-bin/demo_zip.pyz | grep cygrpc
   2419632  1980-01-01 00:00   runfiles/rules_python++pip+pypi_311_grpcio_.../site-packages/grpc/_cython/cygrpc.cpython-311-x86_64-linux-gnu.so
   2419632  1980-01-01 00:00   runfiles/_main/<pkg>/_demo.venv/lib/python3.11/site-packages/grpc/_cython/cygrpc.cpython-311-x86_64-linux-gnu.so

🔥 Exception or Error

No exceptions or errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions