-
Notifications
You must be signed in to change notification settings - Fork 369
feat: Autocast #3878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Autocast #3878
Conversation
| enable_autocast: bool = _defaults.ENABLE_AUTOCAST, | ||
| low_precision_type: Optional[ | ||
| Union[torch.dtype, dtype] | ||
| ] = _defaults.LOW_PRECISION_TYPE, | ||
| nodes_to_exclude: Collection[str] = _defaults.NODES_TO_EXCLUDE, | ||
| targets_to_exclude: Collection[Target] = _defaults.TARGETS_TO_EXCLUDE, | ||
| data_max: float = _defaults.DATA_MAX, | ||
| max_depth_of_reduction: Optional[int] = _defaults.MAX_DEPTH_OF_REDUCTION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging, these args should be added to other compile functions in this file.
| ]: | ||
| # GEMM: A (M, K) @ B (K, N) = C (M, N) | ||
| self.reduction_depth = input_0_dims[-1] | ||
| # TODO: Add more reduction ops here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any more reduction targets be added?
py/torch_tensorrt/dynamo/lowering/passes/rule_based_autocast.py
Outdated
Show resolved
Hide resolved
peri044
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update the documentation at https://github.com/pytorch/TensorRT/blob/main/docsrc/user_guide/mixed_precision.rst
py/torch_tensorrt/dynamo/lowering/passes/rule_based_autocast.py
Outdated
Show resolved
Hide resolved
py/torch_tensorrt/dynamo/lowering/passes/rule_based_autocast.py
Outdated
Show resolved
Hide resolved
py/torch_tensorrt/dynamo/lowering/passes/rule_based_autocast.py
Outdated
Show resolved
Hide resolved
|
For Tests
L1 or L2 tests |
| If we compile the above model using Torch-TensorRT, layer profiling logs indicate that all the layers are | ||
| run in FP32. This is because TensorRT picks the kernels for layers which result in the best performance. | ||
| If we compile the above model using Torch-TensorRT with the following settings, layer profiling logs indicate that all the layers are | ||
| run in FP32. This is because TensorRT picks the kernels for layers which result in the best performance (i.e., weak typing in TensorRT). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to reorient around strong typing first and then weak typing as an optimization. Right now this is a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So like in the tutorial
- Demonstrate strong typing and explain that its going to be the default behavior
- Show the weak typing behavior and talk about how the trt graph changed (and maybe why)
- Show how you can recover the weak typing behavior using auto cast for trt 11 and beyond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since TRT has deprecated weak typing, should we mention weak typing is deprecated so need to use autocast instead? Thus, we have only two modes:
User defineds precision: use_explicit_typing=True + enable_autocast=False
Autocast chooses precision: use_explicit_typing=True + enable_autocast=True
| Autocast | ||
| --------------- | ||
|
|
||
| Weak typing behavior in TensorRT is deprecated. However it is a good way to maximize performance. Therefore, in Torch-TensorRT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However mixed precision is a good way to maximize performance
| reduced precision on the rest of the nodes. Torch-TensorRT Autocast also supports users to specify which nodes to exclude from Autocast, | ||
| considering some nodes might be more sensitive to affecting accuracy. In addition, Torch-TensorRT Autocast can cooperate with PyTorch | ||
| native Autocast, allowing users to use both PyTorch and Torch-TensorRT Autocast in the same model. Torch-TensorRT respects the precision | ||
| of the nodes within PyTorch Autocast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the difference between PyTorch and Torch-TensorRT autocast?
| @@ -0,0 +1,70 @@ | |||
| import torch | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments to this doc? Here is an example of what im looking for https://docs.pytorch.org/TensorRT/tutorials/_rendered_examples/dynamo/converter_overloading.html
| return out | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know its not best practice but lets just make them pure scripts so they render better
| pre_lowering_pass_list = [ | ||
| remove_detach, | ||
| remove_assert_nodes, | ||
| rule_based_autocast, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this pass be conditionally added to the pre_lowering_pass_list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a condition inside of rule_based_autocast
narendasan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice its looking good, some final polishing details then I think its good to go
Description
Weak typing behavior in TensorRT is deprecated. However it is a good way to maximize performance. Therefore, we want to create similar PyTorch native system to use with Torch-TensorRT that recovers some of this behavior.
Fixes #3869
Type of change
Checklist: