Skip to content

Conversation

@paragao
Copy link

@paragao paragao commented Oct 3, 2025

Issue #, if available: N/A

Description of changes:
Customers that want to share a single FSx for Lustre filesytem to different Sagemaker Hyperpod clusters need to update the FSx for Lustre ENIs with the new security group. FSx for Lustre scale the number of ENI depending on the size of storage it uses. For small filesystems (few TB), you may have just a few ENIs. For larger filesystems (hundreds of TB or PB), FSx for Lustre scale the number of ENIs (you can have dozens or hundreds).

This helper tool will automate adding the new security group to the existing file system ENIs.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


# First get one network interface then describe the network interface to get existing Security Groups attached
fsx_id_enis=$(aws fsx describe-file-systems "${awscli_args[@]}" --query 'FileSystems[0].NetworkInterfaceIds' --output text)
existing_sg=$(aws ec2 describe-network-interfaces "${awscli_args[@]}" --network-interface-ids $temp_eni_id --query 'NetworkInterfaces[0].Groups[*].GroupId' --output text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$temp_eni_id is undefined and is referenced here, it'll fail.

Need to define "temp_eni_id"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to be fsx_id_enis (as we define in line 83 above).

;;
*)
[[ "$fsx_id" == "" ]] \
&& $fsx_id="$key" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be fsx_id="$key"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to check for both. The syntax do:

  1. Test if fsx_id is empty
  2. If it is true, then it assigns key to fsx_id.

If the user does not provide a fsx_id as part of the arguments, then the test will be FALSE. The fsx_id won't get assigned a value which will cause the script to exit.

fi

echo -e "Amazon FSx for Lustre filesystem: ${GREEN}${fsx_id}${NC}"
echo -e "Existing security groups attached on the filesystem: ${GREEN}${$existing_sg}${NC}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect variable syntax: ${$existing_sg} -> ${existing_sg}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


echo -e "Amazon FSx for Lustre filesystem: ${GREEN}${fsx_id}${NC}"
echo -e "Existing security groups attached on the filesystem: ${GREEN}${$existing_sg}${NC}"
echo -e "Adding security group ID: ${GREEN}${$security_group}${NC}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, incorrect variable syntax: ${$security_group} -> ${security_group}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Finally update the ENI to add the new security groups plus the existing security groups
for i in $fsx_id_enis; do
echo -e "Adding ${GREEN}${$security_group} to ENI $GREEN}${$i}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, ${$security_group} -> ${security_group} and ${$i} -> ${i}
Additionally, missing { before GREEN, should be ${GREEN}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Finally update the ENI to add the new security groups plus the existing security groups
for i in $fsx_id_enis; do
echo -e "Adding ${GREEN}${$security_group} to ENI $GREEN}${$i}"
$(aws ec2 modify-network-interface-attribute "${awscli_args[@]}" --network-interface-id $i --groups $existing_sg $security_group)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the command substitution here?
We just executing the command, right? We don't need to capture the output?

shouldn't be below instead?:
aws ec2 modify-network-interface-attribute "${awscli_args[@]}" --network-interface-id $i --groups $existing_sg $security_group

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either work, but you are right. I'm not capturing the output; I care about the exit signal only.

YELLOW='\033[1;33m'
NC='\033[0m' # No Color

escape_spaces() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function being used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed.

security groups to the FSx for Lustre ENIs"

# First get one network interface then describe the network interface to get existing Security Groups attached
fsx_id_enis=$(aws fsx describe-file-systems "${awscli_args[@]}" --query 'FileSystems[0].NetworkInterfaceIds' --output text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting ENIs for the first fsx found, and not really for defined $fsx_id. If there are multiple fsx, this will cause issues and might cause modifying wrong SG.

We should be using $fsx_id here to make sure we are getting ENIs for correct fsx.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, this is getting the ENIs for the $fsx_id defined on the arguments passed to the script. If the user does not pass an fsx_id on the arguments, the script doesn't run.

when the user pass the required arguments, then the parse_args function add the $fsx_id variable to the $awscli_args variable. Then on this CLI bash will expand ${awscli_args[@]} and use the required fsx_id as part of the command. It will then query the ENIs for the fsx_id passed as an argument to this function.

This is the right syntax.

Copy link
Author

@paragao paragao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed PR feedback and committed changes.

security groups to the FSx for Lustre ENIs"

# First get one network interface then describe the network interface to get existing Security Groups attached
fsx_id_enis=$(aws fsx describe-file-systems "${awscli_args[@]}" --query 'FileSystems[0].NetworkInterfaceIds' --output text)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, this is getting the ENIs for the $fsx_id defined on the arguments passed to the script. If the user does not pass an fsx_id on the arguments, the script doesn't run.

when the user pass the required arguments, then the parse_args function add the $fsx_id variable to the $awscli_args variable. Then on this CLI bash will expand ${awscli_args[@]} and use the required fsx_id as part of the command. It will then query the ENIs for the fsx_id passed as an argument to this function.

This is the right syntax.

YELLOW='\033[1;33m'
NC='\033[0m' # No Color

escape_spaces() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed.

;;
*)
[[ "$fsx_id" == "" ]] \
&& $fsx_id="$key" \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to check for both. The syntax do:

  1. Test if fsx_id is empty
  2. If it is true, then it assigns key to fsx_id.

If the user does not provide a fsx_id as part of the arguments, then the test will be FALSE. The fsx_id won't get assigned a value which will cause the script to exit.

# Finally update the ENI to add the new security groups plus the existing security groups
for i in $fsx_id_enis; do
echo -e "Adding ${GREEN}${$security_group} to ENI $GREEN}${$i}"
$(aws ec2 modify-network-interface-attribute "${awscli_args[@]}" --network-interface-id $i --groups $existing_sg $security_group)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either work, but you are right. I'm not capturing the output; I care about the exit signal only.


# Finally update the ENI to add the new security groups plus the existing security groups
for i in $fsx_id_enis; do
echo -e "Adding ${GREEN}${$security_group} to ENI $GREEN}${$i}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


echo -e "Amazon FSx for Lustre filesystem: ${GREEN}${fsx_id}${NC}"
echo -e "Existing security groups attached on the filesystem: ${GREEN}${$existing_sg}${NC}"
echo -e "Adding security group ID: ${GREEN}${$security_group}${NC}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

fi

echo -e "Amazon FSx for Lustre filesystem: ${GREEN}${fsx_id}${NC}"
echo -e "Existing security groups attached on the filesystem: ${GREEN}${$existing_sg}${NC}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# First get one network interface then describe the network interface to get existing Security Groups attached
fsx_id_enis=$(aws fsx describe-file-systems "${awscli_args[@]}" --query 'FileSystems[0].NetworkInterfaceIds' --output text)
existing_sg=$(aws ec2 describe-network-interfaces "${awscli_args[@]}" --network-interface-ids $temp_eni_id --query 'NetworkInterfaces[0].Groups[*].GroupId' --output text)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to be fsx_id_enis (as we define in line 83 above).

Copy link
Contributor

@nghtm nghtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving - this is a utility script, as long as it is tested and works, LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants