Hero Banner

Microsoft AppSource and Azure Marketplace

Learn how to grow your business by publishing your cloud solution on Microsoft AppSource and Azure Marketplace

Reply
Highlighted
Level 2 Contributor

Handling data disk in virtual machine offers

I'm preparing a virtual machine offer that needs enough IOPS in order to run smoothly. The only way I found was either to make sure the original VHD was big enough, or add a data disk that is big enough for our database to read/write into. While I find the data disk solution cleaner, I'm wondering, going through the following doc: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/add-disk

 

  • Is there no way to automount a data disk on creation?
  • If not, can I know in advance which device it will land on? Or which UUID?
  • After creating the VHDs (os and data) and providing them to the marketplace, will they keep the same device id? Or same UUID?
  • What is the right way to handle this?

The idea being that if we force the usage of a data disk, we need to make sure it is mounted on the proper path when the user launch the offer. Otherwise, our product would fail on boot.

1 ACCEPTED SOLUTION

Accepted Solutions
Level 2 Contributor

Re: Handling data disk in virtual machine offers

I actually finished playing around and here are my observations:

 

TL;DR I decided to go with /dev/disk/azure/scsi1/lun0 and /dev/disk/azure/scsi1/lun0-part1

 

> The variability of the landing spot for the disks was something I worried over. I was thinking about what to do in that case; basically mapping out which data disk isn't in use would handle this last bit and would add a few more lines to the bash script.

 

I do agree this kind of scripting is doable but I'd rather avoid any kind of scripting if possible, just in case it could lead to unforeseen issues...

 

About UUIDs, I tried making a VHD from a data disk, duplicating it, making two managed disks out of them, and ended up with devices with the same system UUID, partition UUID and partition label:

 

/dev/sdc1: UUID="5da212c2-6055-4316-8dd6-4a9bffa1335d" TYPE="ext4" PARTLABEL="FRONTLINE_DATA" PARTUUID="b61d94e3-1442-41ca-879b-529e38c84bb4"
/dev/sdd1: UUID="5da212c2-6055-4316-8dd6-4a9bffa1335d" TYPE="ext4" PARTLABEL="FRONTLINE_DATA" PARTUUID="b61d94e3-1442-41ca-879b-529e38c84bb4"

Which kind of bugs me...

 

Then, I found there is actually a automatic label system inside /dev/disk/azure/scsi1 which links every data disk device to their lun number : lun0, lun1, etc. Partition are named lun0-part1, lun0-part2..., etc.

 

Then I made the following install script that only need to be launched before creating the VHD:

 

# at this point, only lun0 exists, the next command will create lun0-part1
parted /dev/disk/azure/scsi1/lun0 --script mklabel gpt mkpart FRONTLINE_DATA ext4 0% 100%
# actually prepares the disk be used
mkfs.ext4 /dev/disk/azure/scsi1/lun0-part1

# create a landing directory and mount
mkdir /data
mount /dev/disk/azure/scsi1/lun0-part1 /data

Added the following to /etc/fstab:

 

/dev/disk/azure/scsi1/lun0-part1 /data ext4 defaults,nofail 0 0

And that was pretty much it.

 

Not sure if the azure/scsi1 is actually a feature or not as I only tested it on Debian 10 with backports. It does make the job really easy however.

View solution in original post

8 REPLIES 8
Visitor 1

Re: Handling data disk in virtual machine offers

Hey, 

 

So I'll preface this by saying that we are Windows Server shop, so I can't comment specifically on the Linux specifics of the data disk.

 

I will share with you how we deployed our solution to Azure Marketplace with a data disk.

 

1) I created my VM as normal

2) I then created a data disk which I mounted as E:\ in the VM so I could store my data on it. my C:\ is my OS disk and the D:\ is temp storage

3) I then Captured my VM and created my Image

4) As a test I created a new VM from my image and the data disk did apply with the same E:\ and the volume label was also retained which was nice.

5) I submitted my offer and when adding the Sku I added the SAS URI for the OS disk then I added the SAS URI also for the data disk, both of which were out on the vhds folder of my storage account.

6) To verify everything was sticky, I was able to create a new VM in Azure from my Marketplace offering and the data disk drive letter and volume name were retained.

 

I'm not sure that helps, hopefully it will.

 

Michael

 

Microsoft

Re: Handling data disk in virtual machine offers

I've asked around with my colleagues and we all agree that the solution with minimal complications is a single VHD which is large enough to house your application. That said, you may have extenuating circumstances which require a data disk. For that, I have some questions:

  1. At ship time, what goes on to the Data Disk?
  2. Is this simply a volume of some size that has been formatted and mounted but otherwise empty or will it have application data on it when you go into the Azure Marketplace?
  3. How much space is required for this Data Disk?

All that said, let's assume that you need to go down this path.

 

To automount the data disk on creation, you can put a script that automatically runs on boot which then updates fstab. If the script runs to completion, delete the script so it never runs again.

 

An added disk will land on the same device, dependent on the base OS for your image. For example, Redhat Enterprise Linux version 8 will have /dev/sda and /dev/sdb used by the OS disk, leaving the data disk at /dev/sdc.

 

As for the UUID, I created two VMs using the same base VHD with datadisk.

First pass:

/dev/mapper/rootvg-datavol: UUID="02e9689c-fc7f-4a47-91e7-40fee8c3dffe" TYPE="ext4"

 

Second pass:

/dev/mapper/rootvg-datavol: UUID="0ab414ac-646f-42af-9fa4-26259c851b30" TYPE="ext4"

 

So, no, the UUID is not consistent across instances, though the UUIDs on the OS Disk do stay consistent; it's possible I made a mistake somewhere in my testing here.

 

So, what does the one-time run startup script look like to do this magic? I came up with the following which assumes the disk mounts as /dev/sdc, a volume of 128 GB (formatted to 127 GB) and a desired path of /var/datavol. Please edit as needed for your situation. Also, this is "sample grade" code, which means that YMMV when asking for support. Please test to validate that it works in your scenarios. Finally, note that the sudo calls may be able to be removed if executed as a script under the right identity. I tested the commands below from SSH.

# Set a label on the device
sudo parted /dev/sdc mklabel msdos

# Create the partition
sudo parted -a optimal -s /dev/sdc mkpart primary 1 100%

# Set the partition as an LVM volume
sudo parted /dev/sdc set 1 lvm on

# Create the physical volume
sudo pvcreate /dev/sdc1

# Extend rootvg to this volume
sudo vgextend rootvg /dev/sdc1
sudo lvcreate -L 127GB -n datavol rootvg

# Set the file system
sudo mkfs -t ext4 /dev/rootvg/datavol

# Create the mount point
sudo mkdir /var/datavol

# Note: echoing directly to /etc/fstab did not work, so we do it locally.
sudo cp /etc/fstab .
sudo chmod 666 fstab

# Get blkid of rootvg-datavols
datavol=$(sudo blkid | grep 'rootvg-datavol')
startuuid=$(echo $datavol | \grep -aob '"' | head -n1 | cut -d: -f1)
enduuid=$(echo $datavol | \grep -aob '"' | head -n2 | tail -n1 | cut -d: -f1)
let "startuuid = $startuuid +1"
let "enduuid = $enduuid - $startuuid"
blockid=$(echo ${datavol:$startuuid:$enduuid})

# Edit the fstab file so that things mount every time.
sudo echo -e "UUID=$blockid\t/var/datavol\text4\tdefaults,nofail\t1" >> fstab

# Copy the edits back
sudo chmod 644 fstab
sudo cp fstab /etc/fstab
sudo rm fstab

# Extra, if needed: make sure the LVM is mounted, it will auto-mount on reboot
sudo mount /var/datavol

 

Level 2 Contributor

Re: Handling data disk in virtual machine offers

  1. At ship time, what goes on to the Data Disk?
  2. Is this simply a volume of some size that has been formatted and mounted but otherwise empty or will it have application data on it when you go into the Azure Marketplace?
  3. How much space is required for this Data Disk?

I appreciate the time taken for this great, thorough and detailed response! Here are my answers:

 

1 & 2. It should be empty when booting the first time

3. The only requirement is "big enough so we can have enough IOPS to boot the database properly as the database behaves weirdly with low throughput disks": roughly 128/256 GB

 

We don't actually need to have a data disk to meet this requirement as we could increase the os disk default size and be good to go. However, I understand having using a data disk can ease user experience when migrating to a newer version of our product: keep the data disk and you are good to go. In the os disk case, it would be more time consuming migrating from one version to the other. (Unless there is a method for this use case inside Azure!)

 

Hence the data disk. If the device path is always the same, indexed by the number of the data disk (I guess data disk 0 will always be /dev/sdc, 1 /dev/sdd and so on then) is good for me.

 

I also like the one shot method provided to write into fstab on first boot only, as long as the UUID is the same for the life duration of the data disk once it is made from a VHD.

Microsoft

Re: Handling data disk in virtual machine offers

As an additional note - since the default caching policy for OS disks is "read/write: On"  - and this setting is recommended for OS performance, it is generally not a good practice to put a database on the OS disk, as also mentioned in the SQL Server guidance for Azure VMs. With write caching on it might lead to DB consistency issues when there is a problem that causes the VM to suddenly fail. From this perspective I would definitely recommend to use a data disk. Just my 2cts :-)

Level 2 Contributor

Re: Handling data disk in virtual machine offers

Made some tests, data disk ended up in /dev/sda once... which does not reassure me at all. Still trying to find a way to predict on which device the data disk will land :/

Level 2 Contributor

Re: Handling data disk in virtual machine offers

Looks like /dev/disk/azure/scsi1/lun0 always point to the first data disk device. Will try scripting over this.

Microsoft

Re: Handling data disk in virtual machine offers

The variability of the landing spot for the disks was something I worried over. I was thinking about what to do in that case; basically mapping out which data disk isn't in use would handle this last bit and would add a few more lines to the bash script.

Level 2 Contributor

Re: Handling data disk in virtual machine offers

I actually finished playing around and here are my observations:

 

TL;DR I decided to go with /dev/disk/azure/scsi1/lun0 and /dev/disk/azure/scsi1/lun0-part1

 

> The variability of the landing spot for the disks was something I worried over. I was thinking about what to do in that case; basically mapping out which data disk isn't in use would handle this last bit and would add a few more lines to the bash script.

 

I do agree this kind of scripting is doable but I'd rather avoid any kind of scripting if possible, just in case it could lead to unforeseen issues...

 

About UUIDs, I tried making a VHD from a data disk, duplicating it, making two managed disks out of them, and ended up with devices with the same system UUID, partition UUID and partition label:

 

/dev/sdc1: UUID="5da212c2-6055-4316-8dd6-4a9bffa1335d" TYPE="ext4" PARTLABEL="FRONTLINE_DATA" PARTUUID="b61d94e3-1442-41ca-879b-529e38c84bb4"
/dev/sdd1: UUID="5da212c2-6055-4316-8dd6-4a9bffa1335d" TYPE="ext4" PARTLABEL="FRONTLINE_DATA" PARTUUID="b61d94e3-1442-41ca-879b-529e38c84bb4"

Which kind of bugs me...

 

Then, I found there is actually a automatic label system inside /dev/disk/azure/scsi1 which links every data disk device to their lun number : lun0, lun1, etc. Partition are named lun0-part1, lun0-part2..., etc.

 

Then I made the following install script that only need to be launched before creating the VHD:

 

# at this point, only lun0 exists, the next command will create lun0-part1
parted /dev/disk/azure/scsi1/lun0 --script mklabel gpt mkpart FRONTLINE_DATA ext4 0% 100%
# actually prepares the disk be used
mkfs.ext4 /dev/disk/azure/scsi1/lun0-part1

# create a landing directory and mount
mkdir /data
mount /dev/disk/azure/scsi1/lun0-part1 /data

Added the following to /etc/fstab:

 

/dev/disk/azure/scsi1/lun0-part1 /data ext4 defaults,nofail 0 0

And that was pretty much it.

 

Not sure if the azure/scsi1 is actually a feature or not as I only tested it on Debian 10 with backports. It does make the job really easy however.

View solution in original post