Hmmm... I did get that the wrong way around: 1 vCPU VMs can perform worse than 2 vCPU VMs during UEFI PXE boot.
If you add to the VM's configuration file:
vnet.recvClusterSize = "1"
does the 1 vCPU VM's performance improve?
It probably still won't approach the performance of a physical machine, though. The architecture of UEFI does not permit us to use interrupts for NIC events -- we must use polling -- and polling is very bad when we need to steal CPU time from the host OS in a shared environment. We've made a few compromises to achieve a balance between TFTP throughput and not causing too much host CPU usage while the firmware is running (i.e. limiting the polling rate), and those compromises do make it somewhat unlikely that we will be able to match the performance of a native firmware implementation (which realistically can use just as much CPU as it wants... no one will care).
There's probably more that we can do here to improve our virtual UEFI firmware's TFTP performance if we have the time and opportunity to dig deeper...
Cheers,
--
Darius