3.238.118.78

ESXi 7.0 and Mellanox ConnectX 2 – support fix patch

ESXi 7.0 and Mellanox ConnectX 2 – support fix patch

I upgraded vCenter to version 7 successfully but failed when it came to updating my hosts from 6.7 to 7.

I got some warning stating PCI devices were incompatible but tried anyways. Turns out that failed, my Mellanox ConnectX 2 wasn’t showing up as an available physical NIC.

At first It was necessary to have VID/DID device code for MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s].

Deprecated devices supported by VMKlinux drivers

Devices that were only supported in 6.7 or earlier by a VMKlinux inbox driver. These devices are no longer supported because all support for VMKlinux drivers and their devices have been completely removed in 7.0.

How to fix it? I tuned small script ESXi7-enable-nmlx4_co.v00.sh to DO IT. Notes:

  • edit patch to Your Datastore example is /vmfs/volumes/ISO
  • nmlx4_co.v00.orig is backup for original nmlx4_co.v00
  • New VIB is without signatures – ALERT message will be in log during reboot:
  • ALERT: Failed to verify signatures of the following vib
  • ESXi reboot is needed for load new driver
  • cp /bootbank/nmlx4_co.v00 /vmfs/volumes/ISO/nmlx4_co.v00.orig
    cp /bootbank/nmlx4_co.v00 /vmfs/volumes/ISO/n.tar
    cd /vmfs/volumes/ISO/
    vmtar -x n.tar -o output.tar
    rm -f n.tar
    mkdir tmp-network
    mv output.tar tmp-network/output.tar
    cd tmp-network
    tar xf output.tar
    rm output.tar
    echo '' >> /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
    echo 'regtype=native,bus=pci,id=15b36750..............,driver=nmlx4_core' >> /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
    cat /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
    echo '        6750  Mellanox ConnectX-2 Dual Port 10GbE '                 >> /vmfs/volumes/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids 
    cat /vmfs/volumes/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids
    tar -cf /vmfs/volumes/ISO/FILE.tar *
    cd /vmfs/volumes/ISO/
    vmtar -c FILE.tar -o output.vtar
    gzip output.vtar
    mv output.vtar.gz nmlx4_co.v00
    rm FILE.tar
    cp /vmfs/volumes/ISO/nmlx4_co.v00 /bootbank/nmlx4_co.v00

    Scripts add HW ID support in file nmlx4_core.map:

    *********************************************************************
    /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
    *********************************************************************
    regtype=native,bus=pci,id=15b301f6..............,driver=nmlx4_core
    regtype=native,bus=pci,id=15b301f8..............,driver=nmlx4_core
    regtype=native,bus=pci,id=15b31003..............,driver=nmlx4_core
    regtype=native,bus=pci,id=15b31004..............,driver=nmlx4_core
    regtype=native,bus=pci,id=15b31007..............,driver=nmlx4_core
    regtype=native,bus=pci,id=15b3100715b30003......,driver=nmlx4_core
    regtype=native,bus=pci,id=15b3100715b30006......,driver=nmlx4_core
    regtype=native,bus=pci,id=15b3100715b30007......,driver=nmlx4_core
    regtype=native,bus=pci,id=15b3100715b30008......,driver=nmlx4_core
    regtype=native,bus=pci,id=15b3100715b3000c......,driver=nmlx4_core
    regtype=native,bus=pci,id=15b3100715b3000d......,driver=nmlx4_core
    regtype=native,bus=pci,id=15b36750..............,driver=nmlx4_core
    ------------------------->Last Line is FIX
    ________________________________________

    And add HW ID support in file nmlx4_core.ids:

    **************************************************************************************
    /vmfs/volumes/FreeNAS/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids 
    **************************************************************************************
    #
    # This file is mechanically generated.  Any changes you make
    # manually will be lost at the next build.
    #
    # Please edit _devices.py file for permanent changes.
    #
    # Vendors, devices and subsystems.
    #
    # Syntax (initial indentation must be done with TAB characters):
    #
    # vendor  vendor_name
    #       device  device_name                            <-- single TAB
    #               subvendor subdevice  subsystem_name    <-- two TABs
    
    15b3  Mellanox Technologies
            01f6  MT27500 [ConnectX-3 Flash Recovery]
            01f8  MT27520 [ConnectX-3 Pro Flash Recovery]
            1003  MT27500 Family [ConnectX-3]
            1004  MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
            1007  MT27520 Family [ConnectX-3 Pro]
                    15b3 0003  ConnectX-3 Pro VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE (MCX354A-FCC)
                    15b3 0006  ConnectX-3 Pro EN network interface card 40/56GbE dual-port QSFP(MCX314A-BCCT )
                    15b3 0007  ConnectX-3 Pro EN NIC; 40GigE; dual-port QSFP (MCX314A-BCC)
                    15b3 0008  ConnectX-3 Pro VPI adapter card; single-port QSFP; FDR IB (56Gb/s) and 40GigE (MCX353A-FCC)
                    15b3 000c  ConnectX-3 Pro EN NIC; 10GigE; dual-port SFP+ (MCX312B-XCC)
                    15b3 000d  ConnectX-3 Pro EN network interface card; 10GigE; single-port SFP+ (MCX311A-XCC)
            6750  Mellanox ConnectX-2 Dual Port 10GbE
    -------->Last Line is FIX

    After reboot I could see support for MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s].

    Only ALERT: Failed to verify signatures of the following vib(s): [nmlx4-core].

    2020-XX-XXTXX:XX:44.473Z cpu0:2097509)ALERT: Failed to verify signatures of the following vib(s): [nmlx4-core]. All tardisks validated
    2020-XX-XXTXX:XX:47.909Z cpu1:2097754)Loading module nmlx4_core ...
    2020-XX-XXTXX:XX:47.912Z cpu1:2097754)Elf: 2052: module nmlx4_core has license BSD
    2020-XX-XXTXX:XX:47.921Z cpu1:2097754) nmlx4_core: init_module called
    2020-XX-XXTXX:XX:47.921Z cpu1:2097754)Device: 194: Registered driver 'nmlx4_core' from 42
    2020-XX-XXTXX:XX:47.921Z cpu1:2097754)Mod: 4845: Initialization of nmlx4_core succeeded with module ID 42.
    2020-XX-XXTXX:XX:47.921Z cpu1:2097754)nmlx4_core loaded successfully.
    2020-XX-XXTXX:XX:47.951Z cpu1:2097754) nmlx4_core: 0000:05:00.0: nmlx4_core_Attach - (nmlx4_core_main.c:2476) running
    2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
    2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
    2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
    2020-XX-XXTXX:XX:49.724Z cpu1:2097754) nmlx4_core: 0000:05:00.0: nmlx4_ChooseRoceMode - (nmlx4_core_main.c:382) Requested RoCE mode RoCEv1
    2020-XX-XXTXX:XX:49.724Z cpu1:2097754) nmlx4_core: 0000:05:00.0: nmlx4_ChooseRoceMode - (nmlx4_core_main.c:422) Requested RoCE mode is supported - choosing RoCEv1
    2020-XX-XXTXX:XX:49.934Z cpu1:2097754) nmlx4_core: 0000:05:00.0: nmlx4_CmdInitHca - (nmlx4_core_fw.c:1408) Initializing device with B0 steering support
    2020-XX-XXTXX:XX:50.561Z cpu1:2097754) nmlx4_core: 0000:05:00.0: nmlx4_InterruptsAlloc - (nmlx4_core_main.c:1744) Granted 38 MSIX vectors
    2020-XX-XXTXX:XX:50.561Z cpu1:2097754) nmlx4_core: 0000:05:00.0: nmlx4_InterruptsAlloc - (nmlx4_core_main.c:1766) Using MSIX
    2020-XX-XXTXX:XX:50.781Z cpu1:2097754)Device: 330: Found driver nmlx4_core for device 0xxxxxxxxxxxxxxxxxxxxxxx

    Some 10 Gbps tuning testing looks great, between 2x ESXi 7.0 with 2x MT2644:

    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-120.00 sec   131 GBytes  9380 Mbits/sec    0             sender
    [  4]   0.00-120.00 sec   131 GBytes  9380 Mbits/sec                  receiver

    RoCEv1 is only supported, because:

    • • Support for RoCEv2 is above card – Mellanox ConnectX-3 Pro
    • • We can see RoCEv2 options in nmlx2_core driver, but when I enabled enable_rocev2 It is NOT working
    [root@esxi~] esxcli system module parameters list -m nmlx4_core
    Name                    Type  Value  Description
    ----------------------  ----  -----  -----------
    enable_64b_cqe_eqe      int          Enable 64 byte CQEs/EQEs when the the FW supports this
    enable_dmfs             int          Enable Device Managed Flow Steering
    enable_qos              int          Enable Quality of Service support in the HCA
    enable_rocev2           int          Enable RoCEv2 mode for all devices
    enable_vxlan_offloads   int          Enable VXLAN offloads when supported by NIC
    log_mtts_per_seg        int          Log2 number of MTT entries per segment
    log_num_mgm_entry_size  int          Log2 MGM entry size, that defines the number of QPs per MCG, for example: value 10 results in 248 QP per MGM entry
    msi_x                   int          Enable MSI-X
    mst_recovery            int          Enable recovery mode(only NMST module is loaded)
    rocev2_udp_port         int          Destination port for RoCEv2

    Use it only in your HomeLAB.