技术论坛

  • FPGA

    Xilinx大学计划官方FPGA开发板(DIGILENT全球唯一原厂) & Pmods积木式传感器

    发帖数

    459
    Zybo-z7-20 Linux+PL嵌入式开发入门实验
    Zybo-z7-20linux+PLexerimentNucleuslyk@gmail.comEnviroment:CentOS7.8.2003:Vivado2019.2+Vitis2019.264bitWindows10:Putty0.72Workingath:~/zybo_z7_linux1.Preaareworkingath:Prearetwofolderforthisexeriment,“ref”foralldownloadedresources,and“work”forallruntimerojectfiles.cd~/zybo_z7_linuxmkdir–refwork2.InstallDigilentboardfilesintoVivadoGoto~/zybo_z7_linux/refanddownloadboardfile:cd~/zybo_z7_linux/refgitclonehtts://github.com/Digilent/vivado-boards.gitweonlyneedboardfilesforzybo-z20,socoyittoyourVivadoinstallationdirectory(inmycase,it’s/tools/Xilinx/Vivado/2019.2).Andyoumayneedrootermission.c-rf~/zybo_z7_linux/ref/vivado-boards/new/board_files/zybo-z7-20/tools/Xilinx/Vivado/2019.2/data/boards/board_filesNowwedon’tneed“vivado-boards-master”,sowecandeleteit:rm–rfvivado-boards3.SettingenvironmentvariablesGoto~/zybq_z7_linux/work,createanewfile“setu.csh”.Thefilecontentsshouldbeasbelow(TheXILINX_HOMEshouldbethesameasyoudidinSte2).setenvARCHarmsetenvCROSS_COMPILEarm-linux-gnueabihf-setenvPATH${PATH}:${PWD}/u-boot-xlnx-xilinx-v2019.2/toolssetenvPATH${PATH}:${PWD}/u-boot-xlnx-xilinx-v2019.2/scrits/dtcsetenv${XILINX_HOME}/Vivado/2019.2/settings64.cshAddexecutableermissiontosetu.cshandsourceit:chmod+x./setu.csh|source./setu.csh4.PreareVivadorojectGoto~/zybo_z7_linux/work:reareworkingdirectoryforVivadoanVitis:mkdir-vivado_rojvitis_rojGotovivado_rojandthenlaunchVivadocdvivado_rojvivado&am;SelectCreateProject->Next,setProjectnameto“zybo_z7_20_linux”,ensureProjectlocationis“~/zybo_z7_linux/work/vivado_roj”,anddeselect“Createrojectsubdirectory”.Next->Next->Next->NexttoDefaultart,switchtoBoardsotionthenselect“ZyboZ7-20”,Next->Finish.Select“CreateaBlockDesign”,set“Designname”to“cu”,theclick“OK”.Click“+”buttontoaddIP.Searchfor“ZYNQ”andaddit.Thenclick“RunBlockAutomation”,select“rocessing_system7_0”theclick“OK”.Connect“FCLK_CLK0”to“M_AXI_GP0_ACLK”.ThenweneedtoaddacustomAXI4-liteIPtothisroject,andweexecttocontrolisformembeddedLinuxsystem.SelectTools->CreateandPackageNewIP->Next,thenselect“CreatingAXI4Periheral”.Changethenameasyouwant.Inmycase,forexamle:Name:myledVersion:1.0Dislayname:myled_v1.0Descrition:MynewAXIIPIPlocation:~/zybo_z7_linux/work/vivado_roj/myled_i_reoBeensurethe“IPlocation”isundervivado_roj,justforeasymanagement.Inthe“AddInterface”age,wedon’tneedtochangeanything,leaveitasdefault.ClickNextandselect“EditIP”,thenfinish.Modifyyourdesignsourcefile“myled_v1_1_S00_AXI.v”asEmbedded-Linux-Tutorialdoes(Reference[1],Ste9).a.Adduserortsb.AdduserlogicModifyyourdesignsourcefile“myled_v1_0.v”asEmbedded-Linux-Tutorialdoes(Reference[1],Ste10).a.Adduserortsb.ConnectortsininstanceSwitchto“PackageIP-myled”age.Click“FileGrous->Mergechanges…”.“ClickCustomizationParameters->Merge…”.Click“ReviewandPackage->Re-PakageIP”.Click“Yes”toclosethistemoraryIP-ackagerojectintheromtingage.Backto“DiagramPage”,Click“+”buttontoaddIP.Searchfor“myled_v1.0”andaddit(theinstancenameis“myled_0”).Afterthatclick“RunConnectionAutomation”(afterconnectionautomation,youcanswitchto“AddressEditor”agetoobservewhich“OffsetAddress”isassignedtoAXI-slaveIP“myled_0”.Inmycase,It’s0x43C0_0000,leaserememberit).Rightclickanyblankositioninthe“Diagram”window,click“RegerateLayout”.Rightclickort“led[3:0]”ofmyled_0IP,select“CreatePort”andclick“OK”.NowyoursimleSoClookslikethis:Closethe“BLOCKDESIGN-cu”ageandgobackto“PROJECTMANAGER”.Rightclickthe“cu.bd”inthe“Sources”columnandselect“CreateHDLWraer->LetVivadomanagewraerandauto-udate”.IgnoreseveralcriticalwarningsaboutDDR.Nowweneedtoaddconstraintsforthe“myled”IPwhichresidesinthePL(ProgrammableLogic)Part.Select“AddSources->Addorcreateconstaints->Next->CreateFile”.Inutany“Filename”youwant,forexamle“cu_wraer”,thenclick“Finish”.Edit“cu_wraer.xdc”inthe“Sources”columnlikethis(justcorresondingto4LEDsontheboard):#GPIOset_roerty-dict{PACKAGE_PINM14IOSTANDARDLVCMOS33}[get_ortsled[0]];#led[0]set_roerty-dict{PACKAGE_PINM15IOSTANDARDLVCMOS33}[get_ortsled[1]];#led[1]set_roerty-dict{PACKAGE_PING14IOSTANDARDLVCMOS33}[get_ortsled[2]];#led[2]set_roerty-dict{PACKAGE_PIND18IOSTANDARDLVCMOS33}[get_ortsled[3]];#led[3]Click“GenerateBitstream”andwaitforawhile.Select“File->Exort->ExortHardware”,tick“Includebitstream”thenclick“OK”.Nowyoucanfinda“cu_wraer.xsa”filerearedforVitisunder“vivado_roj”directory.5.PreareVitisFSBLrojectGotovitis_rojandthenlaunchVitis:cdvitis_rojvitis&am;Select“Worksace”to~/zybo_z7_linux/work/vitis_rojintheromtedwindow.Anddon’tselect“Usethisasdefault…”.SelectFile->New->AlicationProjecttocreateanewroject.Setrojectnameto“fsbl”,forexamleandthenclicknext.Inthe“Platformage”,select“Createanewlatformformhardware(XSA)”,clickthe“+”button,findthe“cu_wraer.xsa”fileinste4(itshouldbeunder~/zybo_z7_linux/work/vivado_roj)andthenclick“Next”.Inthe“Domain”age,keeeverythingasdefault(s7_cortexa9_0/standalone/C)andclick“Next’.Inthetemlatesage,select“ZynqFSBL”,thenclick“Finish”.Afterthe“Finish”buttonisclicked,therojectwillbeautomaticallyestablished(a“cu_wraer”latformrojectanda“fsbl_system”alicationroject).Press“Ctrl+B”tobuildthemall.Nowyoucanfinda“fsbl.elf”fileundervitis_rojdirectory.Don’tcloseVitisandgotoste6.6.CreatedevicetreeCreateafoldernamed“dts”under~/zybo_z7_linux/work,andthengobacktoVitisGUI.Select“Xilinx->GenerateDeviceTree”.Set“HardwareSecificationFile”to“cu_wraer.xsa”mentionedinSte4&am;5.Set“OututDirectory”to~/zybo_z7_linux/work/dtswhichisjustcreated,thenclick“Generate”.Nowyoucanfindseveralfilesunder~/zybo_z7_linux/work/dts.Esecially“l.dtsi”,youcanusevimtooenitandtakeinsightintois.Youcanfindthevalue“0x43c00000”of“reg”domainisequaltowhatismentionedinSte4.Andthisisjustthebridgeformhardwaretosoftware.Andremembervalueof“comatible”,itwillbeusedinSte11.Oen“system-to.dts”,andmodifythefirst3“#include”to“/include/”.Andthenrunthiscommandunder~/zybo_z7_linux/work/dtstogeneratefile“devicetree.dtb”(don’tchangeittoanotherfilename).Nowyoucanfind“devicetree.dtb”under~/zybo_z7_linux/work/dts.Createanewfoldernamed“sd_image”under~/zybo_z7_linux/work.Allfilesinthisfolderlaterwillbecoiedtothemicro-sdcard.Nowcoyfile“devicetree.dtb”into“sd_image”.c~/zybo_z7_linux/work/dts/devicetree.dtb~/zybo_z7_linux/work/sd_image7.BuildXilinxu-bootGoto~/zybq_z7_linux/refanddownloadXilinxu-bootreository:cd~/zybo_z7_linux/refwgethtts://github.com/Xilinx/u-boot-xlnx/archive/xilinx-v2019.2.ziGoto~/zybo_z7-linux/work,andunziithere.cd~/zybo_z7_linux/work|unzi../ref/u-boot-xlnx-xilinx-v2019.2.ziGotou-boot-xlnx-xilinx-v2019.2directory,andaddtwonewlinestoconfigs/zynq_zybo_z7_defconfig:CONFIG_OF_EMBED=yCONFIG_CMD_NET=nCONFIG_OF_EMBED=yembedsdevicetreefortheboardintobinary.CONFIG_CMD_NET=nreventsBOOTPtriesfewtimesbeforebootmruns.Buildit:makezynq_zybo_z7_defconfigmakeNowyoucanfinda“u-boot”fileunderu-boot-xlnx-xilinx-v2019.2directory:8.BuildXilinxLinuxkernelGoto~/zybq_z7_linux/refanddownloadXilinxLinuxkernelreository:wgethtts://github.com/Xilinx/linux-xlnx/archive/xlnx_rebase_v4.19_2019.2.ziGoto~/zybo_z7-linux/work,andunziitherecd~/zybo_z7_linux/work|unzi../ref/linux-xlnx-xlnx_rebase_v4.19_2019.2.ziGotolinux-xlnx-xlnx_rebase_v4.19_2019.2directoryandbuildit:makexilinx_zynq_defconfigmakeNowyoucanfinda“zImage”fileunderlinux-xlnx-xlnx_rebase_v4.19_2019.2directory:zImagefileisziedandneedstobeconvertedtouImage(unzied).makeUIMAGE_LOADADDR=0x8000uImageNowyoucanfinda“uImage”fileunderlinux-xlnx-xlnx_rebase_v4.19_2019.2directory:Coyfile“uImage”into“sd_image”.c~/zybo_z7_linux/work/linux-xlnx-xlnx_rebase_v4.19_2019.2/arch/arm/boot/uImage~/zybo_z7_linux/work/sd_image9.MakeRAMdiskDownloadarm_ramdisk.image.gzfromthislinkbelow:htts://xilinx-wiki.atlassian.net/wiki/saces/A/ages/18842473/Build+and+Modify+a+RootfsMovearm_ramdisk.image.gzto~/zybo_z7-linux/refandthengoto~/zybo_z7-linux/work/sd_imagetocreateuramdisk.image.gzmkimage-Aarm-Tramdisk-Cgzi-d../../ref/arm_ramdisk.image.gzuramdisk.image.gzNowyoucanfinda“uramdisk.image.gz”fileundersd_imagefolder.10.CreatebootimageGoto~/zybo_z7-linux/workandcreateanewfoldernamed“boot_image”,gointoit.Thencoyallfileneededintoit(don’tcoyu-boot.elf,coyu-bootandrenameittou-boot.elf).mkdir–~/zybo_z7-linux/work/boot_imagecd~/zybo_z7-linux/work/boot_imagec~/zybo_z7_linux/work/vivado_roj/cu_wraer.bit./c~/zybo_z7_linux/work/vitis_roj/fsbl/Debug/fsbl.elf./c~/zybo_z7_linux/work/u-boot-xlnx-xilinx-v2019.2/u-boot./u-boot.elfCreateanewfilename“boot.bif”withcontentsbelow:image:{[bootloader]fsbl.elfcu_wraer.bitu-boot.elf}Usethiscommandtogenerate“boot.bin”(don’tchangeittoanotherfilename).bootgen-imageboot.bif-oiboot.binCoy“boot.bin”to“sd_image”folfer.c~/zybo_z7-linux/work/boot_image/boot.bin~/zybo_z7-linux/work/sd_image11.CreatekerneldriverGoto~/zybo_z7-linux/workandcreateanewfoldernamed“drivers”.Gointoit,createafilenamednamed“myled_0.c”(mustbesamewiththeinstancenameinste3).Thecontentsofitcanbefindfromthelinkbelow:htts://cdn.instructables.com/ORIG/FX8/HRRR/HX1W69D4/FX8HRRRHX1W69D4.cThissourcefileneedsalittlemodification:a.addthreeheaderfilesatthebeginning.#include<linux/uaccess.h>#include<linux/slab.h>#include<linux/mod_devicetable.h>b.changemacrovalueof“DRIVER_NAME”to“myled_0”.c.changestructuremyled_of_match’smembervalveof“comatible”to“xlnx,myled-1.0”(sameaswhatyouseein~/zybo_z7_linux/work/dts/l.dtsimentionedinste6).CreateasimleMakefile(contentsasbelow):obj-m:=myled_0.oall:make-C../linux-xlnx-xlnx_rebase_v4.19_2019.2/M=$(PWD)modulesclean:make-C../linux-xlnx-xlnx_rebase_v4.19_2019.2/M=$(PWD)cleanAndmakeit:makeNowyoucanfindafilenamed“myled_0.ko”underdriversdirectory:Coy“myled_0.ko”to“sd_image”folfer.c~/zybo_z7-linux/work/drivers/myled_0.ko~/zybo_z7-linux/work/sd_image12.CreateuseralicationfordriverGoto~/zybo_z7-linux/workandcreateanewfoldernamed“user_a”.Gointoit,createafilenamednamed“led_blink.c”.Thecontentsisasbelow:#include<stdio.h>#include<stdlib.h>#include<unistd.h>intmain(){FILE*f;while(1){f=foen("/roc/myled_0","w");if(f==NULL){rintf("Cannotoen/roc/myledforwrite\n");return-1;}futs("0x0F\n",f);fclose(f);slee(1);f=foen("/roc/myled_0","w");if(f==NULL){rintf("Cannotoen/roc/myledforwrite\n");return-1;}futs("0x00\n",f);fclose(f);slee(1);}return0;}CreateasimleMakefile(contentsasbelow):CC=arm-linux-gnueabihf-gccCFLAGS=-gall:led_blinkled_blink:led_blink.o$(CC)$(CFLAGS)$^-o$@clean:rm-rf*.orm-rfled_blink.PHONY:cleanAndmakeit:makeNowyoucanfindafilenamed“led_blink”underuser_adirectory:Coy“led_blink”to“sd_image”folfer.c~/zybo_z7-linux/work/user_a/led_blink~/zybo_z7-linux/work/sd_imageTillnow,allfilesforsdcardisready,theyshouldbelooklike:13.BoottheboardPreareamicro-sdcard,formatistoFAT32filesystem(inmycase,it’s8GBsize).Coyallfilesin“sd_imgae”directorymentionedinste12aboveintothefirstartitionofthemicore-sdcard(ifmorethanoneartitionexist).Plugonemicro-usbcablefrom“PROGUART”ortontheboardtothecomuter.Plugthemicro-sdcardtothe“SDMICRO”slotonthebackoftheboard.Switchjumerof“JP5”to“SD”mode.Now,it’stimetoswitchontheoweroftheboard.TheredLEDof“PGOOD”shouldbeonimmediately,andthegreenLEDof“DONE”shouldbeonafteraboutonesecond,too.AndthenflashingoftwoyellowLEDsnearthe“PROGUART”ortindicatesthatthesystemiscorrectlybooting.14.LogintheboardUseanyUARTclienttologintheboard,suchas:utty(forwindows),icocom(forLinux).Thelogwindowisshownasbelow:Select“Connectiontye”to“Serial”,tyeincorrect“Serialline”(inmycase,it’sCOM4),andchange“Seed”to115200,thenclick“Oen”.Nowyoumayseewindowlikethis:Thefollowedsteswilltestthekerneldriveranduseralicationa.Mountthefirstartitionofmicro-sdcardtofilesystemby(ignorewarning):mount/dev/mmcblk01/mntAnduselsmodtoseeallinstalledkernelmodule:b.Goto/mntandinstallLinuxkerneldriver“myled_0.ko”by(ignore“out-of-treewarning”,thiswarningneedstobefixedbutdoesn’taffectthenextstes):cdmntinsmodmyled_0.koc.Runexecutablefile“led_blink”,youmayseeLEDsontheboardblink,butinmycaseitgiveswaringlikethis:Itmightbesomethingwrongwiththedynamiclinklibrary,I’mstillworkingonit.d.Actually,thekernelmoduleiscorrectlyinstalled,asyoucanfindafilename“myled_0”under“/roc”directory.Andnowit’stimetoverifyifthecustomizedAXI-liteIPhardwarecircuitisworkingroerly,justby:echo15>/roc/myled_0Nowyoucanfindall4LEDsontheboardareon,as“15”indicatesthebinarynumber4’b1111.e.Uninstallthekernelmodule“myled_0”by(ofcourse,4LEDsareoffifyouremovekernelmodule):mkdir/lib/modules/`uname-r`rmmodmyled_0Reference[1]htts://www.instructables.com/id/Embedded-Linux-Tutorial-Zybo/[2]htts://qiita.com/yhmtmt/items/cba5330ad7ded151882d[3]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab3.df[4]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab4.df[5]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab5.df[6]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab6.df
    发布于 前天 14:49
  • 开源微控制器(LabVIEW支持)

    树莓派(Raspberry Pi)、BeagleBone Black、chipKIT

    发帖数

    51
  • 口袋仪器

    OpenScope、Analog Discovery 2、Analog Discovery、Digital Discovery、Electronics Explorer

    发帖数

    53
    关于ADS与MAC OS的兼容性
    不少小伙伴担心AD2、ADS的兼容问题,其实AD系列与Windows,Mac和Linux全兼容。这里将重点介绍如何在MacOS上启动并运行ADS。无论使用哪种类型的操作系统,首先要做的就是下载WaveForms,然后可以根据以下步骤操作:打开.dmg文件。在桌面上打开WaveForms驱动器。将WaveForms图标移到“应用程序”文件夹。OS系统有可能在第一次打开WaveForms时自动屏蔽。在这种情况下,请进入“系统偏好设置”的“安全性和隐私”页面,然后单击“仍然打开”。这针对于你使用的是较新版本的MacOS,但是如果您使用的是OS-X10.13或更早版本,还需要额外几步操作来安装驱动程序。对于OS-X10.13(或更早版本):双击DigilentFtdiDriver.kg以启动FTDI安装程序。简介>继续自述文件>继续许可>继续>同意目标选择>选择用户,然后继续安装类型>安装完成
    发布于 01-13

创客圈

  • 竞赛 & 活动

    2020年集创赛RISC-V挑战杯、DIGILENT全可编程创新创业设计大赛(DDC)、江苏省虚拟仪器竞赛、极客DIY限时挑战等

    发帖数

    23
    关于freedom生成的问题
    我在github上下载了freedom库,但是在生成的三个步骤中,遇到了问题,无法生成可用的mcs文件:makeBOARD=arty-fMakefile.e300artydevkitcleanmakeBOARD=arty-fMakefile.e300artydevkitverilogmakeBOARD=arty-fMakefile.e300artydevkitmcs报错信息如下:/ot/Xilinx/Vivado/2017.1/bin/loader:line179:2456Killed"$RDI_PROG""$@"common.mk:81:reciefortarget'/home/q/RISCV/freedom/builds/e300artydevkit/obj/E300ArtyDevKitFPGAChi.bit'failedmake:***[/home/q/RISCV/freedom/builds/e300artydevkit/obj/E300ArtyDevKitFPGAChi.bit]Error137q@qBox:~/RISCV/freedom$Parentrocess(id2456)hasdied.Thishelerrocesswillnowexit整个log太长了,超过了4000行,暂时不发在论坛里了,请问有人遇到过verilog正常,但是mcs生成异常的问题吗?
    发布于 前天 12:37
  • 创客小组(项目集散地)

    这里不仅有灵感与创意,更有实践、激情、与同道。创者无畏!

    发帖数

    26
    关于Pmod DHB1电机驱动模块
    请问给模块供电的电压与直流电机输出端的电压是什么关系?例如,给模块的HeaderJ4供电电压为7.2V,那HeaderJ5、HeaderJ6的输出电压是多少?
    发布于 04-30
  • 工科街(求职招聘区)

    一条纯粹属于在校工科生的话题街。尽情释放内心的小怪兽,海侃校园内外,闲聊天南地北

    发帖数

    52
    招聘|中国航天科技集团公司第八研究院第八O二研究所(2017.12.27发布)
    发布时间|2017.12.27招聘单位|中国航天科技集团公司第八研究院第八O二研究所公司简介是我国从事光电探测、数据通信、卫星有效载荷、电磁环境效应等技术研究、产品研发、研制、试验、生产的国家重点科研单位,科研条件、专业技术水平处于国内领先地位。建有十个专业研究室,一个国家级重点实验室、三个上海市级重点实验室,并与国内多所高校建立了联合研发中心。现有职工900余人,其中各类专业技术人才600余人,博士、硕士超过450人,高级工程师以上技术专家260余人,国家百千万人才工程专家、国务院政府特殊津贴专家等省部级以上专家约20余人。先后承担数百项国家级和省部级重点科研项目和生产任务,荣获省部级以上科技成果200余项,拥有专利400余项。先后荣获全国五一劳动奖状、“全国模范职工之家”、上海市职工最满意企业称号,蝉联上海市文明单位、集团公司文明单位、上海市高新技术企业。我所坚持“勤勉、专业、激情、包容”的企业文化,努力创造良好的工作环境和文化氛围。热诚欢迎广大优秀人才加盟,携手共创美好未来!招聘专业信息与通信工程仪器科学与技术信号与信息处理控制科学与工程通信与信息系统机械工程电子科学与技术材料科学与工程电磁场与微波技术计算机科学与技术电路与系统光学工程物理电子学电气工程航空宇航科学与技术招聘岗位雷达总体设计师信号处理设计师通信系统设计师数模电路设计师射频微波电路设计师目标特性研究师电磁环境效应研究师天线/天线罩设计师激光雷达(通信)设计师图像处理设计师计算机软件开发设计师FPGA开发工程师DSP开发工程师测试设备设计师智能制造工程师自动化设计师人工智能产品设计师结构工程师电源设计师电讯装配工艺师产品装配测试工程师薪酬福利为职工提供具有行业竞争力的薪酬待遇,薪酬与绩效挂钩,多劳多得,优劳优得。1、富有竞争力的年薪待遇:基本工资+绩效工资+项目奖金等,独立工作后年收入:博士:年薪20万元以上/年,享受一次性安家费(30-36万左右);硕士:年薪15万元以上/年;本科:年薪8万元以上/年;大专及以下:年薪6万元以上/年。2、五险二金、企业年金、大病保险、交通意外保险等;3、交通补贴、工作餐补贴、租房补贴、其他各类补贴等;4、医疗补贴、医疗基金、健康体检、职工疗休养等;5、带薪年休假、探亲假、其他各类假期、各类教育培训机会等;6、社会成熟人才及高层次人才,待遇从优,具体面议。联系方式联系人:骆欢(13817856987)、程冕之(13636527110)办公室电话:021-65666006-222,65662373联系地址:上海市杨浦区黎平路203号人力资源处邮政编码:200090E-mail地址:ht802hr@126.com(可将简历投递至本邮箱,注明“专业-学历-学校-姓名”)应聘时请注明:从DIGILENT中文技术论坛获悉招聘信息
    发布于 2017-12-27

干货热帖

  • Zybo-z7-20 Linux+PL嵌入式开发入门实验
    Zybo-z7-20linux+PLexerimentNucleuslyk@gmail.comEnviroment:CentOS7.8.2003:Vivado2019.2+Vitis2019.264bitWindows10:Putty0.72Workingath:~/zybo_z7_linux1.Preaareworkingath:Prearetwofolderforthisexeriment,“ref”foralldownloadedresources,and“work”forallruntimerojectfiles.cd~/zybo_z7_linuxmkdir–refwork2.InstallDigilentboardfilesintoVivadoGoto~/zybo_z7_linux/refanddownloadboardfile:cd~/zybo_z7_linux/refgitclonehtts://github.com/Digilent/vivado-boards.gitweonlyneedboardfilesforzybo-z20,socoyittoyourVivadoinstallationdirectory(inmycase,it’s/tools/Xilinx/Vivado/2019.2).Andyoumayneedrootermission.c-rf~/zybo_z7_linux/ref/vivado-boards/new/board_files/zybo-z7-20/tools/Xilinx/Vivado/2019.2/data/boards/board_filesNowwedon’tneed“vivado-boards-master”,sowecandeleteit:rm–rfvivado-boards3.SettingenvironmentvariablesGoto~/zybq_z7_linux/work,createanewfile“setu.csh”.Thefilecontentsshouldbeasbelow(TheXILINX_HOMEshouldbethesameasyoudidinSte2).setenvARCHarmsetenvCROSS_COMPILEarm-linux-gnueabihf-setenvPATH${PATH}:${PWD}/u-boot-xlnx-xilinx-v2019.2/toolssetenvPATH${PATH}:${PWD}/u-boot-xlnx-xilinx-v2019.2/scrits/dtcsetenv${XILINX_HOME}/Vivado/2019.2/settings64.cshAddexecutableermissiontosetu.cshandsourceit:chmod+x./setu.csh|source./setu.csh4.PreareVivadorojectGoto~/zybo_z7_linux/work:reareworkingdirectoryforVivadoanVitis:mkdir-vivado_rojvitis_rojGotovivado_rojandthenlaunchVivadocdvivado_rojvivado&am;SelectCreateProject->Next,setProjectnameto“zybo_z7_20_linux”,ensureProjectlocationis“~/zybo_z7_linux/work/vivado_roj”,anddeselect“Createrojectsubdirectory”.Next->Next->Next->NexttoDefaultart,switchtoBoardsotionthenselect“ZyboZ7-20”,Next->Finish.Select“CreateaBlockDesign”,set“Designname”to“cu”,theclick“OK”.Click“+”buttontoaddIP.Searchfor“ZYNQ”andaddit.Thenclick“RunBlockAutomation”,select“rocessing_system7_0”theclick“OK”.Connect“FCLK_CLK0”to“M_AXI_GP0_ACLK”.ThenweneedtoaddacustomAXI4-liteIPtothisroject,andweexecttocontrolisformembeddedLinuxsystem.SelectTools->CreateandPackageNewIP->Next,thenselect“CreatingAXI4Periheral”.Changethenameasyouwant.Inmycase,forexamle:Name:myledVersion:1.0Dislayname:myled_v1.0Descrition:MynewAXIIPIPlocation:~/zybo_z7_linux/work/vivado_roj/myled_i_reoBeensurethe“IPlocation”isundervivado_roj,justforeasymanagement.Inthe“AddInterface”age,wedon’tneedtochangeanything,leaveitasdefault.ClickNextandselect“EditIP”,thenfinish.Modifyyourdesignsourcefile“myled_v1_1_S00_AXI.v”asEmbedded-Linux-Tutorialdoes(Reference[1],Ste9).a.Adduserortsb.AdduserlogicModifyyourdesignsourcefile“myled_v1_0.v”asEmbedded-Linux-Tutorialdoes(Reference[1],Ste10).a.Adduserortsb.ConnectortsininstanceSwitchto“PackageIP-myled”age.Click“FileGrous->Mergechanges…”.“ClickCustomizationParameters->Merge…”.Click“ReviewandPackage->Re-PakageIP”.Click“Yes”toclosethistemoraryIP-ackagerojectintheromtingage.Backto“DiagramPage”,Click“+”buttontoaddIP.Searchfor“myled_v1.0”andaddit(theinstancenameis“myled_0”).Afterthatclick“RunConnectionAutomation”(afterconnectionautomation,youcanswitchto“AddressEditor”agetoobservewhich“OffsetAddress”isassignedtoAXI-slaveIP“myled_0”.Inmycase,It’s0x43C0_0000,leaserememberit).Rightclickanyblankositioninthe“Diagram”window,click“RegerateLayout”.Rightclickort“led[3:0]”ofmyled_0IP,select“CreatePort”andclick“OK”.NowyoursimleSoClookslikethis:Closethe“BLOCKDESIGN-cu”ageandgobackto“PROJECTMANAGER”.Rightclickthe“cu.bd”inthe“Sources”columnandselect“CreateHDLWraer->LetVivadomanagewraerandauto-udate”.IgnoreseveralcriticalwarningsaboutDDR.Nowweneedtoaddconstraintsforthe“myled”IPwhichresidesinthePL(ProgrammableLogic)Part.Select“AddSources->Addorcreateconstaints->Next->CreateFile”.Inutany“Filename”youwant,forexamle“cu_wraer”,thenclick“Finish”.Edit“cu_wraer.xdc”inthe“Sources”columnlikethis(justcorresondingto4LEDsontheboard):#GPIOset_roerty-dict{PACKAGE_PINM14IOSTANDARDLVCMOS33}[get_ortsled[0]];#led[0]set_roerty-dict{PACKAGE_PINM15IOSTANDARDLVCMOS33}[get_ortsled[1]];#led[1]set_roerty-dict{PACKAGE_PING14IOSTANDARDLVCMOS33}[get_ortsled[2]];#led[2]set_roerty-dict{PACKAGE_PIND18IOSTANDARDLVCMOS33}[get_ortsled[3]];#led[3]Click“GenerateBitstream”andwaitforawhile.Select“File->Exort->ExortHardware”,tick“Includebitstream”thenclick“OK”.Nowyoucanfinda“cu_wraer.xsa”filerearedforVitisunder“vivado_roj”directory.5.PreareVitisFSBLrojectGotovitis_rojandthenlaunchVitis:cdvitis_rojvitis&am;Select“Worksace”to~/zybo_z7_linux/work/vitis_rojintheromtedwindow.Anddon’tselect“Usethisasdefault…”.SelectFile->New->AlicationProjecttocreateanewroject.Setrojectnameto“fsbl”,forexamleandthenclicknext.Inthe“Platformage”,select“Createanewlatformformhardware(XSA)”,clickthe“+”button,findthe“cu_wraer.xsa”fileinste4(itshouldbeunder~/zybo_z7_linux/work/vivado_roj)andthenclick“Next”.Inthe“Domain”age,keeeverythingasdefault(s7_cortexa9_0/standalone/C)andclick“Next’.Inthetemlatesage,select“ZynqFSBL”,thenclick“Finish”.Afterthe“Finish”buttonisclicked,therojectwillbeautomaticallyestablished(a“cu_wraer”latformrojectanda“fsbl_system”alicationroject).Press“Ctrl+B”tobuildthemall.Nowyoucanfinda“fsbl.elf”fileundervitis_rojdirectory.Don’tcloseVitisandgotoste6.6.CreatedevicetreeCreateafoldernamed“dts”under~/zybo_z7_linux/work,andthengobacktoVitisGUI.Select“Xilinx->GenerateDeviceTree”.Set“HardwareSecificationFile”to“cu_wraer.xsa”mentionedinSte4&am;5.Set“OututDirectory”to~/zybo_z7_linux/work/dtswhichisjustcreated,thenclick“Generate”.Nowyoucanfindseveralfilesunder~/zybo_z7_linux/work/dts.Esecially“l.dtsi”,youcanusevimtooenitandtakeinsightintois.Youcanfindthevalue“0x43c00000”of“reg”domainisequaltowhatismentionedinSte4.Andthisisjustthebridgeformhardwaretosoftware.Andremembervalueof“comatible”,itwillbeusedinSte11.Oen“system-to.dts”,andmodifythefirst3“#include”to“/include/”.Andthenrunthiscommandunder~/zybo_z7_linux/work/dtstogeneratefile“devicetree.dtb”(don’tchangeittoanotherfilename).Nowyoucanfind“devicetree.dtb”under~/zybo_z7_linux/work/dts.Createanewfoldernamed“sd_image”under~/zybo_z7_linux/work.Allfilesinthisfolderlaterwillbecoiedtothemicro-sdcard.Nowcoyfile“devicetree.dtb”into“sd_image”.c~/zybo_z7_linux/work/dts/devicetree.dtb~/zybo_z7_linux/work/sd_image7.BuildXilinxu-bootGoto~/zybq_z7_linux/refanddownloadXilinxu-bootreository:cd~/zybo_z7_linux/refwgethtts://github.com/Xilinx/u-boot-xlnx/archive/xilinx-v2019.2.ziGoto~/zybo_z7-linux/work,andunziithere.cd~/zybo_z7_linux/work|unzi../ref/u-boot-xlnx-xilinx-v2019.2.ziGotou-boot-xlnx-xilinx-v2019.2directory,andaddtwonewlinestoconfigs/zynq_zybo_z7_defconfig:CONFIG_OF_EMBED=yCONFIG_CMD_NET=nCONFIG_OF_EMBED=yembedsdevicetreefortheboardintobinary.CONFIG_CMD_NET=nreventsBOOTPtriesfewtimesbeforebootmruns.Buildit:makezynq_zybo_z7_defconfigmakeNowyoucanfinda“u-boot”fileunderu-boot-xlnx-xilinx-v2019.2directory:8.BuildXilinxLinuxkernelGoto~/zybq_z7_linux/refanddownloadXilinxLinuxkernelreository:wgethtts://github.com/Xilinx/linux-xlnx/archive/xlnx_rebase_v4.19_2019.2.ziGoto~/zybo_z7-linux/work,andunziitherecd~/zybo_z7_linux/work|unzi../ref/linux-xlnx-xlnx_rebase_v4.19_2019.2.ziGotolinux-xlnx-xlnx_rebase_v4.19_2019.2directoryandbuildit:makexilinx_zynq_defconfigmakeNowyoucanfinda“zImage”fileunderlinux-xlnx-xlnx_rebase_v4.19_2019.2directory:zImagefileisziedandneedstobeconvertedtouImage(unzied).makeUIMAGE_LOADADDR=0x8000uImageNowyoucanfinda“uImage”fileunderlinux-xlnx-xlnx_rebase_v4.19_2019.2directory:Coyfile“uImage”into“sd_image”.c~/zybo_z7_linux/work/linux-xlnx-xlnx_rebase_v4.19_2019.2/arch/arm/boot/uImage~/zybo_z7_linux/work/sd_image9.MakeRAMdiskDownloadarm_ramdisk.image.gzfromthislinkbelow:htts://xilinx-wiki.atlassian.net/wiki/saces/A/ages/18842473/Build+and+Modify+a+RootfsMovearm_ramdisk.image.gzto~/zybo_z7-linux/refandthengoto~/zybo_z7-linux/work/sd_imagetocreateuramdisk.image.gzmkimage-Aarm-Tramdisk-Cgzi-d../../ref/arm_ramdisk.image.gzuramdisk.image.gzNowyoucanfinda“uramdisk.image.gz”fileundersd_imagefolder.10.CreatebootimageGoto~/zybo_z7-linux/workandcreateanewfoldernamed“boot_image”,gointoit.Thencoyallfileneededintoit(don’tcoyu-boot.elf,coyu-bootandrenameittou-boot.elf).mkdir–~/zybo_z7-linux/work/boot_imagecd~/zybo_z7-linux/work/boot_imagec~/zybo_z7_linux/work/vivado_roj/cu_wraer.bit./c~/zybo_z7_linux/work/vitis_roj/fsbl/Debug/fsbl.elf./c~/zybo_z7_linux/work/u-boot-xlnx-xilinx-v2019.2/u-boot./u-boot.elfCreateanewfilename“boot.bif”withcontentsbelow:image:{[bootloader]fsbl.elfcu_wraer.bitu-boot.elf}Usethiscommandtogenerate“boot.bin”(don’tchangeittoanotherfilename).bootgen-imageboot.bif-oiboot.binCoy“boot.bin”to“sd_image”folfer.c~/zybo_z7-linux/work/boot_image/boot.bin~/zybo_z7-linux/work/sd_image11.CreatekerneldriverGoto~/zybo_z7-linux/workandcreateanewfoldernamed“drivers”.Gointoit,createafilenamednamed“myled_0.c”(mustbesamewiththeinstancenameinste3).Thecontentsofitcanbefindfromthelinkbelow:htts://cdn.instructables.com/ORIG/FX8/HRRR/HX1W69D4/FX8HRRRHX1W69D4.cThissourcefileneedsalittlemodification:a.addthreeheaderfilesatthebeginning.#include<linux/uaccess.h>#include<linux/slab.h>#include<linux/mod_devicetable.h>b.changemacrovalueof“DRIVER_NAME”to“myled_0”.c.changestructuremyled_of_match’smembervalveof“comatible”to“xlnx,myled-1.0”(sameaswhatyouseein~/zybo_z7_linux/work/dts/l.dtsimentionedinste6).CreateasimleMakefile(contentsasbelow):obj-m:=myled_0.oall:make-C../linux-xlnx-xlnx_rebase_v4.19_2019.2/M=$(PWD)modulesclean:make-C../linux-xlnx-xlnx_rebase_v4.19_2019.2/M=$(PWD)cleanAndmakeit:makeNowyoucanfindafilenamed“myled_0.ko”underdriversdirectory:Coy“myled_0.ko”to“sd_image”folfer.c~/zybo_z7-linux/work/drivers/myled_0.ko~/zybo_z7-linux/work/sd_image12.CreateuseralicationfordriverGoto~/zybo_z7-linux/workandcreateanewfoldernamed“user_a”.Gointoit,createafilenamednamed“led_blink.c”.Thecontentsisasbelow:#include<stdio.h>#include<stdlib.h>#include<unistd.h>intmain(){FILE*f;while(1){f=foen("/roc/myled_0","w");if(f==NULL){rintf("Cannotoen/roc/myledforwrite\n");return-1;}futs("0x0F\n",f);fclose(f);slee(1);f=foen("/roc/myled_0","w");if(f==NULL){rintf("Cannotoen/roc/myledforwrite\n");return-1;}futs("0x00\n",f);fclose(f);slee(1);}return0;}CreateasimleMakefile(contentsasbelow):CC=arm-linux-gnueabihf-gccCFLAGS=-gall:led_blinkled_blink:led_blink.o$(CC)$(CFLAGS)$^-o$@clean:rm-rf*.orm-rfled_blink.PHONY:cleanAndmakeit:makeNowyoucanfindafilenamed“led_blink”underuser_adirectory:Coy“led_blink”to“sd_image”folfer.c~/zybo_z7-linux/work/user_a/led_blink~/zybo_z7-linux/work/sd_imageTillnow,allfilesforsdcardisready,theyshouldbelooklike:13.BoottheboardPreareamicro-sdcard,formatistoFAT32filesystem(inmycase,it’s8GBsize).Coyallfilesin“sd_imgae”directorymentionedinste12aboveintothefirstartitionofthemicore-sdcard(ifmorethanoneartitionexist).Plugonemicro-usbcablefrom“PROGUART”ortontheboardtothecomuter.Plugthemicro-sdcardtothe“SDMICRO”slotonthebackoftheboard.Switchjumerof“JP5”to“SD”mode.Now,it’stimetoswitchontheoweroftheboard.TheredLEDof“PGOOD”shouldbeonimmediately,andthegreenLEDof“DONE”shouldbeonafteraboutonesecond,too.AndthenflashingoftwoyellowLEDsnearthe“PROGUART”ortindicatesthatthesystemiscorrectlybooting.14.LogintheboardUseanyUARTclienttologintheboard,suchas:utty(forwindows),icocom(forLinux).Thelogwindowisshownasbelow:Select“Connectiontye”to“Serial”,tyeincorrect“Serialline”(inmycase,it’sCOM4),andchange“Seed”to115200,thenclick“Oen”.Nowyoumayseewindowlikethis:Thefollowedsteswilltestthekerneldriveranduseralicationa.Mountthefirstartitionofmicro-sdcardtofilesystemby(ignorewarning):mount/dev/mmcblk01/mntAnduselsmodtoseeallinstalledkernelmodule:b.Goto/mntandinstallLinuxkerneldriver“myled_0.ko”by(ignore“out-of-treewarning”,thiswarningneedstobefixedbutdoesn’taffectthenextstes):cdmntinsmodmyled_0.koc.Runexecutablefile“led_blink”,youmayseeLEDsontheboardblink,butinmycaseitgiveswaringlikethis:Itmightbesomethingwrongwiththedynamiclinklibrary,I’mstillworkingonit.d.Actually,thekernelmoduleiscorrectlyinstalled,asyoucanfindafilename“myled_0”under“/roc”directory.Andnowit’stimetoverifyifthecustomizedAXI-liteIPhardwarecircuitisworkingroerly,justby:echo15>/roc/myled_0Nowyoucanfindall4LEDsontheboardareon,as“15”indicatesthebinarynumber4’b1111.e.Uninstallthekernelmodule“myled_0”by(ofcourse,4LEDsareoffifyouremovekernelmodule):mkdir/lib/modules/`uname-r`rmmodmyled_0Reference[1]htts://www.instructables.com/id/Embedded-Linux-Tutorial-Zybo/[2]htts://qiita.com/yhmtmt/items/cba5330ad7ded151882d[3]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab3.df[4]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab4.df[5]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab5.df[6]htt://www.ece.tamu.edu/~sunilkhatri/courses/ee449/labs/lab6.df
    亮了(0) 19 0 前天 14:49
  • 【RISC-V】2020 DIGILENT
    随着物联网(IoT)、5G通信、人工智能(AI)等技术的不断发展,行业对芯片的需求也变得越来越严苛,无论是从芯片的超低功耗方面还是差异化方面,都存在不小的挑战。RISC-V这一基于精简指令集计算(RISC)设计原则的开放指令集架构(ISA)凭借其特有的开放性和免费性成为硅谷、中国乃至全球IC设计圈的热门话题,有人将之比作“半导体行业的Linux”。对多年来一直寻求突破的中国芯片产业来说,RISC-V将成为我们实现自主、可控、创新和繁荣的新希望。DIGILENT作为Xilinx全球重要生态合作伙伴,为广大的IC设计工程师们提供了种类丰富的FPGA原型验证平台。2020年1月,由工信部人才交流中心牵头主办的第四届全国大学生集成电路创新创业大赛已经拉开大幕,我们也希望通过此次杯赛帮助更多IC设计创新人才披荆斩棘,展露锋芒!为此我们特别整理了这个汇总帖供大家参考,让我们在实现自主可控CPU设计的道路上走的不那么的坎坷。【ArtyA7】·在ArtyA7-100T上搭建RISC-VCPU(htt://www.digilent.com.cn/roject/details/216.html)·【教程】在ArtyA7-100T上部署SiFiveFreedomE310(htt://www.digilent.com.cn/community/709.html)·SiFiveRISC-V核(E和U系列)评估版本,包括预编译适用于ArtyA7-100T的bit和mcs文件(请见附件)·蜂鸟E200开源RISC-V核(htts://github.com/SI-RISCV/e200_oensource)·基于FreedomE300的MultiZoneSecureIoTStack(htts://github.com/hex-five/multizone-fga)【NexysA7/Nexys4DDR】·lowRISC开源项目(htts://github.com/lowRISC/lowrisc-nexys4)·lowRISC开源项目(htts://www.lowrisc.org/docs/minion-v0.4/fga/)·基于lowRISCSoC平台运行FedoraLinux系统(htts://fedoraroject.org/wiki/Architectures/RISC-V/FPGA)·移植Freedom开源SoC到NexysA7(htts://github.com/DigilentChina/Freedom_on_Nexys_A7)·在NexysA7上搭建InstantSoC(htts://www.fga-cores.com/instant-soc/)【NexysVideo】·OenPiton+Ariane开源项目(美国普林斯顿大学和瑞士苏黎世理工学院合作)o单核CPU主频高达30MHz·OenTitan(谷歌开源项目)ohtts://docs.oentitan.org/doc/ug/quickstart/【Genesys2】·OenPiton+Ariane开源项目o单核或双核CPU主频高达66MHz·ArianeRISC-VCPU(瑞士苏黎世理工学院开源项目)ohtts://github.com/ul-latform/ariane【Zybo/ZyboZ7】·在ZynqFPGA上运行一个RISC-VRocket核(htts://github.com/ucb-bar/fga-zynq)【SWORD】·敬请期待【本贴将持续更新,欢迎来踩!】
    亮了(1) 2904 1 01-22
  • 口袋仪器中的瑞士军刀——AD2资料汇总
    AnalogDiscovery2,江湖人称AD2,它是一个迷你型USB示波器和多功能仪器。提供11种不同的工具,包括模拟和数字工具,可以让用户方便地测量、读取、生成、记录和控制各种混合信号电路,覆盖所有基本台式设备功能。可谓是口袋仪器中的瑞士军刀!装备:介绍与基础教程AD2产品介绍使用DigitalDiscovery查看Zynq启动顺序实战:可学习的项目智能停车场车位监测系统基于AnalogDiscovery2的电子钢琴攻略:原创帖子AD2_双通道USB数字示波器AD2_电源模块介绍【Demo系列】PmodTMP3温度传感器(I2C通信)
    亮了(0) 892 0 2019-06-27
  • 教程 | 从零开始在Zedboard上构建Linux环境(详细步骤)
    对于Zedboard的基本情况,不了解的可以点击官网产品页,在此就不再做赘述了,只是记录下本人如何在拿到开发板之后,怎么从零开始构建并运行linux系统,并在其上运行一个hello_world程序。先附上成功运行截图:第一步,安装arm-linux交叉编译器,在网站htts://code.google.com//zedboard-book-source/downloads/list上的download里下载红色所圈出的文件,如果不想安装双系统,可以在虚拟机VMware上安装Ubuntu10.04_i386,然后将打开将该文件复制到linux系统中,(笔者是放入了共享文件夹中,方法百度,很容易找然后,进入如下目录,安装xilinx-2011.09-50-arm-xilinx-linux-gnueabi.bin,具体步骤如下:(必须保证当前已经进入到该文件所在的文件夹)>sudo–s>输入密码>./xilinx-2011.09-50-arm-xilinx-linux-gnueabi.bin,附图如下出现图如下一路默认就好了,最终安装完成后如下所示:安装完成后,修改当前dash为bash,具体方法如下:弹出如下对话框,选择NO,即可。最后,修改/etc/bash.bashrc文件,在其中添加如下代码:方法如下:在命令行上输入gedit/etc/bash.bashrc第三个exort即为默认的软件安装路径。最后,关闭该文件,并使其立即生效,输入source/etc/bash.bashrc。此时,可以查看当前的PATH是否已经成功更新,在命令行上输入$PATH,若其中有上述PATH,则更新成功!在本帖附件中,下载Linux镜像。如红色线条所画文档,并解压,将解压后的文件夹sd_image中的文件复制到sd卡上(在这之前保证sd卡被格式化为FAT32格式)。然后将SD卡插入到开发板上,按如图所示连接线路,注意红色线条所示的部分:编写测试文件:hello_world.c#includeintmain(){rintf("Hello,ZedBoard!\nI'mjefby!\n");return0;}在命令行下如下编译(进入到该文件所在的文件夹)>arm-xilinx-linux-gnueabi-gcc-ohello_worldhello_world.c将生成的文件拷贝到U盘中,然后按图1所示连接电路,上电,出现如下问题:下载驱动,打开设备管理器,更新驱动程序(CyUSB2Serial_v3.0.11.0.zi,附件中下载),又出现如下问题,再次更新此驱动(同一个目录),至此驱动安装完成。使用超级终端(windowsSuerzd.rar,附件中下载)建立串行连接,如下设置串口参数:最后输出如下:(部分图)红线所示为笔者U盘的挂载名称在命令行下将其挂载到mnt下,并运行hello_world,具体如下,Ok,到此就大功告成了!
    亮了(0) 4687 0 2018-01-05
  • 如何评价微软在数据中心使用 FPGA 代替传统 CPU 的做法?
    作者:李博杰链接:htts://www.zhihu.com/question/24174597/answer/138717507来源:知乎著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。问题「用FPGA代替CPU」中,这个「代替」的说法不准确。我们并不是不用CPU了,而是用FPGA加速适合它的计算任务,其他任务仍然在CPU上完成,让FPGA和CPU协同工作。本回答将涵盖三个问题:为什么使用FPGA,相比CPU、GPU、ASIC(专用芯片)有什么特点?微软的FPGA部署在哪里?FPGA之间、FPGA与CPU之间是如何通信的?未来FPGA在云计算平台中应充当怎样的角色?仅仅是像GPU一样的计算加速卡吗?一、为什么使用FPGA?众所周知,通用处理器(CPU)的摩尔定律已入暮年,而机器学习和Web服务的规模却在指数级增长。人们使用定制硬件来加速常见的计算任务,然而日新月异的行业又要求这些定制的硬件可被重新编程来执行新类型的计算任务。FPGA(FieldProgrammableGateArray)正是一种硬件可重构的体系结构,常年来被用作专用芯片(ASIC)的小批量替代品,然而近年来在微软、百度等公司的数据中心大规模部署,以同时提供强大的计算能力和足够的灵活性。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/50/v2-71bb2d45032d3752ab64eb85e80b815c_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2106&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1147&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2106&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/v2-71bb2d45032d3752ab64eb85e80b815c_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;不同体系结构性能和灵活性的比较。FPGA为什么快?「都是同行衬托得好」。CPU、GPU都属于冯·诺依曼结构,指令译码执行、共享内存。FPGA之所以比CPU甚至GPU能效高,本质上是无指令、无需共享内存的体系结构带来的福利。冯氏结构中,由于执行单元(如CPU核)可能执行任意指令,就需要有指令存储器、译码器、各种指令的运算器、分支跳转处理逻辑。由于指令流的控制逻辑复杂,不可能有太多条独立的指令流,因此GPU使用SIMD(单指令流多数据流)来让多个执行单元以同样的步调处理不同的数据,CPU也支持SIMD指令。而FPGA每个逻辑单元的功能在重编程(烧写)时就已经确定,不需要指令。冯氏结构中使用内存有两种作用。一是保存状态,二是在执行单元间通信。由于内存是共享的,就需要做访问仲裁;为了利用访问局部性,每个执行单元有一个私有的缓存,这就要维持执行部件间缓存的一致性。对于保存状态的需求,FPGA中的寄存器和片上内存(BRAM)是属于各自的控制逻辑的,无需不必要的仲裁和缓存。对于通信的需求,FPGA每个逻辑单元与周围逻辑单元的连接在重编程(烧写)时就已经确定,并不需要通过共享内存来通信。说了这么多三千英尺高度的话,FPGA实际的表现如何呢?我们分别来看计算密集型任务和通信密集型任务。计算密集型任务的例子包括矩阵运算、图像处理、机器学习、压缩、非对称加密、Bing搜索的排序等。这类任务一般是CPU把任务卸载(offload)给FPGA去执行。对这类任务,目前我们正在用的Altera(似乎应该叫Intel了,我还是习惯叫Altera……)StratixVFPGA的整数乘法运算性能与20核的CPU基本相当,浮点乘法运算性能与8核的CPU基本相当,而比GPU低一个数量级。我们即将用上的下一代FPGA,Stratix10,将配备更多的乘法器和硬件浮点运算部件,从而理论上可达到与现在的顶级GPU计算卡旗鼓相当的计算能力。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-ff703e77642f86b1789c57f9cf55777b_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1053&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;832&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1053&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-ff703e77642f86b1789c57f9cf55777b_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA的整数乘法运算能力(估计值,不使用DSP,根据逻辑资源占用量估计)&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-404dcbcf292c4e31b3e12ebc228963af_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1070&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;831&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1070&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-404dcbcf292c4e31b3e12ebc228963af_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA的浮点乘法运算能力(估计值,float16用软核,float32用硬核)在数据中心,FPGA相比GPU的核心优势在于延迟。像Bing搜索排序这样的任务,要尽可能快地返回搜索结果,就需要尽可能降低每一步的延迟。如果使用GPU来加速,要想充分利用GPU的计算能力,batchsize就不能太小,延迟将高达毫秒量级。使用FPGA来加速的话,只需要微秒级的PCIe延迟(我们现在的FPGA是作为一块PCIe加速卡)。未来Intel推出通过QPI连接的Xeon+FPGA之后,CPU和FPGA之间的延迟更可以降到100纳秒以下,跟访问主存没什么区别了。FPGA为什么比GPU的延迟低这么多?这本质上是体系结构的区别。FPGA同时拥有流水线并行和数据并行,而GPU几乎只有数据并行(流水线深度受限)。例如处理一个数据包有10个步骤,FPGA可以搭建一个10级流水线,流水线的不同级在处理不同的数据包,每个数据包流经10级之后处理完成。每处理完成一个数据包,就能马上输出。而GPU的数据并行方法是做10个计算单元,每个计算单元也在处理不同的数据包,然而所有的计算单元必须按照统一的步调,做相同的事情(SIMD,SingleInstructionMultileData)。这就要求10个数据包必须一起输入、一起输出,输入输出的延迟增加了。当任务是逐个而非成批到达的时候,流水线并行比数据并行可实现更低的延迟。因此对流式计算的任务,FPGA比GPU天生有延迟方面的优势。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-1ffb204e56f3d02b0cabdcd6f6c3fb34_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1435&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;476&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1435&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-1ffb204e56f3d02b0cabdcd6f6c3fb34_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;计算密集型任务,CPU、GPU、FPGA、ASIC的数量级比较(以16位整数乘法为例,数字仅为数量级的估计)ASIC专用芯片在吞吐量、延迟和功耗三方面都无可指摘,但微软并没有采用,出于两个原因:数据中心的计算任务是灵活多变的,而ASIC研发成本高、周期长。好不容易大规模部署了一批某种神经网络的加速卡,结果另一种神经网络更火了,钱就白费了。FPGA只需要几百毫秒就可以更新逻辑功能。FPGA的灵活性可以保护投资,事实上,微软现在的FPGA玩法与最初的设想大不相同。数据中心是租给不同的租户使用的,如果有的机器上有神经网络加速卡,有的机器上有Bing搜索加速卡,有的机器上有网络虚拟化加速卡,任务的调度和服务器的运维会很麻烦。使用FPGA可以保持数据中心的同构性。接下来看通信密集型任务。相比计算密集型任务,通信密集型任务对每个输入数据的处理不甚复杂,基本上简单算算就输出了,这时通信往往会成为瓶颈。对称加密、防火墙、网络虚拟化都是通信密集型的例子。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-d74634adc21db32f6fafed538c7b91ca_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1434&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;478&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1434&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-d74634adc21db32f6fafed538c7b91ca_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;通信密集型任务,CPU、GPU、FPGA、ASIC的数量级比较(以64字节网络数据包处理为例,数字仅为数量级的估计)对通信密集型任务,FPGA相比CPU、GPU的优势就更大了。从吞吐量上讲,FPGA上的收发器可以直接接上40Gbs甚至100Gbs的网线,以线速处理任意大小的数据包;而CPU需要从网卡把数据包收上来才能处理,很多网卡是不能线速处理64字节的小数据包的。尽管可以通过插多块网卡来达到高性能,但CPU和主板支持的PCIe插槽数量往往有限,而且网卡、交换机本身也价格不菲。从延迟上讲,网卡把数据包收到CPU,CPU再发给网卡,即使使用DPDK这样高性能的数据包处理框架,延迟也有4~5微秒。更严重的问题是,通用CPU的延迟不够稳定。例如当负载较高时,转发延迟可能升到几十微秒甚至更高(如下图所示);现代操作系统中的时钟中断和任务调度也增加了延迟的不确定性。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-b5b50a3c73cb770401d20a5223ada6c6_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;817&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;594&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;817&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-b5b50a3c73cb770401d20a5223ada6c6_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;ClickNP(FPGA)与DellS6000交换机(商用交换机芯片)、Click+DPDK(CPU)和Linux(CPU)的转发延迟比较,errorbar表示5%和95%。来源:[5]虽然GPU也可以高性能处理数据包,但GPU是没有网口的,意味着需要首先把数据包由网卡收上来,再让GPU去做处理。这样吞吐量受到CPU和/或网卡的限制。GPU本身的延迟就更不必说了。那么为什么不把这些网络功能做进网卡,或者使用可编程交换机呢?ASIC的灵活性仍然是硬伤。尽管目前有越来越强大的可编程交换机芯片,比如支持P4语言的Tofino,ASIC仍然不能做复杂的有状态处理,比如某种自定义的加密算法。综上,在数据中心里FPGA的主要优势是稳定又极低的延迟,适用于流式的计算密集型任务和通信密集型任务。二、微软部署FPGA的实践2016年9月,《连线》(Wired)杂志发表了一篇《微软把未来押注在FPGA上》的报道[3],讲述了Catault项目的前世今生。紧接着,Catault项目的老大DougBurger在Ignite2016大会上与微软CEOSatyaNadella一起做了FPGA加速机器翻译的演示。演示的总计算能力是103万Tos,也就是1.03Exa-o,相当于10万块顶级GPU计算卡。一块FPGA(加上板上内存和网络接口等)的功耗大约是30W,仅增加了整个服务器功耗的十分之一。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-de52ee704478410276b2acae767ec3a3_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1410&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;731&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1410&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-de52ee704478410276b2acae767ec3a3_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;Ignite2016上的演示:每秒1Exa-o(10^18)的机器翻译运算能力微软部署FPGA并不是一帆风顺的。对于把FPGA部署在哪里这个问题,大致经历了三个阶段:专用的FPGA集群,里面插满了FPGA每台机器一块FPGA,采用专用网络连接每台机器一块FPGA,放在网卡和交换机之间,共享服务器网络&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/50/v2-880465ced11d754f07f8edd225e48cab_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1077&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1335&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1077&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/v2-880465ced11d754f07f8edd225e48cab_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;微软FPGA部署方式的三个阶段,来源:[3]第一个阶段是专用集群,里面插满了FPGA加速卡,就像是一个FPGA组成的超级计算机。下图是最早的BFB实验板,一块PCIe卡上放了6块FPGA,每台1U服务器上又插了4块PCIe卡。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-8ed8783399c5b5cf4640c1450e73e1cf_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2483&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1101&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2483&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-8ed8783399c5b5cf4640c1450e73e1cf_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;最早的BFB实验板,上面放了6块FPGA。来源:[1]可以注意到该公司的名字。在半导体行业,只要批量足够大,芯片的价格都将趋向于沙子的价格。据传闻,正是由于该公司不肯给「沙子的价格」,才选择了另一家公司。当然现在数据中心领域用两家公司FPGA的都有。只要规模足够大,对FPGA价格过高的担心将是不必要的。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-a2257b21b10a8f6a91bfa87bd5db9165_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;714&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;599&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;714&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-a2257b21b10a8f6a91bfa87bd5db9165_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;最早的BFB实验板,1U服务器上插了4块FPGA卡。来源:[1]像超级计算机一样的部署方式,意味着有专门的一个机柜全是上图这种装了24块FPGA的服务器(下图左)。这种方式有几个问题:不同机器的FPGA之间无法通信,FPGA所能处理问题的规模受限于单台服务器上FPGA的数量;数据中心里的其他机器要把任务集中发到这个机柜,构成了in-cast,网络延迟很难做到稳定。FPGA专用机柜构成了单点故障,只要它一坏,谁都别想加速了;装FPGA的服务器是定制的,冷却、运维都增加了麻烦。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-70aa39ff215037213a8d70e46094464c_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2534&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1206&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2534&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-70aa39ff215037213a8d70e46094464c_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;部署FPGA的三种方式,从中心化到分布式。来源:[1]一种不那么激进的方式是,在每个机柜一面部署一台装满FPGA的服务器(上图中)。这避免了上述问题(2)(3),但(1)(4)仍然没有解决。第二个阶段,为了保证数据中心中服务器的同构性(这也是不用ASIC的一个重要原因),在每台服务器上插一块FPGA(上图右),FPGA之间通过专用网络连接。这也是微软在ISCA'14上所发表论文采用的部署方式。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-7b73facc9e24d0fc6770a302de4eca7e_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;858&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;612&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;858&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-7b73facc9e24d0fc6770a302de4eca7e_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;OenComuteServer在机架中。来源:[1]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-e23b8d5c807ad2ff8245d992d950fbec_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2433&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;736&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2433&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-e23b8d5c807ad2ff8245d992d950fbec_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;OenComuteServer内景。红框是放FPGA的位置。来源:[1]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-f0c08601b8b0a82beaa1389406ebbc15_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1034&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;594&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1034&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-f0c08601b8b0a82beaa1389406ebbc15_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;插入FPGA后的OenComuteServer。来源:[1]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/50/v2-6d0f7965b3064df2e59d9a4b579fc59c_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1002&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1353&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1002&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/v2-6d0f7965b3064df2e59d9a4b579fc59c_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA与OenComuteServer之间的连接与固定。来源:[1]FPGA采用StratixVD5,有172K个ALM,2014个M20K片上内存,1590个DSP。板上有一个8GBDDR3-1333内存,一个PCIeGen3x8接口,两个10Gbs网络接口。一个机柜之间的FPGA采用专用网络连接,一组10G网口8个一组连成环,另一组10G网口6个一组连成环,不使用交换机。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-90500731a6a932a37354ce1f16ac4cd8_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2431&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1218&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2431&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-90500731a6a932a37354ce1f16ac4cd8_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;机柜中FPGA之间的网络连接方式。来源:[1]这样一个1632台服务器、1632块FPGA的集群,把Bing的搜索结果排序整体性能提高到了2倍(换言之,节省了一半的服务器)。如下图所示,每8块FPGA穿成一条链,中间用前面提到的10Gbs专用网线来通信。这8块FPGA各司其职,有的负责从文档中提取特征(黄色),有的负责计算特征表达式(绿色),有的负责计算文档的得分(红色)。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-aaef099e0f6cf7aaf9e5be6bb3b0bc27_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1655&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1155&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1655&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-aaef099e0f6cf7aaf9e5be6bb3b0bc27_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA加速Bing的搜索排序过程。来源:[1]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-8bd4141da4973664abed27f2cbb8605f_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;963&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;638&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;963&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-8bd4141da4973664abed27f2cbb8605f_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA不仅降低了Bing搜索的延迟,还显著提高了延迟的稳定性。来源:[4]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/50/v2-b81641c8231b29f3d4b687ce294b329c_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;963&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;647&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;963&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/v2-b81641c8231b29f3d4b687ce294b329c_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;本地和远程的FPGA均可以降低搜索延迟,远程FPGA的通信延迟相比搜索延迟可忽略。来源:[4]FPGA在Bing的部署取得了成功,Catault项目继续在公司内扩张。微软内部拥有最多服务器的,就是云计算Azure部门了。Azure部门急需解决的问题是网络和存储虚拟化带来的开销。Azure把虚拟机卖给客户,需要给虚拟机的网络提供防火墙、负载均衡、隧道、NAT等网络功能。由于云存储的物理存储跟计算节点是分离的,需要把数据从存储节点通过网络搬运过来,还要进行压缩和加密。在1Gbs网络和机械硬盘的时代,网络和存储虚拟化的CPU开销不值一提。随着网络和存储速度越来越快,网络上了40Gbs,一块SSD的吞吐量也能到1GB/s,CPU渐渐变得力不从心了。例如Hyer-V虚拟交换机只能处理25Gbs左右的流量,不能达到40Gbs线速,当数据包较小时性能更差;AES-256加密和SHA-1签名,每个CPU核只能处理100MB/s,只是一块SSD吞吐量的十分之一。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-5aeb1ccedd0b0f00cd1779d454b33382_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1842&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;546&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1842&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-5aeb1ccedd0b0f00cd1779d454b33382_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;网络隧道协议、防火墙处理40Gbs需要的CPU核数。来源:[5]为了加速网络功能和存储虚拟化,微软把FPGA部署在网卡和交换机之间。如下图所示,每个FPGA有一个4GBDDR3-1333DRAM,通过两个PCIeGen3x8接口连接到一个CPUsocket(物理上是PCIeGen3x16接口,因为FPGA没有x16的硬核,逻辑上当成两个x8的用)。物理网卡(NIC)就是普通的40Gbs网卡,仅用于宿主机与网络之间的通信。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-974d7118b5993fbd74756be5931fc14f_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1265&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;625&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1265&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-974d7118b5993fbd74756be5931fc14f_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;Azure服务器部署FPGA的架构。来源:[6]FPGA(SmartNIC)对每个虚拟机虚拟出一块网卡,虚拟机通过SR-IOV直接访问这块虚拟网卡。原本在虚拟交换机里面的数据平面功能被移到了FPGA里面,虚拟机收发网络数据包均不需要CPU参与,也不需要经过物理网卡(NIC)。这样不仅节约了可用于出售的CPU资源,还提高了虚拟机的网络性能(25Gbs),把同数据中心虚拟机之间的网络延迟降低了10倍。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/50/v2-061762f3dad5b8d8d6ac0e047a016924_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2371&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;1316&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2371&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/v2-061762f3dad5b8d8d6ac0e047a016924_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;网络虚拟化的加速架构。来源:[6]这就是微软部署FPGA的第三代架构,也是目前「每台服务器一块FPGA」大规模部署所采用的架构。FPGA复用主机网络的初心是加速网络和存储,更深远的影响则是把FPGA之间的网络连接扩展到了整个数据中心的规模,做成真正cloud-scale的「超级计算机」。第二代架构里面,FPGA之间的网络连接局限于同一个机架以内,FPGA之间专网互联的方式很难扩大规模,通过CPU来转发则开销太高。第三代架构中,FPGA之间通过LTL(LightweightTransortLayer)通信。同一机架内延迟在3微秒以内;8微秒以内可达1000块FPGA;20微秒可达同一数据中心的所有FPGA。第二代架构尽管8台机器以内的延迟更低,但只能通过网络访问48块FPGA。为了支持大范围的FPGA间通信,第三代架构中的LTL还支持PFC流控协议和DCQCN拥塞控制协议。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-c1db4799f5fe34ecc85611211b568bb5_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;2209&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;993&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;2209&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-c1db4799f5fe34ecc85611211b568bb5_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;纵轴:LTL的延迟,横轴:可达的FPGA数量。来源:[4]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/50/v2-7ce4e13c6a60fe56684f1e2217923ceb_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;968&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;791&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;968&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic3.zhimg.com/v2-7ce4e13c6a60fe56684f1e2217923ceb_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA内的逻辑模块关系,其中每个Role是用户逻辑(如DNN加速、网络功能加速、加密),外面的部分负责各个Role之间的通信及Role与外设之间的通信。来源:[4]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-b9d7f53b5125aecfd5d0b719b1a4179f_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1272&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;971&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1272&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-b9d7f53b5125aecfd5d0b719b1a4179f_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;FPGA构成的数据中心加速平面,介于网络交换层(TOR、L1、L2)和传统服务器软件(CPU上运行的软件)之间。来源:[4]通过高带宽、低延迟的网络互联的FPGA构成了介于网络交换层和传统服务器软件之间的数据中心加速平面。除了每台提供云服务的服务器都需要的网络和存储虚拟化加速,FPGA上的剩余资源还可以用来加速Bing搜索、深度神经网络(DNN)等计算任务。对很多类型的应用,随着分布式FPGA加速器的规模扩大,其性能提升是超线性的。例如CNNinference,当只用一块FPGA的时候,由于片上内存不足以放下整个模型,需要不断访问DRAM中的模型权重,性能瓶颈在DRAM;如果FPGA的数量足够多,每块FPGA负责模型中的一层或者一层中的若干个特征,使得模型权重完全载入片上内存,就消除了DRAM的性能瓶颈,完全发挥出FPGA计算单元的性能。当然,拆得过细也会导致通信开销的增加。把任务拆分到分布式FPGA集群的关键在于平衡计算和通信。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/50/v2-5a17afc6d68df612e27e34778d0a0932_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1827&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;394&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1827&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic1.zhimg.com/v2-5a17afc6d68df612e27e34778d0a0932_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;从神经网络模型到HaaS上的FPGA。利用模型内的并行性,模型的不同层、不同特征映射到不同FPGA。来源:[4]在MICRO'16会议上,微软提出了HardwareasaService(HaaS)的概念,即把硬件作为一种可调度的云服务,使得FPGA服务的集中调度、管理和大规模部署成为可能。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-e87fddf2b776f27c0d37cba5a521beed_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1025&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;950&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1025&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-e87fddf2b776f27c0d37cba5a521beed_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;HardwareasaService(HaaS)。来源:[4]从第一代装满FPGA的专用服务器集群,到第二代通过专网连接的FPGA加速卡集群,到目前复用数据中心网络的大规模FPGA云,三个思想指导我们的路线:硬件和软件不是相互取代的关系,而是合作的关系;必须具备灵活性,即用软件定义的能力;必须具备可扩放性(scalability)。三、FPGA在云计算中的角色最后谈一点我个人对FPGA在云计算中角色的思考。作为三年级博士生,我在微软亚洲研究院的研究试图回答两个问题:FPGA在云规模的网络互连系统中应当充当怎样的角色?如何高效、可扩放地对FPGA+CPU的异构系统进行编程?我对FPGA业界主要的遗憾是,FPGA在数据中心的主流用法,从除微软外的互联网巨头,到两大FPGA厂商,再到学术界,大多是把FPGA当作跟GPU一样的计算密集型任务的加速卡。然而FPGA真的很适合做GPU的事情吗?前面讲过,FPGA和GPU最大的区别在于体系结构,FPGA更适合做需要低延迟的流式处理,GPU更适合做大批量同构数据的处理。由于很多人打算把FPGA当作计算加速卡来用,两大FPGA厂商推出的高层次编程模型也是基于OenCL,模仿GPU基于共享内存的批处理模式。CPU要交给FPGA做一件事,需要先放进FPGA板上的DRAM,然后告诉FPGA开始执行,FPGA把执行结果放回DRAM,再通知CPU去取回。CPU和FPGA之间本来可以通过PCIe高效通信,为什么要到板上的DRAM绕一圈?也许是工程实现的问题,我们发现通过OenCL写DRAM、启动kernel、读DRAM一个来回,需要1.8毫秒。而通过PCIeDMA来通信,却只要1~2微秒。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-ade077e9ffe5e9babe8712621204b857_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1761&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;647&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1761&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-ade077e9ffe5e9babe8712621204b857_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;PCIeI/Ochannel与OenCL的性能比较。纵坐标为对数坐标。来源:[5]OenCL里面多个kernel之间的通信就更夸张了,默认的方式也是通过共享内存。本文开篇就讲,FPGA比CPU和GPU能效高,体系结构上的根本优势是无指令、无需共享内存。使用共享内存在多个kernel之间通信,在顺序通信(FIFO)的情况下是毫无必要的。况且FPGA上的DRAM一般比GPU上的DRAM慢很多。因此我们提出了ClickNP网络编程框架[5],使用管道(channel)而非共享内存来在执行单元(element/kernel)间、执行单元和主机软件间进行通信。需要共享内存的应用,也可以在管道的基础上实现,毕竟CSP(CommunicatingSequentialProcess)和共享内存理论上是等价的嘛。ClickNP目前还是在OenCL基础上的一个框架,受到C语言描述硬件的局限性(当然HLS比Verilog的开发效率确实高多了)。理想的硬件描述语言,大概不会是C语言吧。&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/50/v2-3749426b53f0af9adc57613b4ec35093_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1647&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;618&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1647&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic4.zhimg.com/v2-3749426b53f0af9adc57613b4ec35093_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;ClickNP使用channel在elements间通信,来源:[5]&am;am;am;am;am;am;am;am;am;am;lt;imgsrc=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/50/v2-6364688b070e6ca6cdab83c6602b7c73_hd.jg&am;am;am;am;am;am;am;am;quot;data-rawwidth=&am;am;am;am;am;am;am;am;quot;1585&am;am;am;am;am;am;am;am;quot;data-rawheight=&am;am;am;am;am;am;am;am;quot;359&am;am;am;am;am;am;am;am;quot;class=&am;am;am;am;am;am;am;am;quot;origin_imagezh-lightbox-thumb&am;am;am;am;am;am;am;am;quot;width=&am;am;am;am;am;am;am;am;quot;1585&am;am;am;am;am;am;am;am;quot;data-original=&am;am;am;am;am;am;am;am;quot;htts://ic2.zhimg.com/v2-6364688b070e6ca6cdab83c6602b7c73_r.jg&am;am;am;am;am;am;am;am;quot;&am;am;am;am;am;am;am;am;am;am;gt;ClickNP使用channel在FPGA和CPU间通信,来源:[5]低延迟的流式处理,需要最多的地方就是通信。然而CPU由于并行性的限制和操作系统的调度,做通信效率不高,延迟也不稳定。此外,通信就必然涉及到调度和仲裁,CPU由于单核性能的局限和核间通信的低效,调度、仲裁性能受限,硬件则很适合做这种重复工作。因此我的博士研究把FPGA定义为通信的「大管家」,不管是服务器跟服务器之间的通信,虚拟机跟虚拟机之间的通信,进程跟进程之间的通信,CPU跟存储设备之间的通信,都可以用FPGA来加速。成也萧何,败也萧何。缺少指令同时是FPGA的优势和软肋。每做一点不同的事情,就要占用一定的FPGA逻辑资源。如果要做的事情复杂、重复性不强,就会占用大量的逻辑资源,其中的大部分处于闲置状态。这时就不如用冯·诺依曼结构的处理器。数据中心里的很多任务有很强的局部性和重复性:一部分是虚拟化平台需要做的网络和存储,这些都属于通信;另一部分是客户计算任务里的,比如机器学习、加密解密。我们首先把FPGA用于它最擅长的通信,日后也许也会像AWS那样把FPGA作为计算加速卡租给客户。不管通信还是机器学习、加密解密,算法都是很复杂的,如果试图用FPGA完全取代CPU,势必会带来FPGA逻辑资源极大的浪费,也会提高FPGA程序的开发成本。更实用的做法是FPGA和CPU协同工作,局部性和重复性强的归FPGA,复杂的归CPU。当我们用FPGA加速了Bing搜索、深度学习等越来越多的服务;当网络虚拟化、存储虚拟化等基础组件的数据平面被FPGA把持;当FPGA组成的「数据中心加速平面」成为网络和服务器之间的天堑……似乎有种感觉,FPGA将掌控全局,CPU上的计算任务反而变得碎片化,受FPGA的驱使。以往我们是CPU为主,把重复的计算任务卸载(offload)到FPGA上;以后会不会变成FPGA为主,把复杂的计算任务卸载到CPU上呢?随着Xeon+FPGA的问世,古老的SoC会不会在数据中心焕发新生?「跨越内存墙,走向可编程世界」(Acrossthememorywallandreachafullyrogrammableworld.)参考文献:[1]Large-ScaleReconfigurableComutinginaMicrosoftDatacenterhtts://www.microsoft.com/en-us/research/w-content/uloads/2014/06/HC26.12.520-Recon-Fabric-Pulnam-Microsoft-Catault.df[2]AReconfigurableFabricforAcceleratingLarge-ScaleDatacenterServices,ISCA'14htts://www.microsoft.com/en-us/research/w-content/uloads/2016/02/Catault_ISCA_2014.df[3]MicrosoftHasaWholeNewKindofComuterChi—andIt’llChangeEverything[4]ACloud-ScaleAccelerationArchitecture,MICRO'16htts://www.microsoft.com/en-us/research/w-content/uloads/2016/10/Cloud-Scale-Acceleration-Architecture.df[5]ClickNP:HighlyFlexibleandHigh-erformanceNetworkProcessingwithReconfigurableHardware-MicrosoftResearch[6]DanielFirestone,SmartNIC:AcceleratingAzure'sNetworkwith.FPGAsonOCSservers.
    亮了(0) 1974 1 2018-01-04
  • 论文 | 美国伯明翰大学团队使用Theano,Python,PYNQ和Zynq开发定点Deep Recurrent神经网络
    译文来源:xilinx.eetrend.com作者:Sleibso编译:csc57可编程逻辑(PLD)是由一种通用的集成电路产生的,逻辑功能按照用户对器件编程来确定,用户可以自行编程把数字系统集成在PLD中。经过多年的发展,可编程逻辑器件由70年代的可编程逻辑阵列器件(PLD)发展到目前的拥有数千万门的现场可编程阵列逻辑(FPGA),随着人工智能研究的火热发展,FPGA的并行性已经在一些实时性很高的神经网络计算任务中得到应用。由于在FPGA上实现浮点数会耗费很多硬件资源,而定点数虽然精度有限,但是对于不同应用通过选择合适的字长精度仍可以保证收敛,且速度要比浮点数表示更快而且资源耗费更少,已经使其成为嵌入式AI和机器学习应用程序的理想选择。最新的证明点是英国伯明翰大学电子电气和系统工程系的YufengHao和StevenQuigley最近发表的论文。论文标题为“在XilinxFPGA上实现深度递归神经网络语言模型“,介绍了使用Python编程语言成功实现和训练基于固定点深度递归神经网络(DRNN);Theano数学库和多维数组的框架;开源的基于Python的PYNQ开发环境;DigilentPYNQ-Z1开发板以及PYNQ-Z1板上的赛灵思ZynqZ-7020的片上系统SoC。Zynq-7000系列装载了双核ARMCortex-A9处理器和28nm的Artix-7或Kintex-7可编程逻辑。在单片上集成了CPU,DSP以及ASSP,具备了关键分析和硬件加速能力以及混合信号功能,出色的性价比和最大的设计灵活性也是特点之一。使用PythonDRNN硬件加速覆盖(一种赛灵思公司提出的硬件库,使用PythonAPI在硬件逻辑和软件中建立连接并交换数据),两个合作者使用此设计为NLP(自然语言处理)应用程序实现了20GOPS(10亿次每秒)的处理吞吐量,优于早期基于FPGA的实现2.75倍到70.5倍。论文的大部分讨论了NLP和LM(语言模型),“它涉及机器翻译,语音搜索,语音标记和语音识别”。本文随后讨论了使用VivadoHLS开发工具和Verilog语言实现DRNNLM硬件加速器,可以为PYNQ开发环境合成一个定制的硬件覆盖。由此产生的加速器包含五个过程元素(PE),能够在此应用程序中提供20GOPS的数据吞吐量。以下是设计的框图:DRNN加速器框图Vivado设计套件为下一代超高效率的C/C++和基于IP的设计提供了新的方法。融入了新的超快高效率设计方法集,用户可以实现10-15倍的效率的提升。VivadoHLS支持ISE和Vivado设计环境,可以通过集成C,C++和SystemC标准到赛灵思的可编程器件中而无需创建RTL模型,加快IP的创建。这篇论文中包括了大量深入的技术细节,但是这一句话总结了这篇博客文章的理由:“更重要的是,我们展示了软件和硬件联合设计和仿真过程在神经网络领域的应用“。对于PDF版论文原文有兴趣的小伙伴,可以点击本帖附件查看。
    亮了(0) 4135 0 2018-01-03
解决问题:67
干货热帖:169
会员总数:15163
总帖数:714
我要发帖

贡献榜

  • hahavchen

    创新创业教育知名圈内人

    常隐匿于江湖的资深攻城狮,拥有超过10年的半导体与测控行业产品研发、技术支持、市场拓展及区域销售经验。目前同时担任上海交通大学本科生企业导师。

    • 工程师

      Andorid工程师,书呆子

    • mysunday2

      本科生

      武汉大学在读研究生,懂一点Java,懂一点LabVIEW

    • 风雨兼程

      本科生

      热爱科研,忠于技术,渴望在LabVIEW的世界里遨游。

    • 媒体人

      资深媒体人,现任职于国内某知名电子行业媒体

  • 工程师

    汽车电子方向系统集成工程师,坐标北京。精通LabVIEW与测控技术,乐于分享总结。

    • 工程师

      国家电网工程师,对于创新创造,我是真爱粉!

    • chnwjian

      研究生

      擅长物理实时测量和ardunio

    • 教师

      东南大学电子科学与工程学院

    • suo ivy

      创业者

      乐忠于机器人的创业少年

  • CC

    研究生

    擅长FPGA以及LabVIEW程序设计,拥有多年项目开发经验,曾开发过高速误码仪、自动泊车系统,研发并将PM2.5检测仪推入市场,目前致力于FPGA的图像处理研究。

    • 冰淇淋

      研究生

      略懂c语言,爱玩爱交流

    • Veritas

      电子技术爱好者

      非电类专业的在校纯技术爱好者

    • 竹杖芒鞋轻胜马

      研究生

      擅长电子电力,电源,逆变器,Matlab的同济骚年

    • 熊猫家的猫

      研究生

      热爱电子设计,熟悉LabVIEW编程,希望和大家一起学习进步。

    • 李比希

      电子技术爱好者

      喜欢LabVIEW,会点c语言,痴迷玩创,让激情碰出创意的火花!

    • 糊涂宝宝

      研究生

      精通C语言与电路设计。善于软硬件结合开发实际工程项目。

    • LabVIEWers

      工程师

      汽车电子电控领域工程师。个人信条:追求卓越。

    • 南瓜粥

      本科生

      致力于精密测量@天津大学

  • 工程师

    刚毕业的新晋TI验证工程师一枚。内心埋有一个小小的希望靠技术改变世界的种子。CLD(认证LabVIEW开发工程师)持有者,并熟悉C语言与MATLAB。

    • 阳光的新手125

      研究生

      Strict coding is the boddy, smart thoughts are the soul.

    • 工程师

      华为数字视频领域工程师,技术宅,CLD,热衷LabVIEW

    • 西兰花教负责人

      创业者

      俗称“福建三本“的某985高校毕业,正不务正业地创业中

    • berwin

      创业者

      好奇主义 & 观察者 & 行动派

  • 教师

    博士,现于哈尔滨工业大学任教。拥有超过8年基于FPGA的数字系统硬件设计经验,精通FPGA开发。

    • wonderm

      本科生

      熟悉LabVIEW/Matlab/Verilog,擅长STM32/K60硬件开发

    • diguaguowang

      研究生

      一直羡慕会各种编程的人,也在向着这个方向努力

    • 阿Q

      工程师

      精通FPGA的酷创达人

    • 工程师

      Digilent元老级大牛

  • Mr. D

    工程师

    部落的发起者与第一位"Digger",非典型张江男一枚。致力于为有执着有梦想的志同道合者缔造一个可以互相勾搭的中文开源技术社区。