Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Minimum II of 2 but HTML report has no further information

  1. #1
    Join Date
    Nov 2014
    Posts
    12
    Rep Power
    1

    Default Minimum II of 2 but HTML report has no further information

    I have a single work item kernel with a local mem used for a ping pong buffer with a form similar to the following:

    local float __attribute__((bankwidth(4),
    numreadports(2),

    numwriteports(2),
    doublepump,
    bank_bits(2,1,0))) mem[1024][4][2];


    for (uint outer_outer = 0; outer_outer < 8; ++outer_outer)
    {
    // some integer add,sub,and shifts that are used to help compute x_idx, y_idx later
    float x_pipe[4];
    float y_pipe[4];
    uint x_idx_pipe[4];
    uint y_idx_pipe[4];
    for (uint outer = 0; outer < 8; ++outer)
    {
    uint x_idx, y_idx;
    // compute x_idx, and y_idx using integer add, subs, and shifts

    #pragma unroll
    for (uint inner = 0; inner < 4; ++inner)
    {
    float x_fetched = mem[x_idx][inner][0];
    float y_fetched = mem[y_idx][inner][0];

    mem[x_idx_pipe[0]][inner][1] = x_pipe[0];
    mem[y_idx_pipe[0]][inner][1] = y_pipe[0];

    // shift register statements + computations on x and y

    x_pipe[3] = x_fetched;
    y_pipe[3] = y_fetched;
    x_idx_pipe[3] = x_idx;
    y_idx_pipe[3] = y_idx;
    }
    }
    }

    The compiler seems to detect the parallelization of the inner loop correctly, but my II on the 'outer' loop is 2. Unfortunately there is no additional information in the Loop Analysis section of the HTML report about what's the limiting factor. Does anyone here have any insight into what it means if the HTML report doesn't provide info on what's limiting the II? Does that mean the control logic is causing it hence there's nothing I can do?

    I've tried forcing it using #pragma ii 1 but the compiler fails. Looking at the system view I notice the two store ops are sequential (the second dependent on the first) but am unsure if this is just a graphical thing (I.E. the system view doesn't display doublepump allowing for parallel store).

  2. #2
    Join Date
    Jan 2017
    Posts
    394
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    In nested loops, II of outer loops will be two since the exit condition of the inner loop and the outer loop need to be evaluated in one cycle if you want II of one on the outer loop, and that will create a very large critical path and significantly reduce operating frequency. This issue does not necessarily result in lower performance; however, you can merge your loops manually into one to achieve II of one. There should be a note about this in the report at the bottom if you click on the line with the II info, but I don't remember exactly.

  3. #3
    Join Date
    Nov 2014
    Posts
    12
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    HRZ,

    Thanks for the insight. I had not considered nested loops, and experimenting with that produced something interesting. I removed the inner loop and just did the calculations on one bank to test things out. I also unrolled a for loop I used to implement the shift regs (idx_pipes, etc) so that there were no nested loops inside the 'outer' loop. Now the tool still shows an II of 2 on the 'outer' loop except now it provides more info. It does say there's a store dependency on those two lines. I wouldn't expect that behavior because it's a doublepumped memory (2 wr ports, 2 rd ports).

    Do you have any advice on that?

    EDIT: This is with version 17.0.
    Last edited by dark_visage; November 14th, 2017 at 10:23 AM.

  4. #4
    Join Date
    Jan 2017
    Posts
    394
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    Please post your new code. You seem to be using indirect addressing on the local buffer; this is very likely not a good idea. Double-pumping memory should not affect load/store dependencies.

  5. #5
    Join Date
    Nov 2014
    Posts
    12
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    I could see double pumping not affecting load/store dependencies, but from an II perspective I think it should matter. Here's what I'm thinking, please let me know if you disagree: if I do two writes per loop at clock rate 'clk', and my memory is doublepumped such that it operates at 'clk2x' then on the first cycle of clk2x the first write will be performed, and on the second cycle of clk2x the second write will be performed. The writes will have been performed in order, and in 1 cycle of 'clk'.

    Also, do you have any insight into why indirect address is bad in OpenCL? Is it just Altera preventing anyone from accidentally causing write collisions?


    local float __attribute__((bankwidth(4),
    numreadports(2),
    numwriteports(2),
    doublepump,
    bank_bits(2,1,0))) mem[1024][4][2];


    for (uint outer_outer = 0; outer_outer < 8; ++outer_outer)
    {
    // some integer add,sub,and shifts that are used to help compute x_idx, y_idx later
    float x_pipe[4];
    float y_pipe[4];
    uint x_idx_pipe[4];
    uint y_idx_pipe[4];
    for (uint outer = 0; outer < 8; ++outer)
    {
    uint x_idx, y_idx;
    // compute x_idx, and y_idx using integer add, subs, and shifts
    float x_fetched = mem[x_idx][0][0];
    float y_fetched = mem[y_idx][0][0];
    mem[x_idx_pipe[0]][0][1] = x_pipe[0];
    mem[y_idx_pipe[0]][0][1] = y_pipe[0];
    // manually coded shift register statements (i.e. no for loop) + computations on x and y
    x_pipe[3] = x_fetched;
    y_pipe[3] = y_fetched;
    x_idx_pipe[3] = x_idx;
    y_idx_pipe[3] = y_idx;
    }
    }
    Last edited by dark_visage; November 14th, 2017 at 11:20 AM.

  6. #6
    Join Date
    Nov 2014
    Posts
    12
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    HRZ,

    You are correct that indirect addressing is causing it. If I index with constants it reduces to II = 1. I'm not sure I understand why indirect is such a problem though...

  7. #7
    Join Date
    Jan 2017
    Posts
    394
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    By using indirect addressing, you are preventing the compiler from properly determining whether there is any load/store dependency in accessing the mem buffer in the loop or not; hence, the compiler assumes the worst case and enforces the highest II to ensure correct functionality.

  8. #8
    Join Date
    Nov 2014
    Posts
    12
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    I see what you're saying but I'm reading from one bank and writing to another. The tool correctly infers the loads can be done in parallel, even with indirect addressing. I don't see how a store dependency would affect anything since it's doublepumped - just schedule the second store to be on the second edge of clk2x. Perhaps Altera needs to provide more control over the M20K configuration such as the ability to specify what happens during a collision because this unexpected behavior halves the throughput of the kernel.

  9. #9
    Join Date
    Jan 2017
    Posts
    394
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    Double-pumping just reduces the Block RAM usage, it will not help eliminating load/store dependencies. You have two loops here, both of which are pipelined. You should pay attention to this fact that your pipeline might be long enough that not only all iterations of the inner loop, but also some iterations of the outer loop might be in flight in the pipeline at the same time. This can potentially lead to iteration "i" in the outer loop and some "x" in the inner loop writing to the same location in the mem buffer as iteration "i+1" in the outer loop and "y" in the inner loop is trying to read from; unless the compiler can ensure this will not happen, it will assume it does, and adjusts the II accordingly.

  10. #10
    Join Date
    Nov 2014
    Posts
    12
    Rep Power
    1

    Default Re: Minimum II of 2 but HTML report has no further information

    That's a good point I had not considered, although in my case my "outer_outer" loop is executed serially (as correctly reported by the tool) due to the structure of my code. Therefore an II of 1 is achievable on the "outer" loop IF the tool utilized both write ports of the BRAM simultaneously, which is a common design pattern in HDL, but I guess not supported at this time by the OpenCL compiler.

    As an aside, from an RTL perspective, doublepumping really does more than just reduce BRAM usage - it increases memory bandwidth, or throughput, of each BRAM. If I have two singlepumped BRAMs I should be able to do 1 write, 2 reads per kernel clock; if I have one doublepumped BRAM I should be able to do 2 writes and 2 reads per kernel clock.

    EDIT: I think I may have been wrong. According to the Best Practices guide:
    By default, each local memory bank has one read port and one write port. The double pumping feature allows each local memory bank to support up to three read ports.
    I wonder if the M20K read latency is half the write latency or something like that which results in this....
    Last edited by dark_visage; Yesterday at 10:35 AM.

Similar Threads

  1. Cannot see the HTML GUI reports
    By sbiookag in forum OpenCL
    Replies: 6
    Last Post: October 23rd, 2017, 07:35 PM
  2. minimum pulse width report in Quartus II with cyclone IV and ALTGX IP
    By morlior81 in forum Quartus II and EDA Tools Discussion
    Replies: 0
    Last Post: May 31st, 2016, 01:42 AM
  3. How to customize the compilation report, fitter report and the timing report
    By alteraaditya in forum General Altera Discussion
    Replies: 0
    Last Post: August 4th, 2011, 02:08 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •