.. _arch_model_timing_tutorial: Primitive Block Timing Modeling Tutorial ---------------------------------------- To accurately model an FPGA, the architect needs to specify the timing characteristics of the FPGA's primitives blocks. This involves two key steps: #. Specifying the logical timing characteristics of a primitive including: * whether primitive pins are sequential or combinational, and * what the timing dependencies are between the pins. #. Specifying the physical delay values These two steps separate the logical timing characteristics of a primitive, from the physically dependent delays. This enables a single logical netlist primitive type (e.g. Flip-Flop) to be mapped into different physical locations with different timing characteristics. The :ref:`FPGA architecture description ` describes the logical timing characteristics in the :ref:`models section `, while the physical timing information is specified on ``pb_types`` within :ref:`complex block `. The following sections illustrate some common block timing modeling approaches. Combinational block ~~~~~~~~~~~~~~~~~~~ A typical combinational block is a full adder, .. figure:: fa.* :width: 50% Full Adder where ``a``, ``b`` and ``cin`` are combinational inputs, and ``sum`` and ``cout`` are combinational outputs. We can model these timing dependencies on the model with the ``combinational_sink_ports``, which specifies the output ports which are dependent on an input port: .. code-block:: xml The physical timing delays are specified on any ``pb_type`` instances of the adder model. For example: .. code-block:: xml specifies that all the edges of 300ps delays, except to ``cin`` to ``cout`` edge which has a delay of 10ps. .. _dff_timing_modeling: Sequential block (no internal paths) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A typical sequential block is a D-Flip-Flop (DFF). DFFs have no internal timing paths between their input and output ports. .. note:: If you are using BLIF's ``.latch`` directive to represent DFFs there is no need to explicitly provide a ```` definition, as it is supported by default. .. figure:: dff.* :width: 50% DFF Sequential model ports are specified by providing the ``clock=""`` attribute, where ```` is the name of the associated clock ports. The associated clock port must have ``is_clock="1"`` specified to indicate it is a clock. .. code-block:: xml The physical timing delays are specified on any ``pb_type`` instances of the model. In the example below the setup-time of the input is specified as 66ps, while the clock-to-q delay of the output is set to 124ps. .. code-block:: xml .. _mixed_sp_ram_timing_modeling: Mixed Sequential/Combinational Block ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is possible to define a block with some sequential ports and some combinational ports. In the example below, the ``single_port_ram_mixed`` has sequential input ports: ``we``, ``addr`` and ``data`` (which are controlled by ``clk``). .. figure:: mixed_sp_ram.* :width: 75% Mixed sequential/combinational single port ram However the output port (``out``) is a combinational output, connected internally to the ``we``, ``addr`` and ``data`` input registers. .. code-block:: xml In the ``pb_type`` we define the external setup time of the input registers (50ps) as we did for :ref:`dff_timing_modeling`. However, we also specify the following additional timing information: * The internal clock-to-q delay of the input registers (200ps) * The combinational delay from the input registers to the ``out`` port (800ps) .. code-block:: xml .. _seq_sp_ram_timing_modeling: Sequential block (with internal paths) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some primitives represent more complex architecture primitives, which have timing paths contained completely within the block. The model below specifies a sequential single-port RAM. The ports ``we``, ``addr``, and ``data`` are sequential inputs, while the port ``out`` is a sequential output. ``clk`` is the common clock. .. figure:: seq_sp_ram.* :width: 75% Sequential single port ram .. code-block:: xml Similarly to :ref:`mixed_sp_ram_timing_modeling` the ``pb_type`` defines the input register timing: * external input register setup time (50ps) * internal input register clock-to-q time (200ps) Since the output port ``out`` is sequential we also define the: * internal *output* register setup time (60ps) * external *output* register clock-to-q time (300ps) The combinational delay between the input and output registers is set to 740ps. Note the internal path from the input to output registers can limit the maximum operating frequency. In this case the internal path delay is 1ns (200ps + 740ps + 60ps) limiting the maximum frequency to 1 GHz. .. code-block:: xml .. _seq_sp_ram_comb_inputs_timing_modeling: Sequential block (with internal paths and combinational input) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A primitive may have a mix of sequential and combinational inputs. The model below specifies a mostly sequential single-port RAM. The ports ``addr``, and ``data`` are sequential inputs, while the port ``we`` is a combinational input. The port ``out`` is a sequential output. ``clk`` is the common clock. .. figure:: seq_comb_sp_ram.* :width: 75% Sequential single port ram with a combinational input .. code-block:: xml :emphasize-lines: 3 We use register delays similar to :ref:`seq_sp_ram_timing_modeling`. However we also specify the purely combinational delay between the combinational ``we`` input and sequential output ``out`` (800ps). Note that the setup time of the output register still effects the ``we`` to ``out`` path for an effective delay of 860ps. .. code-block:: xml :emphasize-lines: 17 .. _multiclock_dp_ram_timing_modeling: Multi-clock Sequential block (with internal paths) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is also possible for a sequential primitive to have multiple clocks. The following model represents a multi-clock simple dual-port sequential RAM with: * one write port (``addr1`` and ``data1``, ``we1``) controlled by ``clk1``, and * one read port (``addr2`` and ``data2``) controlled by ``clk2``. .. figure:: multiclock_dp_ram.* :width: 75% Multi-clock sequential simple dual port ram .. code-block:: xml On the ``pb_type`` the input and output register timing is defined similarly to :ref:`seq_sp_ram_timing_modeling`, except multiple clocks are used. .. code-block:: xml .. _clock_generator_timing_modeling: Clock Generators ~~~~~~~~~~~~~~~~ Some blocks (such as PLLs) generate clocks on-chip. To ensure that these generated clocks are identified as clock sources, the associated model output port should be marked with ``is_clock="1"``. As an example consider the following simple PLL model: .. code-block:: xml The port named ``in_clock`` is specified as a clock sink, since it is an input port with ``is_clock="1"`` set. The port named ``out_clock`` is specified as a clock generator, since it is an *output* port with ``is_clock="1"`` set. .. note:: Clock generators should not be the combinational sinks of primitive input ports. Consider the following example netlist: .. code-block:: none .subckt simple_pll \ in_clock=clk \ out_clock=clk_pll Since we have specified that ``simple_pll.out_clock`` is a clock generator (see above), the user must specify what the clock relationship is between the input and output clocks. This information must be either specified in the SDC file (if no SDC file is specified :ref:`VPR's default timing constraints ` will be used instead). .. note:: VPR has no way of determining what the relationship is between the clocks of a black-box primitive. Consider the case where the ``simple_pll`` above creates an output clock which is 2 times the frequency of the input clock. If the input clock period was 10ns then the SDC file would look like: .. code-block:: tcl create_clock clk -period 10 create_clock clk_pll -period 5 #Twice the frequency of clk It is also possible to specify in SDC that there is a phase shift between the two clocks: .. code-block:: tcl create_clock clk -waveform {0 5} -period 10 #Equivalent to 'create_clock clk -period 10' create_clock clk_pll -waveform {0.2 2.7} -period 5 #Twice the frequency of clk with a 0.2ns phase shift .. _clock_buffers_timing_modeling: Clock Buffers & Muxes ~~~~~~~~~~~~~~~~~~~~~ Some architectures contain special primitives for buffering or controlling clocks. VTR supports modelling these using the ``is_clock`` attribute on the model to differentiate between 'data' and 'clock' signals, allowing users to control how clocks are traced through these primitives. When VPR traces through the netlist it will propagate clocks from clock inputs to the downstream combinationally connected pins. Clock Buffers/Gates ^^^^^^^^^^^^^^^^^^^ Consider the following black-box clock buffer with an enable: .. code-block:: none .subckt clkbufce \ in=clk3 \ enable=clk3_enable \ out=clk3_buf We wish to have VPR understand that the ``in`` port of the ``clkbufce`` connects to the ``out`` port, and that as a result the nets ``clk3`` and ``clk3_buf`` are equivalent. This is accomplished by tagging the ``in`` port as a clock (``is_clock="1"``), and combinationally connecting it to the ``out`` port (``combinational_sink_ports="out"``): .. code-block:: xml With the corresponding pb_type: .. code-block:: xml Notably, although the ``enable`` port is combinationally connected to the ``out`` port it will not be considered as a potential clock since it is not marked with ``is_clock="1"``. Clock Muxes ^^^^^^^^^^^ Another common clock control block is a clock mux, which selects from one of several potential clocks. For instance, consider: .. code-block:: none .subckt clkmux \ clk1=clka \ clk2=clkb \ sel=select \ clk_out=clk_downstream which selects one of two input clocks (``clk1`` and ``clk2``) to be passed through to (``clk_out``), controlled on the value of ``sel``. This could be modelled as: .. code-block:: xml where both input clock ports ``clk1`` and ``clk2`` are tagged with ``is_clock="1"`` and combinationally connected to the ``clk_out`` port. As a result both nets ``clka`` and ``clkb`` in the netlist would be identified as independent clocks feeding ``clk_downstream``. .. note:: Clock propagation is driven by netlist connectivity so if one of the input clock ports (e.g. ``clk1``) was disconnected in the netlist no associated clock would be created/considered. Clock Mux Timing Constraints """""""""""""""""""""""""""" For the clock mux example above, if the user specified the following :ref:`SDC timing constraints `: .. code-block:: tcl create_clock -period 3 clka create_clock -period 2 clkb VPR would propagate both ``clka`` and ``clkb`` through the clock mux. Therefore the logic connected to ``clk_downstream`` would be analyzed for both the ``clka`` and ``clkb`` constraints. Most likely (unless ``clka`` and ``clkb`` are used elsewhere) the user should additionally specify: .. code-block:: tcl set_clock_groups -exclusive -group clka -group clkb Which avoids analyzing paths between the two clocks (i.e. ``clka`` -> ``clkb`` and ``clkb`` -> ``clka``) which are not physically realizable. The muxing logic means only one clock can drive ``clk_downstream`` at any point in time (i.e. the mux enforces that ``clka`` and ``clkb`` are mutually exclusive). This is the behaviour of :ref:`VPR's default timing constraints `.