# Manoj Kumar Jain, Ravi Khatwal

Abstract—High performance is the major concern in VLSI Design. Thus, the architecture behavior of the cache governs both high performance and low power consumption. High performance simulator simulates cache memory design in various formats with help of various simulators like simplescalar, Xilinx, Top spice 8 etc. This paper explores the issue and consideration involved in designing the efficient cache memory. We have discussed the cache memory simulation behavior on various simulators. We propose high performance cache simulation behavior issues for future mobile processors design and customize mobile devices.

Index Terms—— Application Specific Instruction Processors, Memory design, Simplescalar simulator, Xilinx, Micro wind, Top spice 8 Simulator etc.

#### I. INTRODUCTION

Memory unit is a collection of storage cells together with associated circuits needed to transfer information in and out of device. The time it takes to transfer information to or from any desired random location is always the same. The communication between a memory and its environment is achieved through data input and output lines and addresses selection lines that specify the direction of transfer. Memory levels are in the order of sequence in which information can be accessed. Various semiconductor memories have been designed for correct read/write operation. The goals of every memory system are to provide adequate storage capacity with acceptable level of performance.

Cache is an essential component of high performance computers that aim to reduce latency period of various cache levels. The real benefit of cache memory is in storing the most frequently-used instructions. For low-power SRAMs, access time is comparable to a standard DRAM. In memory access method are to be used to reduce the waste element in memory or reduce cache misses in memory mapping approach. Memory needs comes from the requirement to hide the latency of accessing slower off chip memory.

Simplescalar simulator is ideal for fast cache simulation if the effect of cache performance on execution time is not needed. The micro wind programs allow the design and simulate an integrated circuit at physical description level. Top spice 8 is a native full-features mixed-mode, mixed-signal, and circuit, simulator capable of simulating circuits containing any arbitrary combination of analog

## Manuscript received March 2013.

**Dr Manoj Kumar Jain**, Associate Professor, Department of Computer science, MLSU, Udaipur, India

Ravi Khatwal, Research Scholar, Department of Computer Science, MLSU, Udaipur, India

device, digital function and high —level behavior blocks. Xilinx tool provide the access time of memory is the time required to select a word and read it. The cycle time of a memory is the time required to complete a write operation. The CPU must provide memory control signals in such a way so as to synchronize its internal clocked operations with the read and write operation of memory.

#### II. REALTED WORK

M.K. Jain, M. Balakrishnan and Anshul Kumar [1] proposed scheduler based technique for exploring register file size, number of register window and cache configuration in integrated manner. P. R. Panda, N.D. Dutt and A. Nicoulau [2] proposed scratch-pad memory architecture exploration for application specific designs processors and optimization technique for customize embedded system. Custom memory organization can potentially and significant reduce the system cost and yield performance. M. Mara gala [3] designed the low power circuit technique and methods for Static random access memories.K. Itoh [4] categorized memory as embedded memory and stand alone memories. Z. Ge, H.B. Lim, W.F. Wong [5] designed the customizing the memory hierarchy for application specific processor. Memory hierarchy is the bottleneck in modern embedded computer system as the gap between the speeds of the processors and the Memory continues growing large. X. Wang [6] design combined method for tuning two-level Memory hierarchy consider for energy consumption in embedded system. This kind of memory level hierarchy allows evaluating the instruction and data caches branches separately. S. Simon Wong and A E Gamal [7] used the 3-D integrated design for 3-D SRAM. S. Mamaksis [8] proposed a new approach to design a convenient dynamic memory management subsystems making profit of multiple memory levels.F. Hamzaoglu, Y. Wang, P. Kolar, U. Bhattacharya and K. Zhang [9] designed six transistor sram cells.

FELI [10] designed a set of operating system mechanisms that allocate application data to on-chip memories without any user Intervention. FELI, automatically maps data to on-chip memories using the address translation mechanism. It relies on a set of TLB counters, and dynamical migration of pages from off-chip memory to on-chip memory. V. K. Singhal, B. Singh [11] proposed comparative study of power reduction technique for static random access memory. S. Petit ,J. Sahuquillo, P. Lopez, J. Duato, and A Valero [12] proposed a hybrid n-bit macrocell that implements one SRAM cell and n-1 eDRAM cells. This cell is aimed at being used in an n-way set-associative first-level data cache. Architectural mechanisms (e.g., special write back policies) have been devised to completely avoid refresh logic. Performance, energy, and area have been analyzed. Liu [13]



designed 3-DRAM which can use high performance logic dies to implement DRAM peripheral circuits and consequently improve the speed and reduce silicon area. S. Singh, N Arora, M. Suthar and N. Gupta [14] proposed the simulation of different SRAM cells and their comparative analysis on different parameters. K. Dhanumjaya, M. Sudha, M. MN.Giri Prasad, And K. Padmaraju [15] proposed dynamic column based power supply 8T SRAM cell and comparing the conventional 6T SRAM cell. A. JhansiRani, VG. SanthiSwaroop [16] proposed technique for low power consumption in VLSI circuit design and low power consumption.

# III. SRAM DESIGN

High performance SRAM cell superficially resembles a flip-flop [Figure1]. A signal applied to address line by the address decoder select the cell for either read or write operation. Two data line which are used in complex way to transfer the data stored and its complement between the cell and the data drivers. 6 T CELL SRAM [Figure 2] cell does not require refreshing unit. A couple of transistors is increased gradually, which in turn increases performance of SRAM cell.



Figure 1. 1-Bit SRAM Design



Figure 2. Design architecture of 6-T SRAM cell



Figure 3. 12-Bit cell design analysis

SRAM cell contain high transistor rate than DRAM but does not contain soft error removal mechanism. Higher quantity of transistors are used multi ported read /writes operation for excellent performance. Design tradeoffs include speed, volatility, cost, and features. All of these factors should be considered efficient cache RAM for our embedded system design. Each cache level contains tag and data part so we compare tag part and data are transfer from one cache level to another cache levels. (Figure 3) shows the architecture level of various caches.

# IV. MEMORY MAPPING IMPLEMENTATION ANALYZING WITH COMPARATOR TOOLS

Memory mapping implementation analyzing with the help of various comparator. In this case if two or more digits are equals, we compare the next lower significant pair of digits. This comparison continues until a pair of unequal digits is reached. If the corresponding digit of As 1 and that of B is 0. We conclude that A>B.If the corresponding digits of a is 0 and that of b is 1. We have been given A<B. The sequential comparison can be expressed logically by the Boolean functions (Refer to "(1)" and "(2)").

$$(A>B)=A_3B_3'+X_3A_2B_2'+X_3X_2A_1B_1'+X_3X_2X_1A_0B_0'$$
 (1)

$$(A < B) = A_3'B_3 + X_3A_2'B_2 + X_3X_2A_1'B_1 + X_3X_2X_1A_0'B_0$$
 (2)



Figure 4. 4-Bit comparator Circuit





Figure 5. Synthesize design of comparator

Memory mapping is implemented by various comparators, Comparator tool is used to compare cache level to cache level mapping and mapped data from level 1 to level 2 caches and this process is continued for various cache levels. With the help of Xilinx tool we have analyzed efficient mapping between them and 4-bit comparator design show in (Figure 4 and 5).

#### V. MEMORY OPERATION ANALYSIS

The write signal specifies a transfer in operation and read signal specifies a transfer out operation. The step must be taken for the purpose of transferring a new word to be stored into memory as follows:

- Apply the binary address of the desired word to the address lines.
- 2. Apply the data bits that must be stored in memory to the data Input lines.
- 3. Activate the write input.

The memory unit will then take the bits from the input data lines and stores them in the word specified by the address lines. The steps that must be taken for the purpose of transferring a stored word out of memory are as follows.

- 1. Apply the binary address of the desired word to the address lines.
- 2. Activate the read input.

The memory unit will then take the bit from the word that has been selected by the address and apply them to the output data lines. The content of the selected word does not change after reading. Commercially memory components available in Integrated-Circuit chips sometimes provide the control input for reading and writing in a somewhat different configuration. Most integrated circuits provide two other controls inputs, they are input select and other determines the operations. The memory enable (sometimes called the chip select) is used to enable the particular memory chip in a multichip implementation of a large memory chip when the memory enable input is active, the read /write inputs determine the operation to be performed. When the memory enable is inactive, the memory chip is not selected and no operation is performed. Memory operations analyze with VHDL and perform waveform result in (Figure 6) with the help of micro wind simulator.



Figure 6. Memory operation analysis

# VI. TEST RESULT ANALYSIS WITH VARIOUS SIMULATORS

#### A. Sim-Cache

This simulator can emulate a system with multiple levels of instruction and data caches, each of which can be configured for different sizes and organizations. Simple scalar simulator [17] performs application specific result and show cache hierarchy structure in (Table I and Figure 7).



Figure 7. Cache simulation result

Table I. Cache simulation result

| Benchmarks<br>programs | Total<br>instruction | Sim<br>Memory<br>Ref. | Sim<br>elapsed<br>time | Sim-inst-rate<br>(inst/sec) |
|------------------------|----------------------|-----------------------|------------------------|-----------------------------|
| Calloc.c               | 5086757              | 1248274               | 3                      | 1695585.6                   |
| Malloc.c               | 670697               | 162635                | 2                      | 335348.5                    |
| Matrix.c               | 7035                 | 3936                  | 1                      | 7035.00                     |
| Compress.c             | 8339                 | 4292                  | 1                      | 8339.0                      |
| 1112.c                 | 8076                 | 4263                  | 1                      | 8076.0                      |
| 1114.c                 | 14095                | 4995                  | 1                      | 14095.0                     |
| DIJKSTRA.c             | 24676                | 10135                 | 18                     | 1370                        |



#### B. Micro Wind Simulator

The micro wind programs allow the design and simulate an integrated circuit at physical description level. The package contains a lib. of common logic and analog ICs to view and simulation. Micro wind includes all the commands for a mask editor as well as original tools never gathered before in a single module. The electric extraction of our circuit is automatically performed and the analog simulator produce voltage and current curve immediately. With the help of micro wind simulator simulation we analyze CMOS 6-T SRAM simulation behavior in (Figure 8and9).



Figure 8. Circuit CMOS 6-T SRAM analysis.



Figure 9. 6-T SRAM Schematic behavior SRAM analysis

The DSCH is used to validate the architecture of the logic circuit before the microelectronic design is started DSCH provide a user-friendly environment for hierarchical logic design, and fast simulation delay analysis, which allows the design and validate of complex logic structures. Design architecture of 6-T SRAM cell and show the schematic behavior with the help of DSCH micro wind design phase (Figure 10 and 11).



Figure 10. Design architecture of 6-T SRAM cell



Figure 11. Schematic behavior 6-T SRAM analysis

# C. Top Spice 8 Simulator

Top spice 8 is a native full-features mixed-mode, mixed-signal, and circuit, simulator capable of simulating circuits containing any arbitrary combination of analog device, digital function and high —level behavior blocks. With top spice we can verify and optimize our design from the system to the transistor level. Top spice offers a fully integrated environment to capture, simulate and analyze our circuit's design. Its flexible architecture allows the designer to integrate all design tools, including third party tools and model libraries, into complete CAD systems. With the help of this tool transistor level simulation analysis [Figure 12 and Figure 13] and performed high level simulation.



Figure 12. 4-t SRAM Circuit structure analysis





Figure 13. Schematic behavior SRAM analysis

#### D. XILINX Tool

#### 1) Timing waveform analysis

Xinix tool [18] provide the access time of memory is the time required to select a word and read it. The cycle time of a memory is the time required to complete a write operation. The CPU must provide memory control signals in such a way so as to synchronize its internal clocked operations with the read and write operation of memory.

# 2) 16-word by 8-bit static random access



Figure 14. Design architecture of 16x8S SRAM cell

This element is a 16-word by 8-bit static random access memory with synchronous write capability (Figure14and15). When the write enable (WE) is Low, transitions on the write clock (WCLK) are ignored and data stored in the RAM is not affected. When WE is High, any positive transition on WCLK loads the data on data inputs (D7:D0) into the word selected by the 4-bit address (A3:A0).

For predictable performance, address and data inputs must be stable before a Low-to-High WCLK transition. This RAM block assumes an active-High WCLK. However, WCLK can be active-High or active-Low. Any inverter placed on the WCLK input net is absorbed into the block. The signal output on the data output pins (O7:O0) is the data that is stored in the RAM at the location defined by the values on the address pins show in logic table 1 (Ref. Table II).



Figure 15. Schematic behavior 16x8S SRAM analysis

Table II. Logic Table 1

| ]                      | Outputs |       |       |
|------------------------|---------|-------|-------|
| WE (mode)              | WCLK    | D7:D  | O7:O0 |
|                        |         | 0     |       |
| 0 (read)               | X       | X     | Data  |
| 1 (read)               | 0       | X     | Data  |
| 1 (read)               | 1       | X     | Data  |
| 1 (write)              | ?       | D7:D0 | D7:D0 |
| 1 (read)               | ?       | X     | Data  |
| 1 (read) Data = word a | •       | 1     |       |

#### 3) 64-word by 1-bit static random access memory

This design element is a 64-word by 1-bit static random access memory (RAM) with synchronous write capability (Figure16and17). When the write enable is set Low, transitions on the write clock (WCLK) are ignored and data stored in the RAM is not affected. When WE is set high, any positive transition on WCLK loads the data on the data input (D) into the word selected by the 6-bit address (A5:A0). This RAM block assumes an active-High WCLK. However, WCLK can be active-High or active-Low. Any inverter placed on the WCLK input net is absorbed into the block. The signal output on the data output pin (O) is the data that is stored in the RAM at the location defined by the values on the address pins show in logic table 2 (Ref. Table III). We can initialize this element during configuration using the INIT attribute.

Table III. Logic Table 2

| Inp                                 | Outputs |   |      |  |  |
|-------------------------------------|---------|---|------|--|--|
| WE (mode)                           | WCLK    | D | 0    |  |  |
| 0 (read)                            | X       | X | Data |  |  |
| 1 (read)                            | 0       | X | Data |  |  |
| 1 (read)                            | 1       | X | Data |  |  |
| 1 (write)                           | ?       | D | D    |  |  |
| 1 (read)                            | ?       | X | Data |  |  |
| Data = word addressed by bits A5:A0 |         |   |      |  |  |





Figure 16. Design architecture of 64x1s SRAM cell



Figure 17. Schematic behavior 64x1s SRAM analysis

#### VII. CONCLUSION

In this paper we have presented the simulation behavior and design of various cache memory configurations. Different simulator like Simplescalar, microwind, top spice 8, Xilinx toolset etc, are used for high performance and low power consumption. Simplescalar simulator simulates application specific result and performs the cache hierarchy structure. Micro wind simulator simulates cache structure and performs the efficient schematic behavior of cache RAM. Xilinx tool provide efficient timing waveform analysis and efficient schematic behavior of cache. Xilinx tool provide the better mapping and analysis of various cache memories. After simulation behavior analysis, we are working on bypass caching mechanism for embedded systems.

### VIII. References

- M. K. Jain, M. Balakrishnan and A Kumar, "Integrated on-chip storage evaluation in ASIP synthesis", VLSI Design, 2005, pp.274 -279 [18th International Conference, 2005.
- [2] P. R. Panda, N.D. Dutt and A. Nicoulau, "Data Memory Organization and Optimization In Application Specific Systems", IEEE design and Tests of Computers, May-June 2001, pp. 56-68.
- [3] M. Martin, "Low power SRAM circuit design", *IEEE* design and test of computer1999, pp.115-122.
- [4] K. Itoh, "Embedded Memories: Progress and a look in to the future", IEEE Circuits and Systems Society, *IEEE* Computer Society, Feb, 2011, Japan pp.10-13.
- [5] Z. Ge, "Memory Hierarchy Hardware –Software Co-design in Embedded systems", IT lab, Singapore journal. 2004.
- [6] X. Wang, "A Combined Optimization Method for Tuning Two-level hierarchy considering Energy consumption", EURASIP Journal on Embedded system, 21 Sept, 2010.
- [7] S. S.Wong and A. E. Gamal, "The prospect of 3-D IC", IEEE Design and test computer, June, 2009, pp.445-447.
- [8] S. Mama kakis, "Custom Design of Multi- Level Dynamic Memory management Subsystem for Embedded systems", IEEE Society, April-2004 pp. 170-175.
- [9] F. Hamzaoglu, Y.Wang, P.Kolar, U.Bhattacharya and K. Zhang, "Bit Cell Optimizations and Circuit Techniques for Nanoscale SRAM Design", IEEE Design and Test of Computers, vol. 28, 2011, pp. 22-31.
- [10] E. Jeannot, R. Namyst, and J. Roman (Eds.), "FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores", pp. 280–292 [Euro-Par 2011].
- [11] V. K. Singhal, B. Singh," Comparative Study Of Power Reduction Techniques For Static Random Access Memory", International Journal of VLSI and Signal Processing Applications, Vol. 1, Issue 2, May 2011,pp.80-88.
- [12] S. Petit ,J. Sahuquillo, P. Lopez, J. Duato, and A Valero, "Design, Performance and Energy Consumption of e-DRAM SRAM Macrocells for Data Caches", *IEEE Trans* computer society, 29 July 2011.
- [13] J. Liu and H. Sun, "3- DRAM Design and Application to 3-D Multicore System", IEEE design and test, 2009. pp.36-48.
- [14] S. Singh, N Arora, M. Suthar and N. Gupta, "Low power efficient SRAM cell structure at different technology", International journal of VLSI and signal processing Application, Vol.2, Feb 2012, pp.41-46.
- [15] K. Dhanumjaya, M. Sudha, M. MN.Giri Prasad, And K. Padmaraju, "Cell stability analysis of conventional 6 T Dynamic 8T SRAM cell in 45 nm technology", International journal of VLSI design and communication system (VLSICS), Vol.3, No.2, April-2012.
- [16] A.JhansiRani, VG. Santhi Swaroop, "Designing and analysis of 8bit SRAM cell with Low subthres hold Leakage Power", International Journal of Modern Engineering Research(IJMER), Vol. 2, Issue.3, May-June 2012, pp.733-741.
- [17] T.M. Austin, (1994-2003). Simplescalar tool site [Online]. Available: <a href="http://www.simplescalar.com/">http://www.simplescalar.com/</a>
- [18] Xilinx Available: www.xilinx.com/homepage/



**Dr. M.K..Jain** is Associate Professor in Computer Science at M.L. Sukhadia University Udaipur. His current research interests include Application specific instruction set processor design, wireless sensor networks, semantic web and embedded systems.



Ravi Khatwal is a research scholar in Department of Computer Science, MLSU, Udaipur, Rajasthan.His research area is VLSI design.

