## Neural architecture search for in-memory computing-based deep learning accelerators

#### Olga Krestinskaya

King Abdullah University of Science and Technology, 23955, Saudi Arabia

#### Based on:

Olga Krestinskaya, Mohammed E. Fouda, Hadjer Benmeziane, Kaoutar El Maghraoui, Abu Sebastian, Wei D. Lu, Mario Lanza, Hai Li, Fadi Kurdahi, Suhaib A. Fahmy, Ahmed Eltawil, and Khaled N. Salama.

Nature Reviews Electrical Engineering (2024): 1-17.



#### **Agenda**



#### Introduction, motivation and software-hardware co-design

Motivation to improve hardware efficiency and challenges in selecting optimum design



#### Hardware-aware Neural Architecture Search (HW-NAS)

HW-NAS methods, algorithms, hardware cost estimation methods, HW-NAS frameworks for IMC



#### Future directions, open challenges and final thoughts

Roadmap of HW-NAS for IMC, open issues, HW-NAS and other optimization techniques, summary

## Rapid Al development and Increasing Neural Network Complexity

Smart

cities

# "Al everywhere" Vision and Self-driving Security

vehicles

language

processing

#### **Increasing complexity of neural networks**



applications

#### **Traditional von Neumann Architecture:**



#### In-memory computing accelerator:



## In-memory Computing Roadmap and Software-Hardware Co-design





## Software-Hardware Co-design for IMC Accelerators

hardware

co-design



e.g. number of bits per cell

Neural network blocks, connections, layer sizes, etc.



Crossbar



Size, ADCs per crossbar, etc.

Number of tiles, etc.



## **How to Select the Optimum Design?**









Too many parameters to consider

Example: 8.5 \* 10<sup>85</sup> possible combinations



Manual optimization of parameters is infeasible, based on guessing (sometimes certain rules), and require a lot of human efforts



Grid search is slow and search time exponentially increases with number of hyperparameters to optimize



**Increasing complexity and network size** 

One approach to achieving such optimization is through Hardware-aware Neural Architecture Search (HW-NAS).

#### **Neural Architecture Search from Software Perspective**







No consideration of hardware efficiency or considering only high-level metrics, e.g. FLOPs

#### Hardware-aware Neural Architecture Search





#### Hardware-Aware Neural Architecture Search



## **HW-NAS** for In-Memory Computing: Search Space

#### **Expanding HW-NAS Search Space for IMC Architectures**



#### **HW-NAS Methods**



#### Search algorithms



<sup>11:</sup> refers to time complexity increasing with a search space, 12: can be used with a supernetwork search space, 13: search time does not increase exponentially with a search space size

#### Hardware cost estimation methods



<sup>1:</sup> across neural network models, 2: depends on how similar is a new model, 3: across different hardware platforms, 4: requires regeneration, 5: depends on the hardware similarity, 6: with increasing search space (number of hyperparameters in a search)

#### State-of-the-art HW-NAS Frameworks for IMC

|                                     | Quantization (search) | Pruning  | HW-<br>NAS | Architecture Search Space                                | Hardware<br>Search Space                    | Hardware cost        | Algorithm | Hardware non-idealities         |
|-------------------------------------|-----------------------|----------|------------|----------------------------------------------------------|---------------------------------------------|----------------------|-----------|---------------------------------|
| AnalogNAS (2023)                    | 3                     | 8        |            | # of blocks, channels,<br>branches, kernel size          | 83                                          | AlHWKit              | EA        | Variations,<br>Cond. drift      |
| NAS4RRAM (2021)                     | 3                     | 8        |            | Layers, channels<br>(residual blocks)                    | <b>3</b>                                    | RRAM simulator       | EA        | Variations                      |
| FLASH (2021)                        | 3                     | 8        |            | # of skip connections, cells,<br>layers, channels        | <b>3</b>                                    | NeuroSim,<br>BookSim | SHGO      | <b>3</b>                        |
| NAX (2021)                          | 3                     | 8        |            | Kernel size                                              | Crossbar size                               | GENIEx               | DS        | Wire/sourse/sinl<br>resistances |
| Gibbon (2022)                       |                       | 3        | <b>②</b>   | # of blocks, channels, groups,<br>kernel size, bit-width | Crossbar size, ADC/<br>DAC/device precision | MNSIM                | EA        | Variations                      |
| NACIM (2020)                        | <b>O</b>              | <b>3</b> |            | Architecture hyperparameters, bit-width (int./frac.)     | Tile/buffer size,<br>bandwidth              | NeuroSim             | RL        | Variations                      |
| UAE (2021)                          |                       | 3        |            | # of channels, filter size, bit-<br>width (int./frac.)   | €3                                          | Analytical           | RL        | Variations, program. errors     |
| CMQ (2022)                          |                       | 3        | 3          | Quantization threshold, bit-width                        | 3                                           | MINT                 | DS        | Variations                      |
| Mixed-precision quantization (2021) |                       | 3        | 8          | Weight/inputs<br>bit-width (int./frac.)                  | ADC precision                               | PUMAsim              | RL        | 3                               |
| EGQ (2021)                          | •                     | <b>3</b> | 8          | Weight/activation<br>bit-width                           | 8                                           | NeuroSim             | GA        | 83                              |
| RaQU (2021)                         |                       | 3        | 3          | Weight/kernel<br>bit-width                               | 8                                           | Analytical           | RL        | 3                               |
| ASBP (2021)                         | 3                     |          | 8          | Bits of weights                                          | 8                                           | Analytical           | RL        | 3                               |
| Auto-prune (2021)                   | 3                     |          | <b>3</b>   | Weights (pruned unimportaint columns)                    | 8                                           | MNSIM                | RL        | <b>33</b>                       |

#### The latest frameworks (not included in the paper but worth mentioning):

| XPert (2023)                                      | Two-step co-optimziation of software and hardware parameters, including channel depth, ADC precision, input precision, etc.    |  |  |  |  |  |  |
|---------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
| CoMN (2024)                                       | Design space exploration for a large IMC hardware search space, including circuits and architecture parameters                 |  |  |  |  |  |  |
| Joint Hardware-Workload<br>Co-optimization (2024) | Design space exploration for a large IMC hardware search space to optimize IMC hardware for different workloads simultaneously |  |  |  |  |  |  |





Limited hardware search space (only few hardware parameters)





Moitra, A., Bhattacharjee, et al. (2023, July). XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing. In 2023 60th ACM/IEEE Design Automation Conference (DAC) (pp. 1-6). IEEE. Han, L., Pan, et al. (2024). CoMN: Algorithm-Hardware Co-Design Platform for Non-Volatile Memory Based Convolutional Neural Network Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. Krestinskaya, O., Fouda, M. E., Eltawil, A., & Salama, K. N. (2024). Towards Efficient IMC Accelerator Design Through Joint Hardware-Workload Co-optimization. arXiv preprint arXiv:2410.16759.

## Taxonomy of HW-NAS frameworks for IMC applications



## **HW-NAS** for IMC Applications Roadmap



## **Open Problems and Way Forward**



Solving complexity and runtime issues

(search space size vs search time)

Including
hardware
non-idealities
mitigation
techniques
to NAS
framework

Expanding hardware and model search spaces, adding more applications

Creating HW-NAS benchmarks Adapting existing HW-NAS methods for different IMC hardware

Considering system-level challenges

(dataflow optimization, scheduling,

etc.)

Unified framework incorporating neural network model and hardware search

## **Open Problems and Way Forward**

Neural network model and IMC hardware optimization covered by HW-NAS:



Combining HW-NAS with other optimization techniques for end-to-end IMC hardware optimization tool?

#### One more step forward

#### **Optimum AI hardware design**





#### **Self-adapting Design Algorithms**



#### **Al-driven Design Tools**

- Fully-automated NAS methods capable of constructing new deep learning operations and algorithms suitable for IMC with minimal human design efforts.
- Example from software: AutoML- Zero automatically searching for the complete machine learning algorithms (model, optimization procedure, etc with minimum restriction on the form or math operations).
- Reducing human intervention in the design.
- No pre-defined blocks.
- Adaptable to different tasks and constraints.
- IMC awareness open challenge.



Utilization of Al capabilities to improve and automate both algorithm and hardware design.



