# Low Jitter Gb/s CMOS Clock and Data Recovery Circuits for Large Synchronous Networks

A dissertation submitted to

the Faculty of Electrical and Computer Engineering

of University of Kaiserslautern

in partial fulfillment of the requirements

for the degree of

**Doctor of Philosophy** 

in Electrical Engineering

Sitt Tontisirin

Date of Submission 3<sup>rd</sup> May 2007

Date of Defense: 25<sup>th</sup> April 2008

Dean of Faculty: Prof. Dr.-Ing. S. Liu

Promotion committee

Committee chair: Prof. Dr.-Ing. A. König

1. Committee examiner: Prof. Dr.-Ing. D. Schmitt-Landsiedel

2. Committee examiner: Prof. Dr.-Ing. N. Wehn

3. Committee examiner: Prof. Dr.-Ing. U. Brüning

## Thesis in Electrical Engineering

## Sitt Tontisirin

## Low Jitter Gb/s CMOS Clock and Data Recovery Circuits for Large Synchronous Networks

## Acknowledgement

This thesis could not be accomplished without the help and support of many people. Firstly, I would like to deeply thank my advisor Professor Reinhard Tielert for giving me the opportunity to conduct the research at the Institute of Microelectronics. He was always available and gave me the valuable discussions. Even through, he was not able to be the committee examiner in my examination, I believe he wished me luck and took care of me from above. I do deeply appreciate Professor Doris Schmitt-Landsiedel from TU Munich (Technische Universität München) for kindly being my committee examiner that made me possible to accomplish the examination. I would like to thank Professor Norbert Wehn for taking over the coordination to proceed my examination process and for kindly being my committee examiner. I would like to express thanks to Professor Ulrich Brüning for his valuable time being my committee examiner and his kindness during the collaboration. It is my grateful to the committee chair Professor Andreas König for his time and interest. I would also like to express thanks to Dr. Jürgen Rötter for his kind administration.

I profited from the research projects with many collaborators. I would like to express my appreciation to Heinz Endriss, Henrik Icking, and Andreas Hebenstreit of the Infineon Technology for their support. It is my grateful to Professor Volker Lindenstruth, the Kirchoff Institute for Physic, the University of Heidelberg, for the challenge research project. I would also like to thank Walter Müller of the GSI-Darmstadt for the research project motivating this thesis.

I am greatly indebted to Ursula Pöpperl, Axel Schmitz, Marco Lambert, Marc Wegener, and Jutta Praetorius for sharing the good time in the Institute. I would like to thank David Muthers not only for the pleasurable collaboration but also for sharing an experience of a long business trip. I would also like to thank to my colleagues Emna Ayari, Muhammad Anis, and Thomas Ilnseher for sharing the knowledge and though and for the enjoyable cooperation. I would like to express my thanks to Supriyanto and Faraz for their support in the layout design for the project works. It is my grateful to Markus Müller, Andreas Christmann, and Roland Volk for the kindly support of the CAD and experimental

facilities. In addition, I would like to thank Barbara Mundell for her kind administrative assistance.

I would also like to thank my teachers, advisors, friends and colleagues during my study and careers in Thailand and Germany, whom I could not name all here. I appreciate the sharing moments and their contributions to my individuality and my though.

Finally I would like to thank my patents, sisters and brother for their care and encouragement. I would like to express thanks to my wife Supak. Without her love and support, I would not have passed the difficulties until today.

## **Table of Contents**

| ABSTRACT                                                        | vii |
|-----------------------------------------------------------------|-----|
| KURZFASSUNG                                                     | ix  |
| 1 INTRODUCTION                                                  | 1   |
| 1.1 MOTIVATION                                                  | 1   |
| 1.2 SCOPE AND ORGANIZATION                                      | 6   |
| 2 DESIGN CONSIDERATIONS OF CDR FOR TIME AND CLOCK               |     |
| DISTRIBUTION                                                    | 7   |
| 2.1 Introduction                                                | 7   |
| 2.2 CDR SPECIFICATIONS                                          | 8   |
| 2.2.1 Jitter transfer function                                  | 8   |
| 2.2.2 Jitter peaking                                            | 9   |
| 2.2.3 Jitter Tolerance                                          | 9   |
| 2.2.4 Jitter generation                                         | 10  |
| 2.3 JITTER IN SERIAL COMMUNICATION SYSTEM                       | 10  |
| 2.3.1 Transmitter jitter                                        | 10  |
| 2.3.2 Channel jitter                                            | 14  |
| 2.3.3 Receiver jitter                                           | 18  |
| 2.4 PLL-based CDR and clock synthesizer                         | 18  |
| 2.4.1 PLL linear model                                          | 19  |
| 2.4.2 Loop characteristic design and jitter in PLL              | 24  |
| 2.5 THE ARCHITECTURES OF CDRs FOR LARGE SYNCHRONOUS NETWORKS    | 26  |
| 2.5.1 The CDR with a clock extraction and a phase tracking loop | 27  |
| 2.5.2 The CDR with a clock-jitter-filter                        | 28  |
| 2.6 SIMMADY                                                     | 20  |

| 3 STRUCTURE OF THE FRONT-ENDED LOOP: PLL-BASED CDR                          | 31 |
|-----------------------------------------------------------------------------|----|
| 3.1 State of the art                                                        | 31 |
| 3.1.1 PLL-based CDR with an external reference clock                        | 31 |
| 3.1.1.1 CDR with frequency initialization                                   | 32 |
| 3.1.1.2 CDR with phase synthesis and phase interpolation                    | 33 |
| 3.1.2 PLL-based CDR without external reference clock                        | 34 |
| 3.1.3 Phase detector for serial data                                        | 35 |
| 3.1.3.1 Linear phase detector                                               | 35 |
| 3.1.3.2 Binary phase detector                                               | 37 |
| 3.1.3.3 Comparison of linear PD and binary PD                               | 41 |
| 3.1.4 Frequency detector for serial data                                    | 42 |
| 3.2 CLOCK RATE REDUCTION ARCHITECTURE OF CDR                                | 46 |
| 3.2.1 Comparison of current-mode logic (CML) and CMOS logic                 | 48 |
| 3.2.2 Comparison of a full rate and an 1/4-rate clock PFDs for CDRs         | 50 |
| 3.3 POWER EFFICIENT CDR WITH 1/4-RATE REDUCED-SAMPLING-PHASE TIME-          |    |
| INTERLEAVING PFD (PROPOSED IN THIS WORK)                                    | 52 |
| 3.3.1 Architecture                                                          | 52 |
| 3.3.2 Building blocks                                                       | 54 |
| 3.3.2.1 Sense Amplifier                                                     | 54 |
| 3.3.2.2 1/4-rate reduced-sampling-phase time-interleaving PFD               | 55 |
| 3.3.2.3 Voltage controlled oscillator (VCO)                                 | 60 |
| 3.3.2.4 Charge pump                                                         | 66 |
| 3.2.3 Loop bandwidth reduction technique in CDR using the divided frequence | су |
| impulse modulation technique (proposed in this thesis)                      | 68 |
| 3.2.4 Loop bandwidth design for jitter tolerance                            | 71 |
| 3.2.5 Simulation results                                                    | 73 |
| 3.2.6 Experiment results                                                    | 75 |
| 3.4 Summary                                                                 | 78 |

| 4 DESIGN OF PLL-BASED CLOCK-JITTER-FILTER                        | 79  |
|------------------------------------------------------------------|-----|
| 4.1 State of the art                                             | 79  |
| 4.1.1 Quartz crystal based Phase-Locked Loop (QPLL)              | 79  |
| 4.2 PLL-BASED CLOCK-JITTER-FILTER WITH LC-VCO                    | 80  |
| 4.2.1 Architecture                                               | 81  |
| 4.2.2 Building blocks                                            | 82  |
| 4.2.2.1 Phase frequency detector                                 | 82  |
| 4.2.2.2 LC-VCO                                                   | 84  |
| 4.2.2.3 Frequency divider                                        | 89  |
| 4.2.2.4 Charge pump                                              | 93  |
| 4.2.3 Loop characteristic design                                 | 93  |
| 4.2.4 Simulation results                                         | 94  |
| 4.2.5 Experiment results                                         | 95  |
| 4.3 SUMMARY                                                      | 96  |
| 5 EXPERIMENTAL RESULTS OF THE 1/4-RATE CDR AND THI JITTER-FILTER | 99  |
| 5.2 Measurement results                                          | 100 |
| 5.3 SUMMARY                                                      | 103 |
| 6 CONCLUSIONS                                                    | 105 |
| 6.1 Future work                                                  | 107 |
| 7 APPENDICES                                                     | 109 |
| APPENDIX-A: 8B/10B CODING SYSTEM                                 | 100 |
| APPENDIX-B: DETERMINISTIC JITTER FROM BANDWIDTH LIMITATION       | 102 |
| APPENDIX-C: THE EFFECTIVE GAIN OF A DIGITAL PHASE DETECTOR       |     |
|                                                                  | 111 |

|     | APPENDIX-E: THE SAMPLING PHASES FOR FREQUENCY DETECTION IN CDR         | 125 |
|-----|------------------------------------------------------------------------|-----|
|     | APPENDIX-F: THE OUTPUT FREQUENCY OF THE CLOCK-JITTER-FILTER            | 127 |
|     | Appendix-G : The effect of loop resistor to peak-to-peak jitter of PLL | 129 |
|     | APPENDIX-H: THE PUBLICATION LIST IN FIGURE 3.7                         | 131 |
| 8   | LIST OF FIGURES                                                        | 135 |
| 9 ] | LIST OF ABBREVIATIONS                                                  | 139 |
| 10  | REFERENCES                                                             | 141 |
| C   | IIDDICIII IIM VITAE                                                    | 147 |

## **Abstract**

The high demanded data throughput of data communication between units in the system can be covered by short-haul optical communication and high speed serial data communication. In these data communication schemes, the receiver has to extract the corresponding clock from serial data stream by a clock and data recovery circuit (CDR). Data transceiver nodes have their own local reference clocks for their data transmission and data processing units. The reference clocks are normally slightly different even if they are specified to have the same frequency. Therefore, the data communication transceivers always work in a plesiochronous condition, an operation with slightly different reference frequencies. The difference of the data rates is covered by an elastic buffer. In a data readout system in the experiment in particle physics, such as a particle detector, the data of analog-to-digital converters (ADCs) in all detector nodes are transmitted over the networks. The plesiochronous condition in these networks are non-preferable because it causes the difficulty in the time stamping, which is used to indicate the relative time between events. The separated clock distribution network is normally required to overcome this problem. If the existing data communication networks can support the clock distribution function, the system complexity can be largely reduced. The CDRs on all detector nodes have to operate without a local reference clock and provide the recovered clocks, which have sufficiently good quality, for using as the reference timing for their local data processing units.

In this thesis, a low jitter clock and data recovery circuit for large synchronous networks is presented. It possesses a 2-loop topology. They are clock and data recovery loop and clock jitter filter loop. In CDR loop, the CDR with rotational frequency detector is applied to increase its frequency capture range, therefore the operation without local reference clock is possible. Its loop bandwidth can be freely adjusted to meet the specified jitter tolerance. The 1/4-rate time-interleaving architecture is used to reduce the operation frequency and optimize the power consumption. The clock-jitter-filter loop is applied to improve the jitter of the recovered clock. It uses a low jitter LC voltage controlled oscillator (VCO). The

loop bandwidth of the clock-jitter-filter is minimized to suppress the jitter of the recovered clock. The 1/4-rate CDR with frequency detector and clock-jitter-filter with LC-VCO were implemented in 0.18µm CMOS Technology. Both circuits occupy an area of 1.61mm² and consume 170mW from 1.8V supply. The CDR can cover data rate from 1 to 2Gb/s. Its loop bandwidth is configurable from 700kHz to 4MHz. Its jitter tolerance can comply to SONET standard. The clock-jitter-filter has the configurable input/output frequencies from 9.191 to 78.125MHz. Its loop bandwidth is adjustable from 100kHz to 3MHz. The high frequency clock is also available for a serial data transmitter. The CDR with clock-jitter-filter can generate clock with jitter of 4.2ps rms from the incoming serial data with intersymbol-interference jitter of 150ps peak-to-peak.

## Kurzfassung:

## Gigabit pro Sekunde CMOS Takt- und Datenrückgewinnungsschaltungen mit geringem Jitter für grosse Synchronnetze

## **Einleitung**

Die hohen Anforderungen an den Datendurchsatz in Kommunikationssystemen können mit serieller Datenübertragung erfüllt werden. Eine hochratige Übertragung im Bereich von Gb/s ist möglich mit differentieller Signalübertragung. Der Spannungshub der Signale kann ebenfalls reduziert werden, da sie weniger Interferenzen erzeugt und toleranter gegenüber Gleichspannungseinkopplungen ist. Der Datendurchsatz pro Übertragungsstrecke kann so erhöht werden. Ebenso können die Einschränkungen durch die grosse Zahl von Verbindungen bei paralleler Übertragung überwunden werden. Trotz einer Punkt-zu-Punkt-Verbindung kann der Gesamtdurchsatz erhöht werden. Die Technik ist auch möglich bei Datenübertragung zwischen Modulen auf Platinenebene oder anderen Kurzstrecken mit Koaxialkabeln. Optische Kurzstreckenübertragung können weitere Distanzen überbrücken, da optische Fasern weniger Dämpfung ausweisen und weniger empfindlich gegenüber Einkopplungen sind. So können Übertragungsstrecken zwischen Gebäuden überbrückt werden. Die Daten werden ohne Takt übertragen, da ein zeitlicher Versatz zwischen Takt und Daten nicht verhindert werden kann. Im Empfänger wird der Takt durch eine Taktrückgewinnungsschaltung (CDR/Clock and Data Revocery circuit) aus dem Datenstrom zurückgewonnen. Der rückgewonnene Takt muss in Phase mit den Daten sein, um mit einer optimalen Abtastung der Daten eine niedrige Bitfehlerrate zu erreichen.

In einem seriellen Datentransceiver wird ein Hochfrequenztakt benötigt, um die parallel vorliegenden Daten zu serialisieren. Der Hochfrequenztakt wird von einem lokalen

Referenztakt abgeleitet. Die Taktrückgewinnungsschaltung im Empfänger erzeugt einen rückgewonnenen Takt synchron zum empfangenen Datenstrom. Dieser Takt wird zum Demultiplex der Daten und zur Wortsynchronisierung gebraucht. Mismatches der Datenraten sind unvermeidlich, da die Referenztakte der Transceiver nie ganz exakt übereinstimmen. Im Falle von PCI-Express Standard sind beispielsweise +/- 300ppm Unterschied zulässig. Im Empfänger führt der Unterschied zu einem Überlauf oder Leerlauf, je nachdem, ob der Senderreferenztakt zu hoch oder zu niedrig ist. Ein elastischer Puffer wird hier eingesetzt zur Vermeidung dieses Problems. Er erkennt einen Über- oder Leerlauf und entfernt oder addiert spezielle Symbole, um den Sendetakt an den Empfängertakt anzupassen. Alle Knoten in einem Kommunikationsnetzwerk arbeiten folglich mit leicht verschiedener Taktfrequenz. Dies ist solange kein Problem, wie keine präzisen Anforderungen an die Synchronisierung zwischen Knoten gestellt wird.

In Experimenten der Physik, wie in einem System zur Datenauslese in einem Teilchendetektor, werden die Ladungen, die als Ergebnis von Kollisionen aufgezeichnet werden, von Vorverstärkern und AD-Wandlern verarbeitet. Die digitalisierten Daten werden durch ein Kommunikationsnetz übertragen. Die werden verarbeitet und in Pakete gruppiert, bevor sie zur Weiterverarbeitung übertragen werden. Noch im Detektor erhalten sie einen Zeitstempel, um relative Zeitunterschiede der Ereignisse auswerten zu können. Die Verteilung eines Zeitsignals ist deshalb wichtig, um den Zeitstempel erzeugen zu können. Ein konventionelles serielles und optisches Übertragungsnetze mit einem eigenen Referenztakt in jedem Knoten kann hier nicht verwendet werden, sondern es wird ein separates Netz zur Verteilung des Zeitsignals benötigt. Wenn jedoch das existierende Datenübertragungsnetzwerk zur Synchronisierung des Zeitsignals verwendet werden kann, kann die Komplexität des Gesamtsystems stark reduziert werden. Hier ist deshalb das Datenübertragungsnetzwerk leicht verschieden von einem Konventionellen.

Wenn der im Empfänger zurückgewonnene Takt als Referenztakt für die Datenverarbeitung und -übertragung benutzt wird, kommt es nicht zu einem Unterschied der Taktfrequenzen zwischen Sender und Empfänger. Alle Knoten im Netzwerk haben dann den gleichen Referenztakt. Die Latenzen sind durch die Verbindungslängen definiert

und können durch eine Wortsynchronisierung ermittelt werden. Die Hauptaufgabe eines CDR in einem konventionellen Datenübertragungssystem ist die Rückgewinnung der Daten. Wichtigste Kenngrösse ist dann die Bitfehlerrate. Der Jitter des rückgewonnenen Taktes in veröffentlichen Schaltungen wird durch die verschiedenen Referenztakte stets vergrössert. Die meisten CDR brauchen zudem den lokalen Referenztakt, der im Sender ohnehin vorhanden ist. In einem System zur Übertragung des Daten- und Zeitsignals in einem ist es vorteilhaft, wenn der CDR ohne lokalen Referenztakt auskommt. Das System wird so effizienter. Insgesamt sollte der CDR demnach zur Datenrückgewinnung dienen und ebenfalls einen Takt mit niedrigem Jitter bereitstellen, der als Zeitsignal verwendet werden kann.

#### Stand der Technik

CDR-Schaltungen basierend auf PLLs finden häufig Verwendung, da sie leicht zu integrieren sind. In einer solchen PLL-basierten CDR-Schaltung werden die Phasenunterschiede des VCO-Taktes und der ankommenden Daten bei jeder Signalübergang bewertet. Nur der Phasenfehler kann bewertet werden, nicht jedoch ein vorhandener Frequenzunterschied. Eine PLL-basierte CDR mit nur einem Phasendetektor hat einen beschränkten Fangbereich, der von ihrer Schleifenbandbreite definiert wird.

Manche CDR verfügen über eine zusätzliche Phasenfangschleife, um die Frequenz der CDR mit Hilfe einer Frequenzmultiplikation des lokalen Referenztaktes in ihren Fangbereich zu bringen. Eine andere Möglichkeit besteht in einer Zweischleifentopologie, bestehend aus einem Taktsynthesizer und einer Phaseninterpolation. Die erste Schleife ausgehend vom Referenztakt einen Multiphasentakt, der von der zweiten Schleife durch Wahl der optimalen Abtastphase zur Synchronisierung der ankommenden Daten verwendet wird. Leichte Unterschiede in den Frequenzen werden durch ein fortwährendes Umlaufen der Abtastphase ausgeglichen. Beide Ansätze brauchen einen lokalen Referenztakt und führen zu hohem Jitter, wenn die Frequenzen nicht exakt übereinstimmen.

## Leistungseffiziente CDR-Schaltung: Viertelraten-CDR mit reduzierter redundanter Abtastung des Frequenzdetektors

Eine CDR mit eigenem Frequenzdetektor bietet einen erweiterten Fangbereich bei gleichzeitiger Einsparung eines lokalen Referenztaktes. Zudem kann die Schleifenbandbreite individuell angepasst werden. Ein bekanntes Verfahren zur Erkennung von Frequenzunterschieden bei zufälligen Daten ist die Quadrikorrelationstechnik (order Rotationalsfrequenzsdetektion), die zwei Gruppen von Phasenfehlern zwischen den seriellen Daten und den Quadraturtaktphasen verwendet. Der Frequenzunterschied kann durch Auswertung der Korrelation extrahiert werden. Verglichen mit einem Phasendetektor braucht ein Frequenzdetektor mehr Abtasttaktphasen. In CDR-Schaltungen mit einem Vollphasentakt muss der Phase-Frequenzdetektor(PFD) bei der vollen Datenrate arbeiten. Bei einer Datenrate von 2Gb/s bedeutet dies beispielsweise eine Operationsfrequenz von 2GHz. nur Current-mode-logic (CML) Verwendung finden kann. Leistungsverbrauch ist dann sehr hoch auf Grund der konstanten Stromaufnahme der CML-Schaltungen. Auch der Taktpuffer verbraucht viel Leistung.

Eine Architektur mit reduzierter Operationsfrequenz arbeitet mit Mehrphasentakten bei niedrigerer Frequenz, um die nötigen Samplephasen bereit zu stellen. In einer Architektur mit Halbratentakt wird beispielsweise ein 8-Phasentakt gebraucht. Diese 8 werden aufgeteilt in 2 Gruppen zu je 4 Phasen zur Phasen- und Frequenzdetektion. In einem Viertelraten-CDR werden bereits 16 Taktphasen gebraucht. Hiervon dienen 8 für den Phasendetektor und 8 für den Frequenzdetektion. Diese 8 Phasen für den Frequenzdetektion können zu 4 Phasen optimiert werden, da noch genug für die Erkennung einer Frequenzdifferenz gebraucht werden. Ein weiterer Vorteil eines Viertelraten-CDR ist die Möglichkeit, mit CMOS-Logik statt CML Leistung zu sparen. Zudem bildet der Viertelraten-CDR einen intrinsischen 1 zu 4-Demultiplexer, so dass der Deserializer für die Daten weniger Leistung braucht.

Der Multiphasentakt kann mit Hilfe eines Ringoszillators erzeugt werden. Differentielle Schaltungen werden hier verwendet, weil sie eine bessere Betriebsspannungsunterdrückung aufweisen. In jeder Der unvermeidliche Phasenoffset eines Multiphasentaktes, der die Qualität der CDR einschränken kann, kann durch geeignete Layout-Techniken minimiert werden. Eine Skew-Kalibrierung wird eingesetzt, um den Phasenoffset weiter zu verringern. Sie kostet etwas Leistung und Chipfläche, verbessert jedoch die Güte der CDR.

In dieser Arbeit kommt ein binärer Phasendetektor zum Einsatz. Seine Vorteile gegenüber einem linearen Phasendetektor ist die statistische Annäherung, so dass er unempfindlich gegenüber Pulsverzerrungen ist. Zudem werden die seriellen Daten automatisch regeneriert. Die Abtasteinheit zur Phasenfehlerdetektion hat die gleiche Funktion wie die Datenregeneration, so dass es hier kein Mismatch ihrer Abtasteigenschaften geben kann. In der Arbeit wird eine Tri-state-Charge pump mir einer Technik zur Reduktion der Ladungsteilung verwendet. Durch Verfolgung des Arbeitspunktes der Charge pump wird eine Offsetkompensation erreicht. Bei der Einstellung der Schleifenbandbreite wird eine Technik zur Verringerung der Bandbreite verwendet. Ein Teilerfaktor und die Fehlerakkumulation erweitern die Einstellmöglichkeiten um einen weiteren Parameter.

## PLL-basierte Taktjitterfilter mit einem LC-VCO

Der ankommende serielle Datenstrom enthält Jitter vom Übertragungskanal. Die CDR muss in der Lage sein, den Phasenfehler zu verfolgen. Der rückgewonnene Takt hat nicht zwangsläufig einen kleineren Jitter, wenn er auch zur Datenerkennung ausreicht. Als Referenztakt kann er so nicht verwendet werden. Aus diesem Grund wird ein Taktjitterfilter benötigt. Eine Quarz-PLL (QPLL) verwendet einen Quarzoszillator als VCO, bei dem die Frequenz über die kapazitive Last gesteuert wird. So kann der Taktjitter gefiltert werden bis hin zu Frequenzen im Bereich von 100MHz. Für serielle Datenübertragung muss der Hochfrequenztakt von einer anderen PLL erzeugt werden. Der PLL-basierte Taktgenerator hat ebenfalls eine Filterfunktion bezüglich des Taktjitters. Sie können demnach kombiniert werden. Für einen Hochfrequenztakt bieten LC-VCOs sich wegen ihres niedrigen Jitters an, zumal mit Spulen auf dem Chip eine

integrierte Lösung möglich ist. Ein PLL-basierter Taktjitterfilter kann den Hochfrequenztakt für den Transceiver erzeugen und gleichzeitig einen Takt im mittleren Frequenzbereich für Datenverarbeitung oder AD-Wandler im Detektorknoten.

Beim Entwurf eines LC-VCO wird die Steilheit der VCO-Kurve reduziert, um die Jitterempfindlichkeit zu verringern. Der Einstellbereich der Frequenz kann dennoch erweitert werden durch eine Umschaltung mit Hilfe von Konfigurationsbits, die kapazitive Belastungen zuschalten. Die Frequenzeinstellung verläuft mit Hilfe eines On-Chip-Varaktors. Die Steuerspannung des Schleifenfilters wird gepuffert, um eine Rückwirkung vom VCO zu minimieren. Eine Tri-state Charge pump mit den identischen Techniken wie im CDR zur Reduktion von Ladungsteilung und Ladungsoffset findet Verwendung. Beim Phasenfrequenzdetektor handelt es sich um ein Standarddesign mit 2 Latches und einem Rückkoppel-Reset. Die Frequenzteiler werden im Rückkoppelpfad und im Taktausgang verwendet. Um Leistung zu sparen, laufen die Taktteiler bei einem Viertel der VCO-Frequenz, wodurch sie als CMOS-Logik implementiert werden können. Der Ausgangstakt wird abschliessend synchronisiert durch den Hochfrequenztakt, um den Jitter des Taktteilers und der Logik zu eliminieren.

## Messergebnisse

Die Viertelraten-CDR mit reduzierter redundanter Abtastung des Frequenzdetektors und PLL-basierte Taktjitterfilter mit einem LC-VCO wurden in einer 0,18um-CMOS-Technologie auf separaten Testchips implementiert. Der CDR mit 1-zu-4-DEMUX ist 0,79mm² gross und verbraucht 80mW bei 1,8V Betriebsspannung. Er deckt einen Bereich von 1 bis 2Gb/s ab. Bei seriellen Daten mit kleinem Jitter hat er einen RMS-Jitter von 4,6ps. Die Schleifenparameter sind konfigurierbar, die Schleifenbandbreite kann von 700kHz bis 4MHz eingestellt werden. Die SONET-Jittertoleranzanforderungen werden eingehalten. Der Frequenzfangbereich ist grösser als 100MHz. Der PLL-basierte Taktjitterfilter mit LC-VCO nimmt eine Fläche von 0,82mm² ein bei einem Leistungsbedarf von 90mW bei VDD=1,8V. Seine Eingang- und Ausgangstaktfrequenz reichen von 9.191MHz bis 78.125MHz. Die Schleifenparameter sind ebenfalls

konfigurierbar. Die Schleifenbandbreite kann von 100kHz bis 3MHz eingestellt werden. Der Ausgangsjitter hat einen Minimalwert von 2,8ps.

Beide, die Viertelraten-CDR und das Taktjitterfilter wurden ebenfalls zusammen getestet. Bei einem seriellen Datenstrom mit emuliertem pp-Jitter von 150ps liegt der Ausgangsjitter des Taktjitterfilters bei 4,2ps rms.

## Zusammenfassung

In einem grossen synchronen Netzwerk zur Messung von Zeitdaten ist ein separates Taktverteilungsnetz stets nötig. Wenn der Zeittakt durch das Datennetzwerk verteilt werden kann die Komplexität des Netzwerks deutlich reduziert werden. Die Schlüsselkomponente für die Technik ist eine Schaltung Datenrückgewinnung. Die Qualität des rückgewonnenen Taktes muss hoch genug sein, um als Referenztakt dienen zu können. Die CDR mit Frequenzdetektor und Taktjitterfilter kann diese Anforderungen erfüllen. Die CDR hat einen erweiterten Frequenzfangbereich und kann ohne lokalen Referenztakt auskommen. Die Viertelratenarchitektur reduziert die Leistung, da ihr PFD in CMOS-Logik implementiert werden kann. Zudem können die zusätzlichen Abtastphasen für die Frequenzdetektion reduziert werden. Die Viertelraten-CDR ist automatisch auch ein 1-zu-4-DEMUX. Die Leistungsaufnahme des Deserializers wird so reduziert. Das Taktjitterfilter mit LC-VCO erzeugt nicht nur einen Takt mit niedrigem Jitter bei mittleren Frequenzen, wie dies AD-Wandler benötigen, sondern auch einen Hochfrequenztakt für serielle Datenübertragung. Die Viertelraten-CDR mit Frequenzdetektor und Taktjitterfilter in 0,18um-CMOS bietet eine Datenrückgewinnung und einen Referenztakt mit niedrigem Jitter.

## Chapter 1

## Introduction

### 1.1 Motivation



Figure 1.1 Serial data communication system

Optical communication and high speed serial data communication support high demand of data throughput in data networks from global to sub-system level. Optical data communication is applied in long-haul data network for Internet or data highway. It also serves for short-haul interconnections such as interconnections from building to building.

The high speed serial data communication covers large size network to data communication inside small system. It gradually replaces parallel data interface that has a physical limitation by its extremely large number of ports and interconnections for high data throughput.

Figure 1.1 shows the data transmission in serial data communication system. Parallel data are encoded, serialized and transmitted as a serial data stream. Data can be transmitted by higher data rate because of low swing differential signaling. Hence, the physical limitation in parallel data interface can be overcome because a high data throughput can be achieved by a smaller number of ports. For data serialization, a required high frequency clock is generated from a low frequency local reference clock by a clock synthesizer. It is used in a high speed data multiplexer to generate a serial data stream. The serial data are transmitted over transmission lines and cables by a high data rate transmitter driver. In order to avoid clock-data-skew, the serial data are transmitted solely. A corresponding clock is extracted at receiver by a clock and data recovery circuit (CDR). The recovered clock is used in data regeneration, demultiplexing, comma detection/alignment and 8B/10B decoding. Basically, the receiver node has its own local reference clock. It rarely has the exact frequency to the reference clock in the transmitter node. They are normally specified to be within a tolerance. In PCI Express standard, their reference frequencies are specified to be different within +/- 300 ppm. Therefore, an elastic buffer is applied to handle the difference of the reference frequencies. The front-ended blocks in the receiver are synchronous to the incoming serial data. They are depicted as shaded blocks in Figure 1.1. The mismatch of the reference frequencies leads to data overflow/underflow, if the reference frequency in the transmitter node is higher/lower respectively. The elastic buffer observes data overflow/underflow and removes/inserts the special symbol to make the data flow match to the reference frequency in the receiver node. This means that all nodes in the data communication networks operate in a plesiochronous mode, an operation with slightly different reference frequencies. This kind of operation will well function if timing precision in each data node is not critical.



Figure 1.2 The data readout networks in particle physics experiment

Figure 1.2 depicts the data readout networks of the particle detectors in a particle physics experiment. The particles, resulting from collisions, are converted to electrical signals in detectors. The signals are transformed to digital data by charge-amplifiers and analog-to-digital converters in front-ended electronic units (FEE). The data from the front-ended electronic units are collected by Concentrator Network (CNet) and further framed and processed in Build Network (BNet), Processing Network (PNet), and High-level Network (HNet) for the high level computing. The concentrator network operates not only as a data collector but also as a data selector in order to minimize the required data throughput of the networks. Moreover, it also operates as a time distributor for the front-ended electronic units. The time distribution is important for the time stamps of events on detectors. Therefore, the information is not only the readout data but also the relative times of the events [1]. In the conventional serial data communication networks, each node has its own local reference clock. It can be used for data collection but not for time distribution. Therefore, a separate time distribution networks are required. If the existing data networks can provide timing function, the system complexity can be largely reduced. Consequently,

the data communication networks in this synchronous data readout system are slightly different from the standard ones.



Figure 1.3 The data communication networks for time and clock distribution

The data communication networks for time and clock distribution are shown in Figure 1.3. Only the time distributor node has a local reference clock. The local reference clock in other nodes, time-distributed clients, are replaced by the recovered clocks from CDRs. They are used as the reference clocks for data transmission and data processing units such as analog-to-digital converters in the client nodes. Therefore, there is no plesiochronous condition like in the standard data networks. Hence, the elastic buffer is not required. The key to achieve this concept is a low jitter clock and data recovery circuit because the recovered clock has to be used as a reference frequency. For that reason, a good quality of the recovered clock is required. The CDR for time and clock distribution has to fulfill both requirements of data recovery and of precise reference clock.

The CDR for time and clock distribution has to operate without local reference clock. Therefore, the CDR must have a wide frequency capture range. This can be provided by a rotational frequency detection technique. The additional sampling phases are required but the CDR can have frequency capture range larger than its loop bandwidth. Consequently, it can have a wide frequency capture range and its jitter transfer function can be independently adjusted. In order to improve jitter quality, the clock-jitter-filter is applied to reduce the jitter of the recovered clock. It can also reduce jitter accumulation, if the time-distributed client nodes further distribute the reference frequency.



Figure 1.4 The block diagram of CDR for time and clock distribution

The block diagram of CDR for time and clock distribution proposed in this thesis is shown in Figure 1.4. The clock rate reduction technique is applied at the front-ended CDR. The CDR as an intrinsic 1-to-4 demultiplexer can reduce the power consumption of the entire deserializer. The sampling phase for frequency detection can be optimized in order to improve its efficiency. In the Phase-Locked-Loop-based (PLL-based) clock-jitter-filter, the low jitter LC-Voltage-Control-Oscillator (LC-VCO) is used and the loop bandwidth is minimized to suppress the jitter of the recovered clock. The front-ended loop with 1/4-rate

CDR provides data recovery function while the clock-jitter-filter improves the quality of the recovered clock. It can fulfill the requirement of CDR for time and clock distribution.

## 1.2 Scope and organization

An adequate time measurement protocol is required to obtain the relative times of the events in different data readout nodes in a detector. It has to quantify the interconnection delay and the delay of front-ended electronics. However, this thesis is focused on the design of the crucial component for time synchronization. The organization of a complete relative time measurement system is not covered in detail.

This thesis is organized in the following way. The design consideration of the CDR for time and clock distribution is discussed in chapter 2. The specification for CDR is explained. The jitter sources in serial data communication system are defined and analyzed. The characteristic of PLL-based CDR and clock synthesizer are discussed. The 2-loop CDR architecture is purposed and the design goal of CDR for timing reference distribution is specified. In chapter 3, the design of the front-ended loop, PLL-based CDR, is described. The state of the art of CDR in standard serial data communication system is summarized. The various types of CDR are discussed and the rotational frequency detection technique in CDR is explained. The proposed power efficient CDR with reduced-sampling-phase timeinterleaving 1/4-rate phase frequency detector is presented in this chapter. The architecture and the design considerations of its building blocks i.e. phase detector, frequency detector, VCO, charge pump, and the proposed loop bandwidth reduction technique are discussed in detail. The corresponding simulation and experiment results of the 1/4-CDR are shown. In chapter 4, The design of the secondary loop, PLL-based clock-jitter-filter with low jitter LC-VCO is discussed. The design issues of its building blocks: phase/frequency detector, LC-VCO, and frequency divider, are explained. The clock-jitter-filter is optimized to have a low jitter generation. The simulation and experimental results are presented in this chapter. In chapter 5, the experimental results of the two-loop CDR for time and clock distribution implemented in 0.18µm CMOS Technology is shown. The conclusion of this thesis and the future work are presented in chapter 6.

## Chapter 2

## Design considerations of CDR for time and clock distribution

### 2.1 Introduction



Figure 2.1 The function and basic block diagram of CDR

The specifications of CDR in standard serial data communication will be discussed in this chapter. In order to specify the requirement of CDR for time and clock distribution, the jitter sources in the serial data communication have to be analyzed.

In serial data communication at high bit rates, differential signaling is a common feature. It has smaller signal voltage swing and better common-mode noise rejection than single-ended signaling. Clock and data skew is rather problematic because the unit interval (UI) is small at high data rates. Consequently, only serial data are transmitted and the corresponding clock is recovered at the receiver, as shown Figure 1.1. The 8b/10b coding scheme is always applied in the serial data for two main reasons. The first is to remove the dc-component from serial data stream, allowing the transmission over ac-coupled connections. The second reason is to assure sufficient data transitions for CDR to recover clock from the serial data. The PLL-based CDR is widely used because it is suitable for monolithic integration. The function and basic block diagram of CDR is depicted in Figure 2.1 a) and b) respectively. The CDR generates the required clock for data regeneration, as in Figure 2.1 a). The clock is generated by a VCO which its phase and frequency are regulated by using the averaged phase error from a phase detector (PD) and loop filter (LF). The specifications of CDR will be briefly described in the following.

## 2.2 CDR specifications

#### 2.2.1 Jitter transfer function

Jitter transfer function relates to the transfer function of the jitter output in the recovered clock to the jitter input in serial data by various frequencies. It defines the loop bandwidth or closed-loop response of CDR. It is considered as low-pass filter, as shown in Figure 2.2 a). The quantitative design can be done by a PLL linear model, that will be discussed in section 2.4.1.

#### 2.2.2 Jitter peaking



Figure 2.2 CDR specifications

In a phase-locked loop, a peak can appear around the corner frequency of the closed loop response depending on the loop damping factor, Figure 2.2 a). The jitter peak causes errors in data regeneration. Therefore, it is normally specified to be less than 0.1 dB.

#### 2.2.3 Jitter Tolerance

Jitter tolerance describes the maximum jitter amplitude of incoming serial data, which the CDR can operate without error or with an extremely small error rate, e.g.  $10^{-12}$ , at various jitter frequencies. The jitter tolerance mask in SONET is shown as a grey line in Figure 2.2 b). The black line depicts the jitter tolerance of a qualified CDR. It can usually tolerate higher jitter amplitude than specified by the jitter tolerance mask. Normally, a CDR has to allow for large jitter amplitudes at low frequencies to support a plesiochronous operation, in which the clock frequency of the CDR is slightly different from the incoming serial data stream. In this condition, phase wandering occurs and it is considered as a low frequency jitter. The CDR should be able to track it.

According to its jitter transfer function, the low frequency jitter in serial data appears at the CDR recovered clock. It means the clock phase can follow low frequency jitter inside its loop bandwidth, even if it is larger than a unit interval. Incoming jitter at high frequencies outside CDR loop bandwidth is suppressed. The CDR cannot follow the high frequency

jitter of the incoming serial data, therefore it tolerates only a smaller jitter amplitude. The jitter tolerance of CDR has the same corner frequency as that of the jitter transfer function, f<sub>-3dB</sub>. Therefore, a trade-off in the CDR design is required in order to meet both specifications.

#### 2.2.4 Jitter generation

Jitter generation means the jitter produced by the CDR itself, when input data contains no jitter. The jitter sources in PLL-based CDR are VCO phase noise, interference, and supply/substrate noise. The jitter in CDR can be categorized into two types: random jitter and deterministic jitter. The random jitter comes from device noise such as thermal noise in transistors or resistors. Deterministic jitter is periodic and has finite value. It is caused by the circuit operation such as supply/substrate noise. Any offset voltages or mismatch of devices can degrade the performance and cause deterministic jitter. A careful circuit and layout designs can reduce the jitter generation.

## 2.3 Jitter in serial communication system

The jitter sources in serial communication system have to be analyzed, in order to specify the requirement of the CDR for clock and time distribution. They can be classified into 3 types, i.e. transmitter jitter, channel jitter, and receiver jitter.

### 2.3.1 Transmitter jitter

The high frequency clock for data serialization and the data buffer are the main contributions of the transmitter jitter. The high frequency clock is normally generated by a PLL-based clock synthesizer. Its block diagram is shown in Figure 2.9 a). The dominant jitter sources in PLL are the phase noise of its reference clock and of VCO. They affect the PLL output clock in different ways. The reference clock phase noise has a low-pass transfer function, while the VCO phase noise has high-pass transfer function at the same corner frequency. Therefore, the loop characteristic has to be optimized regarding to the jitter quality of the VCO and of the reference clock. The reported clock synthesizers use an on-

chip low jitter LC-VCO and a low jitter reference clock. They can provide the high quality clock with jitter less than 1ps rms [2][3]. The jitter contributions of the building blocks in PLL will be discussed in section 2.4.2. The design of a PLL-based clock synthesizer as a clock-jitter-filter will be discussed in Chapter 4.

High data rate buffers are usually realized in a differential circuit topology, because common-mode noise rejection is improved comparing with the single-ended data buffers. The supply and substrate noise induced jitter is well rejected. Figure 2.3 a) shows the device noise sources in a differential serial data buffer. The jitter contributed by device's thermal noise in data buffer is analyzed in [4]. It can be calculated by

$$\Delta t_{d-buff,rms} = \sqrt{\frac{kT \cdot C_L}{2}} \cdot \frac{\xi}{I_{ss}}$$
 Eq. 2.1

where  $\Delta t_{d\text{-}buff,rms}$  is the root-mean-square of the jitter from the device's thermal noise in data buffer, k is the Boltzmann constant, T is the absolute temperature,  $C_L$  is the output capacitive load,  $I_{ss}$  is the data buffer tail current, and  $\xi = \sqrt{1 + (2/3) \cdot a_v}$  is the noise contribution factor, where  $a_v$  is the data buffer gain. Only jitter from thermal noise is considered in this equation. The jitter contribution by thermal noise in the output driver with this following parameters: the characteristic impedance of 50 ohm, the output voltage swing of 400mV, the capacitive load of 500fF from the bonding pad capacitor and 1.5pF from the effective parasitic capacitor of interconnection, and the voltage gain of one, can be calculated by Eq. 2.1 to be 0.022ps rms. This value is relatively small compared to the jitter contributed by PLL-based clock synthesizers. Therefore, it is normally neglected in the well-designed data buffer.



a) Device noises in data buffer

b) Phase error induced by offset voltage

Figure 2.3 Device noise in the data buffer

The jitter contributed by flicker noise dominates at low frequency. The flicker noise of transistor can be calculated by

$$v_{flicker-nosie}^2 = \frac{K_f}{W \cdot L \cdot C_{ox} \cdot f}$$
 Eq. 2.2

where  $v_{flicker-noise}$  is the amplitude of flicker noise,  $K_f$  is the empirical coefficient for flicker noise, W is the width of transistor, L is the length of transistor,  $C_{ox}$  is the specific capacitor of MOS transistor, and f is the noise frequency. The flicker noise has high amplitude at the low frequency, therefore the flicker noise in switch transistors can be visualized as an offset voltage in the data buffer. For instance, the contribution of flicker noise between 1Hz to 0.001Hz from the transistor with W=10 $\mu$ m, L=0.2 $\mu$ m, and  $C_{ox}$  of 8fF/ $\mu$ m<sup>2</sup> can be calculated by Eq.2.2 to be 19 $\mu$ V. It is relatively small compared to the offset voltage contributed by mismatch properties of transistors in the differential circuit. This induces an offset voltage in the milli-volt range.

The effect of the offset voltage in the data buffer is depicted in Figure 2.3 b). The offset voltage from the mismatch and thermal noise of the switch transistors influence the rising and falling edges of the serial data in the opposite way. If offset voltage makes the threshold crossing time of the data rising edge late, it will make the threshold crossing time

of the data falling edge early, which results in a data duty cycle distortion. The phase errors of the rising and falling edges are in the opposite direction by the same amplitude. Therefore, the offset voltage of the data buffer is converted to high frequency phase errors. Normally, the serial data are encoded by the 8B/10B coding scheme with a longest five identical successive bit sequence, described in Appendix-A. Therefore, the frequency range of jitter induced by the offset voltage of the data buffer can be written as

$$\frac{f_{data-rate}}{10} < f_{jitter-offset} < \frac{f_{data-rate}}{2}$$
 Eq. 2.3

where  $f_{data-rate}$  is the data rate frequency,  $f_{jitter-offset}$  is the frequency range of jitter induced by the offset voltage of the data buffer. The jitter amplitude can be calculated from

$$\Delta t_{d-buff,offset} = \frac{2 \cdot C_L \cdot v_{offset}}{I_{ss}}$$
 Eq. 2.4

where  $\Delta t_{d-buff,offset}$  is the peak-to-peak jitter caused by the offset voltage,  $v_{offset}$  is the offset voltage of the data buffer,  $C_L$  is the output capacitive load, and  $I_{ss}$  is the tail current of a differential data buffer. In Eq. 2.4,  $C_L$  divided by  $I_{ss}$  is interpreted as the slew rates of the rise or fall times at the output nodes. Jitter induced by the offset voltage will be reduced if the transition time becomes smaller. It can be achieved by increasing the tail current or reducing the capacitive load by minimizing the up-scaling factor in data buffer chain. For instance, the rise/fall times of the serial data buffer chain for driving output pads can be improved by using a up-scaling factor of 2 instead of 2.7 or 3. It make the data buffer has longer delay time. However, the total delay of the serial data buffer chain is normally not critical.

Figure 2.4 shows the simulated power spectrum of the phase error induced by offset voltage of the data buffer. The data transitions occur randomly depending on the transmitted data pattern, therefore the pseudo-random binary bit sequence  $2^{23}$ -1 encoded with 8B/10B coding scheme is used in the simulation. The offset voltage in data buffer in this simulation is 100mV. The phase error power spectra at frequency band as predicted in Eq. 2.3 can be observed.



Figure 2.4 The phase error induced by the offset voltage in data buffer

The flicker noise in the tail current transistor affects the delay time of the data buffer by its current variation. However, the area of the tail current transistor is typically large and the bias node usually possesses relative large capacitive load. Therefore, its contribution can be neglected.

The reported monolithic serializers [2][3] using on-chip LC-VCO and careful design of CML output buffers can provide serial data with jitter less than 10ps P-P.

#### 2.3.2 Channel jitter

The serial data is transmitted over the channel media like coaxial cable or interconnection on PCB. The interconnections have more loss at high frequency. Therefore, the characteristic can be modelled as a low-pass filter. The inaccurate interconnection termination and the inhomogeneous impedance from connectors, bonding wires, and soldering points also contribute to the parts of the channel jitter. However, the jitter from the bandwidth limitation still dominates, therefore, it is the main focus in this section. The data transmission through the bandwidth limitation channel is shown in Figure 2.5. If the corner frequency of the interconnection,  $f_{media}$ , is lower than the highest dominant frequency component of the serial data, which is half of the data rate for non-return-to-zero

data, it will make the data amplitudes different by various data patterns. The corresponding jitter occur on the data eye-diagram, so-called data dependent jitter (DDJ) or inter-symbol-interference (ISI).



Figure 2.5 Channel Jitter

If the interconnection bandwidth is not high enough for the data rate, data bit can not reach the complete logic level as depicted in Figure 2.6. The ideal transmitted signal,  $V_{Tx}$ , is represented as dashed line while the signal at the receiver of the interconnection,  $V_{Rx}$ , is depicted in solid line. The rise time of  $V_{Rx}$  is limited by the interconnection bandwidth. If there is a data transition in the next bit,  $V_{Rx}$  reaches its peak at  $a_1$  and starts falling.  $V_{Rx}$  reaches the logic threshold voltage,  $v_{th}$ , faster. Hence, the threshold crossing times vary.  $V_{Rx}$  starts falling from the signal level  $a_1$  and cannot reach the complete logic level if a data transition occurs in the following bit. Therefore, the next threshold crossing is also fast, which is depicted as negative phase error.  $V_{Rx}$  will not reach the complete logic level again,

if there are data transitions in the following bits, resulting in the fast threshold crossings. If identical successive bits occur,  $V_{Rx}$  has enough time to approach the complete logic level, therefore the threshold crossing of the following transition is slow, which is depicted as a positive phase error.



Figure 2.6 The detail of data-dependent jitter

From [5] and the derivation in Appendix-B, the peak-to-peak jitter caused by the channel bandwidth limitation can be calculated by

$$\Delta t_{d-DDJ,p-p} = -\tau \cdot \ln(1 - e^{-\frac{T}{\tau}})$$
 Eq. 2.5

where  $\Delta t_{d\text{-}DDJ,p\text{-}p}$  is the peak-to-peak jitter caused by channel bandwidth limitation,  $\tau$  is the channel time constant, and T is the serial data bit period. The example of  $V_{Rx}$  and its phase error signal are depicted in Figure 2.7. The data transition after identical successive bits makes a positive phase error, a late threshold crossing, and the successive data transitions make negative phase errors, early threshold crossing.

The pseudo-random binary bit sequence  $2^{23}$ -1 encoded with 8B/10B encoding scheme is simulated for 16384 bit periods. The time constant of interconnection channel is 0.5 of bit period. The eye diagram and the power spectrum of the jitter due to the bandwidth

limitation are shown in Figure 2.8. The simulated power spectrum of jitter shows nearly white spectrum. The details of the data-dependent jitter, interference jitter, and their equalization techniques are discussed in [5].



Figure 2.7 An example of data-dependent jitter



Figure 2.8 Data-dependent jitter

#### 2.3.3 Receiver jitter

The receiver jitter can be categorized to the jitter from data buffer and the jitter from CDR. The jitter from CDR dominates the receiver jitter. Figure 2.9 b) shows the block diagram of a PLL-based CDR. Like the PLL-based clock synthesizer, the high frequency output jitter of a PLL-based CDR is dominated by the VCO phase noise, whereas the output jitter at low frequency is related to the jitter of incoming serial data. Basically, the incoming serial data always has the data-dependent jitter from the bandwidth limitation of interconnections. In order to obtain a low jitter recovered clock, a small loop bandwidth has to be used to suppress the incoming jitter. However, the CDR should be able to track the input jitter according to the specified jitter tolerance. Therefore, the loop bandwidth cannot be too small. This trade-off can be released by the 2-loop CDR architecture, which will be analyzed in the following sections.

#### 2.4 PLL-based CDR and clock synthesizer



Figure 2.9 Block diagrams of PLL-based clock synthesizer and PLL-based CDR

Phase-locked loop circuits are used in the precision timing circuits and systems. Because of feedback topology, they can operate over the variations of temperature, process parameters and supply voltages. The block diagrams of PLL-based clock synthesizer and PLL-based CDR circuit are shown in Figure 2.9 a) and b) respectively.

The clock synthesizer, Figure 2.9 a), adjusts the phase and frequency of the divided clock of the VCO to match the reference clock. In lock condition, the frequency of the output clock is N-time higher than the reference clock. It can be used in clock multiplication or as clock-jitter-filter, if a low jitter VCO is available. In the PLL-based CDR circuit, Figure 2.9 b), the VCO clock tracks the phase of the incoming serial data. Phase-frequency detectors of CDR and of clock synthesizer are different. The CDR circuit compares phases and frequencies of random data and of clock. The frequency detection in CDR is more complicate because the transitions of random data do not occur in every unit interval (UI). Unlike the frequency difference detection in clock synthesizer, this can be achieved by clock edge counting or by latches with feedback resetting. The common method for frequency detection in CDR is a rotational frequency detection. Details will be described in Section 3.1.4. Importantly the PFD of a CDR operates at different frequency from the PFD of a clock synthesizer. The PFD of a clock synthesizer operates at reference clock frequency, while the PFD of a CDR operates at the data rate frequency in the case of fullrate clock CDR. Its operation frequency can be reduced by clock rate reduction technique, using a multi-phase clock to supply sufficient sampling phases. The details of clock rate reduction technique is discussed in Section 3.2.

#### 2.4.1 PLL linear model

A phase-locked loop has non-linear behavior in some conditions. However, the loop behavior in phase domain at frequencies one or two decades lower than the PLL operation frequency can be assumed as a linear system. The detailed analysis is shown in [6]. The block diagram of PLL and its linear model are shown in Figure 2.10 a) and b) respectively.



Figure 2.10 PLL linear model

The open loop response of PLL linear model can be written as

$$H_{open}(s) = \frac{P_{op}(s)}{P_{i}(s)} = \frac{2\pi \cdot K_{VCO} \cdot K_{PD} \cdot (1 + s \cdot R_{1} \cdot C_{1})}{s^{2} \cdot C \cdot N \cdot (1 + s \cdot R_{1} \cdot C_{2})}$$
 Eq. 2.6

for  $C_1 >> C_2$ , where  $P_i(s)$  is the phase of the input clock/data,  $P_o(s)$  is the phase of the output clock,  $K_{PD}$  is the phase detector gain referring to the ratio of the average current from charge pump to the phase difference between the feedback clock and the input clock/data,  $K_{VCO}$  is the VCO gain, and N is the dividing factor of the frequency divider. VCO behaves as an integrator in phase domain providing one pole at origin. The dividing factor in frequency divider can be interpreted as a gain reduction factor in the phase detector.



Figure 2.11 The open loop and closed loop responses of PLL linear model

Figure 2.11 shows the bode diagram from Eq. 2.6 and the corresponding closed loop response. Capacitor  $C_1$  averages charge injected from the charge pump and operates as a current integrator crating a pole at low frequency, ideally at origin. Without resister  $R_1$ , the loop has 2 poles at the origin and, hence, is an unstable system. Resistor  $R_1$  is required to create a zero for phase compensation.  $C_2$  is additionally applied to remove the high frequency noise/interference at the VCO control node, but it has to be small enough not to disturb the loop stability.  $C_2$  creates a high frequency pole that can degrade phase margin. If  $C_2$  is selected much smaller than  $C_1$ , the closed loop response can be simplified as  $2^{nd}$ -order loop. The simplified closed loop response can be written as

$$H_{closed}(s) = \frac{P_o(s)}{P_i(s)} = \frac{(1 + 2 \cdot \varsigma \cdot (s/\omega_N))}{1 + 2 \cdot \varsigma \cdot (s/\omega_N) + (s/\omega_N)^2}$$
 Eq. 2.7

where,

$$\varsigma = \frac{1}{2} \cdot \sqrt{\frac{K_{PD} \cdot 2\pi \cdot K_{VCO} \cdot R_1^2 \cdot C_1}{N}}$$
 Eq. 2.8

and

$$\omega_N = \frac{2 \cdot \zeta}{R_1 \cdot C_1} = \sqrt{\frac{K_{PD} \cdot 2\pi \cdot K_{VCO}}{N \cdot C_1}}$$
 Eq. 2.9

 $\varsigma$  is the damping factor and  $\omega_N$  is the loop natural frequency<sup>\*</sup>. As derived in [7], the loop bandwidth can be calculated from

$$\omega_{-3dB} = \omega_N \cdot \left( (2 \cdot \zeta^2 + 1) + \sqrt{(2 \cdot \zeta^2 + 1)^2 + 1} \right)^{\frac{1}{2}}$$
 Eq. 2.10

where  $\omega_{.3dB}$  is the corner frequency\* of the closed loop response. If a large value of  $C_1$  is applied, in order to make  $\omega_{.3dB}$  greater than 3 to 4 times of  $1/R_1C_1$  and also make  $2\varsigma^2 >> 1$ , the corner frequency of closed loop transfer function can be simplified as

$$\omega_{-3dB} \approx 2 \cdot \omega_N \cdot \zeta = \frac{K_{PD} \cdot 2\pi \cdot K_{VCO} \cdot R_1}{N}$$
 Eq. 2.11

Eq. 2.7 to Eq. 2.11 describe the closed loop response of a PLL. The damping factor defines how the loop reacts to the input. If the damping factor is smaller than 0.7 or underdamped, the closed loop response shows an overshoot around its corner frequency, but the loop acquisition time is small. On the opposite, if the damping factor is larger than 0.7 or overdamped, there is no overshoot around its corner frequency, but the loop needs more time to acquire the lock condition.

×

<sup>\*</sup> frequency in this term is angular frequency



Figure 2.12 The open and closed loop responses of the 3<sup>rd</sup>-order loop by various loop resistors

From Eq.2.8, the damping factor of the  $2^{nd}$ -order loop can be calculated. However, with the capacitor  $C_2$ , the loop is actually  $3^{rd}$ -order, so that the loop damping factor cannot be precisely predicted. The phase roll-off resulting from the  $3^{rd}$  pole is presented in the bode diagram in Figure 2.12. By increasing loop resistor  $R_1$ , the zero and the  $3^{rd}$  pole move towards low frequency. Consequently, the crossover frequency increases and the phase margin is degraded. However, the equations of the  $2^{nd}$  order loop in Eq. 2.7 to Eq. 2.11, can provide a simple hand calculation for the loop design. The more precise loop characteristic can be obtained by the linear model simulation.

#### 2.4.2 Loop characteristic design and jitter in PLL



a) Linear model of jitter in PLL



Figure 2.13 Jitter in PLL

The main contributions of jitter in PLL are incoming jitter and VCO phase noise. Figure 2.13 a) shows the linear model of PLL with its jitter sources. The transfer function of the incoming jitter,  $P_i(S)$ , to the output clock jitter,  $P_o(S)$ , has low-pass characteristic. It is described by the closed loop response as shown in Figure 2.13 b). The transfer function of VCO phase noise,  $N_{VCO}(S)$ , to the output clock jitter is depicted in Figure 2.13 c). It has high-pass characteristic by the same corner frequency. Incoming jitter within the loop bandwidth will be tracked by the output clock. The incoming jitter out of the loop bandwidth will be suppressed. The VCO phase noise dominates the clock output jitter at high frequencies. The low frequency part of the VCO phase noise is cancelled itself out by the negative feedback. Other building blocks have jitter induced by device noise, supply/substrate noise, interference, mismatch and offsets in the circuits. Jitter from phase detector, charge pump, and frequency divider,  $N_{PD}(S)$  and  $N_{F-DIV}(S)$ , have a low-pass

transfer function to the output clock. The jitter from loop filter has a band-pass transfer function.



Figure 2.14 Phase noise (jitter) contribution in PLL

Figure 2.14 shows the plot of phase noise contributions from various sources to the PLL output clock obtained by the simulation of PLL linear model. Phase noise describes jitter in frequency domain as the ratio of power spectra around the carrier frequency to the carrier power. The total phase noise of PLL output clock is depicted by the thick line. Inside loop bandwidth, up to around 10kHz in this example, the output phase noise is dominated by the phase noise of the reference clock. The VCO phase noise dominates the output phase noise at higher offset frequencies. The phase frequency detector and the frequency divider contribute to the output phase noise inside the PLL loop bandwidth.

From the linear model, the contributions of jitter sources in PLL are well understood are useful for loop characteristic design. For example, in the design of a PLL-based clock synthesizer that has a low jitter reference clock from quartz oscillator and a ring oscillator as VCO, the loop bandwidth should be wide in order to suppress the poor jitter quality of

the ring-based VCO clock. In a high performance system, a low jitter LC-VCO is applied. Its loop bandwidth is selected to have the same jitter contributions from the reference clock and VCO. Another example is a PLL-based clock-jitter-filter. It uses a low jitter VCO and a small loop bandwidth to suppress the input jitter.

The loop characteristic of a PLL-based CDR has to be designed to meet the requirements of the jitter tolerance and the jitter peaking. The CDR is normally designed to be overdamped, in order to avoid jitter peaking that degrades jitter tolerance.

## 2.5 The architectures of CDRs for large synchronous networks

By the understanding of PLL jitter sources and their effects, the architecture of the CDR can be designed. Serial data always has unavoidable jitter contributed by the transmitter jitter and the channel jitter. Therefore, the CDR should be able to track the phase of incoming data to provide sufficient bit-error-rate (BER). The BER will be degraded, if the loop bandwidth of the CDR is too small. The jitter tolerance specification has to be met. In large synchronous networks. The recovered clock from the CDR has to be used as a local reference clock, the conventional CDR architecture will not be suitable. Hence, the 2-loop architecture has to be applied, in order to avoid the trade-off between jitter tolerance and jitter filtering. Different approaches will be discussed and compared in the following section.

#### Recovered clock for system Serial data In FD CP & LC-VCO Loop filter (for data) Clock Recovery Loop: low loop bandwidth Recovered clock for serial data CP & PD VCDL (for data) Loop filter Clock and Data Recovery Loop: Recovered data loop bandwidth complied to jitter tolerance

#### 2.5.1 The CDR with a clock extraction and a phase tracking loop

Figure 2.15 The CDR with clock extraction and phase tracking loops

The first approach is using a clock extraction loop and a phase tracking loop. The block diagram is shown in Figure 2.15. In the clock extraction loop, a low jitter VCO is required and the loop bandwidth has to be small to suppressed the incoming jitter. Therefore, LC-VCO is used. The extracted clock is further applied for the data tracking loop. It adjusts the clock phase by a voltage control delay line (VCDL) in order to track the phase of the incoming serial data. The loop bandwidth of the data tracking loop has to comply with the jitter tolerance standard. This approach has disadvantage that the frequency detector and the phase detector in both loops have to operate at the data rate frequency. Therefore, the current-mode-logic has to be applied. It makes the whole system has large power consumption. Moreover, it is difficult to has small loop bandwidth, when phase detector operate at high frequency. The clock rate reduction architecture is difficult to apply because the LC-VCO is not inherently multi-phase but it has to be used for low jitter clock generation.

#### 2.5.2 The CDR with a clock-jitter-filter



Figure 2.16 The CDR with a clock-jitter-filter

The second approach is using a clock and data recovery loop and a clock-jitter-filter. Its block diagram is shown in Figure 2.16. In the CDR, the frequency detector is applied to be able to operate without the local reference clock. The loop bandwidth of CDR is designed to comply with the jitter tolerance specification. The second loop operates as a clock-jitter-filter. The loop bandwidth has to be small to reduce the jitter of the recovered clock. The low jitter VCO is needed therefore a LC-VCO is applied in the clock-jitter-filter. In this approach, there are many advantages over the previous one. The clock rate reduction architecture can be applied by using a ring oscillator in the front-ended CDR loop. With carefully design, the ring oscillator can provide sufficient low jitter for data recovery function. With reduced clock rate CDR, the recovered clock is at the lower frequency. It is suitable for the small loop bandwidth design in the clock-jitter-filter. Moreover, the reduced clock rate CDR using multi-phase clock is an intrinsic DEMUX, hence, the power consumption of the whole deserializer is reduced. The clock-jitter-filter operates not only

as clock-jitter-filter but also as the high frequency clock synthesizer for the data serializer. Consequently, power consumption of the entire transceiver is minimized. This architecture is selected to apply on the CDR for large synchronous networks. The detail designs of both loops, the CDR loop and the clock-jitter-filter loop, are discussed in the following chapters.

By using the CDR in Figure 2.16, the incoming jitter inside the loop bandwidth of the clock-jitter-filter still remains in the output clock. It is mainly contributed by the channel jitter because of its white spectrum. The jitter resulting from the offset voltage in the data buffer, which is induced by the flicker noise and the device mismatch, is converted to the high frequency jitter by the 8B/10B coding scheme, as discussed in Section 2.3.1. Therefore, it is filtered by the clock-jitter-filter. The remaining jitter at the output clock can be minimized by decreasing the ISI jitter. It can be achieved by the equalization technique in the serial data transmitter driver.

The jitter of the distributed clock is specified in [1] to be smaller than 25ps rms. Therefore, it is a primary goal of this thesis.

#### 2.6 Summary

From the analysis of the jitter sources in serial data communication system, the jitter sources in transmitter can be minimized by using a clean reference clock and a low jitter LC-VCO in the clock synthesizer in time distributor node. Therefore, the main contribution of the serial data jitter is the channel jitter by its bandwidth limitation. It causes white spectrum jitter because of the random nature of the serial data stream. The clock data recovery circuit at the receiver has low-pass jitter transfer function according to its loop bandwidth. It can filter some parts of the incoming jitter from the serial data. However, its loop bandwidth can not be too small. It needs to have a tracking ability for sufficient bit-error-rate. In order to obtain clock distribution through data network, the CDR with 2-loop topology has to be applied because it has no trade-off between jitter tolerance and jitter filtering. The front-ended loop generates the clock tracking serial data phase for the adequate bit-error-rate and the secondary loop suppresses the jitter by its small loop

bandwidth and low jitter LC-VCO. It can provide low jitter clock for local reference timing.

## Chapter 3

# Structure of the front-ended loop: PLL-based CDR

#### 3.1 State of the art

Clock and data recovery circuit is a crucial component in serial data receiver. The phase-locked loop based, PLL-based, CDR is widely used. The CDR uses phase error information from phase detector to adjust phase and frequency of its VCO clock to synchronize to the incoming serial data. The jitter characteristic of PLL-based CDR is defined by its loop parameters. For instance, its loop bandwidth defines its jitter transfer function and its frequency capture range. The architectures of PLL-based CDR can be roughly categorized to 2 types, i.e. the CDR with and without external reference clock.

#### 3.1.1 PLL-based CDR with an external reference clock

This type of CDR requires the external reference clock for its operation. It is widely used in the standard serial data transceiver system, as shown in Figure 1.1, because the external reference clock is always available for the data transmitter. The architectures can be roughly separated into two groups. They are CDR with frequency initialization and CDR with phase synthesis and phase interpolation.

#### VCO 44 Control Lock Freq. Divider Loop logic detector by factor **D** Filter **PFD** Ref. CLK (clk. to clk.) ( bitrate freq./ $\mathbf{D}$ ) Selector CP Serial Data In PD Recovered Data Recovered Clock Ref. CLK ( bitrate freq./ $\mathbf{D}$ ) Frequency acquisition loop (active in initialisation) Recovered Clock Data tracking loop

#### 3.1.1.1 CDR with frequency initialization

Serial Data In

Figure 3.1 CDR with frequency initialization loop

With the simple phase detector for serial data, the frequency capture range of a PLL-based CDR is limited by its loop bandwidth. Therefore, a frequency acquisition loop has to be applied. The CDRs with frequency initialization are reported in [2][3]. Figure 3.1 shows its block diagram and the important signals. The frequency acquisition loop, shown as grey lines, drives the CDR to lock with an external reference clock. The frequency of the external reference clock and the dividing factor D are carefully selected in order to make the frequency of the VCO close to the incoming data rate. The phase frequency detector in the frequency acquisition loop is similar to the one in the clock synthesizer, which is designed for clock-to-clock comparison. It will be further discuss in detail in section 4.2.2.1. After the frequency acquisition is achieved, the CDR switches to a data tracking loop as shown in black lines. The clock phase will be adjusted to sample the incoming data at the optimum point.

It occurs in practical that the reference clocks of different transceivers can not be exactly equal. Therefore, in the serial communication standard, e.g. PCI Express, the unit interval of transmitter and receiver can be different by +/-300 ppm. As a result, the operation frequency of the CDR at the beginning of the data tracking mode is different from the incoming data rate. The loop bandwidth of the data tracking loop has to be designed to cover this frequency difference.

#### 3.1.1.2 CDR with phase synthesis and phase interpolation



Figure 3.2 CDR with phase interpolation

A CDR with phase synthesis and phase interpolation is depicted in Figure 3.2. A phase synthesis loop, shown in grey lines, generates a multiphase clock locked to a reference clock by using a ring-based VCO or a Voltage Control Delay Line (VCDL). The phase selectors and phase interpolators select the suitable pairs of clock phases from the multiphase clock and interpolate them in order to obtain the optimum data sampling phases. CDRs with this architecture are reported in [8],[9], and [10]. If there is frequency difference between the reference clock and the incoming serial data, phase errors cyclically

change. The control circuit of the phase selectors and the phase interpolators is designed to support this phase error rotation and the CDR has to track the phase error over one unit interval, 360°. Parts of phase error signals are used to drive the VCO clock to track the frequency difference. It makes the sampling phase shift from the optimum sampling point, hence, degrades the CDR operation. The compensation of the phase offset in a plesiochronous operation is reported in [11]. The additional frequency detector based on the edge-counting is applied to estimate the frequency difference and generate signals to compensate the phase offset.

#### 3.1.2 PLL-based CDR without external reference clock



Figure 3.3 CDR without external reference clock

CDR without external reference clock is presented in Figure 3.3. Its architecture is quite simple. However, its PFD is more complicated than the phase detector (PD) in CDR with an external reference clock. The phase and frequency error can be obtained by PFD. The frequency difference between the serial data and the VCO clock can be indicated by the rotational frequency detection. It will be discussed in detail in section 3.1.4. The CDR has wider frequency capture range than its loop bandwidth, therefore the frequency acquisition loop and external reference clock are not required. Normally, this type of CDR is applied in the signal repeater in the optical communication system or in applications that local reference clocks are not available. The design and implementation of the full-rate-clock architecture is reported in [12]. The half-rate-clock designs are shown in [13] and [14]. The

1/4-rate-clock approach is proposed in this thesis. It has been also reported as a publication in [15].

#### 3.1.3 Phase detector for serial data

The phase detectors for serial data can be categorized into two types, i.e. the linear phase detector and the binary phase detector.

#### 3.1.3.1 Linear phase detector



Figure 3.4 Linear phase detector

The circuit of a linear phase detector, its operations and characteristic curve are shown in Figure 3.4. The linear phase detector had been first time published in [16] by Hogge. The operation of the linear phase detector can be described in the following. The linear phase detector generates the reference impulses that have a duration of half unit interval at every serial data transition, shown as signal 'X' in Figure 3.4 b) and c). It can be produced by two D-flipflops (D-FFs), which are clocked by raising and falling clock edges, and XOR-gate. The phase-measuring impulses are generated by the recovered data, signal 'B', and the

incoming serial data. The phase-measuring impulses are shown as signal 'Y'. If clock are too late/early, the phase measuring impulses will be shorter/longer than a duration of half unit interval, as depicted in Figure 3.4 b) and Figure 3.4 c) respectively. In the lock condition, the phase-measuring impulses have a half unit interval duration. The difference of the impulse durations represents the phase error, therefore its phase error output depends linearly to the phase error between serial data and VCO clock. Moreover, the recovered serial data is automatically regenerated in the phase detection circuit as signal 'B'. The disadvantage of the linear phase detector is the utilization of impulse durations. That their distortions from unbalancing gate delay or rise/fall times cause phase offset or the sampling point shifting from the middle of the data eye. Furthermore, the phase error is detected in every serial data transitions, but not every unit interval. Consequently, the gain of this phase detector type depends on the data transition density.

Figure 3.4 d) shows the characteristic curve of the linear phase detector. The average current output of a linear phase detector combining with a charge pump can be written as

$$I_{avg-linear-PD}(\Delta\phi) = \frac{\Delta\phi \cdot I_{CP} \cdot D_T}{2\pi} \left[ \mathbf{A} \right] \quad \text{for } -\pi \le \Delta\phi \le +\pi$$
 Eq. 3.1

where  $I_{avg-linear-PD}$  is the average output current of the linear phase detector combining with a charge pump,  $\Delta \phi$  is the phase error,  $I_{CP}$  is the charge pump current and  $D_T$  is the data transition density, the number of data transitions per unit interval. Therefore, the gain of a linear phase detector can be written as

$$K_{linear-PD} = \frac{I_{CP} \cdot D_T}{2 \cdot \pi} \left[ \frac{A}{rad} \right]$$
 Eq. 3.2

where  $K_{linear-PD}$  is the gain of a linear phase detector.

#### 3.1.3.2 Binary phase detector



Figure 3.5 Binary-state and ternary-state binary phase detectors

Unlike the linear phase detector, a binary phase detector indicates only if the clock phases are too early or too late but it cannot provide the quantitative phase error. It utilizes the unavoidable jitter in the incoming serial data and the VCO clock to generate the same amounts of the early and the late signals in the lock condition. The binary phase detector can be separated into the binary-state type and the ternary-state type (Alexander's phase detector).

#### 3.1.3.2.1 Binary-state binary phase detector

The block diagram of a binary-state binary phase detector, its operations and characteristic curve are shown in Figure 3.5 a) and c) respectively. The circuit consists of a D-flipflop, or data sampling unit, using a serial data as a clock input and a VCO clock as a data input. In lock condition, the rising edges of the serial data sample around the falling edges of the VCO clock therefore the rising edges of VCO clock can be used as the optimum sampling edges for data regeneration. In this condition, the output 'Q' of the D-FF alternates between

high and low, phase late and early, in the same amounts. If the VCO clock is early, the rising edges of the serial data sample at the logic 'low' of the VCO clock. On the opposite way, if the VCO clock is late, the rising edges of the serial data sample at the logic 'high' of the VCO clock. The results of phase error are held until the next rising edge of the serial data. It makes the gain of this phase detector independent from the serial data transition density because it shows the result of phase error in every unit interval [17]. Both-edge-triggered D-flipflops can be applied to make the phase detector indicate phase error by both rising and falling edges of the serial data. This kind of phase detector has disadvantages that there is phase error accumulation by absence of the data transition. Furthermore, the automatic data regeneration can not be obtained because the phase error is detected by using serial data sampling a clock. The additional circuit for data regeneration is required. The non-identical operations of phase detection and data regeneration can cause phase offset leading to a non-optimum in data regeneration.

#### 3.1.3.2.2 Ternary-state binary phase detector (Alexander's phase detector)

It was firstly published in [18] by Alexander, therefore it is widely known as the Alexander's phase detector. The block diagram of Alexander's phase detector, its operation and characteristic curve are shown in Figure 3.5 b) and Figure 3.5 c). Both edges of the VCO clock sample the serial data to provide the phase error information. In the lock condition, the falling edges of the clock sample around the serial data transitions while the rising edges sample at the middle of the data eyes. The phase error can be detected by XOR-gates. If the data sample (a) is different from its edge sample (b) and the next data sample (c), it means the phase clock is late. On the opposite way, if the data sample (a) and edge simple (b) differs from its next data sample (c), the PD indicates phase clock early. In the lock state, the phase detector generates the same amounts of phase early and late signals. The phase error is detected when the serial data transitions occur. Therefore, its gain depends on the data transition density. It has an advantage that the serial data is automatically regenerated, data samples (a) or (c). Moreover, the phase detection circuit has an identical operation as the data regeneration unit, a clock sampling data, therefore there is no phase offset in the data regeneration. Its characteristic curve is similar to the

binary-state phase detector. Figure 3.6 a) shows the internal signals of the ternary-state phase detector indicating the clock phase early.



Figure 3.6 Alexander's phase detector and the statistic approach of a binary PD gain

#### 3.1.3.2.3 The effective gain of a binary phase detector

Both types of binary phase detectors can indicate only phase early or late but cannot provide the quantitative phase error. Therefore, its characteristic curve shows a non-linear behavior to the phase error. In a CDR, the binary phase detector outputs are used as small frequency adjustment steps and a loop filter average them to generate the control voltage for the VCO. Moreover, the CDR loop bandwidth in normally far lower than the operation frequency of a phase detector, therefore the statistic approach can be applied to define the effective gain of a binary phase detector. This approach was reported in [19]. If the jitter of the serial data has a certain probability density function (PDF), the gain of a binary PD can be defined by the different amounts of the probabilities that the phase detector indicates

phase early and late. The statistic approach and the effective gain of a binary PD are shown in Figure 3.6 b). The non-linear behavior occurs only when the jitter of the serial data is very small that does not happen in practical.

The binary phase detector has a linear behavior for a certain incoming jitter probability density function. Therefore, the linear model is still applicable for the loop characteristic design. The average current output of a binary phase detector combining with a charge pump can be written as

$$I_{avg-binray-PD}(\Delta\phi) = I_{CP-AV} \cdot D_T \cdot \left( \int_{-\pi}^{\Delta\phi} f(x) dx - \int_{\Delta\phi}^{+\pi} f(x) dx \right) \text{ for } -\pi \le \Delta\phi \le +\pi \quad \text{ Eq. 3.3}$$

where f(x) is the probability density function of incoming jitter in serial data,  $D_T$  is the data transition density and  $I_{CP-AV}$  is the average current of charge pump per UI. The gain of binary phase detector can be calculated from the slope of the average current by

$$K_{binary-PD}(\Delta\phi) = \frac{\partial \left(I_{avg-binray-PD}(\Delta\phi)\right)}{d(\Delta\phi)} \left[\frac{A}{rad}\right]$$
 Eq. 3.4

where  $K_{binary-PD}$  is the gain of a binary phase detector. It varies by different phase errors. Therefore, the average gain has to be used and it can be calculated by the integration of the PD gain and its probability at each phase error. It can be written as

$$K_{binary-PD-avg} = \frac{\int K_{binary-PD}(\Delta\phi) \cdot f(\Delta\phi) \cdot d(\Delta\phi)}{\int f(\Delta\phi) \cdot d(\Delta\phi)} \left[ \frac{A}{rad} \right]$$
 Eq. 3.5

where  $K_{binary-PD-avg}$  is the average gain of a binary phase detector for the incoming jitter with a certain probability density. Some examples of the average gain of a binary phase detector are shown in Appendix-C. Eq. 3.5 can be simplified to be

$$K_{binary-PD-avg} = \frac{2 \cdot I_{CP-AV} \cdot D_T \cdot K_{PDF}}{PPJ} \left[ \frac{A}{rad} \right]$$
 Eq. 3.6

where PPJ is the peak-to-peak incoming jitter in radian and  $K_{PDF}$  is the constant depending on the form of the jitter probability density function. It is shown in Appendix-C that the  $K_{PDF}$  lies between 1.4 to 1.7 for a typical jitter probability density function.

#### 3.1.3.3 Comparison of linear PD and binary PD

Linear phase detector utilizes the duration difference of up-impulse and down-impulse to derive the phase error. The impulse distortion or mismatch in any delay units can cause phase offset. The example of compensation technique is shown in Figure 3.4 a). The delay unit, shown in dashed line, is normally applied to emulate the delay time of a D-filpflop. By this way, the linear phase detector can generate the equal impulse durations in the lock condition and the phase offset can be reduced. However, the linear phase detector still has a drawback that the small and accurate impulses are required in high data rate operation.

Binary phase detector uses statistic to derive phase error. If there is phase error, the amounts of phase-early and phase-late impulses are not equal. In lock condition, phase error approaching zero, the binary phase detector generates the same amounts of phaseearly and phase-late impulses. With this principle, it has more robust operation and is more independent from non-idealities. Therefore, binary phase detectors are often applied in high data rate CDR. Figure 3.7 shows CDR designs reported in international solid-state circuit conference from 1999 to 2006. The CDRs with binary phase detectors are shown in black dots and CDRs with linear phase detectors are depicted as triangle symbols. The reference numbers of the publications are shown in square parentheses and their details are listed in Appendix-H. The performances of CDRs are verified by using the ratio between their data rate and the transit frequency of the transistor in their Technologies. In the CMOS Technologies, there is no transit frequency, therefore the unity gain frequeny of a transistor with a fixed ratio between its width and length are applied instead. The numbers in the parentheses beside the reference numbers show the ratio between data rates and operation frequencies. The majority of the latest designs used the binary phase detector and the clock rate reduction architecture.



Figure 3.7 CDR reported in international solid-state circuit conference from 1999 to 2006

Unlike in the linear phase detector, Alexander's phase detector does not use the incoming serial data connecting directly to the phase detector logic circuit. Therefore, the quality of the serial data input, the voltage swing and the rise/fall times, does not effect its operation, consequently, the data receiver buffer is not always required. In some applications, sense-amplifiers are applied as the input stages in order to reduce the voltage swing of the transmitted data, therefore the entire power consumption for the data communication is minimized, as reported in [20].

In comparison to the binary-state binary phase detector, the Alexander's phase detector has no phase error accumulation by absence of data transitions. From these advantages, the Alexander's phase detector is selected and applied in this thesis.

#### 3.1.4 Frequency detector for serial data

The frequency difference detection between random serial data and a clock is complicate because of absence of the data transitions in the random serial data. The frequency difference can not be detected by edge counting or latches and resetting method like the

PFD in clock synthesizer. The rotational frequency detection can be applied in the frequency detector in CDR.

#### 3.1.4.1 Rotational frequency detector

A rotational frequency detector utilizes two phase detectors and a quadrature-phase clock to sense the frequency difference. Figure 3.8 shows the operation of the full-rate clock rotational frequency detector. The quadrature phase clock consisting of clk-0 and clk-90 divides one bit period or unit interval into four phase states, PS-1, PS2, PS3, and PS-4. The incoming serial data by various data rates are depicted with their marks: (a) the clock frequency is higher than the incoming data rate, (b) the clock frequency is lower, and (c) the clock frequency matches the incoming data rate. If the clock frequency is higher, the clock phases overtake the data phase. Therefore, data transitions forwards rotationally occur from PS-1, PS-2, PS-3, and PS-4 respectively. On the opposite way, if the clock frequency is lower, the data phase overtakes the clock phases. Therefore, data transitions backwards cyclically occur from PS-4, PS-3, PS-2, and PS-1. If the clock frequency matches the data rate, there is no phase state rotation. Data transitions alternately occur between PS-2 and PS-3.

The operation of the rotational frequency detector can be more explicitly described by a rotation ring analogy [21]. They are shown in Figure 3.9 and Figure 3.10 for the case of higher and lower clock frequencies respectively. In Figure 3.9, the black dot in the external ring symbolizes the phase of the data transition. The black square symbol and the grey dot of the internal ring symbolize the phases of the quadrature phase clock. Four snap shots of the rotation rings are depicted in time sequence beginning from top-left corner. If the angular frequency of the clock,  $\omega_{clk}$ , is higher than that of the serial data,  $\omega_{data}$ , the clock phases overtake the data phase. Figure 3.10 shows the rotation rings when the clock frequency is lower. The data phase overtakes the clock phases.



- (a)  $f_{clock}$  is higher, data transitions occur in PS-1 -> PS-2 -> PS-3 -> PS-4
- (b)  $f_{clock}$  is lower, data transitions occur in PS-1 -> PS-4 -> PS-3 -> PS-2
- (c)  $f_{\rm clock}$  matched the data rate, data transitions alternately occur in PS-2 and PS-3

Figure 3.8 The operation of the rotational frequency detector



Figure 3.9 Rotation ring analogy: clock frequency is higher



Figure 3.10 Rotation ring analog: clock frequency is lower

The simplified block diagram of the rotational frequency detector and its internal signals are depicted in Figure 3.11. It consists of two phase detectors and an additional frequency detection logic. If the clock frequency is higher, the clock phase 0°, the gray clock, overtakes the data phase before the clock phase 90°, the black clock. Therefore, the phase error signals from PD-1 leads the phase error from PD-2. It occurs in the opposite way, if the clock frequency is lower. The frequency detection logic utilizes these two internal signals to reduce the frequency error. If the frequency error becomes zero, there is no phase rotation, therefore the frequency detector is automatically inactivated. It can be recognized that without the second phase detector and the quadrature-phase clock, phase rotation can be detected but the phase rotation direction is unknown. Therefore, the second phase detector and the quadrature-phase clock are required to provide frequency error direction.

In the implementation of the PFD for CDR, both types of binary phase detectors can be applied. The PFD is simpler by using binary-state binary phase detector, with data sampling clocks as presented in [12], however, the additional circuit for data recovery is required. In [14], the rotational frequency detector for CDR was minimized for frequency

tracking only, therefore the separated phase detector is applied for data regeneration. By applying Alexander's phase detector to the rotational frequency detector, the data regeneration is inherent. Combining with a clock rate reduction architecture, the CDR can provide the data demultiplexing function. Moreover, the required additional sampling phases for the frequency detection can be optimized by applying a time-interleaving architecture that will be discussed in Section 3.3.2.2. Therefore, the CDR with 1/4-rate PFD based on Alexander's phase detector is proposed in this thesis.



Figure 3.11 The block diagram and operation of a rotational frequency detector

#### 3.2 Clock rate reduction architecture of CDR

The phase frequency detector in CDR has to operate at the incoming data rate. Unlike the phase frequency detector in the clock synthesizer, it operates at the reference clock frequency. For instance, the phase frequency detector in a 2.5Gb/s CDR has to operate at 2.5 GHz. The operation frequency can be reduced by using a multi-phase clock. The sampling phases of CDRs without and with clock rate reduction are depicted in Figure 3.12. The full-rate binary phase detector has to sample two points within one unit interval, as shown in Figure 3.12 a). Both rising and falling edges of a full rate clock are utilized for sampling the serial data. The similar sampling phases can be obtained by a multi-phase

clock at the lower frequency. Figure 3.12 b) depicts the sampling behavior of the 1/4-rate phase detector by using four clock phases. The sampling data are retimed and processed at the 1/4 frequency of the data rate. Figure 3.12 c) shows the sampling phases of the full-rate CDR with phase/frequency detector. The double sampling points, four sampling points per unit interval, are required to provide phase and frequency errors. The sufficient sampling phases can be obtained from an 8-phase 1/4-rate clock. Figure 3.12 d) depicts the sampling phases of the 1/4-rate clock CDR with phase frequency detector. The concept can be further applied for an 1/8-rate or an 1/16-rate architectures but the circuit complexity will increase by the larger number of the parallel logic circuits. In multi-phase clock, phase offsets are problematic. However, it can be minimized by various layout design techniques and skew calibration scheme that will be discussed in the later section.



Figure 3.12 The sampling phases of CDRs without and with clock rate reduction

With this principle, the operation frequency of phase/frequency detector is reduced. Furthermore, the regenerated data are automatically demultiplexed. Therefore, the additional DEMUX is not required.

#### 3.2.1 Comparison of current-mode logic (CML) and CMOS logic



Figure 3.13 Circuit topology comparison between CMOS logic and current-mode logic

The full-rate clock PFD in CDR has to operate at the incoming data rate, therefore the current-mode logic (CML) has to be applied as the high speed digital circuit. Figure 3.13 depicts a circuit topology comparison between CMOS logic and CML. Table 3.1 shows a comparison of the setup time and clock-to-Q delay of the CML and CMOS logic D-FFs in 0.18µm CMOS Technology. The CML D-FF is constructed by using two CML latches shown in Figure 3.13 d). The CMOS logic D-FF is based on a true single-phase-clock D-FF in [22]. Table 3.2 represents a comparison of the CML and CMOS logic clock buffers. They are simulated with two unit loads connected at their outputs.

Table 3.1 Comparison of CML and CMOS logic D-FFs

| Logic style | Setup time | Clk-to-Q delay |
|-------------|------------|----------------|
| CML         | 40ps       | 63p            |
| CMOS Logic  | 200ps      | 100p           |

Table 3.2 Comparison of CML and CMOS logic clock buffers

| Clock Buffer | t-delay | Variation of t-delay by supply noise 100mVp-p |
|--------------|---------|-----------------------------------------------|
| CML          | 26ps    | 0.6ps                                         |
| CMOS logic   | 57ps    | брѕ                                           |

The D-FF with CML circuit has a shorter delay time and setup time. The CML clock buffer has also better supply noise rejection, as shown in Table 3.2, because of its differential topology. The D-FF with CMOS logic circuit has longer delay time and setup time because it has larger intrinsic input capacitive load from PMOS transistors. Therefore, the CMOS logic is suitable to operate at a moderate or low frequency. The power consumption of a CMOS logic circuit is dominated by dynamic power. It can be calculated by

$$P_{CMOS} = V_{dd}^{2} \cdot C_{eff} \cdot f_{clk}$$
 Eq. 3.7

where  $P_{CMOS}$  is the power consumption of a CMOS logic circuit,  $V_{dd}$  is the supply voltage,  $f_{clk}$  is the clock frequency of the logic circuit, and  $C_{eff}$  is the effective output capacitive load of the logic circuit that is defined by parasitic capacitors of interconnection, the input capacitors of the logic units connected to its output, and the transition density of the output signal. Comparing with the power consumption of a CML circuit that can be calculated by

$$P_{CML} = \frac{V_{dd} \cdot V_{\text{log}ic-swing}}{R_{CML-load}}$$
 Eq. 3.8

where  $P_{CML}$  is the power consumption of a CML circuit,  $V_{logic\text{-}swing}$  is the voltage swing of CML signal, and  $R_{CML\text{-}load}$  is the effective resistive pull-up load. The main portion of the power consumption of the CML circuit is the tail current that can be calculated by its logic swing and the pull-up resistive load.

Figure 3.14 depicts the power consumptions of a CMOS logic circuit and a CML circuit by various operation frequencies. The power consumption of a CMOS logic circuit depends linearly on the operation frequency because the current is consumed when the logic value of the output node changes. The power consumption of a CML circuit is almost constant for all frequencies depending on its tail current. At high operation frequencies, the logic unit with CMOS logic consumes more power than that with CML. Moreover, the CML can reliably operate at high operation frequencies because of its shorter delay time. In most designs, the CML circuit part is minimized by applying only in the necessary function. For instance, in a high data rate, Gb/s, multiplexer (MUX) implemented with tree-structure, the front-ended part is in CML while the low data rate part is in CMOS logic [23]. The clock rate reduction technique, which has been discussed in the previous section, can reduce the operation frequency. Therefore, the CMOS logic part can be expanded. However, only a small amount of power consumption can be reduced because the clock reduction architecture utilizes a parallel structure. Nevertheless, by using clock rate reduction architectures, the maximum data rate can be achieved for a certain technology.



Figure 3.14 Power consumptions of CMOS logic and CML [23]

#### 3.2.2 Comparison of a full rate and an 1/4-rate clock PFDs for CDRs

Figure 3.15 shows the detailed power consumptions of a full-rate-clock PFD in CML and an 1/4-rate clock PFD in CMOS logic obtained by simulations at data rate of 2.5Gbps in 0.18µm CMOS Technology. In the full-rate PFD, the large portion of the power

consumption is spent on the CML clock buffer in order to guaranty sufficiently short rise and fall times at a high clock frequency. In the 1/4-rate PFD logic circuit, clock buffers also share the large part of the power consumption because they have to supply a multi-phase clock for the parallel structure. The phase offset compensation circuit for a multi-phase clock consumes additional power. Even if the logic circuit in the 1/4-rate PFD is more complicate but its power consumption is not higher because it operates at lower frequency.



Figure 3.15 Detail power consumptions of CML and CMOS-logic PFDs

The 1/4-rate clock PFD consumes slightly less power than the full-rate PFD but the 1/4-rate architecture is an inherent 1-to-4 DEMUX. Therefore, the entire 1-to-4 deserializer consumes less power by the 1/4-rate architecture. In another aspect, the 1/4-rate architecture can be implemented in a less advance technology in which the operation frequency is limited. Moreover, if CDR is designed for a wide range operation, the CDR with CMOS logic is more attractive, because it consumes less power than that with CML at a low data rate.

### 3.3 Power efficient CDR with 1/4-rate reduced-samplingphase time-interleaving PFD (proposed in this work)

A power efficient CDR with 1/4-rate PFD is proposed in this work. It has time-interleaving architecture providing the improve frequency capture range. Then, the optimization of the sampling phases for frequency detection is possible. This leads to the smaller circuit area and less power consumption. The phase detection is based on the Alexander's phase detector. It has a robust operation and can provide data recovery function. The CDR operates at the 1/4-rate clock, therefore the logic circuit can be implemented in CMOS logic style. In the CDR's loop characteristic design, the loop bandwidth reduction technique using a divided frequency impulse modulator and a phase error accumulator is proposed. The monolithic CDR has a satisfactory loop bandwidth without an external loop component.

#### 3.3.1 Architecture

The block diagram of the CDR with reduced-sampling-phase time-interleaving 1/4-rate PFD and its sampling phases are shown in Figure 3.16. The CDR consists of a fully differential 4-phase ring-based VCO, a skew calibration circuit and phase generator, 12 sense-amplifiers, a time-interleaving 1/4-rate PFD, a divided frequency impulse modulator and a phase error accumulator (DFIM&PEA), a charge pump and loop filter components (CP&LF). The skew-calibrated 8-phase clocks are provided by using both rising and falling edges from the VCO, clk-0 to clk-7. The additional 4-sampling phases, clk-A to clk-D, are generated by delay cells for frequency difference detection. They are applied in 12 sense-amplifiers to sample the incoming serial data covering 4 unit intervals. Clk-0 to clk-7 sampling at the middle of the data eye and the data transitions to provide phase error. The additional sampling phases, clk-A to clk-D, divide one unit interval into four phase states for phase rotation detection, as a rotational frequency detector. Only one from two unit intervals has these additional sampling phases. Therefore, the phase state is indicated one time per 2 unit intervals. It slightly degrades the frequency capture range. However, by

using a time-interleaving architecture, it is sufficient for the frequency capture range of 50-100MHz for 2Gb/s data rate. This sampling phase reduction decreases the complexity of the 1/4-rate parallel architecture, therefore the power consumption and occupied areas are largely reduced.



Figure 3.16 The block diagram of the power efficient 1/4-rate CDR and its sampling phases

## 3.3.2 Building blocks

## 3.3.2.1 Sense Amplifier

Figure 3.17 represents the input stage of the CDR. It consists of sense-amplifiers operating as data sampling units connected in a parallel order, like [20]. They are controlled by a multi-phase clock from the VCO. The outputs of sense-amplifiers are connected to the inputs of the latches to hold the sampling data. The schematics and operation of the sense-amplifiers and latches are shown in Figure 3.18. The outputs of the sense amplifier are precharged to supply voltage when the clock 'clk' is low. The serial data are evaluated when the clock goes high. The two inverters consisting of PMOS and NMOS transistors in a sense-amplifier are connected in a latch form in order to regenerate the sampled differential input. The outputs are held even if the incoming data change after the sampling. The outputs of sense-amplifiers are used as the set and reset input of a SR-latch. Therefore, the latch outputs represent the sampling data and stay constant until the next evaluation. By using sense-amplifiers as input stages, the CDR locks to the incoming data directly at the interconnection termination. There is no delay time of an input buffer, thus it is more convenient to determine the interconnection delay time.



Figure 3.17 Input state of 1/4-rate CDR



Figure 3.18 Schematics and operation of sense-amplifier and latch

## 3.3.2.2 1/4-rate reduced-sampling-phase time-interleaving PFD

The operation of the synchronous 1/4-rate PFD is depicted in Figure 3.19. The operation of the improved version with a reduced-sampling-phase time-interleaving architecture is represented in Figure 3.20. Both show an example case that the clock frequency is too low. The phase state incidences, which are the data transitions occurring by different phase states, are shown as numbers in brackets on top of the figure. They are indicated by edge detectors and retimed by a system clock to create the retimed phase states, shown as numbers in brackets with their corresponding retiming clocks beneath a multi-phase clock. The retimed phase states are used to generate the signals 'Q1', 'Q2', 'F-down-disable', and 'F-up-disable' by the conditions specified in Table 3.3. As depicted in Figure 3.19, the synchronous 1/4-rate PFD evaluates the phase states every unit interval and retimes the results by a single clock phase, clk-1. Unlikely, in the reduced-sampling-phase time-interleaving 1/4-rate PFD, the phase states are evaluated one time per two unit intervals but the results are retimed by various clock phases. The phase state resolution is improved by this time-interleaving retiming. A similar phase state resolution like a full rate PFD is obtained. In order to compare the frequency capture ranges of the various PFD

architectures, the limitation of the frequency capture range by the VCO tuning range is not considered. The frequency capture range of the PFD is estimated by the maximum frequency of the phase state rotation that is possible to be detected by the PFD. As shown in Appendix-D, the frequency capture range of the synchronous 1/4-rate PFD can be estimated by

$$FCR - 1/4 - rate - sync = \frac{f_{Data}}{16} = 0.0625 f_{Data}$$
 [Hz] Eq. 3.9

where FCR-1/4-rate-sync is the frequency capture range of the synchronous 1/4-rate PFD and  $f_{data}$  is the data rate frequency. The frequency capture range of a reduced-sampling-phase time-interleaving 1/4-rate PFD can be estimated by

$$FCR - 1/4 - RSP - TIL = \frac{f_{data} \cdot D_T}{4 \cdot N_{PSI}} \quad [Hz]$$
 Eq. 3.10

where FCR-1/4-RSP-TIL is the frequency capture range of the reduced-sampling-phase time-interleaving 1/4-rate PFD,  $D_T$  is the data transition density, the number of data transitions per unit interval,  $N_{PSI}$  is the number of unit intervals between two phase state indications.

In this design,  $N_{PSI}$  is two. The  $D_T$  is around 0.6 for a serial data with 8b/10b coding scheme, therefore the frequency capture range of the reduced-sampling-phase time-interleaving 1/4-rate PFD can be calculated by Eq. 3.10 as 0.075  $f_{data}$  that is comparable to the frequency capture range of the synchronous 1/4-rate PFD in Eq. 3.9. By using the reduced-sampling-phase time-interleaving 1/4-rate PFD, the complexity of parallel circuit is reduced leading to smaller area and less power consumption.



Figure 3.19 The operation of the synchronous 1/4-rate PFD: clock frequency too low [15]



Figure 3.20 The operation of the reduced-sampling-phase time-interleaving 1/4-rate PFD,  $f_{clk}$  too low



Figure 3.21 Block diagram of a reduced-sampling-phase time-interleaving 1/4-rate PFD

Figure 3.21 represents the block diagram of the reduced-sampling-phase time-interleaving 1/4-rate PFD. The retiming clock phases have been slightly adapted to be more suitable for the phase state incidences. In fact, clk-1 and clk-5 retime the phase incidences PS-3 and PS-4 and clk-2 and clk-6 retime the phase incidences PS-1 and PS-2. All results from the time-interleaving parallel phase state indicators are combined by using the OR-gates. The internal signal 'Q1', 'Q2', 'F-down-disable', and 'F-down-disable' are generated corresponding to Table 3.1. The Phase detector indicates phase errors in a similar way as Alexander's phase detector and generates 'PS-1-or-2' and 'PS-3-or-4' signals representing clock late and early.

**Table 3.3 Operation Table of Frequency Detector** 

| Signal         | Set-condition                | Reset-condition      |
|----------------|------------------------------|----------------------|
| Q1             | F-down-state-4 = '1'         | F-up-state-2 = '1'   |
| Q2             | F-up-state-1 = '1'           | F-down-state-3 = '1' |
| F-down-disable | Rising edge of Q1 & Q2 = '1' | Falling edge of Q1   |
| F-up-disable   | Rising edge of Q2 & Q1 = '1' | Falling edge of Q2   |

If the frequency error is larger than the CDR loop bandwidth, a phase state rotation occurs and the phase detector indicates the same amounts of phase early and late. Therefore, the same amounts of both frequency-up signal 'PS-1-ro-2' and -down signal 'PS-3-ro-4' are generated and CDR cannot track the phase of serial data. The 'F-down-disable' and 'F-up-disable' signals produce the required unbalance of the frequency up and down signals to compensate the frequency error. The detailed operations of a reduced-sampling-phase time-interleaving 1/4-rate PFD in cases of clock frequency too high and too low are depicted in Figure 3.22 and Figure 3.23 respectively. The corresponding 'F-down-disable' and 'F-up-disable' signals are gated with 'PS-1-or-2' and 'PS-3-or-4' signals to generate 'F-up-12' and 'F-down-34' signals which drive the clock frequency track to the incoming data rate.



Figure 3.22 The operation of the reduced-sampling-phase time-interleaving 1/4-rate PFD,  $f_{clk}$  too high



Figure 3.23 The operation of the reduced-sampling-phase time-interleaving 1/4-rate PFD,  $f_{clk}$  too low

## 3.3.2.3 Voltage controlled oscillator (VCO)

A ring-based VCO can inherently generates a multi-phase clock required for 1/4-rate CDR. It provides a moderate jitter quality that is sufficient for the data recovery function. Therefore, it is suitable for 1/4-rate architecture.

## 3.3.2.3.1 Ring-based VCO for 1/4-rate CDR

Figure 3.24 depicts the circuit of the ring-based VCO. The differential delay unit is used because it has good supply noise rejection. The voltage swing is controlled by a replicated bias circuit [25]. The VCO gain is reduced to decrease the output clock jitter. The overall frequency range is extended through digital control bits. The frequencies are configured by switching resistive loads in delay units. The control voltage is converted to current by an V-I converter in order to make its characteristic curve linear. In order to reduce the phase offset in the multi-phase clock, the matching layout techniques, i.e. balancing interconnection delays, common centroid topology, and dummy cells, are applied.



Figure 3.24 Ring-based VCO

The VCO is optimized to reduce both device noise and interference noise. Device noise is the intrinsic noise in electronic components such as thermal noise and flicker noise in transistor. The jitter caused by the white noise sources has been analyzed in [26]. The impulse sensitivity function of an oscillator is applied in [27]. The phase noise in oscillators are well concluded in [28].

The noise sources in a differential delay unit in a ring-based oscillator is depicted in Figure 3.25. The phase noise from the white noise sources is calculated in [28] by

$$\mathcal{L}_{wn}(f_{offset}) = \frac{2 \cdot kT}{I \cdot \ln 2} \left[ \gamma \cdot \left( \frac{3}{4 \cdot V_{effd}} + \frac{1}{V_{efft}} \right) + \frac{1}{V_{op}} \right] \cdot \left( \frac{f_{osc}}{f_{offset}} \right)^{2} \quad \text{[dBc] Eq. 3.11}$$

where  $L_{wn}$  is the phase noise contributed by the white noise, k is the Boltzmann's constant, I is the tail current of a delay unit,  $V_{effd}$  is the effective voltage over the gate-source of the switching transistor,  $V_{efft}$  is the effective voltage over the gate-source of the tail current transistor,  $V_{op}$  is the differential output voltage,  $f_{osc}$  is the oscillation frequency, and  $f_{offset}$  is the frequency offset of the noise from the oscillation frequency.

The main contribution of the phase noise from the flicker noise in the transistor is at the current mirror circuit. It is calculated by [28]

$$\mathcal{L}_{fn}(f_{offset}) = \frac{A \cdot K_f}{W \cdot L \cdot C_{ox}} \left[ \frac{1}{V_{eff}^2} \right] \cdot \frac{f_{osc}^2}{f_{offset}^3} \quad \text{[dBc]}$$
 Eq. 3.12

where  $L_{fn}$  is the phase noise contributed by the flicker noise, A is the multiplication factor of the current mirror circuit,  $K_f$  is the empirical coefficient, W is the width of the tail current transistor, L is the length of the tail current transistor, and  $C_{ox}$  is the specific gate capacitor.



Figure 3.25 Device noises in ring-based VCO

The phase contributed by device noise can be predicted by Eq.3.11 and Eq.3.12. It can be also simulated using Spectra-RF. Figure 3.26 shows the phase noise of a 500MHz clock at frequency offsets from 10kHz to 10MHz by the simulation and calculation. They are in agreement. In this design, the VCO with phase noise of –75dBc at 10kHz frequency offset is obtained.



Figure 3.26 Calculation and simulation of VCO phase noise at 500 MHz

Phase noise depicts jitter in frequency domain. Therefore, the cycle-to-cycle jitter of the oscillator clock can be calculated from its phase noise, as shown in [4][26] by

$$\phi_{\tau}^2 = \frac{\mathcal{L}(f_{offset}) \cdot f_{offset}^2}{f_{osc}^3}$$
 Eq. 3.13

where  $\phi_{\tau}$  is the root-mean-square of the cycle-to-cycle jitter,  $f_{offset}$  is the frequency offset of the phase noise from the oscillation frequency,  $f_{osc}$  is the oscillation frequency, and  $L(f_{offset})$  is the phase noise at  $f_{offset}$ . The contribution of the device noise in this oscillator to the CDR output clock is calculated by the integration of the VCO phase noise at frequencies higher than the CDR loop bandwidth, where the VCO phase noise dominates. If this VCO is applied in the CDR with loop bandwidth of 2MHz, the VCO phase noise contributes to the CDR clock output by approximately 2.8ps rms. The jitter from the supply-voltage-dependence of the delay units contributes the large part of jitter in a ring-based VCO. This jitter depends directly on the number of the delay stages. Consequently, the number of the delay stages has to be minimized in order to reduce this jitter. The supply voltage can be considered as the second control node of a VCO. For that reason, the supply voltage has to

be well regulated or filtered by a voltage regulator or a decoupling capacitor. By using differential circuit, the supply noise rejection of the VCO is improved.

In this design, the VCO frequency range is divided into many digital selectable frequency ranges. Digital control bits can be automatically adjusted by applying a digital finite state machine and a VCO control node tracking circuit, as reported in [29], but it is not implemented in this thesis.

## 3.3.2.3.2 Phase offset compensation and additional sampling phase generation

In order to further reduce the phase offsets, a skew calibration, as described in [30], is utilized. The block diagram is shown in Figure 3.27 a). Each clock phase has its own delay-locked loop (DLL) to calibrate the phase positions. The phase control hierarchy is shown in Figure 3.27 b). Clk-4 is calibrated to be in the middle of two successive clk-0s. Clk-2 is adjusted to be in the middle of clk-0 and clk-4. Similarly, clk-6 is controlled to be in the middle of clk-4 and the next clk-0. This method controls also the rest of the clock phases.



Figure 3.27 A skew calibration scheme

The additional sampling clock phases for frequency detection, clk-A, clk-B, clk-C, and clk-D, are generated by simple delay units as shown in Figure 3.28. The delay cells shown in dashed lines are applied as a dummy load for the clock load balancing. The optimum delay time of 1/4 of the UI provides the maximum frequency detection sensitivity, as derived in Appendix-E. Such sampling phases can be inherently obtained by a quadrature-phase clock. By using the sampling phase generation circuit in stead of a quadrature phase clock, the frequency detection sensitivity is slightly degraded leading to a longer frequency acquisition time. However, it is not critical for this CDR. Without the need of quadrature-phase clock, the VCO circuit is less complicate and the number of additional sampling phases can be optimized. The delay time of the delay unit is selected to be around 1/4 of UI of the maximum data rate. In this design, the delay time is selected to be 100ps.



Figure 3.28 The sampling phase generator and the skew calibration circuit

## **3.3.2.4 Charge pump**



Figure 3.29 Phase offset induced by charge offset or current mismatch

A tri-state charge pump with pull-up, pull-down current sources and current switches is applied because it can operate with small impulse inputs. Figure 3.29 represents the effect of charge offset and current mismatch in a charge pump. Figure 3.29 a) shows an example of charge pump input signals, its output currents, and a VCO control node in locked condition without charge offset or current mismatch. Up/down signals are set to occur simultaneously for simplicity. The operation of the charge pump with current mismatch is shown in Figure 3.29 b). In locked condition, the average of the VCO control node is constant. The current mismatch is compensated by the phase offset that provides the same amount as the charge offset  $Q_{offset}$ , as depicted in Figure 3.29 b). The phase offset can be calculated from

$$t_{offset} = \frac{Q_{offset}}{I_{cp}}$$
 Eq. 3.14

where  $t_{offset}$  is the time offset caused by charge offset,  $Q_{offset}$  is the charge offset, and  $I_{cp}$  is the charge pump current. From Eq. 3.14, it shows that the charge pump current can not be too small, if the charge pump current is decreased in order to reduce the CDR loop bandwidth. The phase offset makes impulse patterns occur at the VCO control node and

corresponding spurs at the power spectrum of the output clock. However, the charge offset can be diminished through the techniques described in the following section.

#### 3.3.2.4.1 Phase offset reduction



Figure 3.30 Charge pump circuit

A tri-state charge pump and its charge offset compensation circuit are shown in Figure 3.30. The offset charge is caused by the charge coupling from the control signals of the current switches, 'Up-p' and 'Down-n'. It also results from the charge sharing from the parasitic capacitors of the current sources,  $C_{csp}$  and  $C_{csn}$ . This charge sharing can be reduced by forcing the voltages of  $C_{csp}$  and  $C_{csn}$  to follow the VCO control node [31]. It is realized by the operation amplifier Op-1. It forces the node  $V_x$  to follow the VCO control node. During the inactivated period when 'Up-p' and 'Down-p' are 'low', both current sources are connected to the node  $V_x$ , therefore their voltages also track the charge pump output (the VCO control node). A phase offset can be induced by a current mismatch between pull-up and pull-down currents. It increases if the VCO control node is far from the intrinsic common-mode of the charge pump because of the finite output impedances of the pull-up and pull-down current sources. It can be reduced by applying the adjustable current sources, which is depicted in Figure 3.30 as  $I_{cpp}$  and  $I_{cpn}$ . They are around 10 percent of pull-up and pull-down currents. The currents are adjusted by tracking the voltage of the charge output. In fact, if the voltage of the VCO control node is higher than the

intrinsic common-mode of the charge pump, the pull-up current source is likely to supply less current than the pull-down current source. Therefore  $I_{cpp}$  is increasing while  $I_{cpn}$  is decreasing to reduce the current mismatch. The intrinsic common-mode of the charge pump is generated by the replicated circuit shown in Figure 3.30 left.

# 3.2.3 Loop bandwidth reduction technique in CDR using the divided frequency impulse modulation technique (proposed in this thesis)



b) Block diagram of PLL-based CDR with divided frequency impulse modulation

Figure 3.31 PLL-Based CDRs without and with the divided frequency impulse modulation

In the PLL-based clock synthesizer, there is a frequency divider in the feedback path. It can reduce the open loop unit gain frequency and, hence, the loop bandwidth. Unfortunately, it is not available in the conventional PLL-based CDR. The divided frequency impulse modulation technique proposed in this thesis can provide the comparable dividing factor for loop bandwidth adjustment. Figure 3.31 shows the comparison between the conventional PLL-based CDR and the PLL-based CDR with divided frequency impulse modulation. The additional components, i.e. a divided frequency impulse modulator and a

phase error accumulator (DFIM&PEA), are applied. The operation of the divided frequency impulse modulation, which is an example with a dividing factor N being 2, is shown in Figure 3.32. The charge pump is activated by the phase frequency error signals gated by the generated constant duration impulses. The modulation frequency is lower than the VCO frequency by a factor of N. The phase error accumulator collects phase error between the activating impulses and uses the majority vote to decide the output status, i.e. frequency up, frequency down, or no action. This reduces the average current of the charge pump without decreasing the charge pump current, which induces larger phase offset. It helps not only to decrease the CDR loop bandwidth but also to reduce the activity of the charge pump and noise at the VCO control node and, hence, the jitter of the output clock. By using the divided frequency impulse modulation technique, the satisfactory loop bandwidth can be achieved without external loop component.



Figure 3.32 The operation of the divided frequency impulse modulation

Figure 3.33 represents the linear model of this 1/4-rate binary CDR with divided frequency impulse modulation. The average charge pump current can be calculated by

$$I_{cp-av} = \frac{I_{cp} \cdot P}{4 \cdot UI \cdot N}$$
 Eq. 3.15

where  $I_{cp-av}$  is the average charge pump current with the divided frequency impulse modulation,  $I_{cp}$  is the charge pump current injected to the loop filter during the activating duration, N is the dividing factor and P is the impulse duration. As discussed in section 3.1.3.2.3, the phase detector gain of a digital phase detector,  $K_{PD}$ , is constant for a certain incoming jitter probability density function (PDF) and is a function of the average charge pump current,  $I_{cp-av}$ . Hence,  $K_{PD}$  can be determined by using Eq. 3.6 and applied in a PLL linear model for the loop characteristic design.



Figure 3.33 A linear model of the 1/4-rate CDR with divided frequency impulse modulation

## 3.2.4 Loop bandwidth design for jitter tolerance

As derived in [32], the jitter tolerance of the CDR with linear phase detector can be described in the following function.

$$JTOL(s) = \frac{1}{1 - JTRAN(s)}$$
 Eq. 3.16

where JTOL(s) determines the shape of the jitter tolerance and JTRAN(s) is the jitter transfer function obtained by the PLL linear model. If the jitter transfer function is simplified to be a asymptotic single-pole low-pass transfer function, as shown in [32]. It can be written as

$$JTRAN(s) = \frac{\omega_{-3dB}}{s + \omega_{-3dB}}$$
 Eq. 3.17

where  $\omega_{.3dB}$  is the corner frequency of the jitter transfer function or the loop bandwidth. Eq. 3.16 can be rewritten as

$$JTOL(s) = \frac{s + \omega_{-3dB}}{s}$$
 Eq. 3.18

The jitter tolerance of a CDR can be estimated from its loop bandwidth because it shares the same corner frequency, as shown in Figure 2.2. In fact that, the CDR can tolerate small jitter amplitudes at the jitter frequencies higher than the corner the frequency. For the jitter frequencies lower than the corner frequency, the CDR can tolerate larger jitter amplitudes. By decreasing the jitter frequency, the tolerated jitter amplitude is increasing.





jitter tolerances by various jitter P-P

Figure 3.34 Jitter transfer functions and Jitter tolerance of binary phase detector

The gain of a binary phase detector is not a constant. It depends on the incoming peak-to-peak jitter. Thus, it is more complicate to design the CDR with binary phase detector to comply with the jitter tolerance specification. Figure 3.34 a) represents the jitter transfer functions of the CDR with a binary phase detector in various input jitter amplitudes or peak-to-peak jitters (jitters, P-P). As shown in Eq. 3.6, the gain of binary phase detector decreases, when the incoming peak-to-peak jitter is larger. As shown in Figure 3.34 b), the actual jitter tolerance of a CDR with digital phase detector can be plotted by the aid of the presumed jitter tolerances by various incoming jitter peak-to-peak amplitudes. A constant jitter transfer function of a CDR by a fixed incoming jitter amplitude is used to draw a presumed jitter tolerance, as shown as a dashed line in Figure 3.34 b). The presumed jitter

tolerances by various incoming jitter peak-to-peak amplitudes can be obtained. The actual jitter tolerance is drawn by using the crossing points between the presumed jitter tolerance plots and the actual jitter peak-to-peak amplitudes. For instance, the presumed jitter tolerance of the incoming peak-to-peak jitter of 0.6UI crosses the peak-to-peak jitter value of 0.6 UI line, marked on the Y axis, at the frequency of 2.4MHz. The frequency in the actual jitter tolerance plot for the incoming peak-to-peak jitter of 0.7UI is 1.8MHz. By this way, the actual jitter tolerance of the CDR with digital phase detector can be predicted. The CDR's loop characteristic can be designed to comply with a jitter tolerance mask. The slope of the jitter tolerance of the CDR with binary phase detector is flatter than that of the CDR with linear phase detector.

## 3.2.5 Simulation results



Figure 3.35 Simulation of CDR approaching lock

The CDR is simulated in many design levels. The building blocks of the 1/4-rate CDR are simulated in transistor level for performance optimizations and characterizations. The loop characteristic is designed by the simulation of PLL linear model using the loop parameters from the transistor level simulations of the building blocks. First the PFD logic circuit is verified by the AHDL building blocks in order to get a reasonably short simulation time. The transistor level simulation is done later. Figure 3.35 shows the simulation results of the CDR approaching lock. The phase state rotation was observed by the PFD. In this simulation, the VCO frequency was too low at the beginning. The 'f-down-disable' signal, shown in Figure 3.35 as '/DISABLE\_DOWN', was generated to drive the frequency lock and disappeared when the frequency acquisition was achieved.



Figure 3.36 Eye diagram of the incoming serial data and the recovered clocks.

Figure 3.36 shows eye diagrams of the incoming serial data and the multi-phase clock in the lock condition. Clk-0 and Clk-2 sample at the middle of the data eye while Clk-1 sample at the data edges. The jitter transfer function was simulated using the PLL linear model, as represented in Figure 3.37. The loop bandwidth was designed to be 1 MHz with the incoming peak-to-peak jitter of 0.3 UI.



Figure 3.37 the simulated CDR loop bandwidth

## 3.2.6 Experiment results

The CDR with reduced-sampling-phase time-interleaving 1/4-rate PFD was implemented in 0.18µm CMOS Technology. The die photograph is depicted in Figure 3.38. It covers data rates from 1 to 2Gb/s with a configurable loop bandwidth from 700kHz to 4MHz without external loop component. The frequency capture range is larger than 100 MHz. It consumes 80mW from 1.8V supply. The CDR occupies an area of 0.47 mm<sup>2</sup> and an additional area of 0.32 mm<sup>2</sup> for on-chip loop capacitors. By comparing with the previous CDR with synchronous 1/4-rate PFD in [15], the techniques in this work reduce the CDR core circuit area by 30% and the power consumption by 40%. In this design, more power consumption is spent for the VCO. However, the overall power consumption is reduced by 20%. Jitter is reduced by 30% because of the divided frequency impulse modulation technique and the improved VCO. Its performance is summarized in Table 3.4. Figure 3.39 a) depicts 4-bit output of DEMUX and the recovered clock. The jitter histogram of the corresponding recovered clock is shown in Figure 3.39 b). The recovered clock from the 2Gb/s PRBS 2<sup>7</sup>-1 serial data has jitter of 4.6ps rms. The CDR was tested with PRBS 2<sup>7</sup>-1 serial data with emulated ISI of 12 meters RG-58 cables. The jitter histograms of the incoming serial data and its corresponding recovered clock are depicted in Figure 3.40. The incoming serial data have jitter of 150ps P-P, and the recovered clock has jitter of 6.9ps rms. The jitter tolerance of the 1/4-rate CDR tested at a data rate of 2Gb/s is depicted in Figure 3.41 in comparison with the jitter tolerance mask for SONET OC-48 specification. The CDR complies with the specification. The slope of the jitter tolerance at jitter frequencies inside the loop bandwidth is flatter than -20dB/Dec as predicted in Section 3.2.4.



Figure 3.38 The die photograph of the 1/4-rate CDR

Table 3.4 Performance summary of the 1/4-rate CDR

| Technology          | 0.18μm CMOS                       |
|---------------------|-----------------------------------|
| Active area         | CDR: 0.47 mm <sup>2</sup>         |
|                     | Loop Filter: 0.32 mm <sup>2</sup> |
| Power consumption   | 80 mW at 1.8 Vdd                  |
| VCO frequency       | 240 - 550MHz                      |
| Data rate           | 1 - 2.1Gb/s                       |
| Loop bandwidth      | 700kHz - 4MHz                     |
| Freq. Capture range | > 100MHz                          |



Figure 3.39 DEMUX output and the jitter histogram of the recovered clock



Figure 3.40 The 1/4-rate CDR testing with serial data with ISI



Figure 3.41 The jitter tolerance of the 1/4-rate CDR at 2Gb/s

# 3.4 Summary

The clock and data recovery circuit for large synchronous networks has to operate without external reference clock. Therefore, the frequency detector has to be applied in order to extend the CDR frequency capture range. The Alexander's phase detector is selected because it is suitable for CDR with sense-amplifiers as input stages and has an inherent data recovery function. By using the clock rate reduction architecture, data demultiplexing is automatically obtained and the PFD logic can be implemented with low power CMOS logic style. The power consumption of the CMOS logic circuit also adapts to the data rate leading to the advantage for the wide range operation. By applying the time-interleaving architecture in the 1/4-rate CDR, the additional sampling phases for frequency error detection can be optimized. The 1/4-rate CDR with reduced-sampling-phase timeinterleaving PFD is proposed in this thesis and was implemented in 0.18µm CMOS Technology. It covers data rate from 1 to 2Gb/s and can operate without external reference clock. With the proposed loop bandwidth reduction technique, divided frequency impulse modulation, the CDR has a configurable loop bandwidth from 700kHz to 4MHz without the need of the external loop component. It consumes 80mW from 1.8V supply voltage and has a recovered clock with 4.6ps rms jitter.

# Chapter 4

# Design of PLL-based clock-jitter-filter

# 4.1 State of the art

PLL-based clock synthesizer are widely used in timing circuit. The loop characteristic can be designed by parameters in its building blocks. As discussed in section 2.4.3, the loop filter in PLL can be optimized to reduce the jitter from the reference clock or the VCO phase noise. For the clock-jitter-filter application, the input jitter has to be suppressed by its small loop bandwidth. This causes a long loop acquisition time. However, it is not critical in this application. The VCO phase noise dominates the jitter quality of the output clock. For this reason, the low jitter VCO has to be applied. In Section 4.1.1, one approach of a clock-jitter-filter, the quartz crystal based phase-locked, is discussed.

## 4.1.1 Quartz crystal based Phase-Locked Loop (QPLL)

The Quartz crystal based PLL utilizes the narrow spectrum of a quartz crystal for clock generation. The structure is similar to a conventional PLL-based clock synthesizer but the VCO of QPLL is a voltage control crystal oscillator (VCXO). The extremely small clock jitter can be obtained by using the high performance separated VCXO and small loop bandwidth with off-chip loop components [33]. Another approach, which has less off-chip components, utilizes a simple quartz crystal and an on-chip varactor for the phase and frequency adjustment. Figure 4.1 a) shows the architecture of this QPLL [34]. The frequency is adjusted by its capacitive load, as depicted in Figure 4.1 b). The frequency

ranges of the quartz oscillator are digitally calibrated. The on-chip varactor is applied for analog frequency adjustment. It can provide a good clock jitter filtering. However, it is not a monolithic solution because the off-chip quartz oscillator is required. Furthermore, it can generate only in the middle range clock frequency between 100-200MHz. The high frequency clock for data serialization in a serial data transmitter has to be generated by a separated clock synthesizer.



Figure 4.1 Quartz oscillator phase-locked loop (QPLL) [34]

# 4.2 PLL-based clock-jitter-filter with LC-VCO

A clock synthesizer for serial data transceiver generates a high frequency clock for data serialization. It can be also used as a clock-jitter-filter, if a low jitter VCO is available. Nowadays, an inductor can be integrated by using multiple metal layers or an extra thick top metal layer, available in Technology with RF option. It reduces the resistive loss of the coil, hence, increasing its quality factor. This makes the monolithic solution possible. The PLL-based clock-jitter-filter with LC-VCO is more efficient because it combines both jitter-filter function and high frequency clock synthesis. The jitter performance may not reach the system using an additional off-chip clock jitter filter and a frequency synthesizer with on-chip LC-VCO but the clock-jitter-filter designed in this work still be a good

compromise. By careful circuit design, the clock-jitter-filter with on-chip LC-VCO can has low jitter generation and can provide a reference clock with sufficiently small jitter.

## 4.2.1 Architecture

The block diagram of PLL-based clock-jitter-filter with LC-VCO is shown in Figure 4.2. The PLL core consists of a PFD, a charge pump, a loop filter, a VCO, and a frequency divider. It operates as a clock synthesizer with a configurable frequency multiplying factor. At the input clock, a configurable frequency divider is also applied in order to make the PLL input frequency match to the divided clock frequency. In the feedback path, another configurable frequency divider is applied to divide the high frequency clock output from the VCO, therefore the output clock frequency can be independently selected. The design considerations of its building blocks and its loop characteristic are discussed in the following sections.



Figure 4.2 The block diagram of PLL-based clock-jitter-filter with LC-VCO

## 4.2.2 Building blocks

## 4.2.2.1 Phase frequency detector



Figure 4.3 Phase frequency detector for clocks

A phase frequency detector for a clock synthesizer is shown in Figure 4.3. The operations of a PFD are depicted on the right. If the VCO clock is late, the PFD generates the UP impulse by using the rising edge of the reference clock. The rising edge of VCO clock is used to generate the DN impulse and reset both impulses after a constant delay time. The difference of the UP and the DN impulse durations shows the phase delay between these two clocks. It operates in the opposite way, if the VCO clock is early. In the lock condition, the PFD generates the equal impulse duration of UP and DN signals. The sufficiently long impulse is necessary in order avoid the dead-zone when the phase error approaches zero. The dead-zone is defined by the shortest impulse can be generated by logic circuits. Figure 4.4 represents the PFD characteristic curve and its operations when the clock frequencies are different. If there are frequency differences, for instance, the VCO clock frequency is too low, rising edges of the reference clock occur more often. Therefore, the UP impulses possesses longer duration than the DN impulses, as shown in Figure 4.4 b). The PFD will

operate in the opposite way, if the VCO clock frequency is too high, as depicted in Figure 4.4 c).



Figure 4.4 The characteristic curve of PFD and its operations

Combining a PFD with a charge pump, the average current output depends linearly to the phase error. The PFD gain can be calculated by the slope of its characteristic curve as

$$K_{linear-PFD} = \frac{I_{CP}}{2\pi} \quad \left[ \frac{A}{rad} \right]$$
 Eq. 4.1

where  $K_{linear-PFD}$  is the PFD gain, and  $I_{CP}$  is the charge pump current.

The PFD in a frequency synthesizer operates at the divided clock frequency, therefore it can be implemented by CMOS logic. However, in order to reduce the supply noise generation and improve the supply/substrate noise rejections, the current-mode logic is used. The CML circuit consumes a constant current, therefore it consumes more power than the CMOS logic. Nevertheless, the tail current can be reduced to suit to the lower operation frequency. The CML circuit has differential signaling, hence, the complementary logic for a charge pump can be effortlessly obtained. A PFD for clock synthesizer is not complicated like a PFD for CDR, therefore it is reasonable to trade-off by applying the CML circuit in the PFD for clock synthesizer in order to minimize the jitter generation.



Figure 4.5 Current mode logic (CML) circuits

Figure 4.5 shows a CML replicated bias scheme and a CML NAND/AND-gate. The replicated bias circuit consists of a CML circuit connected to constant input logic levels. Its logic low output is controlled to match the logic low reference level by adjusting the bias voltage of the tail current transistor. The bias voltage is applied to the tail current transistors of the other CML logic circuits. With this method, the bias voltage can be precisely generated. The satisfactory voltage swing can be obtained over the variations of process parameters, supply voltage, and temperature.

## 4.2.2.2 LC-VCO

The on-chip spiral inductor is implemented by the extra thick top metal layer in order to reduce resistive loss and improve the quality factor. Its layout and the equivalent circuit are shown in Figure 4.6.  $R_s$  is the parasitic series resistor of the coil,  $C_{coupling}$  is the parasitic capacitor inside the coil between different turns,  $C_{ox}$  is the parasitic capacitor between the coil and substrate,  $R_{sub}$  is the equivalent resistor for substrate loss, and  $C_{sub}$  is the equivalent capacitor for substrate loss. The equivalent circuit contains three main loss contributions in an on-chip inductor [7]. They are

• The parasitic series resistor: It is contributed by the parasitic resistor of the metal and the skin effect.

- The coupling capacitor to the substrate: It induces the substrate current and reduces the voltage swing.
- The induced eddy current in the substrate: It causes the opposite magnetic field, thus it degrades the coil quality factor.



Figure 4.6 On-chip inductor

The LC-VCO circuit and its equivalent circuit are shown in Figure 4.7. The PMOS and NMOS transistors are applied as negative resistors to compensate the resistive loss of the coil. The current source is omitted to maximize the signal swing and eliminate up converting of the flicker noise from the tail current transistor. The voltage swing of the oscillator is limited between ground and supply voltage, unlike in some topologies that let their VCOs swing over supply voltages. Therefore, this VCO is more reliable and can avoid degradation effect [35]. The oscillation frequency of a LC-VCO can be calculated from

$$f_{LC-VCO} = \frac{1}{2\pi\sqrt{LC}}$$
 Eq. 4.2

where  $f_{LC-VCO}$  is the oscillation frequency of a LC-VCO, L is the inductive value of a coil, and C is the total capacitor of a varactor and MOS capacitors from NMOS and PMOS

transistors, which are shown as  $C_{var}$  and  $C_{MOS}$  in Figure 4.7, respectively. The quality factor can be determined by

$$Q_{LC-VCO} = \frac{1}{R} \sqrt{\frac{L}{C}}$$
 Eq. 4.3

where  $Q_{LC-VCO}$  is the quality factor of a LC-VCO, R is the dominant loss of an oscillator that is the resistive loss of an on-chip coil  $R_{s,L}$ . The phase noise of LC-VCO can be predicted in [35] by

$$S_{SBB} = F \cdot \frac{kT}{V_{peak}^2} \cdot \frac{R^3}{4\pi^2 L^2 f_{offset}^2} \quad [dBc]$$
 Eq. 4.4

where  $S_{SBB}$  is the phase noise of LC-VCO, F is the noise factor described in detail in [36],  $V_{peak}$  is the amplitude of voltage swing,  $f_{offset}$  is the frequency offset from carrier, R is the resistive loss in LC-VCO. From Eq. 4.3 and Eq. 4.4, it can be concluded that the inductive value, L, has to be selected as high as possible while the total capacitive value in tank and the series resistance have to be diminished. Furthermore, the coil area has to be minimized, because the small coil has a less series resistance and less magnetic field into the substrate, hence, the extra resistive losses from the substrate induced current is less [37].

The transistors for negative resistances have non-minimum lengths in order to reduce 1/f noise that contributes to  $1/f^3$  in VCO phase noise [3]. It reduces the VCO tuning range by their MOS capacitors that are parallel with the varactor. However, the total frequency range can be extended by using the additional digital frequency range selection. The widths of the transistors are selected to have large enough  $g_m$  in order to provide a sufficient negative resistance for oscillation. The ratio of the widths of PMOS and NMOS transistors is chosen for the VCO to have a common-mode output around half of the supply voltage. This maximizes the output voltage swing and make the LC-VCO characteristic curve symmetry.



Figure 4.7 LC-VCO circuit



Figure 4.8 NMOS in N-well as Varactor

The varactor is implemented by an accumulation-mode MOS-varactor constructed by n-diffusion areas in N-well. Its structure and characteristic curve are depicted in Figure 4.8. It has a comparable phase noise performance to a diode-varactor [38]. It has advantage that the risk of the forward bias p-n-junction is eliminated unlike the diode-varactor. It is

superior in phase noise performance compared with an inversion mode MOS-varactor, which is constructed by p-diffusion areas in N-well [39]. It is suitable in modern Technology with lower supply voltage.

The LC-VCO is designed to cover the clock frequency from 2.4 to 2.8 GHz by 8 configurable frequency ranges. Figure 4.9 shows the VCO characteristic curves by different configurable digital control bits. The frequency ranges are designed to be overlap for using in the automatic frequency range selection [40].



Figure 4.9 The VCO characteristic curves by different digital control bits

Figure 4.10 shows the phase noise of the LC-VCO obtained from simulation and calculation predicted by Eq.4.4. For using in a PLL-based clock synthesizer, the contribution of the LC-VCO can be calculated by the integration of the VCO's phase noise outside of the PLL loop bandwidth. If PLL loop bandwidth is 300kHz, the contribution of this VCO to the output jitter is 0.3 ps rms.



Figure 4.10 The phase noise of the LC-VCO from simulation and calculation

#### 4.2.2.3 Frequency divider

Figure 4.11 represents the block diagram of the frequency divider used in this work. In the first state, the high frequency clock is divided by frequency dividers based on CML circuit. It provides clock divided by the factors of 2 and 4. It is applied to reduce the operation frequency of the configurable frequency detector. Therefore, the power consumption of the configurable frequency divider can be reduced by using CMOS logic circuit. In this design, the LC-VCO generates clock at 2.5GHz, the configurable frequency divider operates at 625MHz. At the low operation frequency, the CMOS logic circuit consumes lower power than the CML circuit but it has worse supply/substrate noise rejection. For that reason, the output of the CMOS logic frequency divider is retimed by the differential CML clock in order to remove the jitter, which is caused by the CMOS logic circuit.



Figure 4.11 The block diagram of the frequency divider

Figure 4.12 depicts the circuit diagram of the configurable part of the frequency divider implemented in CMOS logic. The CML signaling 1/4-rate clock is converted to the rail-to-rail swing CMOS logic clock. The D-flipflop, which its inverted output is connected to its input, is applied as a clock divider by two. Five stages of clock dividers by two are used to operate as 5-bit counter. The 5-bit output is retimed in order to make them synchronous. The synchronous 5-bit output 'CT' is compared with the 5 configurable bits 'CB' in order to generate the counter reset signal. The active-low comparator output 'RO' is retimed to generate the synchronous reset signal '~reset'. It is not only used for a counter resetting but also applied for an alternating the divided output clock by activating the inverted feedback path of a D-FF. With this pipeline architecture, the delay time in the resetting process is defined and independent from the configuration bits 'CB'.



Figure 4.12 The block diagram of the configurable part of the frequency divider (CMOS logic)

The operation of the configurable frequency divider is represented in Figure 4.13. It shows an example of 'CB' equal to two. When the comparator detects the inverted value of 'CT' ('/CT') equal to 'CB' configurable bits, it will generate the active-low 'RO' signal. The 'RO' signal is retimed to generate the counter reset signal '~reset' and then the clock output 'clk out' is alternated. The 5-bit counter is reset and starts counting at two clock periods later. The whole resetting process takes 4 clock cycles. The process repeats again when the counter reaches the value of the configuration bits 'CB'. From it operation, the dividing factor of the CMOS logic configurable frequency divider can be determined by

$$DIV_{CMOS} = 2 \cdot (CB + 4)$$
 Eq. 4.5

where  $DIV_{CMOS}$  is the dividing factor of the CMOS logic configurable frequency divider, and CB is the configurable dividing factor, five bits. The multiplying by two is the number of the operation cycles to generate a complete output clock period. There is a special case

of the 'CB'. If 'CB' is equal to  $1f_{hex}$  or  $11111_{bin}$ , the  $DIV_{CMOS}$  is equal to two because the reset signal is always active.

The total dividing factor can be calculated from

$$DIV_{TOTAL} = 8 \cdot (CB + 4)$$
 Eq. 4.6

where  $DIV_{TOTAL}$  is the dividing factor of the whole CML/CMOS frequency divider. The dividing factor is configurable from 32 to 272. Therefore, by using a 2.5GHz LC-VCO, the output frequency can be selected from 78MHz to 10MHz. The complete table of the configurable output frequencies is shown in Appendix-F.



Figure 4.13 Operation of configurable frequency divider, CB =2

In the output state, the CMOS logic clock of the configurable stage is retimed by the CML D-FFs in order to remove the jitter from the CMOS logic circuit. The delay between the divided clock output and the retiming CML clock is defined because of the pipeline architecture in the configurable CMOS logic frequency divider. It can be selected to avoid the CML clock sampling at the edges of the CMOS logic output clock. The clock buffers

are applied for the CMOS logic output clock in order to decrease its rise/fall times, hence, reduce the probability of the CML clock sampling the edge of the CMOS logic clock.

The jitter contribution of this frequency divider is similar to the CML data buffer, which is discussed in section 2.3.1. It is relatively small, because the CML data buffer normally has a good supply noise rejection. Therefore, this frequency divider does not much degrade the clock jitter quality. Moreover, it consumes low power because the main logic circuit is implemented in CMOS logic.

#### 4.2.2.4 Charge pump

The similar charge pump as in the CDR circuit is used in this PLL-based clock-jitter-filter. All charge offset compensation techniques are also applied.

#### 4.2.3 Loop characteristic design

A loop bandwidth of the PLL-based clock-jitter-filter has to be as small as possible in order to suppress the input jitter. Some parameters are optimized in the building block designs. For instance, the VCO gain is minimized in order to reduce the output jitter but it should be large enough for the variations of process parameters, supply voltage and temperature. Regarding to the PLL linear model as discussed in Chapter 2, the closed loop response of the PLL are well described by Eq. 2.7 to Eq. 2.9 and the corner frequency of the jitter transfer function can be calculated by Eq. 2.10 and Eq. 2.11. The passive on-chip resistor and capacitors are used as loop components. The accurate loop behavior is obtained by the PLL linear model simulation. Some parameters are designed to be configurable, for instance, the loop resistor, the charge pump current, and the feedback dividing factor. Basically, the PLL loop bandwidth is reduced by using a large loop capacitor, because it does not decrease the loop damping factor as described in Eq. 2.8. However, the loop capacitor size is always limited for the monolithic solution. In this design, the accumulation-mode MOS capacitors, implemented by PMOS transistors, combining with metal-insulation-metal capacitors are used to implemented the on-chip loop capacitor with the approximated value of 600pF.

The dividing factor can be adjusted to reduce the loop bandwidth. This results the decreasing of the loop damping factor, therefore the loop resistor has to increase. The loop resistor can not be arbitrarily large, because it effects the peak-to-peak jitter of the output clock, if there is a charge pump leakage, as described in Appendix-G. Nevertheless, with compromising the dividing factor and loop resistor, it can considerably reduce the loop bandwidth.

#### 4.2.4 Simulation results

Like the CDR circuit, the clock-jitter-filter is simulated in many design levels. Its building blocks are designed and optimized by simulations in transistor level. The parameters of the building blocks, such as the VCO gain and the charge pump current, are specified. The model of the on-chip inductor is provided by the Technology manufacturer. The inductor in this design has the quality factor around 8. The time domain behavior of the loop is simulated by using AHDL building blocks for the reasonable simulation time. The loop behavior in frequency domain is designed by the PLL linear model simulation. Figure 4.14 shows the simulated jitter transfer function of the clock-jitter-filter. The corner frequency of its loop bandwidth of 100kHz is obtained.



Figure 4.14 The simulated jitter transfer function of the clock-jitter-filter

#### **4.2.5** Experiment results

The PLL-based clock-jitter-filter with LC-VCO was implemented in 0.18µm CMOS Technology. The die photograph is shown in Figure 4.15. The performance of the clock-jitter-filter is summarized in Table 4.1. The clock-jitter-filter and the loop filter occupy the area of 0.50 mm² and 0.32 mm² respectively. It consumes 80mW power consumption from 1.8V supply at the LC-VCO frequency of 2.5GHz. The clock-jitter-filter has an independent configurable input/output frequencies from 9.191 to 78.125MHz through the I²C interface. The loop bandwidth can be adjusted from 100kHz to 3MHz. The output clock jitter has the minimum value of 2.8ps rms, as shown in the jitter histogram in Figure 4.16.



Figure 4.15 Die photograph of clock-jitter-filter

Table 4.1 Performance Summary of the clock-jitter-filter

| Technology         | 0.18μm CMOS                             |  |
|--------------------|-----------------------------------------|--|
| Active area        | Clk-jitter-filter: 0.50 mm <sup>2</sup> |  |
|                    | Loop Filter: 0.32 mm <sup>2</sup>       |  |
| Power consumption  | 90 mW at 1.8 Vdd                        |  |
| Clk. Input         | 9.191 - 78.125MHz                       |  |
| Clk. Output        | 9.191 - 78.125MHz                       |  |
| Loop bandwidth     | 100kHz - 3MHz                           |  |
| Jitter Performance | 2.8 ps rms                              |  |



Figure 4.16 The jitter histogram of the output clock

### 4.3 Summary

The monolithic clock-jitter-filter is an attractive solution. It is feasible by the availability of on-chip RF components that are an inductor and a varactor. The LC-VCO has low jitter clock and is suitable for the high frequency operation. Consequently, the clock-jitter-filter can also provide the high frequency clock for data serialization leading to the optimum

power consumption for the serial data transceiver. The required output clock frequency can be obtained from dividing the high frequency clock by a configurable frequency divider. The CMOS logic circuit is used in the configurable part of the frequency divider. The output clock is retimed by the low jitter CML clock to reduce the output jitter. The loop bandwidth of the clock-jitter-filter has to be small. The loop bandwidth reduction by using a large loop capacitor has a limitation that it occupies a large chip area. Therefore, the configurable frequency dividing factor is applied and can reduce the loop bandwidth in a considerable range. It is implemented by two frequency dividers at the input clock and at the feedback path. The clock-jitter-filter was implemented in 0.18µm CMOS technology. It consumes 90mW from 1.8V supply. Its building blocks utilize the CML circuit for the good supply/substrate noise rejection. Hence, the jitter generation of this clock-jitter-filter is approximately 1.7ps rms. Its loop bandwidth is configurable from 100kHz to 3MHz. Its frequency range is 9.191 to 78.125MHz. The 2.5GHz clock is also available for the data serialization.

# Chapter 5

# Experimental results of the 1/4-rate CDR and the clock-jitter-filter

### **5.1 Measurement Setup**



Figure 5.1 The measurement setup

The 1/4-rate CDR with frequency detector and the clock-jitter-filter with LC-VCO are connected together to provide data recovery function and a low jitter clock. The 1/4-rate recovered clock form the CDR is connected to the clock-jitter-filter as the clock input. The

measurement setup is depicted in Figure 5.1. The serial data are generated by Agilent 81133A Pulse/pattern Generator. It can provide various serial data patterns from data rate of 10Mb/s to 3.35Gb/s. The phase modulation in the serial data can be obtained by using an arbitrary waveform from HP 33120A Function/Arbitrary Waveform Generator. It was connected to the delay modulation port of the Agilent 81133A. The jitter modulation by various frequencies can be obtained. Therefore, the jitter transfer function and the jitter tolerance can be verified. The recovered data and clock from the CDR were observed by HP54120B Digitizing Oscilloscope Mainframe that is able to display the repeating signal up to 18GHz. Both high frequency and low frequency clocks from the clock-jitter-filter were also verified by the HP54120B oscilloscope. For testing in the real condition, the serial data are connected to the on-tested CDR through long lossy cables, 12 meters of RG-58. They emulate the channel jitter.

#### 5.2 Measurement results



Figure 5.2 The jitter histograms of the 1/4-rate CDR and the clock-jitter-filter

The performances of the 1/4-rate CDR and the clock-jitter-filter are shown in Table 3.4 in Chapter 3 and Table 4.1 in Chapter 4 respectively. Figure 5.2 depicts the jitter histograms of the clock outputs of the CDR and of the clock-jitter-filter tested with the small jitter serial data at 2Gb/s. The recovered clock has jitter of 4.6ps rms, and the output of the clock-jitter-filter has jitter of 2.8ps rms. The measured jitter included the jitter contributed by the off-chip frequency divider, which is used to match the frequency ranges.

#### Function generator Jitter-filtered clock Jitter-filtered clock Pulse generator Recovered clock 2Gb/s Recovered clock Jitter modulation at 200 kHz. Jitter modulation at 10 kHz. 1/4-rate CDR Loop band width 1MHz. Recovered clock Jitter-filtered clock Freq. Divider (off-chip) Loop band width 100kHz. Clock jitter filter Recovered cloc

**Loop bandwidths of 1/4-rate CDR and Clock-Jitter-filter** 

Jitter modulation at 2 MHz.

Figure 5.3 Output clocks by various jitter modulation frequencies

Jitter-filtered clock

The measurements of the output clocks from the 1/4-rate CDR and the clock-jitter-filter tested with the serial data modulated by various jitter frequencies are shown in Figure 5.3. The loop bandwidth of the 1/4-rate CDR and the clock-jitter-filter are 1MHz and 100kHz respectively. By testing with 10kHz jitter modulation, the jitter appeared on both output clocks of the 1/4-rate CDR and the clock-jitter-filter because it was still inside the loop bandwidths of both on-tested units. If the phase of serial data was modulated by 200kHz, the jitter of the clock-jitter-filter output became smaller because the jitter modulation

frequency was outside of its loop bandwidth. While the output clock of the 1/4-rate CDR had similar jitter amplitude as the incoming jitter since the jitter modulation frequency was still inside its loop bandwidth. When jitter modulation frequency was 2MHz, both output clocks from the 1/4-rate CDR and the clock-jitter-filter became smaller because the jitter modulation frequency lied outside of their loop bandwidths. Figure 5.4 represents the 1/4-rate CDR and the clock-jitter-filter tested by the incoming serial data with inter-symbol-interference (ISI) jitter. The ISI was emulated by 12-meter long RG-58 cables. The incoming jitter of 150ps P-P was measured. The recovered clock had jitter of 6.9ps rms, and the output clock of the clock-jitter-filter had 4.2ps rms jitter.



Figure 5.4 Testing with ISI jitter of 150ps P-P

The 1/4-rate CDR and the clock-jitter-filter have been further tested with the larger ISI jitter. The ISI jitter was enlarged to be 340ps P-P by using the additional 2-meter low quality interconnections. In this condition, the recovered clock and the output of the clock-

jitter-filter had jitter of 15ps rms and 11ps rms, respectively. The jitter histograms are shown in Figure 5.5.



Figure 5.5 Testing with ISI jitter of 340ps P-P

### **5.3 Summary**

From the experiment results, the 1/4-rate CDR and the clock-jitter-filter were tested by 2 Gb/s serial data with the ISI jitter emulated by 12-meter RG-58 cables. The 1/4-rate CDR can recover the clock from the incoming serial data without external reference clock. The clock-jitter-filter can improve the jitter quality of the recovered clock. The clock with jitter of 4.2ps rms is achieved from the incoming serial data with ISI jitter of 150ps P-P, which would be expected in the real condition. They were also tested with the incoming serial data with the ISI jitter of 340ps P-P. By this condition, the synchronous clock of 11ps rms can be generated.

# Chapter 6

### **Conclusions**

In large synchronous networks, the data communication networks and the separated clock distribution networks are typically required. If a clock can be distributed through the high data rate serial data networks, the interconnection complexity can be largely reduced. The key component for this concept is the clock and data recovery circuit (CDR). The recovered clock is needed to fulfill the jitter requirement for using as a reference clock for local components e.g. analog-to-digital converters and data transceivers.

In this work, the jitter in serial data communication system is analyzed in order to understand its characteristic and to specify the requirement of the CDR for large synchronous networks. The jitters caused by the transmitter circuits, the bandwidth limitation of the interconnection channel, and the receiver circuits are discussed. Moreover, the jitter in a phase-locked loop circuit is studied for the design and the optimization of a PLL-based CDR. The 2-loop CDR with frequency detector and clock-jitter-filter is proposed in this work. The CDR is able to have a good jitter filtering while its jitter tolerance can be independently adjusted. Furthermore, the CDR with frequency detector has an extended frequency capture range and can operate without external reference clock. In the design of this 2-loop CDR, many techniques are applied in order to obtain a monolithic solution and to optimize the power consumption and the circuit area.

In the front-ended CDR loop, a power efficient CDR with reduced-sampling-phase timeinterleaving 1/4-rate phase frequency detector is proposed in this work. It has senseamplifiers as input stage. The CDR directly synchronizes to the incoming serial data at the interconnection termination that is useful for the envisaged interconnection timing measurement system. The 1/4-rate architecture can reduce the power consumption because its PFD can be implemented in low power CMOS logic style. Moreover, the additional sampling phases for frequency detection can be reduced. The off-chip loop filter is not required by applying the divided frequency impulse modulation technique. The technique also reduces noise at the VCO control node, hence, the jitter of the output clock. The 1/4-rate CDR is an inherent 1-to-4 DEMUX. Therefore, the power consumption for the entire describing is minimized.

In the secondary loop or clock-jitter-filter, a LC-VCO is applied and can provide a low jitter clock. Furthermore, a high frequency clock for a data serializer is also available. The monolithic solution is possible with an integrated on-chip inductor. In the proposed clock-jitter-filter architecture, the frequency dividers at the clock input and at the feedback path are applied. The loop dividing factor can be selected and then the small loop bandwidth can be obtained to suppress the incoming jitter from the front-ended CDR. In the phase frequency detector circuit, the differential topology is chosen in order to achieve the low jitter generation. The output frequency can be selected by a configurable frequency divider connected to the high frequency clock from the LC-VCO. The proposed frequency divider has low jitter generation because its output clock is retimed by the low jitter differential clock.

The 1/4-rate CDR with frequency detector and clock-jitter-filter are implemented in 0.18µm CMOS Technology. The experimental results show that the circuits can perform the data recovery function and can provide the low jitter clock that fulfills the requirement of large synchronous networks.

In conclusions, the system developed in this work is a combination of several separated building blocks. In the clock and data recovery circuit, they are a phase frequency detector for CDR, sense-amplifiers, a multi-phase ring-based VCO with phase offset compensation, a charge pump with charge offset compensation, and a divided frequency impulse

modulator and a phase error accumulator for the loop bandwidth reduction. In the clock-jitter-filter, the building blocks are a phase frequency detector for clocks, current-mode logic circuits with replicated bias technique, configurable clock dividers with low jitter generation, a LC-VCO with on-chip inductor and varactor, and a charge pump.

New techniques i.e. (i) the reduce-sampling-phase time-interleaving architecture in 1/4-rate phase frequency detector for CDR (ii) the divided frequency impulse modulation technique for CDR loop bandwidth reduction have been added. The proposed 1/4-rate PFD for CDR has extensions from the existing topologies, like the full-rate architectures in [12] and [41] to [46] and the half-rate architecture in [14], by having a lower operation frequency and an automatic data regeneration/demultiplexing. By using the proposed divided frequency impulse modulation technique, this CDR has the satisfactory loop bandwidth without the external loop component, which is always required by the reported CDR in [14] and [41] to [46]. Comparing with the reported monolithic CDRs in [15] and [47], the front-ended CDR in Section 3.3 is more efficient in the power consumption and the circuit area.

The CDR with jitter filtering feature proposed in this work can operate without off-chip components. It has an advantage over the CDR with jitter filtering feature reported in [48] that needs off-chip loop filter components and a voltage control crystal oscillator (VCXO). Another CDR with jitter filtering feature is reported in [49]. It has a jitter filtering characteristic from a dual DLL/PLL topology, however, it needs an off-chip loop capacitor.

#### **6.1 Future work**

The performance of the 2-loop CDR is the basis for a relative time measurement system. One of the future works could be the utilization of a LC-VCO and a frequency-dividing-based multi-phase clock generator in the front-ended CDR in order to improve the CDR's jitter generation. The more sophisticate loop filter e.g. a mixed-mode or a dual path loop filters, like [40], [47], and [50], can be applied to increase the loop damping factor without trading-off with the sizes of loop resistor and of loop capacitors.

The technique similar to the divided frequency impulse modulation and phase error accumulation can be used in the clock-jitter-filter for the loop dividing factor configuration. The phase error does not fold back to the loop bandwidth like the loop dividing factor configuration using frequency dividers.

For the relative time measurement system, the zero-delay buffer for the serial data can be realized by a delay-locked loop. It synchronizes the transmitted serial data to the received serial data by using Alexander's phase detectors with sense-amplifiers as the input stage. It can eliminate the delay time of the serial data transceiver and makes the measurement of the interconnection delay possible.

## 7 Appendices

## Appendix-A: 8B/10B coding system

The 8B/10B coding represents 8-bit data words in 10-bit codes. It makes an overhead data from 20%. The 10-bit data consists of the data (Dxx.y) and the control (Kxx.y) characters. The bit assignments for the characters (Dxx.y) and (Kxx.y) are X=EDCBA, Y=HGF and Z=control. The encoding is implemented by 5B/6B encoding of X followed by a 3B/4B encoding of Y, corresponding to Z. The resulting 10-bit characters are designated as 'abcdei fghj' where 'abcdei' related to 'EDCBAZ' and 'fghj' related to 'HGFZ', as shown in Figure A.1.



Figure A.1 a) The 8B/10B encoding bit-configuration b) Block diagram of 8B/10B encoding

The control characters have many important special functions, which can be simplified in to two categories:

• Delimiting frames, part of protocol: start of frame and end of frame.

 Serial link maintenance: Comma for synchronization (COM) and data rate matching.

This encoding system guarantees the transition density to be enough for the clock recovery at the receiving device PLL. It provides some error detection capabilities (single bit error and two bit error). The running disparity is used to maintain the DC balance by observing the running disparity from the previous byte. The coding scheme generates a bit stream with balanced number of '1' and '0'. For example, the original data byte 0x77 is encoded to "D.23.3":

- If the running disparity is positive, the data stream has more '1' than '0', the data byte 0x77 will be encoded to '0001011100'.
- If the running disparity is negative, the data stream has more '0' than '1', the data byte 0x77 will be encoded to '1110100011'

The 8B/10B coding system makes the bandwidth smaller and the DC free. As shown in Figure A.2 a), the power spectrum of the serial data covers frequency range from  $0-0.5f_{data}$ . After encoding with 8B/10B coding system, the bandwidth will be  $0.1f_{data}-0.5f_{data}$ , as depicted in Figure A.2 b).



Figure A.2 a) serial data frequency range data without 8B/10B encoding b)serial data frequency range with 8B/10B encoding

# Appendix-B: Deterministic Jitter from bandwidth limitation



Figure B.1 Transmitted serial data and degraded received serial data

When random serial data are transmitted through a bandwidth-limited channel or bandwidth-limited electronic unit, such as data buffer, the deterministic data dependent jitter occurs. In this analysis, it is assumed that the first data transition is from a signal level of 0, as shown in Figure B.1. The bandwidth-limited channel degrades rise/fall times of the signal at a receiver,  $V_{Rx}$ . The received signal through the bandwidth-limited channel can be calculated by

$$V_{Rx}(t) = \begin{cases} 0 & \text{for } t = 0 \\ a_0 \cdot (1 - e^{-\frac{t}{\tau}}) & \text{for } 0 \le t < T \\ e^{-\frac{t}{\tau}} \cdot a_0 \cdot (1 - e^{-\frac{T}{\tau}}) & \text{for } t \ge T \end{cases}$$
 Eq. B1

where  $a_0$  is the amplitude of the transmitted signal, and  $\tau$  is the time constant of bandwidth-limited channel. The slowest threshold crossing time arises when data start the transition

form the complete logic level, because the transition starts from the most different voltage from the logic switching threshold  $v_{th}$ . The slowest threshold crossing time can be calculated by

$$t_{slowest} = -\tau \cdot \ln \left( \frac{a_0 - v_{th}}{a_0} \right)$$
 Eq. B2

where  $v_{th}$  is the logic switching threshold. The fastest threshold crossing occurs when there is a transition in the next bit, because the data transition starts from the nearest voltage to the threshold voltage. The fastest threshold crossing time can be calculated by

$$t_{fastest} = -\tau \cdot \ln \left( \frac{v_{th}}{a_0 (1 - e^{-\frac{T}{\tau}})} \right)$$
 Eq. B3

Therefore, the peak-to-peak jitter can be calculated by

$$\Delta t_{d-DDJ,p-p} = t_{slowest} - t_{fastest} = -\tau \cdot \ln \left( \frac{(a_0 - v_{th}) \cdot (1 - e^{-\frac{T}{\tau}})}{v_{th}} \right)$$
 Eq. B4

where  $\Delta t_{d\text{-}DDJ,p\text{-}p}$  is the peak-to-peak jitter caused by bandwidth-limited channel. If there is no offset in logic decision. The logic switching threshold is 0.5a<sub>0</sub>. The Eq. B4 can be simplified to be

$$\Delta t_{d-DDJ,p-p} = -\tau \cdot \ln(1 - e^{-\frac{T}{\tau}})$$
 Eq. B5

This derivation is adapted from [5]. The detail of the second order effect, jitter induced by interference and compensation techniques are covered in the reference.

# Appendix-C: The effective gain of a digital phase detector

A binary phase detector (PD) has non-linear behavior when the jitter of its inputs is very small. However, if data rate is far high than loop bandwidth, the statistic approach can be applied by averaging the phase error results [19]. Hence, the binary phase detector gain linearly depends on the phase error input inside the input jitter probability density function (PDF). Figure C.1 a) shows the incoming serial data eye and its jitter probability density function. The average current output of a phase detector combining with a charge pump is calculated from the different amounts of probabilities that PD detects early and late. The average output current can be calculated by

$$I_{avg-binray-PD}(\Delta\phi) = I_{CP-AV} \cdot D_T \cdot \left( \int_{-\pi}^{\Delta\phi} f(x) dx - \int_{\Delta\phi}^{+\pi} f(x) dx \right) \text{ for } -\pi \le \Delta\phi \le +\pi \quad \text{Eq. C1}$$

where f(x) is the probability density function of the incoming jitter,  $D_T$  is the data transition density and  $I_{CP-AV}$  is the average current of a charge pump per UI. The gain of a binary phase detector can be calculated from the slope of the average output current by

$$K_{binary-PD}(\Delta\phi) = \frac{\partial \left(I_{avg-binray-PD}(\Delta\phi)\right)}{d(\Delta\phi)} \quad \left[\frac{A}{rad}\right]$$
 Eq. C2

where  $K_{binary-PD}$  is the gain of binary phase detector. It varies by different phase errors. Therefore, the average gain has to be used and it can be calculated by the integration of the PD gain and its probability at each phase error. It can be written as

$$K_{binary-PD-avg} = \frac{\int K_{binary-PD}(\Delta\phi) \cdot f(\Delta\phi) \cdot d(\Delta\phi)}{\int f(\Delta\phi) \cdot d(\Delta\phi)} \left[ \frac{A}{rad} \right]$$
 Eq. C3

where  $K_{binary-PD-avg}$  is the average gain of binary phase detector for a certain incoming jitter probability density. Figure C.1 b) represents the jitter PDF as a standard distribution

function that peak-to-peak jitter covers 90% of the area beneath the graph. Its corresponding average output current is shown in Figure C.1 c). The highest PD gain, the slope of the average output current, is at the middle value of the PDF.



Figure C.1 The statistic approach of the binary phase detector gain and a input jitter PDF Example-1 and its corresponding phase detector gain

Figure C.2 a) and b) depict the input jitter PDFs consisting of two separated standard distribution functions and two overlapped standard distribution functions respectively. Their corresponding average output currents are shown in Figure C.2 c) and d). The slopes of the average output currents vary by different phase errors. The average phase detector gain of three input jitter PDFs in Example-1, -2, and -3 are calculated by Eq. C3 and summarized in Table C1.



 $Figure \ C.2 \ Input \ jitter \ PDF \ Example-2 \ and \ Example-3 \ and \ their \ corresponding \ phase \ detector \ gains$ 

Table C.1 Average PD gains by various input jitter PFDs

| Jitter PDF                                            | Average PD gain                                     |
|-------------------------------------------------------|-----------------------------------------------------|
| Single standard distribution function (EX-1)          | $2 \cdot I_{CP-AV} \cdot D_T \cdot 1.61/PPJ$        |
| Two separated standard distribution functions (EX-2)  | $2 \cdot I_{CP-AV} \cdot D_T \cdot 1.60/PPJ$        |
| Two overlapped standard distribution functions (EX-3) | $2 \cdot I_{CP\text{-}AV} \cdot D_T \cdot 1.43/PPJ$ |

where 
$$PPJ = \frac{2 \cdot \pi \cdot t_{peak-to-peak-jitter}}{T}$$
 Eq. C4

and PPJ is the peak-to-peak jitter (radian),  $t_{peak-to-peak-jitter}$  is the peak-to-peak jitter (second), and T is the bit period (second). It can be concluded that an average PD gain depends on the reciprocal of the peak-to-peak jitter and slightly depends on the incoming jitter PDF.

# Appendix-D: The frequency capture range of a PFD for CDR

A full-rate rotational frequency detector for serial data, which its operation is depicted in Figure D.1, utilizes the quadrature phase clock to detect frequency error. The quadrature phase clock divides one unit interval to be 4 phase states, i.e. PS-1, PS-2, PS-3, and PS-4. When there is frequency error, data transitions cyclically occur by various phase states. The rotational frequency detector uses this phase state rotation to indicate the frequency error and generates the suitable signal for driving the frequency lock. The operation of a full-rate rotational frequency detector for serial data is depicted in Figure D.1, an example of a clock frequency too low. The phase state incidences, the data transitions occurring by different phase states, are shown as numbers in brackets on top of figure. They are indicated by edge detectors and retimed by a system clock to create the retimed phase states, shown as numbers in brackets with their corresponding retiming clocks beneath a quadrature phase clock. If the clock frequency is too low, phase states of the data transitions rotate backwards. The PFD detects the phase state rotation by the aid of the internal signals 'Q1' and 'Q2', which are generated following the condition described in Table D.1. If the clock frequency is too low, phase states rotate backwards, consequently, 'Q2' leads 'Q1', as shown in Figure D.1. The 'F-down-disable' and 'F-up-disable' signals are generated from the internal signals Q1 and Q2 as described in Table D.1. They are used to reduce frequency error. If the frequency error is larger than the CDR loop bandwidth, there is a phase state rotation and the phase detector indicates the same amounts of phase early and late. Therefore, the same amounts of both 'f-up' and 'f-down' signals are generated and the CDR cannot track the phase of the serial data. The 'F-down-disable' and 'F-up-disable' signals produce the required unbalance of 'f-up' and 'f-down' signals to compensate the frequency error.

| Signal         | Set-condition                | Reset-condition      |
|----------------|------------------------------|----------------------|
| Q1             | F-down-state-4 = '1'         | F-up-state-2 = '1'   |
| Q2             | F-up-state-1 = '1'           | F-down-state-3 = '1' |
| F-down-disable | Rising edge of Q1 & Q2 = '1' | Falling edge of Q1   |
| F-up-disable   | Rising edge of Q2 & Q1 = '1' | Falling edge of Q2   |

Table D.1 The operation table of a frequency detector

The phase states are updated every full-rate clock cycle. However, the absence of the random data transitions limits the indication of phase states. The frequency capture range of the PFD is estimated by the maximum frequency of the phase state rotation that is possible to be detected by the PFD. Therefore, the frequency capture range of the full-rate PFD can be calculated from

$$FCR - full - rate = \frac{f_{VCO} \cdot D_T}{4} = \frac{f_{data} \cdot D_T}{4}$$
 [Hz] Eq. D.1

where FCR-full-rate is the frequency capture range of a full-rate PFD,  $f_{VCO}$  is the clock frequency,  $f_{data}$  is the data rate frequency and  $D_T$  is the data transition density, the number of data transitions per unit interval. At least one phase state rotation cycle has to be detected in order to detect the frequency difference. Therefore, at least four phase states have to be indicated.

The operation of a synchronous 1/4-rate PFD is depicted in Figure D.2. One clock cycle covers four unit intervals of the serial data. The clock frequency of a 1/4-rate PFD can be written as

$$f_{1/4-VCO} = \frac{f_{Data}}{4}$$
 Eq. D.2

where  $f_{I/4\text{-}VCO}$  is the 1/4-rate clock frequency. The sufficient sampling phases, i.e. 16 clock phases, are provided by a multi-phase clock.



Figure D.1 The operation of a full-rate PFD,  $f_{clk}$  too low



Figure D.2 The operation of 1/4-rate PFD: f<sub>clk</sub> too low [15]

This PFD architecture has been reported in [15]. The sampling data are retimed in order to have the synchronous operation. Phase states are updated every four unit intervals. Therefore, the frequency capture range of synchronous 1/4-rate PFD can be calculated by

$$FCR - 1/4 - rate - sync = min \left[ \frac{f_{Data} \cdot D_T}{4}, \frac{f_{1/4-VCO}}{4} \right]$$
 Eq. D.3

where *FCR-1/4-rate-sync* is the frequency capture range of a synchronous 1/4-rate PFD. In the 1/4-rate PFD, there are two limitations of the frequency capture range. One is the data transition density and another is the system clock frequency. If the data transition density is relatively low, it will automatically limited the frequency capture range of PFD, because phase errors can be detected only when data transitions occur. In this case, the frequency capture ranges of the full-rate PFD and of the 1/4-rate PFD are not different any more. When the serial data have sufficiently high data transition density, more than 0.25 in this case, the frequency capture range is limited by the system clock frequency. In general, the serial data are encoded by the 8b/10b coding scheme, which provides enough data transitions for the CDR. The data transition density is around 0.6. Therefore, the frequency capture range of 1/4-rate PFD is

$$FCR - 1/4 - rate - sync = \frac{f_{Data}}{16}$$
 [Hz] for  $D_T \ge 0.25$  Eq. D.4

By  $D_T$  of 0.6, the frequency capture range of a full-rate clock PFD is 0.15  $f_{data}$  while the frequency capture range of a synchronous 1/4-rate PFD is 0.0625  $f_{data}$ .



Figure D.3 The operation of time-interleaving 1/4-rate PFD, clock frequency too low



Figure D.4 The operation of 1/4-rate reduced-sampling-phase time-interleaving PFD,  $f_{\text{clk}}$  too low

The operation of a time-interleaving 1/4-rate PFD is depicted in Figure D.3. The phase-states are retimed by different clock phases. The retiming clock phases are selected to be suitable for the phase state incidences. Therefore, the phase states are continuously updated, hence, the 1/4-rate PFD has a seamless operation like a full-rate PDF.

The interval between retiming clock phases can be written as

$$T_{retime} = \frac{T_{data}}{N_{retime}} = \frac{1}{f_{data} \cdot N_{retime}}$$
 Eq. D.5

where  $T_{retime}$  is the time interval between two retiming clock phases,  $T_{data}$  is the serial data bit period, and  $N_{retime}$  is the number retiming clock phases per one bit period.

The frequency capture range of a time-interleaving 1/4-rate PFD can be calculated from

$$FCR - 1/4 - TIL = \frac{D_T}{4 \cdot T_{ratine}} = \frac{D_T \cdot f_{data} \cdot N_{retime}}{4} \quad [Hz]$$
 Eq. D.6

where FCR-1/4-TIL is the frequency capture range of a time-interleaving 1/4-rate PFD. It becomes independent from clock frequency. The  $N_{retime}$  of a full-rate clock PFD is one. In a time-interleaving 1/4-rate PFD, its  $N_{retime}$  of one can be obtained by using adequate retiming clock phases. Therefore, the frequency capture range of a time-interleaving 1/4-rate PFD is similar to that of a full-rate PFD.

The frequency capture range of a time-interleaving 1/4-rate PFD is a lot better than the goal frequency capture range, 100 MHz, in this design. Hence, the number of the sampling phases for frequency detection in a time-interleaving 1/4-rate PFD can be optimized. Figure D.4 shows the operation of a reduced-sampling-phase time-interleaving 1/4-rate PFD. An example of too low clock frequency is depicted. The four phase states for phase rotation detection are applied only one time per two unit intervals. Its frequency capture range is approximately reduced by factor of 2. The frequency capture range of a reduced-sampling-phase time-interleaving PFD can be calculated from

$$FCR - 1/4 - RSP - TIL = \frac{f_{data} \cdot D_T}{4 \cdot N_{PSI}} \quad [Hz]$$
 Eq. D.7

where FCR-1/4-RSP-TIL is the frequency capture range of a reduced-sampling-phase time-interleaving 1/4-rate PFD, and  $N_{PSI}$  is the number of unit intervals that PFD once detects phase state rotation. The frequency capture range of a reduced-sampling-phase time-interleaving 1/4-rate PFD is  $0.075 \, f_{data}$  while the frequency capture range of a synchronous 1/4-rate PFD is  $0.0625 \, f_{data}$ , for  $D_T$  of 0.6. For the incoming data rate of 2Gb/s with  $D_T$  of 0.6 and  $N_{PSI}$  of 2, the frequency capture range of a reduced-sampling-phase time-interleaving 1/4-rate PFD calculated by Eq. D.7 is 150 MHz. It is still sufficient for the goal frequency capture range in this design.

# Appendix-E: The sampling phases for frequency detection in CDR

The data eye diagram and the sampling phases of the PFD for CDR are depicted in Figure E.1. The sampling phases for phase detection are shown as black arrows. It shows an eye sampling phase and edge sampling phases. They have a phase delay of half unit interval from each other. The additional sampling phases for the frequency detection are shown in gray arrows. They are placed between two phase detection sampling phases at a delay time  $T_p$  around the eye sampling phase. They divide one unit interval to 4 phase states. The frequency error is detected by phase state rotation. It means all phase state cases, i.e. PS-1, PS-2, PS-3, and PS-4, have to be detected. The probability to detect one phase state can be calculated from the ratio between the phase state duration and one unit interval. The frequency detection sensitivity is the probability of the PFD to detect one cycle of phase state rotation. It can be calculated from the probability

$$FDS = P_{PS-1} \cdot P_{PS-2} \cdot P_{PS-3} \cdot P_{PS-4} = \left(\frac{T}{2} - T_{P}\right) \cdot \left(\frac{T_{P}}{T}\right) \cdot \left(\frac{T_{P}}$$

therefore,

$$FDS = \left(\frac{1}{2} - \frac{T_P}{T}\right)^2 \cdot \left(\frac{T_P}{T}\right)^2$$
 Eq. E.1

where FDS is the frequency detection sensitivity,  $P_{PS-N}$  are the probability to detect the phase-state-N, T is the bit period or unit interval, and  $T_P$  is the delay time of the additional phases for frequency detection. The frequency detection sensitivity can be used to indicate how sensitive the PFD to the frequency error. The plot of FDS by various values of  $T_P/T$  is shown in Figure E.1 b). FDS reaches its maximum when  $T_P/T$  is equal to 0.25. It means

that the optimum sampling phases for frequency detection is by 90° clock phase, which can be provide by a quadrature-phase clock. However, by using the additional sampling phases in the other phase positions, the frequency error is still possible to be detected but the frequency acquisition time is longer by lower frequency detection sensitivity.



Figure E.1 Sampling phases for PFD in CDR and its frequency detection sensitivity

If the additional sampling phases can not be at the optimum positions, they should be selected to be closer to the eye sampling phase. Since, the data transitions occur around the edge-sampling in the lock condition. Therefore, there is less probability of the fault frequency error detection that causes an additional jitter in CDR.

# Appendix-F: The output frequency of the clock-jitter-filter

Table F.1 The output frequency of the clock-jitter-filter

|  | f-VCO divided by 4<br>f-in of CMOS F-divider |                 | 2.5GHz     | f-VCO di  |     |
|--|----------------------------------------------|-----------------|------------|-----------|-----|
|  |                                              |                 | 625MHz     | f-in of C | MOS |
|  |                                              | dividing factor |            |           | div |
|  | CB                                           | 4 x (8+ CB x 2) | fout (MHz) | CB        | 4 x |
|  | 00000                                        | 32              | 78.125     | 10000     |     |
|  | 00001                                        | 40              | 62.500     | 10001     |     |
|  | 00010                                        | 48              | 52.083     | 10010     |     |
|  | 00011                                        | 56              | 44.643     | 10011     |     |
|  | 00100                                        | 64              | 39.063     | 10100     |     |
|  | 00101                                        | 72              | 34.722     | 10101     |     |
|  | 00110                                        | 80              | 31.250     | 10110     |     |
|  | 00111                                        | 88              | 28.409     | 10111     |     |
|  | 01000                                        | 96              | 26.042     | 11000     |     |
|  | 01001                                        | 104             | 24.038     | 11001     |     |
|  | 01010                                        | 112             | 22.321     | 11010     |     |
|  | 01011                                        | 120             | 20.833     | 11011     |     |
|  | 01100                                        | 128             | 19.531     | 11100     |     |
|  | 01101                                        | 136             | 18.382     | 11101     |     |
|  | 01110                                        | 144             | 17.361     | 11110     |     |
|  | 01111                                        | 152             | 16.447     | 11111     |     |

| f VCO divided by A |                        | 2.5GHz     |
|--------------------|------------------------|------------|
|                    | f-VCO divided by 4     |            |
| f-in of C          | f-in of CMOS F-divider |            |
|                    | dividing factor        |            |
| СВ                 | 4 x (8+ CB x 2)        | fout (MHz) |
| 10000              | 160                    | 15.625     |
| 10001              | 168                    | 14.881     |
| 10010              | 176                    | 14.205     |
| 10011              | 184                    | 13.587     |
| 10100              | 192                    | 13.021     |
| 10101              | 200                    | 12.500     |
| 10110              | 208                    | 12.019     |
| 10111              | 216                    | 11.574     |
| 11000              | 224                    | 11.161     |
| 11001              | 232                    | 10.776     |
| 11010              | 240                    | 10.417     |
| 11011              | 248                    | 10.081     |
| 11100              | 256                    | 9.766      |
| 11101              | 264                    | 9.470      |
| 11110              | 272                    | 9.191      |
| 11111              | 8                      | 312.5      |

# Appendix-G: The effect of loop resistor to peak-to-peak jitter of PLL

Recalling equations for calculation the simplified  $2^{nd}$  order PLL loop bandwidth from Section 2.4.1. They are

$$\omega_{-3dB} = \omega_N \cdot \left( (2 \cdot \zeta^2 + 1) + \sqrt{(2 \cdot \zeta^2 + 1)^2 + 1} \right)^{\frac{1}{2}}$$
 Eq. G1

where a loop damping factor is

$$\varsigma = \frac{1}{2} \cdot \sqrt{\frac{K_{PD} \cdot 2\pi \cdot K_{VCO} \cdot R_1^2 \cdot C_1}{N}}$$
 Eq. G2

and a loop natural frequency is

$$\omega_N = \frac{2 \cdot \zeta}{R_1 \cdot C_1} = \sqrt{\frac{K_{PD} \cdot 2\pi \cdot K_{VCO}}{N \cdot C_1}}$$
 Eq. G3

If a loop bandwidth is reduced by decreasing a phase detector gain  $(K_{PD})$  and a VCO gain  $(K_{VCO})$  or increasing a dividing factor (N) the loop damping factor has to be compensated by using a larger loop resistor  $R_I$ .

A charge pump with loop filter and a practical VCO control node of PLL in a lock condition is shown in Figure G.1. When the charge pump is activated the VCO control node is suddenly changed by the voltage  $V_{R1}$  that is defined by the charge pump current and the loop resistor. At the same time, the voltage over loop capacitor  $V_{C1}$  is slowly changed by a time constant defined by the charge pump current and the loop capacitor. When charge pump is deactivated, no current flows over the loop resistor. Therefore, the resulting voltage is the remaining voltage over the loop capacitor.

Normally, charge pump leakage can not be totally eliminated. If the charge pump leakage occurs in PLL, it will be compensated by the feedback topology causing phase offset and voltage peaks at the VCO control node. From the previous analysis, the large loop resistor causes higher voltage peak at the VCO control node. The effect can be reduced by applying of a loop capacitor  $C_2$ , as shown in waveform below. However,  $C_2$  has to be smaller than  $C_1$ , which is always limited by its occupying area, because of the loop stability. Therefore, the loop resistor cannot too large. Moreover, the thermal noise of the loop resistor is also limited its maximum value as described in [51].



Figure G.1 The operation of a PLL: time domain loop behavior

There is a trade-off between a loop damping factor and a dynamic behavior in PLL loop design. Ideally, the large loop capacitor  $C_1$  should be used, therefore the large loop resistor  $R_1$  is not required. Moreover, the larger second loop capacitor can be applied leading to better filtering at the VCO control node. Unfortunately, the size of the loop capacitor  $C_1$  is always limited. Therefore, the values of  $C_2$  and  $R_1$  have to be optimized. An alternative is using more sophisticate loop filter topologies, such as two-signal-path architectures as shown in [40] and [50].

# Appendix-H: The publication list in Figure 3.7

### CDR with binary phase detector

- [B1] R. Gu *et al.*, "A 0.5-3.5Gb/s Low-Power Low-Jitter Serial Data CMOS Transceiver," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLII, pp. 352–353, Feb. 1999.
- [B2] P. Larsson, "A 2-1600MHz 1.2-2.5V CMOS Clock-Recovery PLL with Feedback Phase-Selection and Averaging Phase-Interpolation for Jitter Reduction," in *ISSCC Dig. Tech. Papers*, vol. XLII, pp. 356–357, Feb. 1999.
- [B3] M. Meghelli *et al.*, "A SiGe BiCMOS 3.3V Clock and Data Recovery Circuit for 10Gb/s Serial Transmission Systems," in *ISSCC Dig. Tech. Papers*, vol. XLIII, pp. 56–57, Feb. 2000.
- [B4] M. Reinhold *et al.*, "A Fully-Integrated 40Gb/s Clock and Data Recovery / 1:4 DEMUX IC in SiGe Technology," in *ISSCC Dig. Tech. Papers*, vol. XLIV, pp. 84–85, Feb. 2001.
- [B5] J. Rogers and J. Long, "A 10Gb/s CDR/DEMUX with LC Delay Line VCO in 0.18µm CMOS," in *ISSCC Dig. Tech. Papers*, vol. XLV, pp. 254–255, Feb. 2002.
- [B6] S.-H. Lee *et al.*, "A 5Gb/s 0.25µm CMOS Jitter-Tolerant Variable-Interval Oversampling Clock/Data recovery Circuit," in *ISSCC Dig. Tech. Papers*, vol. XLV, pp. 256–257, Feb. 2002.
- [B7] S. Kaeriyama and M. Mizuno, "A 10Gb/s/ch 50mW 120x130µm<sup>2</sup> Clock and Data Recovery Circuit," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 70–71, Feb. 2003.
- [B8] H. Takauchi *et al.*, "A CMOS Multi-Channel 10Gb/s Transceiver," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 72–73, Feb. 2003.
- [B9] M.-J. Lee *et al.*, "A Second-Order Semi-Digital Clock Recovery Circuit Based on Injection Locking," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 74–75, Feb. 2003.

- [B10] B.-J. Lee *et al.*, "A 2.5 10Gb/s CMOS Transceiver with Alternating Edge Sampling Phase Detection for Loop Characteristic Stabilization," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 76–77, Feb. 2003.
- [B11] M. Meghelli *et al.*, "A 0.18μm SiGe BiCMOS Receiver and Transmitter Chipset for SONET OC-768 Transmissiom Systems," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 230–231, Feb. 2003.
- [B12] A. Ong *et al.*, "A 40-43Gb/s Clock and Data Recovery IC with Integrated SFI-5 1:16 Demultiplexer in SiGe Technology," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 234–235, Feb. 2003.
- [B13] J. Yen *et al.*, "A Fully Integrated 43.2Gb/s Clock and Data Recovery and 1:4 DEMUX in InP HBT Technology," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 238–239, Feb. 2003.
- [B14] S. Sidiropoulos *et al.*, "An 800mW 10Gb Ethernet Transceiver in 0.13μm CMOS," in *ISSCC Dig. Tech. Papers*, vol. XLVII, pp. 168–169, Feb. 2004.
- [B15] H.-R. Lee *et al.*, "A Fully Integrated 0.13µm CMOS 10Gb Ethernet Transceiver with XAUI Interface," in *ISSCC Dig. Tech. Papers*, vol. XLVII, pp. 170–171, Feb. 2004.
- [B16] H. Werker *et al.*, "A 10Gb/s SONET-Compliant CMOS Transceiver with Low Cross-Talk and Intrinsic Jitter," in *ISSCC Dig. Tech. Papers*, vol. XLVII, pp. 172–173, Feb. 2004.
- [B17] J. Yang *et al.*, "A Quad-Channel 3.125Gb/s/ch Serial-Link Transceiver with Mixed-Mode Adaptive Equalizer in 0.18μm CMOS," in *ISSCC Dig. Tech. Papers*, vol. XLVII, pp. 176–177, Feb. 2004.
- [B18] N. Krishnapura *et al.*, "A 5Gb/s NRZ Transceiver with Adaptive Equalization for Backplane Transmission," in *ISSCC Dig. Tech. Papers*, vol. XLVIII, pp. 60–61, Feb. 2005.
- [B19] K. Yamaguchi *et al.*, "12Gb/s Duobinary Signaling with x2 Oversampling Edge Equalization," in *ISSCC Dig. Tech. Papers*, vol. XLVIII, pp. 70–71, Feb. 2005.
- [B20] J. Kenney *et al.*, "A 9.95 to 11.1Gb/s XFP Transceiver in 0.13μm CMOS," in *ISSCC Dig. Tech. Papers*, vol. XLIX, pp. 864–873, Feb. 2006.
- [B21] M. Ierssel *et al.*, "A 3.2Gb/s Semi-Blind-Oversampling CDR," in *ISSCC Dig. Tech. Papers*, vol. XLIX, pp. 1304–1313, Feb. 2006.

- [B22] H. Pertovi *et al.*, "Data Recovery and Retiming for the Fully Buffered DIMM 4.8Gb/s Serial Links," in *ISSCC Dig. Tech. Papers*, vol. XLIX, pp. 1314–1323, Feb. 2006.
- [B23] J. Jaussi *et al.*, "A 20Gb/s Embedded Clock Transceiver in 90nm CMOS," in *ISSCC Dig. Tech. Papers*, vol. XLIX, pp. 1334–1343, Feb. 2006.

#### CDR with linear phase detector

- [L1] H. Wang *et al.*, "A 1Gb/s CMOS Clock and Data Recovery Circuit," in *ISSCC Dig. Tech. Papers*, vol. XLII, pp. 354–355, Feb. 1999.
- [L2] T. Morikawa, et al., "A SiGe Single-Chip 3.3V Receiver IC for 10Gb/s Optical Communication System," in *ISSCC Dig. Tech. Papers*, vol. XLII, pp. 380–381, Feb. 1999.
- [L3] S. Ueno *et al.*, "A Single-Chip 10Gb/s Transceiver LSI using SiGe SOI/BiCMOS," in *ISSCC Dig. Tech. Papers*, vol. XLIV, pp. 82–83, Feb. 2001.
- [L4] J. Cao *et al.*, "OC-192 Receiver in Standard 0.18μm CMOS," in *ISSCC Dig. Tech. Papers*, vol. XLV, pp. 250–251, Feb. 2002.
- [L5] H. Noguchi *et al.*, "A 9.9G-10.8Gb/s Rate-Adaptive Clock and Data Recovery with No External Reference Clock for WDM Optical Fiber Transmission," in *ISSCC Dig. Tech. Papers*, vol. XLV, pp. 252–253, Feb. 2002.
- [L6] T. Takeshita and T. Nishimura, "A 622Mb/s Fully-Integrated Optical IC with a Wide Range Input," in *ISSCC Dig. Tech. Papers*, vol. XLV, pp. 258–259, Feb. 2002.
- [L7] A. Koyama *et al.*, "43Gb/s Full-Rate-Clock 16:1 Multiplexer and 1:16 Demultiplexer with SFI Interface in SiGe BiCMOS Technology," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 232–233, Feb. 2003.
- [L8] K. Watanabe *et al.*, "A Low-Jitter 16:1 MUX and High-Sensitivity 1:16 DEMUX with Integrated 39.8 to 43GHz VCO for OC-768 Communication Systems," in *ISSCC Dig. Tech. Papers*, vol. XLVII, pp. 166–167, Feb. 2004.
- [L9] Y. Ohtomo *et al.*, "A 12.5Gb/s CMOS BER Test Using a Jitter-Tolerant Parallel CDR," in *ISSCC Dig. Tech. Papers*, vol. XLVI, pp. 174–175, Feb. 2004.

- [L10 M. Perrott *et al.*, "A 2.5Gb/s Multi-Rate 0.25µm CMOS CDR Utilizing a Hybrid Analog/Digital Loop Filter," in *ISSCC Dig. Tech. Papers*, vol. XLIX, pp. 1276–1285, Feb. 2006.
- [L11] S. Byun *et al.*, "A 10Gb/s CMOS CDR and DEMUX IC with a Quarter-Rate Linear Phase Detector," in *ISSCC Dig. Tech. Papers*, vol. XLIX, pp. 1324–1333, Feb. 2006.

# 8 List of figures

| Figure 1.1 Serial data communication system                                              | 1         |
|------------------------------------------------------------------------------------------|-----------|
| Figure 1.2 The data readout networks in particle physics experiment                      | 3         |
| Figure 1.3 The data communication networks for time and clock distribution               | 4         |
| Figure 1.4 The block diagram of CDR for time and clock distribution                      | 5         |
| Figure 2.1 The function and basic block diagram of CDR                                   | 7         |
| Figure 2.2 CDR specifications                                                            | 9         |
| Figure 2.3 Device noise in the data buffer                                               | 12        |
| Figure 2.4 The phase error induced by the offset voltage in data buffer                  | 14        |
| Figure 2.5 Channel Jitter                                                                | 15        |
| Figure 2.6 The detail of data-dependent jitter                                           | 16        |
| Figure 2.7 An example of data-dependent jitter                                           | 17        |
| Figure 2.8 Data-dependent jitter                                                         | 17        |
| Figure 2.9 Block diagrams of PLL-based clock synthesizer and PLL-based CDR               | 18        |
| Figure 2.10 PLL linear model                                                             | 20        |
| Figure 2.11 The open loop and closed loop responses of PLL linear model                  | 21        |
| Figure 2.12 The open and closed loop responses of the 3 <sup>rd</sup> -order loop by var | ious loop |
| resistors                                                                                | 23        |
| Figure 2.13 Jitter in PLL                                                                | 24        |
| Figure 2.14 Phase noise (jitter) contribution in PLL                                     | 25        |
| Figure 2.15 The CDR with clock extraction and phase tracking loops                       | 27        |
| Figure 2.16 The CDR with a clock-jitter-filter                                           | 28        |
| Figure 3.1 CDR with frequency initialization loop                                        | 32        |
| Figure 3.2 CDR with phase interpolation                                                  | 33        |
| Figure 3.3 CDR without external reference clock                                          | 34        |
| Figure 3.4 Linear phase detector                                                         | 35        |
| Figure 3.5 Binary-state and ternary-state binary phase detectors                         | 37        |
| Figure 3.6 Alexander's phase detector and the statistic approach of a binary PD ga       | in 39     |
| Figure 3.7 CDR reported in international solid-state circuit conference from 199         | 9 to 2006 |
|                                                                                          | 42        |

| Figure 3.8 The operation of the rotational frequency detector                | 44             |
|------------------------------------------------------------------------------|----------------|
| Figure 3.9 Rotation ring analogy: clock frequency is higher                  | 44             |
| Figure 3.10 Rotation ring analog: clock frequency is lower                   | 45             |
| Figure 3.11 The block diagram and operation of a rotational frequency detect | or 46          |
| Figure 3.12 The sampling phases of CDRs without and with clock rate reduct   | ion 47         |
| Figure 3.13 Circuit topology comparison between CMOS logic and current-m     | ode logic 48   |
| Figure 3.14 Power consumptions of CMOS logic and CML [23]                    | 50             |
| Figure 3.15 Detail power consumptions of CML and CMOS-logic PFDs             | 51             |
| Figure 3.16 The block diagram of the power efficient 1/4-rate CDR and its sa | mpling phases  |
|                                                                              | 53             |
| Figure 3.17 Input state of 1/4-rate CDR                                      | 54             |
| Figure 3.18 Schematics and operation of sense-amplifier and latch            | 55             |
| Figure 3.19 The operation of the synchronous 1/4-rate PFD: clock frequency   | y too low [15] |
|                                                                              | 57             |
| Figure 3.20 The operation of the reduced-sampling-phase time-interleaving    | 1/4-rate PFD,  |
| f <sub>clk</sub> too low                                                     | 57             |
| Figure 3.21 Block diagram of a reduced-sampling-phase time-interleaving 1/4  | 1-rate PFD. 58 |
| Figure 3.22 The operation of the reduced-sampling-phase time-interleaving    | 1/4-rate PFD,  |
| f <sub>clk</sub> too high                                                    | 59             |
| Figure 3.23 The operation of the reduced-sampling-phase time-interleaving    | 1/4-rate PFD,  |
| f <sub>clk</sub> too low                                                     | 60             |
| Figure 3.24 Ring-based VCO                                                   | 61             |
| Figure 3.25 Device noises in ring-based VCO                                  | 62             |
| Figure 3.26 Calculation and simulation of VCO phase noise at 500 MHz         | 63             |
| Figure 3.27 A skew calibration scheme                                        | 64             |
| Figure 3.28 The sampling phase generator and the skew calibration circuit    | 65             |
| Figure 3.29 Phase offset induced by charge offset or current mismatch        | 66             |
| Figure 3.30 Charge pump circuit                                              | 67             |
| Figure 3.31 PLL-Based CDRs without and with the divided frequency impul      |                |
|                                                                              |                |
| Figure 3.32 The operation of the divided frequency impulse modulation        |                |

| Figure 3.33 A linear model of the 1/4-rate CDR with divided frequency in            | npulse |
|-------------------------------------------------------------------------------------|--------|
| modulation                                                                          | 70     |
| Figure 3.34 Jitter transfer functions and Jitter tolerance of binary phase detector | 72     |
| Figure 3.35 Simulation of CDR approaching lock                                      | 73     |
| Figure 3.36 Eye diagram of the incoming serial data and the recovered clocks        | 74     |
| Figure 3.37 the simulated CDR loop bandwidth                                        | 75     |
| Figure 3.38 The die photograph of the 1/4-rate CDR                                  | 76     |
| Figure 3.39 DEMUX output and the jitter histogram of the recovered clock            | 77     |
| Figure 3.40 The 1/4-rate CDR testing with serial data with ISI                      | 77     |
| Figure 3.41 The jitter tolerance of the 1/4-rate CDR at 2Gb/s                       | 77     |
| Figure 4.1 Quartz oscillator phase-locked loop (QPLL) [34]                          | 80     |
| Figure 4.2 The block diagram of PLL-based clock-jitter-filter with LC-VCO           | 81     |
| Figure 4.3 Phase frequency detector for clocks                                      | 82     |
| Figure 4.4 The characteristic curve of PFD and its operations                       | 83     |
| Figure 4.5 Current mode logic (CML) circuits                                        | 84     |
| Figure 4.6 On-chip inductor                                                         | 85     |
| Figure 4.7 LC-VCO circuit                                                           | 87     |
| Figure 4.8 NMOS in N-well as Varactor                                               | 87     |
| Figure 4.9 The VCO characteristic curves by different digital control bits          | 88     |
| Figure 4.10 The phase noise of the LC-VCO from simulation and calculation           | 89     |
| Figure 4.11 The block diagram of the frequency divider                              | 90     |
| Figure 4.12 The block diagram of the configurable part of the frequency divider (   | CMOS   |
| logic)                                                                              | 91     |
| Figure 4.13 Operation of configurable frequency divider, CB =2                      | 92     |
| Figure 4.14 The simulated jitter transfer function of the clock-jitter-filter       | 94     |
| Figure 4.15 Die photograph of clock-jitter-filter                                   | 95     |
| Figure 4.16 The jitter histogram of the output clock                                | 96     |
| Figure 5.1 The measurement setup                                                    | 99     |
| Figure 5.2 The jitter histograms of the 1/4-rate CDR and the clock-jitter-filter    | 100    |
| Figure 5.3 Output clocks by various jitter modulation frequencies                   | 101    |
| Figure 5.4 Testing with ISI jitter of 150ps P-P                                     | 102    |

| Figure 5.5 | Testing with ISI | jitter of 340ps P-P |  | 03 |
|------------|------------------|---------------------|--|----|
|------------|------------------|---------------------|--|----|

## 9 List of abbreviations

A Ampere

BNet Build Network

CDR Clock and data recovery circuit

CML Current-mode logic

CNet Concentrator network

CP Charge pump

DDJ Data dependent jitter

DEMUX Demultiplexer

DFIM Divided frequency impulse modulation

D-FF Data flipflop or one-bit data register

DLL Delay-locked loop

FD Frequency detector

FEE Front-ended electronic unit

Gb/s Giga bit per second

HNet High-level Network

LF Loop filter

MUX Multiplexer

MOS Metal-insulation-silicon

PD Phase detector

PDF Probability density function

PEA Phase error accumulator

PLL Phase-locked loop

PFD Phase frequency detector

PNet Processing Network

P-P Peak to peak

rad radian

rms root mean square

ISI Inter symbol interference

VCDL Voltage controlled delay line

VCO Voltage controlled oscillator

VCXO Voltage control crystal oscillator

## 10 References

- [1] 2005 Technical Status Report of the Compressed Baryonic Matter Experiment (CBM), see http://www.gsi.de/documents/DOC-2005-Feb-447-1.pdf, Chapter 10.
- [2] J. Cao *et al.*, "OC-192 Transmitter and Receiver in Standard 0.18-µm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 1768–1780, Dec. 2002.
- [3] A. Momtaz *et al.*, "A Fully Integrated SONET OC-48 Transceiver in Standard CMOS," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 1964–1973, Dec. 2001.
- [4] T. C. Weigandt, Low-Phase-Noise, Low-Timing-Jitter Design Techniques for Delay Cell Based VCOs and Frequency Synthesizers. PhD Thesis, University of California, Berkeley, 1998.
- [5] J. F. Buckwalter, *Deterministic Jitter in Broadband Communication*. PhD Thesis, California Institute of Technology, 2006.
- [6] F. M. Gardner, "Charge-Pump Phase-Locked loops," *IEEE Transactions on Communications*, Nov. 1980.
- [7] B. Razavi, Design of Integrated Circuits for Optical Communications. McGraw-Hill, 2003.
- [8] R. Farjad-Rad *et al.*, "A 33-mW 8-Gb/s CMOS Clock Multiplier and CDR for Highly Integrated I/O," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1553–1561, Sep. 2004.
- [9] M.-J. E. Lee, An efficient I/O and Clock Recovery Design for Terabit Integrated Circuits. PhD Thesis, Stanford University, 2001.
- [10] R. Kreienkamp et al., "A 10-Gb/s CMOS Clock and Data Recovery Circuit With an Analog Phase Interpolator," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 736–743, March 2005.
- [11] H.-T. Nq *et al.*, "A second-Order Semidigital Clock Recovery Circuit Based on Injection Locking," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 2101–2110, Dec. 2003.

- [12] A. Pottbäcker, U. Langmann, and H-U. Schreiber, "A Si Bipolar Phase and Frequency Detector IC for Clock Extraction up to 8 Gb/s," *IEEE Journal of Solid-State Circuits*, vol. 27, pp. 1747–1751, Dec. 1992.
- [13] J. Savoj and B. Razavi, "A 10Gb/s CMOS Clock and Data Recovery Circuit with Frequency Detection," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLIV, pp.78–79, Feb. 2001.
- [14] R.-J. Yang, S.-P. Chen, and S.-I Liu, "A 3.125-Gb/s Clock and Data Recovery Circuit for the 10-Gbase-LX4 Ethernet," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1356–1360, Aug. 2004.
- [15] S. Tontisirin and R. Tielert, "A Gb/s one-forth-rate CMOS CDR Circuit without External Reference Clock," in *Proc. IEEE Int. Symp. On Circuits and Systems* (*ISCAS*), pp. 3265–3268, June 2006.
- [16] C. R. Hogge, "A Self-Correcting Clock Recovery Circuit," *IEEE Journal of Lightwave Technology*, Dec. 1985
- [17] H. Nosak *et al.*, "A 10-Gb/s Data-Pattern Independent Cock and Data Recovery Circuit With a Two-Mode Phase Comparator," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 192–197, Feb. 2003.
- [18] J. D. H. Alexander, "Clock Recovery from Random Binary Signals," *Electronics Letters*, Oct. 1975.
- [19] Y. Choi, D.-K. Jeong and W. Kim, "Jitter Transfer Analysis of Tracked Oversampling Techniques for Multigigabit Clock and Data Recovery," *IEEE Transaction on Circuits and Systems-II: Analog and Digital Signal Processing*, vol.50, pp. 775–783, Nov. 2003.
- [20] M.-J. E. Lee, W. J. Dally and P. Chiang, "Low-Power Area-Efficient High-Speed I/O Circuit Techniques," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 1591–1599, Nov. 2000.
- [21] A. Buchwald and K Martin, *Integrated Fiber-Optic Receivers*, Kluwer Academic Publishers, 1995.

- [22] Y.J. Ren *et al.*, "A True Single-Phase-Clock Dynamic CMOS Circuit Technique," *IEEE Journal of Solid-State Circuits*, vol. 22, pp. 899–901, Oct. 1987.
- [23] A. Tanabe *et al.*, "0.18-µm CMOS 10-Gb/s Multiplexer/Demultiplexer Ics Using Current Mode Logic with Tolerance to Threshold Voltage Fluctuation," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 988–996, June 2001.
- [24] R. Walker, "Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems," *Phase Locking in High-Performance Systems, From Devices to Architectures*. IEEE Press, 2002.
- [25] J. Maneatis, "Low-Jitter Process-Independence DLL and PLL Based on Self-Biased Techniques," *IEEE Journal of Solid-State Circuits*, vol. 31, pp.1723–132, Dec.1996.
- [26] T. Weigandt, B. Kim and P. R. Gray, "Analysis of Time Jitter in CMOS Ring Oscillators," in *Proc. IEEE Int. Symp. On Circuits and Systems (ISCAS)*, pp. 27–30, June 1994.
- [27] A. Hajimiri, S. Limotyrakis and T. Lee, "Jitter and Phase Noise in Ring Oscillator," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 790–804, June 1999.
- [28] A. Abidi, "Phase Noise and Jitter in CMOS Ring Oscillator," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 1803–1816, Aug. 2006.
- [29] H.-R. Lee *et al.*, "A 1.2-V-Only 900-mW 10Gb Ethernet Transceiver and XAUI Interface With Robust VCO Tuning Technique," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 2148–2158, Nov. 2005.
- [30] L. Wu and C. Black Jr., "A low Jitter Skew-Calibrated Multi-Phase Clock Generator for Time Interleaved Application," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLIV, pp. 396–397, Feb. 2001.
- [31] M. Johnson and E. Hudson, "A variable delay line PLL for CPU-co-processor synchronization," *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 179–194, Feb. 1998.
- [32] Y. Greshishchev and P. Schvan, "SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 1353–1359, Feb. 2000.

- [33] F. Kabir, *Phase-Noise (Jitter) Performance of CDC7005 With Different VCXOs*, see http://focus.ti.com/lit/an/scaa067a/scaa067a.pdf.
- [34] P. Moreira and A. Marchioro, "QPLL a Quartz Crystal Based PLL for Jitter Filtering Applications in LHC," in 9 <sup>th</sup> Workshop on electronics for LHC Experiments, Sep. 2003.
- [35] M. Tiebout, "Low-power low-phase-noise differentially turned quadrature VCO design in Standard CMOS," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 1018–1024, July 2001.
- [36] J. Rael and A. Abidi, "Physical Process of Phase Noise in Differential LC-Oscillators," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, pp.569–572, May 2000.
- [37] J Craninckx and M. Steyaert, "A 1.8-GHz Low-Phase-Noise CMOS VCO Using Optimized Hollow Spiral Inductor," *IEEE Journal of Solid-State Circuits*, vol. 32, pp. 736–744, May 1997.
- [38] P. Andreani and S. Mattisson, "On the Use of MOS Varactors in RF VCO's," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 905–910, June 2000.
- [39] A.-S. Porret, T. Melly and C. C. Enz, "Design of High-Q Varactor for Low-Power Wireless Applications using a Standard CMOS Process," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, pp. 641–644, May 1999.
- [40] J. Kim *et al.*, "A 20-GHz Phase-Locked Loop for 40-Gb/s serializing Transmitter in 0.13μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 899–908, April 2006.
- [41] H. Wang *et al.*, "A 1Gb/s CMOS Clock and Data Recovery Circuit," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLII, pp. 354–355, Feb. 1999.
- [42] J. Park and W. Kim, "An Auto-Ranging 50-210Mb/s Clock Recovery Circuit with a Time-to-Digital Converter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLII, pp. 350–351, Feb. 1999.

- [43] S. Ueno *et al.*, "A Single-Chip 10Gb/s Transceiver LSI using SiGe SOI/BiCMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLIV, pp. 82–83, Feb. 2001.
- [44] H. Noguchi *et al.*, "A 9.9G-10.8Gb/s Rate-Adaptive Clock and Data Recovery with No External Reference Clock for WDM Optical Fiber Transmission," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLV, pp. 252–253, Feb. 2002.
- [45] T. Takeshita, and T. Nishimura, "A 622Mb/s Fully-Integrated Optical IC with a Wide Range Input," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLV, pp. 258–259, Feb. 2002.
- [46] R.-J. Yang *et al.*, "A 155.52 Mbps-3.125 Gbps Continuous-Rate Clock and Data Recovery Circuit," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 1380–1390, June 2006.
- [47] M. Perrott *et al.*, "A 2.5-Gb/s Multi-Rate 0.25-μm CMOS Clock and Data Recovery Circuit Utilizing a Hybrid Analog/Digital Loop Filter and All-Digital Referenceless Frequency Acquisition," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2930–2944, June 2006.
- [48] S. Sidiropoulos *et al.*, "An 800mW 10Gb Ethernet Transceiver in 0.13μm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. XLVII, pp. 168–169, Feb. 2004.
- [49] D. Dalton *et al.*, "A 12.5-Mb/s to 2.7-Gb/s Continuous-Rate CDR With Automatic Frequency Acquisition and Data-Rate Readback," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 2713–2725, Dec. 2005.
- [50] J Craninckx and M. Steyaert, "A Fully Integrated CMOS DCS-1800 Frequency Synthesiszer," *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 2054–2065, Dec. 1998.
- [51] R. v. d. Beek, *High-Speed Low-Jitter Clock Multiplication in CMOS*. PhD Thesis, University of Twente, 2004.

### **Curriculum Vitae**

Name: Sitt Tontisirin

Permanent Address: 8 Soi Ladprew 22, Ladprao Rd., Jatujak, Ladyaw, 10900,

Bangkok, Thailand

Birth: 3 March 1975

Birth Place: Bangkok, Thailand

Nationality: Thai

Gender: Male

Marital Status: Married

### **Education**

01/2002-04/2007 PhD candidate, Institute of Microelectronics, University of

Kaiserslautern, Kaiserslautern, Germany

08/1999-12/2001 Master of Science International Program, Microelectronics, University

of Kaiserslautern, Kaiserslautern, Germany

06/1992-03/1996 Bachelor of Engineering, Electrical Engineering, Chulalongkorn

University, Bangkok, Thailand

06/1990-03/1992 High School, Triam Udom Suksa School, Bangkok, Thailand

06/1987-03/1990 Secondary School, Chidladda School, Bangkok, Thailand

06/1979-03/1987 Elementary School, Vatchirawut College, Bangkok, Thailand

### **Career History**

since 07/2007 Product development engineer, Qimonda AG, Munich, Germany

01/2002-04/2007 Research assistant, Institute of Microelectronics, University of

Kaiserslautern, Kaiserslautern, Germany

12/1997-01/1999 Product design engineer, Sony Siam Industries Co., Ltd., Ayuthaya,

Thailand

12/1996-11/1997 R&D engineer, Alphatel Co., Ltd, Pathumthani, Thailand

05/1996-10/1996 Support engineer, Computer Network Administrator of Chulalongkorn

University, Chulalongkorn University, Bangkok, Thailand

#### **Publications and Conferences**

#### S. Tontisirin and R. Tielert:

#### "Gb/s CDR Circuit for Large Synchronous Networks"

European Solid-State Circuit Conference (ESSCIRC) 2007, Munich, Germany, September 2007.

#### S. Tontisirin and R. Tielert:

#### "A Gb/s one-forth-rate CMOS CDR Circuit without External Reference Clock"

IEEE International Symposium on Circuits and Systems (ISCAS) 2006, Kos, Greece, May 2006.

#### S. Tontisirin and R. Tielert:

#### "Gb/s CMOS 1-4th-rate CDR with Frequency Detector and Skew calibration"

International Symposium on VLSI Design, Automation & Test (VLSI-DAT) 2006, Hsinchu, Taiwan, April 2006.

#### S. Tontisirin and R. Tielert:

## "Loop Bandwidth Reduction Technique using Divided Feedback Impulse Modulation for a Low Jitter Gigabit CDR"

Analog Workshop 2006, Kaiserslautern, Germany, March 2006.

#### S. Tontisirin and R. Tielert:

# "2.5 Gbps Clock Data Recovery using One-Forth-Rate Quadricorrelator Frequency Detector and Skew-Calibrated Multi-Phase Clock Generator"

Kleinheubacher Conference (Tagung) 2005, Miltenberg, Germany, September 2005.

#### S. Tontisirin and R. Tielert:

#### "CMOS 2.5 Gbps 10:1 Serializer combined with VCSEL Driver"

Analog Workshop 2004, Freiburg, Germany, March 2004.

#### R. Tielert and S. Tontisirin:

#### "High Speed CMOS I/O-Circuits (solicited)"

Kleinheubacher Conference (Tagung) 2003, Miltenberg, Germany, September 2003.

#### S. Tontisirin and R. Tielert:

## "Study of Cascade-Error-Feedback Sigma Delta Modulation in DAC for Software Radio Application"

Analog Workshop 2003, Berlin, Germany, March 2003.