Building a Rich Text Editor

Building a rich text editor might as well be a rite of passage for frontend developers. Most people doing any kind of digital work have used this type of editor before, and most content-based websites have some kind of unique text editor in their interface. A textarea might be good enough for 90% of use cases, but that last 10% is where you’ll find some of the more recognizable text editors on the Internet.

A rich text editor, a type of WYSIWYG (“What You See Is What You Get”) editor, is a catch-all term describing a text editing interface where what the user sees in the editor has a similar appearance to what the end-user sees. Whether you’re writing a blog post, an email, or a tweet, you’re probably using a custom-built WYSIWYG text editor. And it was probably a pain.

Before we talk about how these editors can be built, let’s take a quick look at what’s out there.

Google’s Gmail service is obviously massively popular. Countless emails are drafted up in this editor every day. Most emails fall into two camps: all-text for basic communications, or lots of images stitched together, more commonly used for marketing. Gmail’s editor is clearly more focused on text, giving you controls for bolding and italicizing your text, as well as adding header styles and properties like text alignment. You can also add images into the email, with an easy drag and drop interface. The important thing to take note of is that the email appears the same to you in the editor as it will in the recipient’s inbox. If you select text and click “bold”, that information becomes visually apparent immediately. If you add an image, you won’t see <img src='images/your-image-here.jpg' />; you’ll see the image as soon as it’s done uploading. There’s no guesswork involved in what your users will see.

Another editor you might be familiar with is Twitter’s new tweet box.

At first glance, the box is pretty straightforward. You have a faded placeholder message prompting you to type into it. You can spread your tweet out across multiple lines, but there’s no option for bold or italic text. You can’t add inline images like Gmail lets you. However, Twitter’s doing some interesting find-and-replace work in this box as you type. If you type a hashtag with the # symbol, or tag another user with an @, Twitter will immediately swap out the boring, black text, with their branded blue color, making those pieces of text look more like a hyperlink. Again, you’re seeing what the end-user (in this case, your followers) will see.

Twitter knows to remove the hyperlink as soon as the # or @ are removed, and for some of the bigger branded hashtags, they’ll even add extra content into your tweet for you in the form of custom emojis.

Pretty slick.

If you’re a developer reading this, you’re probably familiar with GitHub’s interface. GitHub takes a more functional approach to text editing; instead of giving live visual updates as you type up an issue, pull request or comment, they give users a tabbed editing box, allowing you to toggle between “Write” and “Preview” views of your content easily.

GitHub uses their own brand of Markdown syntax, a common formatting language that transforms contextual symbols into HTML. Surrounding text with **asterisks** results in text wrapped in strong tags. Using _underscores_ gives you italicized text wrapped in em tags.

This editor doesn’t give us the kind of immediate visual feedback that other editors do, but it does make the serialization layer very apparent. Markdown is an easy syntax to recognize, and once you’re familiar with it, it’s easy to translate into what the approximate HTML output would be. The format is portable, lightweight and simple enough for non-technical users to grasp, even if they’re not seeing exactly what the end-user is going to see.

Since GitHub’s not directly storing user-submitted HTML, they’re also protecting themselves against some security threats of script injection. Beyond that, simplifying the data that they’re working with helps to make the logic for the editor itself simpler.

The challenges of building an editor

Let’s think about how we could implement this with an Ember component.

Ember is our preferred front-end framework at Movable Ink, because it gives us a very structured system to manage the complexity of our applications. The “What is Ember?” section of the official Ember guides describes it as “a JavaScript front-end framework designed to help you build websites with rich and complex user interactions.” This editor definitely qualifies as rich and complex, making Ember a great choice for such a project. If you’re not already familiar, you may want to read into the Ember guides a bit further.

Your first instinct might be to start with a <input type='text' /> tag, but then we lose the ability to show line breaks in the text. Using a textarea tag fixes that problem for us.

So we’ve got your textarea, and if we know we’re going to try to handle a find and replace for those hashtags, we’ll probably be using some kind of RegExp match and an event handler. If we look very closely, you can see that the blue highlighting for a hashtag doesn’t get added until after at least one character appears next to the # sign, so Twitter probably isn’t using a keydown or input handler. If they were, we wouldn’t see that quick flash as the black text gets wrapped in their link styling.

We tend to use DockYard’s excellent ember-one-way-controls addon for our form fields, as they help enforce the Data Down, Actions Up (DDAU) philosophy which Ember has embraced.

When a user types into the textarea, the one-way-textarea component logic reacts to an input event, in which it pulls the value property off of the textarea, and passes that value to whatever update action has been defined on it. The update action is expected to change the value attribute that’s passed into the component, triggering a re-render and displaying the new, updated value to the user inside the textarea.

Something like this should serve us:

// controllers/application.js
export default Ember.Controller.extend({
  text: 'I am a #Jedi, like my father before me.',

  actions: {
    updateText(newValue) {
      this.set('text', newValue);
    }
  }
});
// components/tweet-box.js
export default Ember.Component.extend({
  hashtagPattern: /[^>](#[\w\d]+)/g,

  html: Ember.computed('text', function() {
    const text = this.get('text');
    const pattern = this.get('hashtagPattern');

    return text.replace(pattern, "<span class='hashtag'>$1</span>");
  })
});
// templates/application.hbs
{{tweet-box text=text onchange=(action 'updateText')}}

<style>
  .hashtag { color: blue; }
</style>
// templates/components/tweet-box.hbs
{{one-way-textarea value=html update=(action onchange)}}

This looks decent- Our RegExp will match any text prefixed with a # that isn’t already wrapped in a span tag, and surround it with opening and closing tags. We’re checking that the hashtag isn’t already wrapped pretty naively, but for a quick demo, it’ll do. If you load up this example in Ember Twiddle, you’ll quickly realize that we’re not getting what we want anyway.

The immediate problem is that we’re not seeing the visual feedback we wanted. The link should be colored blue, and we shouldn’t be seeing the HTML that’s being generated. There’s also some nasty cursor behavior going on- whenever our HTML runs a find-and-replace and wraps a new hashtag, the cursor immediately jumps to the end of the textarea.

textarea tags and normal input fields can’t render HTML inside of them. Fortunately, there’s a widely available native solution in the contenteditable HTML attribute. Setting nearly any HTML tag to use contenteditable="true" will let you type inside the element as if it was a textarea tag. However, since you’re editing the HTML directly, and not actually transforming it into a form field, it will still render the way you want, respecting whatever CSS you’ve applied to the page.

If we update our component to use a <div contenteditable="true"> tag instead of a textarea, things start to look a little better. We don’t have access to most of the event handlers that input and textarea tags have, but contenteditable elements do fire an input event. We can pull the innerHTML property of our editable div inside that event, and use that in place of our nice update action in the DockYard {{one-way-textarea}} component.

Here’s our updated Twiddle.

It’s looking better, but interacting with the editable div is even worse than before. Typing anything into the div moves our cursor to the front of the text. Unless you’re typing in a language read right-to-left, this is less than ideal.

Further investigation will reveal more problems, some that come from Glimmer (Ember’s rendering engine) throwing up its hands when trying to understand what you’re doing inside the editable div. Clearing the contents of the div throws an error, because you’ve unexpectedly removed HTML nodes that are invisible to you.

The contenteditable attribute is a double-edged sword. It gives you some really nice behavior out of the box, including the ability to toggle bold and italicized text with the expected shortcuts (Command+B or I in macOS). Pasting HTML inside the editable element will render it as you would expect, with full styling.

Aside from that, the contenteditable attribute is not going to get you to the finish line of building a rich text editor without putting in more of the heavy-lifting yourself, and the more custom behavior you add into the editor, the more you’ll have to wrestle with the browser’s native behavior.

If you’ve been tasked with building one of these editors, the contenteditablemight seem like a quick solution, but it’s a building block at best. A good solution will probably involve using the attribute in some way, but you’re still going to need to get your hands dirty to give your users a better experience.

The bottom line is that there are no easy solutions for building a text editor with a great user experience. It’s a common problem to solve, but oftentimes, your particular needs are going to require something bespoke.

Our Experience

At Movable Ink, we recently released a new WYSIWYG editor inside our web platform called Studio. Prior to the release, most of the content created in our platform was built by extending some shared logic on top of a custom template, which would be built out using HTML, JavaScript and CSS. Our goal was to lower the barrier to entry for making your own content, while still delivering reliable, intelligent content to our clients, and our clients’ customers.

Studio lets you create those templates using an interface that would be familiar to anyone that has used programs like Sketch, Photoshop or PowerPoint; elements on the page are created by clicking and dragging in a workspace, using simple tools. However, the more interesting parts of our platform are centered on delivering content that isn’t fully determined until an email is opened. A dynamic image might be built with our platform that includes your customer’s name and some account details, for example, but representing that template inside Studio in a way that clearly communicates how it will change is challenging. Studio has made it much easier to create the visual template that your content will be presented inside, but we faced a fairly unique challenge of making sure that users understood that the text that they see inside their workspace is not what the end-user will see.

We use a token system inside fields that hold “dynamic text” to show that parts of the text will change at time-of-open. A personalized message might be shown in our interface as “Hello, [mi_name]!”, but a user unfamiliar with our system might read that and be confused as to what “mi_name” represents. We decided to build out our own rich text editor that would perform a similar kind of find-and-replace logic that other editors use- If a user types something matching our token format, it should immediately be visually distinguished from the rest of the static text surrounding it. We also wanted our text editor to support the same kind of standard controls that others use to apply bold or italic styles to text.

We’re still in the process of expanding our text editor to cover all of our needs, but we’ve come to a few conclusions in our development process. Here’s some advice that could help guide your own plans.

It’s dangerous to go alone!

Be invisible

Editing text is such a core experience of working on a computer, that most users don’t think twice about things that the editor manages for then. There are lots of things the editor is keeping track of that you might take for granted. A good editor shouldn’t sacrifice any of those things. Users probably won’t be very impressed when you get those things right, but they’ll immediately realize when something doesn’t work as they expect. Your implementation should feel like it’s native to the browser. This includes things like copying and pasting text, using the undo and redo shortcuts, cursor movement, being able to make text bold or italic, indenting text, and others. All of these minor interactions add up to something familiar to most users. Stepping outside of that familiar territory will make your editor feel half-baked. Your goal should always be to make your work as invisible to the end-user as possible.

Use the tools that you’re given

The contenteditable property will get you close to a rich text editor, but as you add your own abstractions on top of that API, it’s important to make sure you’re not stepping in front of the browser unnecessarily. The browser handles all of those small features we just talked about, requiring nothing from your end. You’ll probably find yourself needing to recreate some of that behavior yourself, but use as light a touch as you can there. For example, our tweet-box component that we built earlier had an issue where it would reset the user’s cursor position anytime its find-and-replace logic ran on a hashtag. That’s unacceptable, but your solution should be as minimal as possible; if you need to save and restore the cursor’s position at certain moments, that’s fine, but taking it a step further to controlling all of the logic for cursor movement could be a tangled nightmare quickly. Modern, major browsers are built and maintained by large teams, who are paid to build and test efficient, clean platforms for you to build on top of. Leveraging the work those teams have done for you as much as possible frees you and your team to build new functionality rather than reinventing the wheel.

Understand the tools that you’re given

If you’re going to be doing any kind of programmatic editing inside the field (like toggling markup or wrapping text in tags as it’s typed), there are a few browser APIs you’re going to want to become an expert in. The Selection and Range objects are good tools that are available to you to manage cursor positions and apply changes to a section of a tag, or to a group of tags. You’re going to want to brush up on DOM traversal in general. The Node and Element documentation is a good place to start.

Assume malicious users will find your interface

Anytime you’re working with user-submitted HTML, security should be on your mind. Script injection attacks are a real threat, and even if you believe your users are trustworthy, it’s important not to give them the benefit of the doubt when it comes to security. If anyone outside of your team has access to your platform, you should expect to face some form of malicious user at some point. This part of the development process is going to be largely dependent on your specific use case, but generally, never take user input directly. Ideally, you should have a serialization layer between what the user types and what HTML gets generated. Allowing users to type or paste HTML into your field is dangerous, so you should do what you can to take input and turn it into HTML yourself. At the very least, you should whitelist which tags are allowed in your HTML, and escape anything that doesn’t make the list.

Rely on the community whenever possible

It’s tempting to look at your specific challenges as completely unique, but chances are, other teams and other developers have faced the same challenges at some point, and hopefully, have shared their solutions. There are already multiple well-designed editors out there that will probably fill your needs. You might normally prefer to roll your own solutions to cut down on vendor bloat, but in this case, it’s usually better to fall back on a community-provided solution for a number of reasons. If you find an open source package with even a small user base, you’ve already got more people beyond your own team invested in making sure that the tool is a good one.

Some of the more fully-featured editors we’re fans of are Quill and Bustle Lab’s Mobiledoc-Kit.

Ultimately, your primary goal should be to create a seamless experience for your users. With the right amount of preparation, you can deliver that experience. But if you try to dive into building a rich editor without doing the due diligence of building acceptance criteria, investigating your available options, and testing your work thoroughly, you’re going to have a bad time.