You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

437 lines
12 KiB

# remark-parse [![Build Status][build-badge]][build-status] [![Coverage Status][coverage-badge]][coverage-status] [![Chat][chat-badge]][chat]
[Parser][] for [**unified**][unified]. Parses markdown to an
[**mdast**][mdast] syntax tree. Used in the [**remark**
processor][processor]. Can be [extended][extend] to change how
markdown is parsed.
## Installation
[npm][]:
```bash
npm install remark-parse
```
## Usage
```js
var unified = require('unified');
var markdown = require('remark-parse');
var html = require('remark-html');
process.stdin
.pipe(unified())
.use(markdown)
.use(html)
.pipe(process.stdout, {
'commonmark': true
});
```
## Table of Contents
* [API](#api)
* [processor.use(parse)](#processoruseparse)
* [parse.Parser](#parseparser)
* [Extending the Parser](#extending-the-parser)
* [Parser#blockTokenizers](#parserblocktokenizers)
* [Parser#blockMethods](#parserblockmethods)
* [Parser#inlineTokenizers](#parserinlinetokenizers)
* [Parser#inlineMethods](#parserinlinemethods)
* [function tokenizer(eat, value, silent)](#function-tokenizereat-value-silent)
* [tokenizer.locator(value, fromIndex)](#tokenizerlocatorvalue-fromindex)
* [eat(subvalue)](#eatsubvalue)
* [add(node\[, parent\])](#addnode-parent)
* [add.test()](#addtest)
* [add.reset(node\[, parent\])](#addresetnode-parent)
* [License](#license)
## API
### `processor.use(parse)`
Configure the `processor` to read markdown as input and process an
[**mdast**][mdast] syntax tree.
#### `options`
Options are passed later through [`processor.parse()`][parse],
[`processor.process()`][process], or [`processor.pipe()`][pipe].
The following settings are supported:
* [`gfm`][options-gfm] (`boolean`, default: `true`);
* [`yaml`][options-yaml] (`boolean`, default: `true`);
* [`commonmark`][options-commonmark] (`boolean`, default: `false`);
* [`footnotes`][options-footnotes] (`boolean`, default: `false`);
* [`pedantic`][options-pedantic] (`boolean`, default: `false`);
* [`breaks`][options-breaks] (`boolean`, default: `false`).
##### `options.gfm`
```md
hello ~~hi~~ world
```
GFM mode (default: `true`) turns on:
* [Fenced code blocks](https://help.github.com/articles/github-flavored-markdown/#fenced-code-blocks);
* [Autolinking of URLs](https://help.github.com/articles/github-flavored-markdown/#url-autolinking);
* [Deletions (strikethrough)](https://help.github.com/articles/github-flavored-markdown/#strikethrough);
* [Task lists](https://help.github.com/articles/writing-on-github/#task-lists);
* [Tables](https://help.github.com/articles/github-flavored-markdown/#tables).
##### `options.yaml`
```md
---
title: YAML is Cool
---
# YAML is Cool
```
YAML mode (default: `true`) enables raw YAML front matter to be detected
at the top.
##### `options.commonmark`
```md
This is a paragraph
and this is also part of the preceding paragraph.
```
CommonMark mode (default: `false`) allows:
* Empty lines to split blockquotes;
* Parentheses (`(` and `)`) around for link and image titles;
* Any escaped [ASCII-punctuation][escapes] character;
* Closing parenthesis (`)`) as an ordered list marker;
* URL definitions (and footnotes, when enabled) in blockquotes.
CommonMark mode disallows:
* Code directly following a paragraph;
* ATX-headings (`# Hash headings`) without spacing after opening hashes
or and before closing hashes;
* Setext headings (`Underline headings\n---`) when following a paragraph;
* Newlines in link and image titles;
* White space in link and image URLs in auto-links (links in brackets,
`<` and `>`);
* Lazy blockquote continuation, lines not preceded by a closing angle
bracket (`>`), for lists, code, and thematicBreak.
##### `options.footnotes`
```md
Something something[^or something?].
And something else[^1].
[^1]: This reference footnote contains a paragraph...
* ...and a list
```
Footnotes mode (default: `false`) enables reference footnotes and inline
footnotes. Both are wrapped in square brackets and preceded by a caret
(`^`), and can be referenced from inside other footnotes.
##### `options.breaks`
```md
This is a
paragraph.
```
Breaks mode (default: `false`) exposes newline characters inside
paragraphs as breaks.
##### `options.pedantic`
```md
Check out some_file_name.txt
```
Pedantic mode (default: `false`) turns on:
* Emphasis (`_alpha_`) and importance (`__bravo__`) with underscores
in words;
* Unordered lists with different markers (`*`, `-`, `+`);
* If `commonmark` is also turned on, ordered lists with different
markers (`.`, `)`);
* And pedantic mode removes less spaces in list-items (at most four,
instead of the whole indent).
### `parse.Parser`
Access to the [parser][], if you need it.
## Extending the Parser
Most often, using transformers to manipulate a syntax tree produces
the desired output. Sometimes, mainly when introducing new syntactic
entities with a certain level of precedence, interfacing with the parser
is necessary.
If this plug-in is used, it adds a [`Parser`][parser] constructor to
the `processor`. Other plug-ins can add tokenizers to the parser’s
prototype to change how markdown is parsed.
The below plug-in adds a [tokenizer][] for at-mentions.
```js
function mentions(processor) {
var Parser = processor.Parser;
var tokenizers = Parser.prototype.inlineTokenizers;
var methods = Parser.prototype.inlineMethods;
/* Add an inline tokenizer (defined in the following example). */
tokenizers.mention = tokenizeMention;
/* Run it just before `text`. */
methods.splice(methods.indexOf('text'), 0, 'mention');
}
module.exports = mentions;
```
### `Parser#blockTokenizers`
An object mapping tokenizer names to [tokenizer][]s. These
tokenizers (for example: `fencedCode`, `table`, and `paragraph`) eat
from the start of a value to a line ending.
### `Parser#blockMethods`
Array of `blockTokenizers` names (`string`) specifying the order in
which they run.
### `Parser#inlineTokenizers`
An object mapping tokenizer names to [tokenizer][]s. These tokenizers
(for example: `url`, `reference`, and `emphasis`) eat from the start
of a value. To increase performance, they depend on [locator][]s.
### `Parser#inlineMethods`
Array of `inlineTokenizers` names (`string`) specifying the order in
which they run.
### `function tokenizer(eat, value, silent)`
```js
function tokenizeMention(eat, value, silent) {
var match = /^@(\w+)/.exec(value);
if (match) {
if (silent) {
return true;
}
return eat(match[0])({
'type': 'link',
'url': 'https://social-network/' + match[1],
'children': [{
'type': 'text',
'value': match[0]
}]
});
}
}
tokenizeMention.notInLink = true;
tokenizeMention.locator = locateMention;
```
The parser knows two types of tokenizers: block level and inline level.
Block level tokenizers are the same as inline level tokenizers, with
the exception that the latter must have a [locator][].
Tokenizers _test_ whether a document starts with a certain syntactic
entity. In _silent_ mode, they return whether that test passes.
In _normal_ mode, they consume that token, a process which is called
“eating”. Locators enable tokenizers to function faster by providing
information on where the next entity may occur.
###### Signatures
* `Node? = tokenizer(eat, value)`;
* `boolean? = tokenizer(eat, value, silent)`.
###### Parameters
* `eat` ([`Function`][eat]) — Eat, when applicable, an entity;
* `value` (`string`) — Value which may start an entity;
* `silent` (`boolean`, optional) — Whether to detect or consume.
###### Properties
* `locator` ([`Function`][locator])
— Required for inline tokenizers;
* `onlyAtStart` (`boolean`)
— Whether nodes can only be found at the beginning of the document;
* `notInBlock` (`boolean`)
— Whether nodes cannot be in blockquotes, lists, or footnote
definitions;
* `notInLink` (`boolean`)
— Whether nodes cannot be in lists.
* `notInLink` (`boolean`)
— Whether nodes cannot be in links.
###### Returns
* In _silent_ mode, whether a node can be found at the start of `value`;
* In _normal_ mode, a node if it can be found at the start of `value`.
### `tokenizer.locator(value, fromIndex)`
```js
function locateMention(value, fromIndex) {
return value.indexOf('@', fromIndex);
}
```
Locators are required for inline tokenization to keep the process
performant. Locators enable inline tokenizers to function faster by
providing information on the where the next entity occurs. Locators
may be wrong, it’s OK if there actually isn’t a node to be found at
the index they return, but they must skip any nodes.
###### Parameters
* `value` (`string`) — Value which may contain an entity;
* `fromIndex` (`number`) — Position to start searching at.
###### Returns
Index at which an entity may start, and `-1` otherwise.
### `eat(subvalue)`
```js
var add = eat('foo');
```
Eat `subvalue`, which is a string at the start of the
[tokenize][tokenizer]d `value` (it’s tracked to ensure the correct
value is eaten).
###### Parameters
* `subvalue` (`string`) - Value to eat.
###### Returns
[`add`][add].
### `add(node[, parent])`
```js
var add = eat('foo');
add({type: 'text', value: 'foo'});
```
Add [positional information][location] to `node` and add it to `parent`.
###### Parameters
* `node` ([`Node`][node]) - Node to patch position on and insert;
* `parent` ([`Node`][node], optional) - Place to add `node` to in
the syntax tree. Defaults to the currently processed node.
###### Returns
The given `node`.
### `add.test()`
Get the [positional information][location] which would be patched on
`node` by `add`.
###### Returns
[`Location`][location].
### `add.reset(node[, parent])`
`add`, but resets the internal location. Useful for example in
lists, where the same content is first eaten for a list, and later
for list items
###### Parameters
* `node` ([`Node`][node]) - Node to patch position on and insert;
* `parent` ([`Node`][node], optional) - Place to add `node` to in
the syntax tree. Defaults to the currently processed node.
###### Returns
The given `node`.
## License
[MIT][license] © [Titus Wormer][author]
<!-- Definitions -->
[build-badge]: https://img.shields.io/travis/wooorm/remark.svg
[build-status]: https://travis-ci.org/wooorm/remark
[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/remark.svg
[coverage-status]: https://codecov.io/github/wooorm/remark
[chat-badge]: https://img.shields.io/gitter/room/wooorm/remark.svg
[chat]: https://gitter.im/wooorm/remark
[license]: https://github.com/wooorm/remark/blob/master/LICENSE
[author]: http://wooorm.com
[npm]: https://docs.npmjs.com/cli/install
[unified]: https://github.com/wooorm/unified
[parse]: https://github.com/wooorm/unified#processorparsefilevalue-options
[process]: https://github.com/wooorm/unified#processorprocessfilevalue-options-done
[pipe]: https://github.com/wooorm/unified#processorpipestream-options
[processor]: https://github.com/wooorm/remark/blob/master/packages/remark
[mdast]: https://github.com/wooorm/mdast
[escapes]: http://spec.commonmark.org/0.25/#backslash-escapes
[node]: https://github.com/wooorm/unist#node
[location]: https://github.com/wooorm/unist#location
[options-gfm]: #optionsgfm
[options-yaml]: #optionsyaml
[options-commonmark]: #optionscommonmark
[options-footnotes]: #optionsfootnotes
[options-pedantic]: #optionspedantic
[options-breaks]: #optionsbreaks
[parser]: https://github.com/wooorm/unified#processorparser
[extend]: #extending-the-parser
[tokenizer]: #function-tokenizereat-value-silent
[locator]: #tokenizerlocatorvalue-fromindex
[eat]: #eatsubvalue
[add]: #addnode-parent