介绍
logstash拥有丰富的filter插件,它们扩展了进入过滤器的原始数据,进行复杂的逻辑处理,甚至可以无中生有的添加新的 logstash 事件到后续的流程中去!Grok 是 Logstash 最重要的插件之一。也是迄今为止使蹩脚的、无结构的日志结构化和可查询的最好方式。Grok在解析 syslog logs、apache and other webserver logs、mysql logs等任意格式的文件上表现完美。
- 解析任意文本并对其进行结构化。
- Grok 是将非结构化日志数据解析为结构化和可查询的好方法。
- 该工具非常适合 syslog 日志、apache 和其他网络服务器日志、mysql 日志,以及通常为人类而非计算机消耗而编写的任何日志格式。
官网地址:
https://www.elastic.co/guide/en/logstash/7.17/plugins-filters-grok.html#_getting_help_116
grok_10">使用grok前注意
grok 模式是正则表达式,因此这个插件的性能受到正则表达式引擎严重影响。尽管知道 grok 模式与日志条目可以多快匹配非常重要,但是了解它在什么时候匹配失败也很重要。匹配成功和匹配失败的性能可能会差异很大。
原文地址:https://www.elastic.co/blog/do-you-grok-grok
翻译地址:https://segmentfault.com/a/1190000013051254
Logstash 常用正则
grokpatterns_17">grok-patterns
Github链接:https://github.com/logstash-plugins/logstash-patterns-core/blob/main/patterns/ecs-v1/grok-patterns
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
EMAILLOCALPART [a-zA-Z0-9!#$%&'*+\-/=?^_`{|}~]{1,64}(?:\.[a-zA-Z0-9!#$%&'*+\-/=?^_`{|}~]{1,62}){0,63}
EMAILADDRESS %{EMAILLOCALPART}@%{HOSTNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\bPOSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
# URN, allowing use of RFC 2141 section 2.3 reserved characters
URN urn:[0-9A-Za-z][0-9A-Za-z-]{0,31}:(?:%[0-9a-fA-F]{2}|[0-9A-Za-z()+,.:=@;$_!*'/?#-])+# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
IPORHOST (?:%{IP}|%{HOSTNAME})
HOSTPORT %{IPORHOST}:%{POSINT}# paths (only absolute paths are matched)
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (/[[[:alnum:]]_%!$@:.,+~-]*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]([A-Za-z0-9+\-.]+)+
URIHOST %{IPORHOST}(?::%{POSINT})?
# uripath comes loosely from RFC1738, but mostly from what Firefox doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]*)+
URIQUERY [A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*
# deprecated (kept due compatibility):
URIPARAM \?%{URIQUERY}
URIPATHPARAM %{URIPATH}(?:\?%{URIQUERY})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATH}(?:\?%{URIQUERY})?)?# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHNUM2 (?:0[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)# Years?
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND %{SECOND}
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[APMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG [\x21-\x5a\x5c\x5e-\x7e]+
SYSLOGPROG %{PROG:[process][name]}(?:\[%{POSINT:[process][pid]:int}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{NONNEGINT:[log][syslog][facility][code]:int}.%{NONNEGINT:[log][syslog][priority]:int}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}# Shortcuts
QS %{QUOTEDSTRING}# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:[host][hostname]} %{SYSLOGPROG}:# Log Levels
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo?(?:rmation)?|INFO?(?:RMATION)?|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
Java
Github链接:https://github.com/logstash-plugins/logstash-patterns-core/blob/main/patterns/ecs-v1/java
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
EMAILLOCALPART [a-zA-Z0-9!#$%&'*+\-/=?^_`{|}~]{1,64}(?:\.[a-zA-Z0-9!#$%&'*+\-/=?^_`{|}~]{1,62}){0,63}
EMAILADDRESS %{EMAILLOCALPART}@%{HOSTNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\bPOSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
# URN, allowing use of RFC 2141 section 2.3 reserved characters
URN urn:[0-9A-Za-z][0-9A-Za-z-]{0,31}:(?:%[0-9a-fA-F]{2}|[0-9A-Za-z()+,.:=@;$_!*'/?#-])+# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
IPORHOST (?:%{IP}|%{HOSTNAME})
HOSTPORT %{IPORHOST}:%{POSINT}# paths (only absolute paths are matched)
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (/[[[:alnum:]]_%!$@:.,+~-]*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]([A-Za-z0-9+\-.]+)+
URIHOST %{IPORHOST}(?::%{POSINT})?
# uripath comes loosely from RFC1738, but mostly from what Firefox doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]*)+
URIQUERY [A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*
# deprecated (kept due compatibility):
URIPARAM \?%{URIQUERY}
URIPATHPARAM %{URIPATH}(?:\?%{URIQUERY})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATH}(?:\?%{URIQUERY})?)?# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHNUM2 (?:0[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)# Years?
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND %{SECOND}
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[APMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG [\x21-\x5a\x5c\x5e-\x7e]+
SYSLOGPROG %{PROG:[process][name]}(?:\[%{POSINT:[process][pid]:int}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{NONNEGINT:[log][syslog][facility][code]:int}.%{NONNEGINT:[log][syslog][priority]:int}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}# Shortcuts
QS %{QUOTEDSTRING}# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:[host][hostname]} %{SYSLOGPROG}:# Log Levels
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo?(?:rmation)?|INFO?(?:RMATION)?|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
Httpd
Github链接:https://github.com/logstash-plugins/logstash-patterns-core/blob/main/patterns/ecs-v1/httpd
JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]*
#Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source'
JAVAFILE (?:[a-zA-Z$_0-9. -]+)
#Allow special <init>, <clinit> methods
JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
#Line number is optional in special cases 'Native method' or 'Unknown source'
JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:[java][log][origin][class][name]}\.%{JAVAMETHOD:[log][origin][function]}\(%{JAVAFILE:[log][origin][file][name]}(?::%{INT:[log][origin][file][line]:int})?\)
# Java Logs
JAVATHREAD (?:[A-Z]{2}-Processor[\d]+)
JAVALOGMESSAGE (?:.*)# MMM dd, yyyy HH:mm:ss eg: Jan 9, 2014 7:13:13 AM
# matches default logging configuration in Tomcat 4.1, 5.0, 5.5, 6.0, 7.0
CATALINA7_DATESTAMP %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} (?:AM|PM)
CATALINA7_LOG %{CATALINA7_DATESTAMP:timestamp} %{JAVACLASS:[java][log][origin][class][name]}(?: %{JAVAMETHOD:[log][origin][function]})?\s*(?:%{LOGLEVEL:[log][level]}:)? %{JAVALOGMESSAGE:message}# 31-Jul-2020 16:40:38.578 in Tomcat 8.5/9.0
CATALINA8_DATESTAMP %{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:%{MINUTE}:%{SECOND}
CATALINA8_LOG %{CATALINA8_DATESTAMP:timestamp} %{LOGLEVEL:[log][level]} \[%{DATA:[java][log][origin][thread][name]}\] %{JAVACLASS:[java][log][origin][class][name]}\.(?:%{JAVAMETHOD:[log][origin][function]})? %{JAVALOGMESSAGE:message}CATALINA_DATESTAMP (?:%{CATALINA8_DATESTAMP})|(?:%{CATALINA7_DATESTAMP})
CATALINALOG (?:%{CATALINA8_LOG})|(?:%{CATALINA7_LOG})# in Tomcat 5.5, 6.0, 7.0 it is the same as catalina.out logging format
TOMCAT7_LOG %{CATALINA7_LOG}
TOMCAT8_LOG %{CATALINA8_LOG}# NOTE: a weird log we started with - not sure what TC version this should match out of the box (due the | delimiters)
TOMCATLEGACY_DATESTAMP %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}(?: %{ISO8601_TIMEZONE})?
TOMCATLEGACY_LOG %{TOMCATLEGACY_DATESTAMP:timestamp} \| %{LOGLEVEL:[log][level]} \| %{JAVACLASS:[java][log][origin][class][name]} - %{JAVALOGMESSAGE:message}TOMCAT_DATESTAMP (?:%{CATALINA8_DATESTAMP})|(?:%{CATALINA7_DATESTAMP})|(?:%{TOMCATLEGACY_DATESTAMP})TOMCATLOG (?:%{TOMCAT8_LOG})|(?:%{TOMCAT7_LOG})|(?:%{TOMCATLEGACY_LOG})
默认情况下,Logstash 附带大约 120 种模式。你可以在这里找到它们: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns 。
如果您需要帮助构建模式以匹配您的日志,您会发现以下应用程序非常有用!
实例
- Java实例一:
2022-05-31 19:34:29.271 INFO [user-center,,,] 20731 --- [com.alibaba.nacos.client.naming.updater] com.alibaba.nacos.client.naming : current ips:(1) service: DEFAULT_GROUP@@ai-files-server -> [{"clusterName":"DEFAULT","enabled":true,"ephemeral":true,"healthy":true,"instanceId":"172.28.2.54#8098#DEFAULT#DEFAULT_GROUP@@ai-files-server","ip":"172.28.2.54","metadata":{"preserved.register.source":"SPRING_CLOUD"},"port":8098,"serviceName":"DEFAULT_GROUP@@ai-files-server","weight":1.0}]
2022-05-31 19:34:30.081 INFO [user-center,,,] 20731 --- [com.alibaba.nacos.client.naming.updater] com.alibaba.nacos.client.naming : current ips:(1) service: DEFAULT_GROUP@@gateway-zuul -> [{"clusterName":"DEFAULT","enabled":true,"ephemeral":true,"healthy":true,"instanceId":"172.28.2.54#8080#DEFAULT#DEFAULT_GROUP@@gateway-zuul","ip":"172.28.2.54","metadata":{"preserved.register.source":"SPRING_CLOUD"},"port":8080,"serviceName":"DEFAULT_GROUP@@gateway-zuul","weight":1.0}]
表达式:
%{TIMESTAMP_ISO8601:date} %{SPACE}%{LOGLEVEL:level} %{SPACE}\[%{NOTSPACE:service},*] %{SPACE}%{WORD:thread} %{NOTSPACE} %{SPACE}\[%{JAVACLASS:logger}] %{SPACE}%{JAVACLASS:loginfo} %{SPACE}%{GREEDYDATA:msg}
- Java实例二:
2022-06-07 15:04:20.648 INFO 12358 --- [asyncExecutor_1654076742199] com.cloud.generated.cmd.VideoProcessor : 执行时长:7126, currentIndex:0
2022-06-07 15:04:20.648 INFO 12358 --- [asyncExecutor_1654076742199] c.c.g.s.impl.BatchVideoServiceImpl : 批量视频,任务名称:3b3254b9aa834b8e895c5a8bfa91a0e0.mp4 生成耗时:7149
表达式:
%{TIMESTAMP_ISO8601:date} %{SPACE}%{LOGLEVEL:level} %{SPACE}%{WORD:thread} %{NOTSPACE}* %{SPACE}\[%{JAVACLASS:logger}] %{SPACE}%{NOTSPACE:loginfo} %{SPACE}:%{SPACE}%{GREEDYDATA:msg}
- Nginx实例一:
83.149.9.216 - - [17/May/2015:10:05:50 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-dashboard.png HTTP/1.1" 200 321631 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
表达式:
%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}
grok_290">grok基础
Grok 通过将文本模式组合成与您的日志匹配的内容来工作。
grok模式的语法如下:
%{SYNTAX:SEMANTIC}
- SYNTAX :匹配文本的模式名称。例如,3.44将被NUMBER模式匹配,55.3.244.1将被IP模式匹配。
- SEMANTIC :代表存储该值的一个标识符(变量名称)。例如,3.44可能是事件的持续时间,因此您可以简单地称之为duration。此外,字符串55.3.244.1可能是请求的client 客户端。
对于上面的例子,你的 grok 过滤器看起来像这样:
%{NUMBER:duration} %{IP:client}
或者,您可以将数据类型转换添加到您的 Grok 模式。默认情况下,所有语义都保存为字符串。如果您希望转换语义的数据类型,例如将字符串更改为整数,然后使用目标数据类型作为后缀。例如%{NUMBER:num:int},它将num语义从字符串转换为整数。目前唯一支持的转换是int和float。
示例:
有了语法和语义的想法,我们可以从示例日志中提取有用的字段,例如这个虚构的 http 请求日志:
55.3.244.1 GET /index.html 15824 0.043
可以使用如下grok pattern来匹配这种记录:
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
grok语法调试工具
logstash.conf添加过滤器配置
input {file {path => "/var/log/http.log"}}filter {grok {match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }}}
在 grok 过滤器之后,事件中将包含的字段:
- client: 55.3.244.1
- method: GET
- request: /index.html
- bytes: 15824
- duration: 0.043
正则表达式
Grok位于正则表达式之上,所以任何正则表达式在grok中都是有效的。正则表达式库是Oniguruma,您可以在Oniguruma网站上看到完整支持的regexp语法。
- 中文学习正则表达式网站: https://www.runoob.com/regexp/regexp-tutorial.html
自定义模式
有时logstash grok没办法提供你所需要的匹配类型。为此,您有几个选择。
Oniguruma 语法
首先,您可以使用 Oniguruma 语法进行命名捕获,它可以让您匹配一段文本并将其保存为字段:
(?<field_name>the pattern here)
例如,日志有一个 queue id 为一个长度为10或11个字符的十六进制值。使用下列语法可以获取该片段,并把值赋予queue_id
(?<queue_id>[0-9A-F]{10,11})
自定义 patterns 文件
或者,您可以创建自定义模式文件。
①创建一个名为patterns其中创建一个文件postfix (文件名无关紧要,随便起),在该文件中,将您需要的模式写为模式名称、一个空格,然后是该模式的正则表达式。例如:
# contents of ./patterns/postfix:POSTFIX_QUEUEID [0-9A-F]{10,11}
②然后使用这个插件中的patterns_dir设置告诉logstash目录是你的自定义模式。这是一个完整的示例的示例日志:
Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<20130101142543.5828399CCAF@mailserver14.example.com>
filter {grok {patterns_dir => ["./patterns"]match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }}}
上述内容将匹配并生成以下字段:
- timestamp: Jan 1 06:25:43
- logsource: mailserver14
- program: postfix/cleanup
- pid: 21403
- queue_id: BEF25A72965
- syslog_message: message-id=20130101142543.5828399CCAF@mailserver14.example.com
timestamp, logsource, program 和 pid 字段来自 SYSLOGBASE 模式,该模式本身由其他模式定义。
另一种选择是使用pattern_definitions在筛选器中 patterns inline。这主要是为了方便,并允许用户定义一个可以在该过滤器中使用的模式。pattern_definitions中新定义的模式在该特定 grok 滤波器之外将不可用。
Grok 过滤器配置选项
该插件支持以下配置选项以及稍后描述的通用选项。
break_on_match
- 值类型是布尔值
- 默认值为true
- 描述:match可以一次设定多组,预设会依照顺序设定处理,如果日志满足设定条件,则会终止向下处理。但有的时候我们会希望让Logstash跑完所有的设定,这时可以将break_on_match设为false。
keep_empty_captures
- 值类型是布尔值
- 默认值是 false
- 描述:如果为true,捕获失败的字段将设置为空值
match
- 值类型是数组
- 默认值是 {}
- 描述:字段⇒值匹配 ,定义了查找位置以及使用哪些模式的映射
例如,以下将匹配 message 给定模式的字段中的现有值,如果找到匹配项,则将该字段添加 duration 到具有捕获值的事件中:
filter { grok { match => { "message" => "Duration: %{NUMBER:duration}" } } }
如果你需要针对单个字段匹配多个模式,则该值可以是一组,例如:
filter {grok {match => {"message" => ["Duration: %{NUMBER:duration}","Speed: %{NUMBER:speed}"]}}}
named_captures_only
- 值类型是布尔值
- 默认值是 true
- 描述:If true, only store named captures from grok.
overwrite
- 值类型是 array
- 默认值是 []
- 描述:覆盖字段内容,这允许您覆盖已存在的字段中的值。
例如,如果字段中有一个 syslog 行,则可以使用匹配项的一部分覆盖消息字段,如下所示:
filter {grok {match => { "message" => "%{SYSLOGBASE} %{DATA:message}" }overwrite => [ "message" ]}}
- 在这种情况下,类似 May 29 16:37:11 sadness logger: hello world 这样的行将被解析,经过match属性处理后,hello world 将覆盖原始消息,message的值变成了hello world。这时如果使用了overwrite => [ “message” ]属性,那么原来的message的值将被覆盖成新值。
如果在 overwrite 中使用字段引用,则必须在 pattern 中使用字段引用。例子:
filter {grok {match => { "somefield" => "%{NUMBER} %{GREEDYDATA:[nested][field][test]}" }overwrite => [ "[nested][field][test]" ]}}
pattern_definitions
- 值类型是 hash
- 默认值是 {}
- 描述:模式名称和模式正则表达式,也是用于定义当前过滤器要使用的自定义模式。与现有名称匹配的模式将覆盖预先存在的定义。可以将其视为可用于 grok 定义的内联模式。patterns_dir是将模式写在外部。
例如:
filter {grok {patterns_dir => "/usr/local/elk/logstash/patterns"pattern_definitions => {"MYSELFTIMESTAMP" => "20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})"}match => {"message" => ["%{MYSELFTIMESTAMP:timestamp} %{JAVACLASS:message}","%{MYSELF:content}"]}}
}
patterns_dir
- 值类型是数组
- 默认值是 []
- 描述:默认情况下,Logstash 附带一堆patterns,因此,除非您要添加其他patterns,否则您不一定需要自己定义它。一些复杂的正则表达式,不适合直接写到filter中,可以指定一个文件夹,用来专门保存正则表达式的文件,请注意,Grok 将读取目录中与 patterns_files_glob 匹配的所有文件,并假定它是一个模式文件(包括任何 tilde 备份文件)。
patterns_dir => ["/opt/logstash/patterns", "/opt/logstash/extra_patterns"]
正则文件是纯文本,格式为:
#空格前是正则表达式的名称,空格后是具体的正则表达式
NAME PATTERN
例如:这是一个数字的表达式
NUMBER \d+
创建管道时加载patterns
patterns_files_glob
- 属性值的类型:string
- 默认值:“*”
- 描述:针对patterns_dir属性中指定的文件夹里哪些正则文件,可以在这个filter中生效,需要本属性来指定。默认值“*”是指所有正则文件都生效。
tag_on_failure
- 值类型是数组
- 默认值是 [“_grokparsefailure”]
- 描述:在没有成功匹配时,将值追加到标记字段
tag_on_timeout
- 值类型是字符串
- 默认值是 “_groktimeout”
- 描述:如果Grok正则表达式超时,则应用标记。
timeout_millis
- 值类型是数字
- 默认值是 30000
- 描述: 尝试在这段时间后终止正则表达式。如果应用了多个模式,则适用于每个模式,这将永远不会提前超时,但可能需要更长的时间才能超时。实际的超时时间是基于250ms量化的近似值。设置为0以禁用超时。
timeout_scope
- 值类型是字符串
- 默认值为 “pattern”
- 支持的值是 “pattern"和"event”
- 描述: 当提供多种模式以匹配时,超时一直应用于每个模式,从而为尝试的每个模式产生开销;当 grok 过滤器配置为 timeout_scope => event 时,插件会在事件的所有尝试匹配中强制执行一次超时,因此它可以实现类似的保护,以显着减少开销。
通常最好确定整个事件的超时范围。
常用选项
所有过滤器插件都支持以下配置选项
add_field
- 值类型是hash
- 默认值是 {}
- 描述:如果此筛选器成功,向此事件添加任何任意字段。可以通过%{field}动态命名field名或field的值。例如:
filter {grok {add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }}}
#你也可以一次添加多个字段
filter {grok {add_field => {"foo_%{somefield}" => "Hello world, from %{host}""new_field" => "new_static_value"}}}
如果事件具有字段"somefield" == “hello”,筛选成功时,将添加字段 foo_hello(如果存在),其中包含上面的值,并且 %{host} 部分将替换为事件中的该值。第二个示例还将添加一个硬编码字段。
add_tag
- 值类型是数组
- 默认值是 []
- 描述:如果此过滤器成功,请向该事件添加任意标签。标签可以是动态的,并使用%{field} 语法包含事件的一部分。
例如:
filter {grok {add_tag => [ "foo_%{somefield}" ]}}
#你也可以一次添加多个标签
filter {grok {add_tag => [ "foo_%{somefield}", "taggedy_tag"]}}
enable_metric
- 值类型是 boolean
- 默认值是 true
- 描述:禁用或启用度量标准。默认情况下,我们会记录所有可以记录的指标,但您可以禁用特定插件的指标收集。
id
- 值类型是字符串
- 此设置没有默认值。
- 描述:向插件实例添加唯一ID,此ID用于跟踪插件特定配置的信息,如果未指定 ID,Logstash 将生成一个 ID。强烈建议在配置中设置此 ID。当您有两个或多个相同类型的插件时,例如,如果您有2个grok过滤器,这特别有用。在这种情况下,添加命名 ID 将有助于在使用监视 API 时监视 Logstash。
例如:
filter {grok {id => "ABC"}}
id 字段中的变量替换仅支持环境变量,不支持使用secret机密存储中的值。
periodic_flush
- 值类型是布尔值
- 默认值是 false
- 描述:如果设置为ture,会定时的调用filter的更新函数(flush method)。自选。
remove_field
- 值的类型:array
- 默认值:[]
- 描述:如果此筛选器成功,请从此事件中删除任意字段。字段名称可以是动态的,并使用 %{field} 包含事件的一部分 示例:
filter {grok {remove_field => [ "foo_%{somefield}" ]}}
#你也可以一次移除多个字段:
filter {grok {remove_field => [ "foo_%{somefield}", "my_extraneous_field" ]}}
remove_tag
- 值类型是数组
- 默认值是 []
- 描述:如果此过滤器成功,请从该事件中移除任意标签。标签可以是动态的,并使用%{field} 语法包括事件的一部分。
例如:
filter {grok {remove_tag => [ "foo_%{somefield}" ]}}
#你也可以一次删除多个标签
filter {grok {remove_tag => [ "foo_%{somefield}", "sad_unwanted_tag"]}}
如果事件具有字段"somefield" == “hello”,则此筛选器在成功时将删除标记 foo_hello(如果存在)。第二个示例也会删除一个悲伤的、不需要的标签。