Editing Module:Citation/CS1/Identifiers
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 164: | Line 164: | ||
--[=[-------------------------< I S _ V A L I D _ R X I V _ D A T E > | --[=[-------------------------< I S _ V A L I D _ B I O R X I V _ D A T E >------------------------------------ | ||
returns true if: | |||
2019-12-11T00:00Z <= biorxiv_date < today + 2 days | 2019-12-11T00:00Z <= biorxiv_date < today + 2 days | ||
The dated form of biorxiv identifier has a start date of 2019-12-11. The Unix timestamp for that date is {{#time:U|2019-12-11}} = 1576022400 | The dated form of biorxiv identifier has a start date of 2019-12-11. The Unix timestamp for that date is {{#time:U|2019-12-11}} = 1576022400 | ||
biorxiv_date is the date provided in those |biorxiv= parameter values that are dated at time 00:00:00 UTC | |||
today is the current date at time 00:00:00 UTC plus 48 hours | |||
if today | if today is 2015-01-01T00:00:00 then | ||
adding 24 hours gives | adding 24 hours gives 2015-01-02T00:00:00 – one second more than today | ||
adding 24 hours gives | adding 24 hours gives 2015-01-03T00:00:00 – one second more than tomorrow | ||
This function does not work if it is fed month names for languages other than English. Wikimedia #time: parser | |||
apparently doesn't understand non-English date month names. This function will always return false when the date | |||
contains a non-English month name because good1 is false after the call to lang_object.formatDate(). To get | |||
around that call this function with date parts and create a YYYY-MM-DD format date. | |||
]=] | ]=] | ||
local function | local function is_valid_biorxiv_date (y, m, d) | ||
local biorxiv_date = table.concat ({y, m, d}, '-'); -- make ymd date | |||
local | |||
local good1, good2; | local good1, good2; | ||
local | local biorxiv_ts, tomorrow_ts; -- to hold Unix timestamps representing the dates | ||
local lang_object = mw.getContentLanguage(); | local lang_object = mw.getContentLanguage(); | ||
good1, | good1, biorxiv_ts = pcall (lang_object.formatDate, lang_object, 'U', biorxiv_date); -- convert biorxiv_date value to Unix timestamp | ||
good2, tomorrow_ts = pcall (lang_object.formatDate, lang_object, 'U', 'today + 2 days' ); -- today midnight + 2 days is one second more than all day tomorrow | good2, tomorrow_ts = pcall (lang_object.formatDate, lang_object, 'U', 'today + 2 days' ); -- today midnight + 2 days is one second more than all day tomorrow | ||
if good1 and good2 then -- lang.formatDate() returns a timestamp in the local script which tonumber() may not understand | if good1 and good2 then -- lang.formatDate() returns a timestamp in the local script which tonumber() may not understand | ||
biorxiv_ts = tonumber (biorxiv_ts) or lang_object:parseFormattedNumber (biorxiv_ts); -- convert to numbers for the comparison; | |||
tomorrow_ts = tonumber (tomorrow_ts) or lang_object:parseFormattedNumber (tomorrow_ts); | tomorrow_ts = tonumber (tomorrow_ts) or lang_object:parseFormattedNumber (tomorrow_ts); | ||
else | else | ||
Line 209: | Line 200: | ||
end | end | ||
return ((1576022400 <= biorxiv_ts) and (biorxiv_ts < tomorrow_ts)) -- 2012-12-11T00:00Z <= biorxiv_date < tomorrow's date | |||
end | end | ||
Line 261: | Line 250: | ||
--[[--------------------------< N O R M A L I Z E _ L C C N >-------------------------------------------------- | --[[--------------------------< N O R M A L I Z E _ L C C N >-------------------------------------------------- | ||
LCCN normalization ( | LCCN normalization (http://www.loc.gov/marc/lccn-namespace.html#normalization) | ||
1. Remove all blanks. | 1. Remove all blanks. | ||
2. If there is a forward slash (/) in the string, remove it, and remove all characters to the right of the forward slash. | 2. If there is a forward slash (/) in the string, remove it, and remove all characters to the right of the forward slash. | ||
Line 298: | Line 287: | ||
--[[--------------------------< A R X I V >-------------------------------------------------------------------- | --[[--------------------------< A R X I V >-------------------------------------------------------------------- | ||
See: | See: http://arxiv.org/help/arxiv_identifier | ||
format and error check arXiv identifier. There are three valid forms of the identifier: | format and error check arXiv identifier. There are three valid forms of the identifier: | ||
Line 378: | Line 367: | ||
if is_set (class) then | if is_set (class) then | ||
if id:match ('^%d+') then | if id:match ('^%d+') then | ||
text = table.concat ({text, ' [[ | text = table.concat ({text, ' [[//arxiv.org/archive/', class, ' ', class, ']]'}); -- external link within square brackets, not wikilink | ||
else | else | ||
set_message ('err_class_ignored'); | set_message ('err_class_ignored'); | ||
Line 392: | Line 381: | ||
Validates (sort of) and formats a bibcode ID. | Validates (sort of) and formats a bibcode ID. | ||
Format for bibcodes is specified here: | Format for bibcodes is specified here: http://adsabs.harvard.edu/abs_doc/help_pages/data.html#bibcodes | ||
But, this: 2015arXiv151206696F is apparently valid so apparently, the only things that really matter are length, 19 characters | But, this: 2015arXiv151206696F is apparently valid so apparently, the only things that really matter are length, 19 characters | ||
Line 410: | Line 399: | ||
local access = options.access; | local access = options.access; | ||
local handler = options.handler; | local handler = options.handler; | ||
local err_type; | local err_type; | ||
local err_msg = ''; | local err_msg = ''; | ||
Line 433: | Line 421: | ||
if id:find('&%.') then | if id:find('&%.') then | ||
err_type = cfg.err_msg_supl.journal; -- journal abbreviation must not have '&.' (if it does it's missing a letter) | err_type = cfg.err_msg_supl.journal; -- journal abbreviation must not have '&.' (if it does it's missing a letter) | ||
end | end | ||
end | end | ||
end | end | ||
if is_set (err_type) | if is_set (err_type) then -- if there was an error detected | ||
set_message ('err_bad_bibcode', {err_type}); | set_message ('err_bad_bibcode', {err_type}); | ||
options.coins_list_t['BIBCODE'] = nil; -- when error, unset so not included in COinS | options.coins_list_t['BIBCODE'] = nil; -- when error, unset so not included in COinS | ||
end | end | ||
Line 470: | Line 456: | ||
local patterns = { | local patterns = { | ||
'^10 | '^10.1101/%d%d%d%d%d%d$', -- simple 6-digit identifier (before 2019-12-11) | ||
'^10 | '^10.1101/(20[1-9]%d)%.([01]%d)%.([0-3]%d)%.%d%d%d%d%d%dv%d+$', -- y.m.d. date + 6-digit identifier + version (after 2019-12-11) | ||
'^10 | '^10.1101/(20[1-9]%d)%.([01]%d)%.([0-3]%d)%.%d%d%d%d%d%d$', -- y.m.d. date + 6-digit identifier (after 2019-12-11) | ||
} | } | ||
Line 480: | Line 466: | ||
if m then -- m is nil when id is the six-digit form | if m then -- m is nil when id is the six-digit form | ||
if not | if not is_valid_biorxiv_date (y, m, d) then -- validate the encoded date; TODO: don't ignore leap-year and actual month lengths ({{#time:}} is a poor date validator) | ||
break; -- date fail; break out early so we don't unset the error message | break; -- date fail; break out early so we don't unset the error message | ||
end | end | ||
Line 541: | Line 527: | ||
and terminal punctuation may not be technically correct but it appears, that in practice these characters are rarely | and terminal punctuation may not be technically correct but it appears, that in practice these characters are rarely | ||
if ever used in DOI names. | if ever used in DOI names. | ||
]] | ]] | ||
Line 557: | Line 540: | ||
local text; | local text; | ||
if is_set (inactive) then | if is_set (inactive) then | ||
local inactive_year = inactive:match("%d%d%d%d"); | local inactive_year = inactive:match("%d%d%d%d") or ''; -- try to get the year portion from the inactive date | ||
local inactive_month, good; | local inactive_month, good; | ||
Line 568: | Line 551: | ||
end | end | ||
end | end | ||
else | |||
inactive_year = nil; -- |doi-broken-date= has something but it isn't a date | |||
end | |||
if is_set (inactive_year) and is_set (inactive_month) then | if is_set (inactive_year) and is_set (inactive_month) then | ||
Line 583: | Line 568: | ||
local registrant_err_patterns = { -- these patterns are for code ranges that are not supported | local registrant_err_patterns = { -- these patterns are for code ranges that are not supported | ||
'^[^1-3]%d%d%d%d%.%d | '^[^1-3]%d%d%d%d%.%d%d*$', -- 5 digits with subcode (0xxxx, 40000+); accepts: 10000–39999 | ||
'^[^1- | '^[^1-5]%d%d%d%d$', -- 5 digits without subcode (0xxxx, 60000+); accepts: 10000–59999 | ||
'^[^1-9]%d%d%d%.%d | '^[^1-9]%d%d%d%.%d%d*$', -- 4 digits with subcode (0xxx); accepts: 1000–9999 | ||
'^[^1-9]%d%d%d$', -- 4 digits without subcode (0xxx); accepts: 1000–9999 | '^[^1-9]%d%d%d$', -- 4 digits without subcode (0xxx); accepts: 1000–9999 | ||
'^%d%d%d%d%d%d+', -- 6 or more digits | '^%d%d%d%d%d%d+', -- 6 or more digits | ||
'^%d%d?%d?$', -- less than 4 digits without subcode ( | '^%d%d?%d?$', -- less than 4 digits without subcode (with subcode is legitimate) | ||
'^5555$', -- test registrant will never resolve | '^5555$', -- test registrant will never resolve | ||
'[^%d%.]', -- any character that isn't a digit or a dot | '[^%d%.]', -- any character that isn't a digit or a dot | ||
Line 637: | Line 621: | ||
if ever used in HDLs. | if ever used in HDLs. | ||
Query string parameters are named here: | Query string parameters are named here: http://www.handle.net/proxy_servlet.html. query strings are not displayed | ||
but since '?' is an allowed character in an HDL, '?' followed by one of the query parameters is the only way we | but since '?' is an allowed character in an HDL, '?' followed by one of the query parameters is the only way we | ||
have to detect the query string so that it isn't URL-encoded with the rest of the identifier. | have to detect the query string so that it isn't URL-encoded with the rest of the identifier. | ||
Line 647: | Line 631: | ||
local access = options.access; | local access = options.access; | ||
local handler = options.handler; | local handler = options.handler; | ||
local query_params = { -- list of known query parameters from | local query_params = { -- list of known query parameters from http://www.handle.net/proxy_servlet.html | ||
'noredirect', | 'noredirect', | ||
'ignore_aliases', | 'ignore_aliases', | ||
Line 816: | Line 800: | ||
Determines whether an ISMN string is valid. Similar to ISBN-13, ISMN is 13 digits beginning 979-0-... and uses the | Determines whether an ISMN string is valid. Similar to ISBN-13, ISMN is 13 digits beginning 979-0-... and uses the | ||
same check digit calculations. See | same check digit calculations. See http://www.ismn-international.org/download/Web_ISMN_Users_Manual_2008-6.pdf | ||
section 2, pages 9–12. | section 2, pages 9–12. | ||
Line 865: | Line 849: | ||
like this: | like this: | ||
|issn=0819 4327 gives: [ | |issn=0819 4327 gives: [http://www.worldcat.org/issn/0819 4327 0819 4327] -- can't have spaces in an external link | ||
This code now prevents that by inserting a hyphen at the ISSN midpoint. It also validates the ISSN for length | This code now prevents that by inserting a hyphen at the ISSN midpoint. It also validates the ISSN for length | ||
Line 969: | Line 953: | ||
Format LCCN link and do simple error checking. LCCN is a character string 8-12 characters long. The length of | Format LCCN link and do simple error checking. LCCN is a character string 8-12 characters long. The length of | ||
the LCCN dictates the character type of the first 1-3 characters; the rightmost eight are always digits. | the LCCN dictates the character type of the first 1-3 characters; the rightmost eight are always digits. | ||
http://info-uri.info/registry/OAIHandler?verb=GetRecord&metadataPrefix=reg&identifier=info:lccn/ | |||
length = 8 then all digits | length = 8 then all digits | ||
Line 1,024: | Line 1,008: | ||
return external_link_id ({link = handler.link, label = handler.label, q = handler.q, redirect = handler.redirect, | return external_link_id ({link = handler.link, label = handler.label, q = handler.q, redirect = handler.redirect, | ||
prefix = handler.prefix, id = lccn, separator = handler.separator, encode = handler.encode}); | prefix = handler.prefix, id = lccn, separator = handler.separator, encode = handler.encode}); | ||
end | end | ||
Line 1,131: | Line 1,069: | ||
elseif id:match('^%d+$') then -- no prefix | elseif id:match('^%d+$') then -- no prefix | ||
number = id; -- get the number | number = id; -- get the number | ||
if | if 10 < number:len() then | ||
number = nil; -- | number = nil; -- constrain to 1 to 10 digits; change this when OCLC issues 11-digit numbers | ||
end | end | ||
end | end | ||
Line 1,593: | Line 1,531: | ||
['JSTOR'] = jstor, | ['JSTOR'] = jstor, | ||
['LCCN'] = lccn, | ['LCCN'] = lccn, | ||
['MR'] = mr, | ['MR'] = mr, | ||
['OCLC'] = oclc, | ['OCLC'] = oclc, |