Editing Module:Citation/CS1/Identifiers

Jump to navigation Jump to search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 164: Line 164:




--[=[-------------------------< I S _ V A L I D _ R X I V _ D A T E >------------------------------------------
--[=[-------------------------< I S _ V A L I D _ B I O R X I V _ D A T E >------------------------------------


for biorxiv, returns true if:
returns true if:
2019-12-11T00:00Z <= biorxiv_date < today + 2 days
2019-12-11T00:00Z <= biorxiv_date < today + 2 days
for medrxiv, returns true if:
2020-01-01T00:00Z <= medrxiv_date < today + 2 days
The dated form of biorxiv identifier has a start date of 2019-12-11.  The Unix timestamp for that date is {{#time:U|2019-12-11}} = 1576022400
The dated form of biorxiv identifier has a start date of 2019-12-11.  The Unix timestamp for that date is {{#time:U|2019-12-11}} = 1576022400
The medrxiv identifier has a start date of 2020-01-01.  The Unix timestamp for that date is {{#time:U|2020-01-01}} = 1577836800


<rxiv_date> is the date provided in those |biorxiv= parameter values that are dated and in |medrxiv= parameter values at time 00:00:00 UTC
biorxiv_date is the date provided in those |biorxiv= parameter values that are dated at time 00:00:00 UTC
<today> is the current date at time 00:00:00 UTC plus 48 hours
today is the current date at time 00:00:00 UTC plus 48 hours
if today's date is 2023-01-01T00:00:00 then
if today is 2015-01-01T00:00:00 then
adding 24 hours gives 2023-01-02T00:00:00 – one second more than today
adding 24 hours gives 2015-01-02T00:00:00 – one second more than today
adding 24 hours gives 2023-01-03T00:00:00 – one second more than tomorrow
adding 24 hours gives 2015-01-03T00:00:00 – one second more than tomorrow


inputs:
This function does not work if it is fed month names for languages other than English.  Wikimedia #time: parser
<y>, <m>, <d> – year, month, day parts of the date from the birxiv or medrxiv identifier
apparently doesn't understand non-English date month names. This function will always return false when the date
<select> 'b' for biorxiv, 'm' for medrxiv; defaults to 'b'
contains a non-English month name because good1 is false after the call to lang_object.formatDate().  To get
around that call this function with date parts and create a YYYY-MM-DD format date.


]=]
]=]


local function is_valid_rxiv_date (y, m, d, select)
local function is_valid_biorxiv_date (y, m, d)
if 0 == tonumber (m) and 12 < tonumber (m) then -- <m> must be a number 1–12
local biorxiv_date = table.concat ({y, m, d}, '-'); -- make ymd date
return false;
end
if 0 == tonumber (d) and 31 < tonumber (d) then -- <d> must be a number 1–31; TODO: account for month length and leap yer?
return false;
end
local rxiv_date = table.concat ({y, m, d}, '-'); -- make ymd date string
local good1, good2;
local good1, good2;
local rxiv_ts, tomorrow_ts; -- to hold Unix timestamps representing the dates
local biorxiv_ts, tomorrow_ts; -- to hold Unix timestamps representing the dates
local lang_object = mw.getContentLanguage();
local lang_object = mw.getContentLanguage();


good1, rxiv_ts = pcall (lang_object.formatDate, lang_object, 'U', rxiv_date); -- convert rxiv_date value to Unix timestamp  
good1, biorxiv_ts = pcall (lang_object.formatDate, lang_object, 'U', biorxiv_date); -- convert biorxiv_date value to Unix timestamp  
good2, tomorrow_ts = pcall (lang_object.formatDate, lang_object, 'U', 'today + 2 days' ); -- today midnight + 2 days is one second more than all day tomorrow
good2, tomorrow_ts = pcall (lang_object.formatDate, lang_object, 'U', 'today + 2 days' ); -- today midnight + 2 days is one second more than all day tomorrow
if good1 and good2 then -- lang.formatDate() returns a timestamp in the local script which tonumber() may not understand
if good1 and good2 then -- lang.formatDate() returns a timestamp in the local script which tonumber() may not understand
rxiv_ts = tonumber (rxiv_ts) or lang_object:parseFormattedNumber (rxiv_ts); -- convert to numbers for the comparison;
biorxiv_ts = tonumber (biorxiv_ts) or lang_object:parseFormattedNumber (biorxiv_ts); -- convert to numbers for the comparison;
tomorrow_ts = tonumber (tomorrow_ts) or lang_object:parseFormattedNumber (tomorrow_ts);
tomorrow_ts = tonumber (tomorrow_ts) or lang_object:parseFormattedNumber (tomorrow_ts);
else
else
Line 209: Line 200:
end
end


local limit_ts = ((select and ('m' == select)) and 1577836800) or 1576022400; -- choose the appropriate limit timesatmp
return ((1576022400 <= biorxiv_ts) and (biorxiv_ts < tomorrow_ts)) -- 2012-12-11T00:00Z <= biorxiv_date < tomorrow's date
 
return ((limit_ts <= rxiv_ts) and (rxiv_ts < tomorrow_ts)) -- limit_ts <= rxiv_date < tomorrow's date
end
end


Line 261: Line 250:
--[[--------------------------< N O R M A L I Z E _ L C C N >--------------------------------------------------
--[[--------------------------< N O R M A L I Z E _ L C C N >--------------------------------------------------


LCCN normalization (https://www.loc.gov/marc/lccn-namespace.html#normalization)
LCCN normalization (http://www.loc.gov/marc/lccn-namespace.html#normalization)
1. Remove all blanks.
1. Remove all blanks.
2. If there is a forward slash (/) in the string, remove it, and remove all characters to the right of the forward slash.
2. If there is a forward slash (/) in the string, remove it, and remove all characters to the right of the forward slash.
Line 298: Line 287:
--[[--------------------------< A R X I V >--------------------------------------------------------------------
--[[--------------------------< A R X I V >--------------------------------------------------------------------


See: https://arxiv.org/help/arxiv_identifier
See: http://arxiv.org/help/arxiv_identifier


format and error check arXiv identifier.  There are three valid forms of the identifier:
format and error check arXiv identifier.  There are three valid forms of the identifier:
Line 378: Line 367:
if is_set (class) then
if is_set (class) then
if id:match ('^%d+') then
if id:match ('^%d+') then
text = table.concat ({text, ' [[https://arxiv.org/archive/', class, ' ', class, ']]'}); -- external link within square brackets, not wikilink
text = table.concat ({text, ' [[//arxiv.org/archive/', class, ' ', class, ']]'}); -- external link within square brackets, not wikilink
else
else
set_message ('err_class_ignored');
set_message ('err_class_ignored');
Line 392: Line 381:
Validates (sort of) and formats a bibcode ID.
Validates (sort of) and formats a bibcode ID.


Format for bibcodes is specified here: https://adsabs.harvard.edu/abs_doc/help_pages/data.html#bibcodes
Format for bibcodes is specified here: http://adsabs.harvard.edu/abs_doc/help_pages/data.html#bibcodes


But, this: 2015arXiv151206696F is apparently valid so apparently, the only things that really matter are length, 19 characters
But, this: 2015arXiv151206696F is apparently valid so apparently, the only things that really matter are length, 19 characters
Line 410: Line 399:
local access = options.access;
local access = options.access;
local handler = options.handler;
local handler = options.handler;
local ignore_invalid = options.accept;
local err_type;
local err_type;
local err_msg = '';
local err_msg = '';
Line 433: Line 421:
if id:find('&%.') then
if id:find('&%.') then
err_type = cfg.err_msg_supl.journal; -- journal abbreviation must not have '&.' (if it does it's missing a letter)
err_type = cfg.err_msg_supl.journal; -- journal abbreviation must not have '&.' (if it does it's missing a letter)
end
if id:match ('.........%.tmp%.') then -- temporary bibcodes when positions 10–14 are '.tmp.'
set_message ('maint_bibcode');
end
end
end
end
end
end


if is_set (err_type) and not ignore_invalid then -- if there was an error detected and accept-as-written markup not used
if is_set (err_type) then -- if there was an error detected
set_message ('err_bad_bibcode', {err_type});
set_message ('err_bad_bibcode', {err_type});
options.coins_list_t['BIBCODE'] = nil; -- when error, unset so not included in COinS
options.coins_list_t['BIBCODE'] = nil; -- when error, unset so not included in COinS
end
end


Line 470: Line 456:
local patterns = {
local patterns = {
'^10%.1101/%d%d%d%d%d%d$', -- simple 6-digit identifier (before 2019-12-11)
'^10.1101/%d%d%d%d%d%d$', -- simple 6-digit identifier (before 2019-12-11)
'^10%.1101/(20%d%d)%.(%d%d)%.(%d%d)%.%d%d%d%d%d%dv%d+$', -- y.m.d. date + 6-digit identifier + version (after 2019-12-11)
'^10.1101/(20[1-9]%d)%.([01]%d)%.([0-3]%d)%.%d%d%d%d%d%dv%d+$', -- y.m.d. date + 6-digit identifier + version (after 2019-12-11)
'^10%.1101/(20%d%d)%.(%d%d)%.(%d%d)%.%d%d%d%d%d%d$', -- y.m.d. date + 6-digit identifier (after 2019-12-11)
'^10.1101/(20[1-9]%d)%.([01]%d)%.([0-3]%d)%.%d%d%d%d%d%d$', -- y.m.d. date + 6-digit identifier (after 2019-12-11)
}
}
Line 480: Line 466:


if m then -- m is nil when id is the six-digit form
if m then -- m is nil when id is the six-digit form
if not is_valid_rxiv_date (y, m, d, 'b') then -- validate the encoded date; 'b' for biorxiv limit
if not is_valid_biorxiv_date (y, m, d) then -- validate the encoded date; TODO: don't ignore leap-year and actual month lengths ({{#time:}} is a poor date validator)
break; -- date fail; break out early so we don't unset the error message
break; -- date fail; break out early so we don't unset the error message
end
end
Line 541: Line 527:
and terminal punctuation may not be technically correct but it appears, that in practice these characters are rarely
and terminal punctuation may not be technically correct but it appears, that in practice these characters are rarely
if ever used in DOI names.
if ever used in DOI names.
https://www.doi.org/doi_handbook/2_Numbering.html -- 2.2 Syntax of a DOI name
https://www.doi.org/doi_handbook/2_Numbering.html#2.2.2 -- 2.2.2 DOI prefix


]]
]]
Line 557: Line 540:
local text;
local text;
if is_set (inactive) then
if is_set (inactive) then
local inactive_year = inactive:match("%d%d%d%d"); -- try to get the year portion from the inactive date
local inactive_year = inactive:match("%d%d%d%d") or ''; -- try to get the year portion from the inactive date
local inactive_month, good;
local inactive_month, good;


Line 568: Line 551:
end
end
end
end
end -- otherwise, |doi-broken-date= has something but it isn't a date
else
inactive_year = nil; -- |doi-broken-date= has something but it isn't a date
end
if is_set (inactive_year) and is_set (inactive_month) then
if is_set (inactive_year) and is_set (inactive_month) then
Line 583: Line 568:


local registrant_err_patterns = { -- these patterns are for code ranges that are not supported  
local registrant_err_patterns = { -- these patterns are for code ranges that are not supported  
'^[^1-3]%d%d%d%d%.%d+$', -- 5 digits with subcode (0xxxx, 40000+); accepts: 10000–39999
'^[^1-3]%d%d%d%d%.%d%d*$', -- 5 digits with subcode (0xxxx, 40000+); accepts: 10000–39999
'^[^1-6]%d%d%d%d$', -- 5 digits without subcode (0xxxx, 60000+); accepts: 10000–69999
'^[^1-5]%d%d%d%d$', -- 5 digits without subcode (0xxxx, 60000+); accepts: 10000–59999
'^[^1-9]%d%d%d%.%d+$', -- 4 digits with subcode (0xxx); accepts: 1000–9999
'^[^1-9]%d%d%d%.%d%d*$', -- 4 digits with subcode (0xxx); accepts: 1000–9999
'^[^1-9]%d%d%d$', -- 4 digits without subcode (0xxx); accepts: 1000–9999
'^[^1-9]%d%d%d$', -- 4 digits without subcode (0xxx); accepts: 1000–9999
'^%d%d%d%d%d%d+', -- 6 or more digits
'^%d%d%d%d%d%d+', -- 6 or more digits
'^%d%d?%d?$', -- less than 4 digits without subcode (3 digits with subcode is legitimate)
'^%d%d?%d?$', -- less than 4 digits without subcode (with subcode is legitimate)
'^%d%d?%.[%d%.]+', -- 1 or 2 digits with subcode
'^5555$', -- test registrant will never resolve
'^5555$', -- test registrant will never resolve
'[^%d%.]', -- any character that isn't a digit or a dot
'[^%d%.]', -- any character that isn't a digit or a dot
Line 637: Line 621:
if ever used in HDLs.
if ever used in HDLs.


Query string parameters are named here: https://www.handle.net/proxy_servlet.html.  query strings are not displayed
Query string parameters are named here: http://www.handle.net/proxy_servlet.html.  query strings are not displayed
but since '?' is an allowed character in an HDL, '?' followed by one of the query parameters is the only way we
but since '?' is an allowed character in an HDL, '?' followed by one of the query parameters is the only way we
have to detect the query string so that it isn't URL-encoded with the rest of the identifier.
have to detect the query string so that it isn't URL-encoded with the rest of the identifier.
Line 647: Line 631:
local access = options.access;
local access = options.access;
local handler = options.handler;
local handler = options.handler;
local query_params = { -- list of known query parameters from https://www.handle.net/proxy_servlet.html
local query_params = { -- list of known query parameters from http://www.handle.net/proxy_servlet.html
'noredirect',
'noredirect',
'ignore_aliases',
'ignore_aliases',
Line 816: Line 800:


Determines whether an ISMN string is valid.  Similar to ISBN-13, ISMN is 13 digits beginning 979-0-... and uses the
Determines whether an ISMN string is valid.  Similar to ISBN-13, ISMN is 13 digits beginning 979-0-... and uses the
same check digit calculations.  See https://www.ismn-international.org/download/Web_ISMN_Users_Manual_2008-6.pdf
same check digit calculations.  See http://www.ismn-international.org/download/Web_ISMN_Users_Manual_2008-6.pdf
section 2, pages 9–12.
section 2, pages 9–12.


Line 865: Line 849:
like this:
like this:


|issn=0819 4327 gives: [https://www.worldcat.org/issn/0819 4327 0819 4327] -- can't have spaces in an external link
|issn=0819 4327 gives: [http://www.worldcat.org/issn/0819 4327 0819 4327] -- can't have spaces in an external link
This code now prevents that by inserting a hyphen at the ISSN midpoint.  It also validates the ISSN for length
This code now prevents that by inserting a hyphen at the ISSN midpoint.  It also validates the ISSN for length
Line 969: Line 953:
Format LCCN link and do simple error checking.  LCCN is a character string 8-12 characters long. The length of
Format LCCN link and do simple error checking.  LCCN is a character string 8-12 characters long. The length of
the LCCN dictates the character type of the first 1-3 characters; the rightmost eight are always digits.
the LCCN dictates the character type of the first 1-3 characters; the rightmost eight are always digits.
https://oclc-research.github.io/infoURI-Frozen/info-uri.info/info:lccn/reg.html
http://info-uri.info/registry/OAIHandler?verb=GetRecord&metadataPrefix=reg&identifier=info:lccn/


length = 8 then all digits
length = 8 then all digits
Line 1,024: Line 1,008:
return external_link_id ({link = handler.link, label = handler.label, q = handler.q, redirect = handler.redirect,
return external_link_id ({link = handler.link, label = handler.label, q = handler.q, redirect = handler.redirect,
prefix = handler.prefix, id = lccn, separator = handler.separator, encode = handler.encode});
prefix = handler.prefix, id = lccn, separator = handler.separator, encode = handler.encode});
end
--[[--------------------------< M E D R X I V >-----------------------------------------------------------------
Format medRxiv ID and do simple error checking.  Similar to later bioRxiv IDs, medRxiv IDs are prefixed with a
yyyy.mm.dd. date and suffixed with an optional version identifier.  Ealiest date accepted is 2020.01.01
The medRxiv ID is a date followed by an eight-digit number followed by an optional version indicator 'v' and one or more digits:
https://www.medrxiv.org/content/10.1101/2020.11.16.20232009v2 -> 10.1101/2020.11.16.20232009v2
]]
local function medrxiv (options)
local id = options.id;
local handler = options.handler;
local err_msg_flag = true; -- flag; assume that there will be an error
local patterns = {
'%d%d%d%d%d%d%d%d$', -- simple 8-digit identifier; these should be relatively rare
'^10%.1101/(20%d%d)%.(%d%d)%.(%d%d)%.%d%d%d%d%d%d%d%dv%d+$', -- y.m.d. date + 8-digit identifier + version (2020-01-01 and later)
'^10%.1101/(20%d%d)%.(%d%d)%.(%d%d)%.%d%d%d%d%d%d%d%d$', -- y.m.d. date + 8-digit identifier (2020-01-01 and later)
}
for _, pattern in ipairs (patterns) do -- spin through the patterns looking for a match
if id:match (pattern) then
local y, m, d = id:match (pattern); -- found a match, attempt to get year, month and date from the identifier
if m then -- m is nil when id is the 8-digit form
if not is_valid_rxiv_date (y, m, d, 'b') then -- validate the encoded date; 'b' for medrxiv limit
break; -- date fail; break out early so we don't unset the error message
end
end
err_msg_flag = nil; -- we found a match so unset the error message
break; -- and done
end
end -- <err_msg_flag> remains set here when no match
if err_msg_flag then
options.coins_list_t['MEDRXIV'] = nil; -- when error, unset so not included in COinS
set_message ('err_bad_medrxiv'); -- and set the error message
end
return external_link_id ({link = handler.link, label = handler.label, q = handler.q, redirect = handler.redirect,
prefix = handler.prefix, id = id, separator = handler.separator,
encode = handler.encode, access = handler.access});
end
end


Line 1,131: Line 1,069:
elseif id:match('^%d+$') then -- no prefix
elseif id:match('^%d+$') then -- no prefix
number = id; -- get the number
number = id; -- get the number
if tonumber (id) > handler.id_limit then
if 10 < number:len() then
number = nil; -- unset when id value exceeds the limit
number = nil; -- constrain to 1 to 10 digits; change this when OCLC issues 11-digit numbers
end
end
end
end
Line 1,593: Line 1,531:
['JSTOR'] = jstor,
['JSTOR'] = jstor,
['LCCN'] = lccn,
['LCCN'] = lccn,
['MEDRXIV'] = medrxiv,
['MR'] = mr,
['MR'] = mr,
['OCLC'] = oclc,
['OCLC'] = oclc,

Please note that all contributions to Timeline of History may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Timeline of History:Copyrights for details). Do not submit copyrighted work without permission!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)

Template used on this page: