Discussion:
[ocaml-platform] on the need and design of OCaml namespaces
Gabriel Scherer
2013-02-21 13:08:29 UTC
Permalink
Hi,

Since about a year now, there has been a intermittent discussion ongoing on
the idea of introducing "namespaces" to the OCaml language. The basis for
discussion are some pain points of the current behavior of the current
implementation (all modules live in a flat space that is defined by the
search-path (-I command) and not very resilient to change), but there have
been fairly different ideas about how to best solve those problems, or even
what "namespace" means.
The current flat-module-space has mostly been felt by distributors of
largeish codebases designed not to be used in isolation, but required by
users code, in particular set of libraries for OCaml (in particular
JaneStreet Core or Batteries, and possibly components from the future OCaml
platform).

I have worked on these issues last year with Didier Remy, and also Fabrice
Le Fessant and Nicolas Pouillard, and made an presentation at (the informal
part of) the last meeting the Caml Consortium. Here are a few documents we
we have written in the process:
- a design document: http://gallium.inria.fr/~scherer/namespaces/spec.pdf
- the slides of the talk:
http://gallium.inria.fr/~scherer/namespaces/consortium-talk-2012.pdf

While I think the core problems with the current compilation unit lookup
system are rather consensual, there is little agreement on what a
reasonable extension for the language or implementation would be, or if it
is even needed at all. The documents above take an intentionally "rich"
approach of the question, presenting a formal framework and language
designed to be rather expressive. It would be desirable to isolate a
simpler feature set that would cover the practical needs, but this needs to
a careful examination of the use cases, etc. I think people working or
interested in the OCaml Platform may have interesting inputs on the
problems and use cases at hand.

(A good example of the design trade-off involved is: who should have the
responsibility of choosing the names by which the OCaml user refers to
modules / software components on her system, the user itself or the
component provider? Letting users name things adds flexibility but also
complexity. Having a global shared namespace (eg. opam) makes the overall
design simpler, but also may also forbid some potentially interesting use
cases, such as users keeping version-pinned or modified versions of their
dependencies locally, and being able to explicitly refer either to the
standard version or the local version of a package. Do we want to forbid
that? Allow to pick one choice or the other in the global build system of
the developer's project? Let the developer use and link both standard and
local versions at the same time in a program, to compare them and make
tests, or if one dependency uses the standard version, and the other the
local one? Besides design trade-offs, there are also underlying
implementation questions, as the current OCaml linker has a rather
inflexible semantics in this regard.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/platform/attachments/20130221/38f72438/attachment.html>
Leo White
2013-02-21 15:36:26 UTC
Permalink
Post by Gabriel Scherer
Since about a year now, there has been a intermittent discussion ongoing
on the idea of introducing "namespaces" to the OCaml language[...], but
there have been fairly different ideas about how to best solve those
problems, or even what "namespace" means.
As I see it what we *need* from namespaces is fairly simple:

Developers must be able to give their components long (hierarchical)
names without changing the component's filename.

This allows components with the same filename to coexist within the search
path. It also allows these components to be grouped together without
packing them into a single module.

Any other features, such as allowing users to use multiple versions of a
component or automatically assigning long names to components based on
their position within the filesystem should be considered superfluous and
unnecessary for an initial implementation.


In practical terms, what we need (based on Fabrice's "namespaces" branch of
the OCaml source tree) is to be able to start a file with a syntax like:

in Core.Std

This path is then included in the .cmi file and other compiled files. Then,
when a user writes "Core.Std.List", lookup proceeds as follows:

1. Look for a module called Core in the current local environment.

2. Look for a file "core.cmi" in the search path that is not attached to a
namespace.

3. Look for a file "std.cmi" in the search path that is attached to the
"Core" namespace.

4. Look for a file "list.cmi" in the search path that is attached to the
"Core.Std" namespace.

This lookup scheme could be simplified by, as Gabriel has suggested, using
a different separator for namespaces (e.g. Core#Std#List). Personally, I
don't have a strong opinion either way. A new separator is less ambiguous,
but it is one more piece of syntax for beginners to learn.


Other simple features that would be useful include:

- Opening namespaces ("open Core.Std")

- Aliasing namespaces ("open Core.Std as CS")

- Attaching a component to multiple namespaces ("in Core.Std and
Core.Containers")

- A command-line option alternative to the "in" syntax.

- A command-line option to pre-open namespaces.

Regards,

Leo
Gabriel Scherer
2013-02-21 15:56:13 UTC
Permalink
How would one specify which search path is associated to a given namespace
path (eg. Core.Std)? Is it easy to integrate into ocamlfind?
Post by Gabriel Scherer
Since about a year now, there has been a intermittent discussion ongoing
Post by Gabriel Scherer
on the idea of introducing "namespaces" to the OCaml language[...], but
there have been fairly different ideas about how to best solve those
problems, or even what "namespace" means.
Developers must be able to give their components long (hierarchical)
names without changing the component's filename.
This allows components with the same filename to coexist within the search
path. It also allows these components to be grouped together without
packing them into a single module.
Any other features, such as allowing users to use multiple versions of a
component or automatically assigning long names to components based on
their position within the filesystem should be considered superfluous and
unnecessary for an initial implementation.
In practical terms, what we need (based on Fabrice's "namespaces" branch
in Core.Std
This path is then included in the .cmi file and other compiled files.
1. Look for a module called Core in the current local environment.
2. Look for a file "core.cmi" in the search path that is not attached to a
namespace.
3. Look for a file "std.cmi" in the search path that is attached to the
"Core" namespace.
4. Look for a file "list.cmi" in the search path that is attached to the
"Core.Std" namespace.
This lookup scheme could be simplified by, as Gabriel has suggested, using
a different separator for namespaces (e.g. Core#Std#List). Personally, I
don't have a strong opinion either way. A new separator is less ambiguous,
but it is one more piece of syntax for beginners to learn.
- Opening namespaces ("open Core.Std")
- Aliasing namespaces ("open Core.Std as CS")
- Attaching a component to multiple namespaces ("in Core.Std and
Core.Containers")
- A command-line option alternative to the "in" syntax.
- A command-line option to pre-open namespaces.
Regards,
Leo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/platform/attachments/20130221/2f083c0b/attachment.html>
Leo White
2013-02-21 16:01:58 UTC
Permalink
Post by Gabriel Scherer
How would one specify which search path is associated to a given namespace
path (eg. Core.Std)? Is it easy to integrate into ocamlfind?
.cmi files would be looked up exactly as they are now, using the search
path specified with -I options. The only difference is that when it finds a
"list.cmi" file it checks if that file is attached to "Core.Std". If the
file is attached to that namespace then it is used, otherwise the compiler
keeps looking in the search path for more "list.cmi" files.
Gabriel Scherer
2013-02-21 16:12:16 UTC
Permalink
So in this case you would have to look for list.cmi, std.cmi then core.cmi
(if you don't know which are namespaces, and which are actual compilation
unit names).

One problem with this proposal is that the compiler has no knowledge of the
set of "existing" namespaces. This combines very badly with the
module/namespace syntactic ambiguity: when you write "open Lsit" (List,
with a typo), the compiler will silently accept the opening of the Lsit
namespace. I formalized this semantics in an earlier proposal, but Fabrice
noticed that this was quite bad from an user interface point of view, and
further proposals used a model with "existing" namespaces and
"non-existing" namespaces -- in the current proposal linked above, the
compiler consults an explicit hierarchical mapping.
(Removing the syntactic ambiguity makes this slightly less of a problem,
but it's still a pain to not be warned of namespace typos.)
Post by Gabriel Scherer
How would one specify which search path is associated to a given namespace
Post by Gabriel Scherer
path (eg. Core.Std)? Is it easy to integrate into ocamlfind?
.cmi files would be looked up exactly as they are now, using the search
path specified with -I options. The only difference is that when it finds a
"list.cmi" file it checks if that file is attached to "Core.Std". If the
file is attached to that namespace then it is used, otherwise the compiler
keeps looking in the search path for more "list.cmi" files.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/platform/attachments/20130221/74ec3e7e/attachment-0001.html>
Leo White
2013-02-21 17:00:25 UTC
Permalink
Post by Gabriel Scherer
So in this case you would have to look for list.cmi, std.cmi then core.cmi
(if you don't know which are namespaces, and which are actual compilation
unit names).
I would probably look for core.cmi then std.cmi and then list.cmi, but
basically yes.
Post by Gabriel Scherer
One problem with this proposal is that the compiler has no knowledge of the
set of "existing" namespaces. This combines very badly with the
module/namespace syntactic ambiguity: when you write "open Lsit" (List,
with a typo), the compiler will silently accept the opening of the Lsit
namespace.
[...]
(Removing the syntactic ambiguity makes this slightly less of a problem,
but it's still a pain to not be warned of namespace typos.)
Catching this error (without syntactic ambiguity) should not actually be
that hard in practise.

Consider something like "open Core#Sdt". This typo prevents us from finding
an appropriate .cmi file, because the namespace is Core#Std rather than
Core#Sdt. The unbound module error could include the fact that a .cmi file
was found but that its namespace was not open.

The compiler could even locate the unused "open Core#Sdt" and use the new
spell-checking code to suggest that it might be the cause of the error.

There is obviously still a risk that the typo in the open statement will
not cause an error because there is another .cmi with the same name whose
namespace is open. However, the "open Core#Sdt" instruction should still
raise an "unused open" warning.

Overall, I don't think that it is worth forcing users to pre-declare all
possible namespaces, just to avoid a slightly confusing error message in
the case of a typo.

Even if you decide that you need to pre-declare all namespaces it should
not be the responsibility of the user. It would be better to create special
".cmi" files to represent the namespaces. These would not contain any
information other than the name of their parent namespace, they would
simply exist to show that the namespace existed. This might also make it
easier to have namespaces and modules use the same seperator.

Leo
Alain Frisch
2013-02-21 18:03:01 UTC
Permalink
Post by Leo White
Developers must be able to give their components long (hierarchical)
names without changing the component's filename.
This allows components with the same filename to coexist within the
search path. It also allows these components to be grouped together
without packing them into a single module.
What would be the justification for hierarchical names? It seems that a
flat qualifier is enough to support the two goals you mention (making
components with the same filename coexist, and grouping them with a
common name). In practice, we will have a namespace per distributed
library (e.g. Core, Extlib, Xml-light, ...). Restricting to flat
qualifiers might enable a simpler design.

Personally, I'm not even convinced of the need for supporting several
compilation units with the same filename. Basically, we will encode
namespace information inside the .cmi instead of doing it in the
filename, forcing the compiler to open files only to discover they are
not in the correct namespace. Is it really so tedious to use longer
filenames? For the library developer, I'd say no. For the library
user, maybe, and I'd rather focus on providing ways to make it simpler
to refer to long module names, such as a good module alias feature in
the language and/or a way to customize the link between names in source
files and external module names (a mapping could be specified in
external files).


Alain
Leo White
2013-02-21 18:39:37 UTC
Permalink
Post by Alain Frisch
What would be the justification for hierarchical names? It seems that a
flat qualifier is enough to support the two goals you mention (making
components with the same filename coexist, and grouping them with a
common name). In practice, we will have a namespace per distributed
library (e.g. Core, Extlib, Xml-light, ...). Restricting to flat
qualifiers might enable a simpler design.
Large libraries can also contain multiple components with the same name.
For example, a library might provide both Foo.Aysc.IO and Foo.Lwt.IO.

I don't think having hierarchical namespaces really adds much additional
complexity to the system.

More generally, almost all naming systems are hierarchical. It is a tried
and tested way of organising things.
Post by Alain Frisch
Personally, I'm not even convinced of the need for supporting several
compilation units with the same filename. Basically, we will encode
namespace information inside the .cmi instead of doing it in the
filename, forcing the compiler to open files only to discover they are
not in the correct namespace. Is it really so tedious to use longer
filenames?
Is it really so tedious for the compiler to look in multiple .cmi files
until the right one is found?

Long filenames don't allow you to open or alias a namespace, they also
don't allow you to change the namespaces that are open by default. All of
which are useful features.

I also think that long filenames *are* tedious, if they weren't people
would use them already. If you are using a large library, even with a good
aliasing feature, you would end up writing:

open Core_Std_Mutex as Mutex
open Core_Std_Thread as Thread
open Core_Std_Date as Date

at the beginning of all your files, instead of writing:

open Core.Std

Regards,

Leo
Alain Frisch
2013-02-22 09:22:23 UTC
Permalink
Post by Leo White
More generally, almost all naming systems are hierarchical. It is a
tried and tested way of organising things.
The namespace system I've used most, XML Namespaces, has flat qualifiers
(we could argue that they are in general URL/URN, with a hierarchical
structure, but namespaces are really just strings matched with strict
equality).

I agree with Xavier that if we give a hierarchical syntax to namespaces,
this should somehow be reflected in the semantics to avoid confusion.
Post by Leo White
Is it really so tedious for the compiler to look in multiple .cmi files
until the right one is found?
Nothing is really tedious for a compiler, but there are technical
drawbacks of doing so:

- Performance: looking up and opening files takes time, especially
under bad OS such as Windows.

- It prevents from putting .cmi files from many libraries in the same
directory, which is sometimes useful (to simplify deployment; to control
precisely the set of .cmi available for a given file; to improve
performance by avoiding repeated lookups in many directories).

- Spurious dependencies: technically, since the compiler will open
them, all x.cmi files in the search path should be considered as
dependencies for a module which refers to X. This is necessary to have
a correct notion of dependency for the build system (formally, each
x.cmi could become the "correct one" if its namespace changes in the
source file; and since all these files are opened, they should not be
overwritten in parallel). This complexifies the build system,
especially for parallel builds, and creates a risk of dependency cycles.
Post by Leo White
I also think that long filenames *are* tedious, if they weren't people
would use them already. If you are using a large library, even with a
open Core_Std_Mutex as Mutex
open Core_Std_Thread as Thread
open Core_Std_Date as Date
open Core.Std
That's why I've proposed to allow specifying mapping between references
to external modules in dedicated files. We could have a file
core_std.ns (probably shipped with Core) with this content:

Mutex = Core_std_mutex
Thread = Core_std_thread
Date = Core_std_date

and just a reference in the source code (or on the command-line):

open namespace Core_std

which would load core_std.ns and use the corresponding module renaming
in the rest of the module.

Regards,

Alain
Leo White
2013-02-22 14:06:22 UTC
Permalink
Post by Alain Frisch
- Performance: looking up and opening files takes time, especially
under bad OS such as Windows.
There are a number of possible solutions to spurious opens. The simplest of
which is to only look for "Core.Std.Mutex" in directories which contain the
special "Core.Std" .cmi file (mentioned previously as a solution to typos
in open statements). These special .cmi files could also be extended to
include a list of modules that have that namespace within the current
directory which would prevent spurious reads entirely.
Post by Alain Frisch
- It prevents from putting .cmi files from many libraries in the same
directory, which is sometimes useful (to simplify deployment; to control
precisely the set of .cmi available for a given file; to improve
performance by avoiding repeated lookups in many directories).
I think that this ability is of dubious value and not really a big loss.
Post by Alain Frisch
- Spurious dependencies: technically, since the compiler will open
them, all x.cmi files in the search path should be considered as
dependencies for a module which refers to X. This is necessary to have
a correct notion of dependency for the build system (formally, each
x.cmi could become the "correct one" if its namespace changes in the
source file; and since all these files are opened, they should not be
overwritten in parallel). This complexifies the build system,
especially for parallel builds, and creates a risk of dependency cycles.
The solutions above should solve this problem.

Even without those solutions, there is no need for a proper dependency,
since changing the namespace would cause there to be two filenames with the
same name and namespace. This is basically an error, so dependencies should
not be expected to be correct. It would not be difficult for an OCaml
specific build system to detect the existence of two files with the same
name and namespace and raise an error.

There is possibly the need for some kind of partial dependency for parallel
builds. This is more like a lock than a dependency, so there should be no
question of circular dependencies. I'm not really familiar with how
parallel file accesses work on different file systems, but perhaps the
compiler could lock ".cmi" files before reading and writing. This might be
a good idea more generally for cases where dependencies have not been
correctly calculated.
Post by Alain Frisch
That's why I've proposed to allow specifying mapping between references
to external modules in dedicated files. We could have a file
Mutex = Core_std_mutex
Thread = Core_std_thread
Date = Core_std_date
open namespace Core_std
which would load core_std.ns and use the corresponding module renaming
in the rest of the module.
There is very little difference between that suggestion and having
a core_std.ns file containing:

Mutex;
Thread;
Date;

and using that as a (partial) declaration of a Core_std namespaces, except
that you have to give every file a unique long filename. So I don't really
see the particular benefit of using long filenames.

There is also a more general problem with any solution like this, which
tries to define namespaces (or sets of aliases) in a single file. It is
difficult to use the namespace from inside the modules that are within the
namespace. For example, if I use:

open namespace Core_std

from within mutex.ml then it will attempt to open itself.

This is particularly problematic for language extensions, because they want
to generate code like:

Core_Std.Mutex.lock

but if they are used within another module in Core_Std then it breaks.

The solution to these problems is to have membership of a namespace encoded
in the module itself.
Alain Frisch
2013-02-22 17:22:44 UTC
Permalink
Post by Leo White
There are a number of possible solutions to spurious opens. The simplest
of which is to only look for "Core.Std.Mutex" in directories which
contain the special "Core.Std" .cmi file (mentioned previously as a
solution to typos in open statements). These special .cmi files could
also be extended to include a list of modules that have that namespace
within the current directory which would prevent spurious reads entirely.
This seems a little bit hackish to me, and likely to require more
over-engineering (do we need a tool to create those .cmi files; if they
are plain text file, it's ugly to use the .cmi extension).
Post by Leo White
Post by Alain Frisch
- It prevents from putting .cmi files from many libraries in the same
directory, which is sometimes useful (to simplify deployment; to
control precisely the set of .cmi available for a given file; to
improve performance by avoiding repeated lookups in many directories).
I think that this ability is of dubious value and not really a big loss.
We use this quite intensively on LexiFi's code base. This really speeds
up compilation time under Windows, and we also use it to simplify
deployment (our application is shipped with some .cmi files from many
libraries and automatically compiles user-provided addins against them;
it would be tedious -- and useless -- to reproduce a complete hierarchy
of libraries on the installation side).

We would hate to have the third-party libraries we use adopt a new
feature (namespaces) which solves a problem we don't have but forces us
to change in non trivial ways how we organize our code base and deployment.
Post by Leo White
This is basically an error, so dependencies
should not be expected to be correct.
I don't agree. Having wrong dependencies is a nightmare to debug and
during development having errors in the code is not an exceptional
situation. A robust build system should handle nicely things like
moving files around, renaming them, etc.
Post by Leo White
There is possibly the need for some kind of partial dependency for
parallel builds. This is more like a lock than a dependency, so there
should be no question of circular dependencies. I'm not really familiar
with how parallel file accesses work on different file systems, but
perhaps the compiler could lock ".cmi" files before reading and writing.
This might be a good idea more generally for cases where dependencies
have not been correctly calculated.
These locks do not really solve the problem. Imagine you have a big
project with two modules

foo/a.ml in namespace Foo
bar/a.ml in namespace Bar

Now you compile x.ml which refers to Foo # A. "ocamldep -modules"
reports that the dependencies for it include mode "A", which must be
mapped to all buildable a.cmi/a.cmx in your tree, i.e. both foo/a and
bar/a. (Things are even worse if you use the same syntax as for
modules, because then any reference like Foo.A must be interpreted as a
potential dependency to foo.cmi/cmx or to a.cmi/cmx.) But maybe
bar/a.ml refers to x.ml, and then you have a circular dependency.

I'd like any proposal about namespaces to come with a description of (i)
how ocamldep is supposed to behave; (ii) how build systems (based on,
say, make, omake and ocamlbuild) are supposed to be adapted.

So here it is for my proposal of using "short names" declared in
external files:

- Doing "open namespace Core_std" is strictly equivalent to doing
"module Mutex == Core_std_mutex;; module Thread == Core_std_thread;;
module Date == Core_std_date" assuming a new module aliasing feature
(available in structures and signatures).

- ocamldep would read the core_std.ns file (meaning that it must
exist and be up-to-date when ocamldep runs; I expect those files to be
quite static so this shouldn't be a big problem -- otherwise, we would
need to have a first pass where ocamldep would returns the list of .ns
files to be opened, and the build system would arrange to build them).

- when ocamldep encounters a module reference "Mutex" in a scope where
an alias "module Mutex == Core_std_mutex" has been defined (manually or
by loading core_std.ns), it reports a dependency to module
Core_std_mutex instead of Mutex.

- the build systems do not have to be adapted.
Post by Leo White
Post by Alain Frisch
That's why I've proposed to allow specifying mapping between
references to external modules in dedicated files. We could have a
Mutex = Core_std_mutex
Thread = Core_std_thread
Date = Core_std_date
open namespace Core_std
which would load core_std.ns and use the corresponding module renaming
in the rest of the module.
There is very little difference between that suggestion and having a
Mutex;
Thread;
Date;
and using that as a (partial) declaration of a Core_std namespaces,
except that you have to give every file a unique long filename. So I
don't really see the particular benefit of using long filenames.
With my proposal, you don't force users of the library to use the new
feature (meaning that if for some reason your local build system does
not work nicely with namespaces, you can always refer to modules using
their long names). Moreover, the semantics is very easy to explain, the
linker does not need to be changed, and we don't change how OCaml
behaves w.r.t. to the file system (your proposal prevents users from
using the currently valid technique of copying .cmi files from many
libraries in the same directory).
Post by Leo White
There is also a more general problem with any solution like this, which
tries to define namespaces (or sets of aliases) in a single file. It is
difficult to use the namespace from inside the modules that are within
open namespace Core_std
from within mutex.ml then it will attempt to open itself.
I don't understand the problem. The source file would be
core_std_mutex.ml and it is fine if it does "open namespace Core_std" as
long as it doesn't refer to Mutex (which would be a circular dependency).


Alain
Leo White
2013-02-22 18:12:20 UTC
Permalink
Post by Alain Frisch
This seems a little bit hackish to me, and likely to require more
over-engineering (do we need a tool to create those .cmi files; if they
are plain text file, it's ugly to use the .cmi extension).
I'm still not sure that they are really needed, but if they are the
simplest thing to do would be to automatically generate some kind of
"core.std.cmn" file whenever an mli file was compiled that contained "in
Core.Std". This would basically indicate that a "Core.Std" namespace
existed and that some ".cmi" files in this directory used it.
Post by Alain Frisch
We use this quite intensively on LexiFi's code base. This really speeds
up compilation time under Windows, and we also use it to simplify
deployment (our application is shipped with some .cmi files from many
libraries and automatically compiles user-provided addins against them;
it would be tedious -- and useless -- to reproduce a complete hierarchy
of libraries on the installation side).
We would hate to have the third-party libraries we use adopt a new
feature (namespaces) which solves a problem we don't have but forces us
to change in non trivial ways how we organize our code base and deployment.
It is quite a specialised case, but I don't think it would be too hard to
accommodate. If the compiler accepted filenames like "list.1.cmi" as
possible files containing a "List" module, then you could safely put all
your files in a single directory.
Post by Alain Frisch
Post by Leo White
This is basically an error, so dependencies
should not be expected to be correct.
I don't agree. Having wrong dependencies is a nightmare to debug and
during development having errors in the code is not an exceptional
situation. A robust build system should handle nicely things like
moving files around, renaming them, etc.
I didn't mean that the build system wouldn't detect the error only that it
wouldn't detect it until dependencies were recalculated.
Post by Alain Frisch
Imagine you have a big
project with two modules
foo/a.ml in namespace Foo
bar/a.ml in namespace Bar
Now you compile x.ml which refers to Foo # A. "ocamldep -modules"
reports that the dependencies for it include mode "A", which must be
mapped to all buildable a.cmi/a.cmx in your tree, i.e. both foo/a and
bar/a. (Things are even worse if you use the same syntax as for
modules, because then any reference like Foo.A must be interpreted as a
potential dependency to foo.cmi/cmx or to a.cmi/cmx.) But maybe
bar/a.ml refers to x.ml, and then you have a circular dependency.
This won't a problem if you use a build system specialised for OCaml, since
they would know about namespaces and create the following dependencies:

x.ml: Foo#A
Foo#A: foo/a.ml
Bar#A: bar/a.ml

Even the makefile output of OCamlDep could be modified to avoid this
problem using phony targets, although it is probably not worth it.
Post by Alain Frisch
I'd like any proposal about namespaces to come with a description of (i)
how ocamldep is supposed to behave; (ii) how build systems (based on,
say, make, omake and ocamlbuild) are supposed to be adapted.
For namespaces:

- Whenever ocamldep encounters a line "in A#B" within c.mli then it creates
a dependency "A#B#C: c.mli".

- Whenever it finds a use A#B#C in a file e.mli it creates a dependency
"e.mli: A#B#C".

- Build systems are modified to include support for phony namespace
targets.
Alain Frisch
2013-02-22 18:21:11 UTC
Permalink
Post by Leo White
- Whenever ocamldep encounters a line "in A#B" within c.mli then it creates
a dependency "A#B#C: c.mli".
- Whenever it finds a use A#B#C in a file e.mli it creates a dependency
"e.mli: A#B#C".
- Build systems are modified to include support for phony namespace
targets.
Are you suggesting that support for namespace would require to change
make and omake, or just their "OCaml-specific" rules (defined in user
land, not in the tool itself)?

I'm not sure that what you describe above correspond to phony targets as
currently understood by make and omake (I might be wrong).

Could you also describe how this would be affected if we allow opening
namespaces? It seems to me that the safe thing to do would be quite
ugly (a reference to a module "A" would create many candidate
dependencies for all opened namespaces). Moreover, would you support
opening namespaces within namespaces (i.e. is "open namespace A;; open
namespace B" a valid way to open the namespace A/B?)

Alain
Leo White
2013-02-25 11:35:09 UTC
Permalink
Post by Alain Frisch
Are you suggesting that support for namespace would require to change
make and omake, or just their "OCaml-specific" rules (defined in user
land, not in the tool itself)?
Just their OCaml-specific rules.
Post by Alain Frisch
I'm not sure that what you describe above correspond to phony targets as
currently understood by make and omake (I might be wrong).
I think that a phony target is just a target without a corresponding file
(I might also be wrong).

Although, I had forgotten the OCamlDep only creates dependencies on modules
that have an existing .ml or .mli file. So instead of using phony targets,
I propose the following solution:

- Whenever it finds a use A#B#C in a file e.mli it creates a dependency
"e.cmi: c.cmi" if it can find an ".mli" or ".ml" file in its search path
which starts with the line "in A#B".

- Build systems are left as they are.
Post by Alain Frisch
Could you also describe how this would be affected if we allow opening
namespaces? It seems to me that the safe thing to do would be quite
ugly (a reference to a module "A" would create many candidate
dependencies for all opened namespaces).
With the solution given above, a reference to "A" would only create a
dependency on the "a.cmi" that was in the more recently opened namespace.
Post by Alain Frisch
Moreover, would you support
opening namespaces within namespaces (i.e. is "open namespace A;; open
namespace B" a valid way to open the namespace A/B?)
I would like to but it is not a deal-breaker.
Alain Frisch
2013-02-21 18:23:19 UTC
Permalink
Post by Leo White
Developers must be able to give their components long (hierarchical)
names without changing the component's filename.
This allows components with the same filename to coexist within the
search path. It also allows these components to be grouped together
without packing them into a single module.
What would be the justification for hierarchical names? It seems that a
flat qualifier is enough to support the two goals you mention (making
components with the same filename coexist, and grouping them with a
common name). In practice, we will have a namespace per distributed
library (e.g. Core, Extlib, Xml-light, ...). Restricting to flat
qualifiers might enable a simpler design.

Personally, I'm not even convinced of the need for supporting several
compilation units with the same filename. Basically, we will encode
namespace information inside the .cmi instead of doing it in the
filename, forcing the compiler to open files only to discover they are
not in the correct namespace. Is it really so tedious to use longer
filenames? For the library developer, I'd say no. For the library
user, maybe, and I'd rather focus on providing ways to make it simpler
to refer to long module names, such as a good module alias feature in
the language and/or a way to customize the link between names in source
files and external module names (a mapping could be specified in
external files).


Alain
Stefano Zacchiroli
2013-02-21 20:31:40 UTC
Permalink
Post by Alain Frisch
What would be the justification for hierarchical names?
One of the advantages that comes to mind is the ability to piggyback on
already existing, world-wide, unambiguous, hierarchical namespaces out
there, such as DNS. It's overly verbose, but if you dream of a very
widespread adoption of the language (a-la Java), then namespaces like
org.apache.... have their advantages in terms of scalability.

Just my 0.02?,
Cheers.
--
Stefano Zacchiroli . . . . . . . zack at upsilon.cc . . . . o . . . o . o
Ma?tre de conf?rences . . . . . http://upsilon.cc/zack . . . o . . . o o
Debian Project Leader . . . . . . @zack on identi.ca . . o o o . . . o .
? the first rule of tautology club is the first rule of tautology club ?
Xavier Clerc
2013-02-22 08:54:37 UTC
Permalink
----- Mail original -----
Post by Stefano Zacchiroli
Post by Alain Frisch
What would be the justification for hierarchical names?
One of the advantages that comes to mind is the ability to piggyback
on
already existing, world-wide, unambiguous, hierarchical namespaces
out
there, such as DNS. It's overly verbose, but if you dream of a very
widespread adoption of the language (a-la Java), then namespaces like
org.apache.... have their advantages in terms of scalability.
I concur. However, I would like to draw your attention on the fact that
hierarchical names tend to convey the intuition that the parent element
somehow "contains" the children elements. In Java, this is an incorrect
assumption, as the namespace is indeed flat. My bet is that if hierarchical
names are used, it has to be reflected in the semantics.


More generally on the subject of namespaces, shouldn't we assess the
merits and mistakes of their equivalents in other languages?


Regards,

Xavier
Sylvain Le Gall
2013-02-22 09:31:20 UTC
Permalink
Post by Xavier Clerc
----- Mail original -----
Post by Stefano Zacchiroli
Post by Alain Frisch
What would be the justification for hierarchical names?
One of the advantages that comes to mind is the ability to piggyback on
already existing, world-wide, unambiguous, hierarchical namespaces out
there, such as DNS. It's overly verbose, but if you dream of a very
widespread adoption of the language (a-la Java), then namespaces like
org.apache.... have their advantages in terms of scalability.
I concur. However, I would like to draw your attention on the fact that
hierarchical names tend to convey the intuition that the parent element
somehow "contains" the children elements. In Java, this is an incorrect
assumption, as the namespace is indeed flat. My bet is that if hierarchical
names are used, it has to be reflected in the semantics.
More generally on the subject of namespaces, shouldn't we assess the
merits and mistakes of their equivalents in other languages?
I tend to agree with Alain about the fact that hierarchical namespaces
is overkill. Flat namespace is easier to achieve and will solve most
problems.

Working with Java all days long and its namespace, I would say it is
totally useless to copy that! What the meaning of org.apache? Will I
have to name my libraries net.le-gall.sylvain.Foo or
org.ocamlcore.forge.ounit.OUnit?

Honestly, hierarchical namespace just makes mandatory IDE completion
(read Eclipse or IntelliJ here). And even in this case the
autocomplete boxes are too small to make the difference between
org.apache.utils.String and org.apache.bar.utils.String.

If we have to vote on this topic, I would say: flat namespace and
that's enough for me.

Regards
Sylvain
Daniel Bünzli
2013-02-22 10:39:41 UTC
Permalink
This may be a silly suggestion as I'm not sure I'm really convinced by the absolute *need* for namespaces (I'd rather not have an additional concept in a language that I already find sufficiently complex to my taste).

However it strikes me that while it seems to be agreed upon that a simple mechanism like `-pack` solves the problem albeit not in a technically satisfying way, the worked out proposal seems to skyrocket into complexity. The best way to avoid bureaucracy/complexity is not to introduce the tools to be able to manage it...

So if the problem is `-pack` is not good enough because it produces a huge `cmo`, why not just try to find a corresponding concept workable with `cm[x]a` ?

Here's a proposal --- from a user point of view, I'll let the compiler hackers comment on how/if this could be workable.

Introduce `cmia` files that bundles the `cmi`'s of a `cm[x]a` with the corresponding file name. The name of the `cmia` file (and hence of the `cm[x]a` file) defines your toplevel module name ? la `-pack`. Make it even be backward compatible in the sense that if there's no `cmia` for a `cm[x]a` the modules are accessed as before. Now you have a kind of `-pack` that work with `cm[x]a`.

Best,

Daniel

P.S. flat vs hierarchical, I'd also rather go flat.
Anil Madhavapeddy
2013-02-22 11:41:10 UTC
Permalink
There's one scenario which absolutely requires the ability to explicitly open a particular namespace: camlp4 code generation.

Right now, several camlp4 extensions break because they use modules from the standard Pervasives library, and have no way to explicitly state that. If Core.Std is opened, then compilation fails.

The two workarounds are:
- hack the build system to pass -pp options to the camlp4 generator. Painful.
- have some facility to explicitly open 'Caml_std' or 'Core_std' locally, irrespective of the current module environment.

I believe namespaces addresses the latter workaround.

-anil
Post by Daniel Bünzli
This may be a silly suggestion as I'm not sure I'm really convinced by the absolute *need* for namespaces (I'd rather not have an additional concept in a language that I already find sufficiently complex to my taste).
However it strikes me that while it seems to be agreed upon that a simple mechanism like `-pack` solves the problem albeit not in a technically satisfying way, the worked out proposal seems to skyrocket into complexity. The best way to avoid bureaucracy/complexity is not to introduce the tools to be able to manage it...
So if the problem is `-pack` is not good enough because it produces a huge `cmo`, why not just try to find a corresponding concept workable with `cm[x]a` ?
Here's a proposal --- from a user point of view, I'll let the compiler hackers comment on how/if this could be workable.
Introduce `cmia` files that bundles the `cmi`'s of a `cm[x]a` with the corresponding file name. The name of the `cmia` file (and hence of the `cm[x]a` file) defines your toplevel module name ? la `-pack`. Make it even be backward compatible in the sense that if there's no `cmia` for a `cm[x]a` the modules are accessed as before. Now you have a kind of `-pack` that work with `cm[x]a`.
Best,
Daniel
P.S. flat vs hierarchical, I'd also rather go flat.
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Daniel Bünzli
2013-02-22 12:12:16 UTC
Permalink
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to explicitly open a particular namespace: camlp4 code generation.
Well I'm not sure I'd like more complexity in the system to support the otherwise ugly tool that camlp4 is. Besides I'm sure this problem can be tackled using modules and the language as it stands instead of introducing a new bureaucratic concept in the language.

Daniel
Anil Madhavapeddy
2013-02-22 12:14:21 UTC
Permalink
Post by Daniel Bünzli
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to explicitly open a particular namespace: camlp4 code generation.
Well I'm not sure I'd like more complexity in the system to support the otherwise ugly tool that camlp4 is. Besides I'm sure this problem can be tackled using modules and the language as it stands instead of introducing a new bureaucratic concept in the language.
This remains a problem in any code-generation approach, including ppx. Namespaces are just another way of manipulating modules, so we could call them 'module aliases' if having a new word is scaring people off.

-anil
Daniel Bünzli
2013-02-22 12:51:05 UTC
Permalink
This remains a problem in any code-generation approach, including ppx. Namespaces are just another way of manipulating modules, so we could call them 'module aliases' if having a new word is scaring people off.
But this is already part of the language (module M = M'), just make its cost negligible. Now maybe it's just a matter of rendering stdlib less pervasives or as you suggest to make its components available under another toplevel name. What I doubt is that some new mechanism really has to be introduced.

Daniel
Malcolm Matalka
2013-02-22 13:52:38 UTC
Permalink
So would a syntax extension always have to store the modules it wants to
be sure to access at the beginning of a file it requires, making sure
not to choose overlapping names?
Post by Anil Madhavapeddy
This remains a problem in any code-generation approach, including ppx. Namespaces are just another way of manipulating modules, so we
could call them 'module aliases' if having a new word is scaring people off.
But this is already part of the language (module M = M'), just make its cost negligible. Now maybe it's just a matter of rendering stdlib
less pervasives or as you suggest to make its components available under another toplevel name. What I doubt is that some new mechanism
really has to be introduced.
Daniel
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Daniel Bünzli
2013-02-22 14:33:57 UTC
Permalink
Post by Malcolm Matalka
So would a syntax extension always have to store the modules it wants to
be sure to access at the beginning of a file it requires, making sure
not to choose overlapping names?
Why not ? Each extension could have its own module against which you have to link to use the extension that has the module it uses in it (MyExt.List?).

It's not evident to me that namespaces actually solve the problem either, they just seem to push the problem in the build system. The actual problem seems very related to the problem of hygienic macros (accidental name capture).

Besides I'm not sure the example Anil gave is as widespread as it seems, it feels like a corner case that could be avoided, as he suggested, by having the ability to refer to stdlib's modules under a toplevel name (OCaml.List) and this still seems in the realm of the module system (and could be realized via something like cmia).

Daniel
Gabriel Scherer
2013-02-22 14:39:44 UTC
Permalink
Post by Daniel Bünzli
It's not evident to me that namespaces actually solve the problem either,
they just seem to push the problem in the build system. The actual problem
seems very related to the problem of hygienic macros (accidental name
capture).

Indeed.

What you really want of Camlp4 extensions is not "namespaces", but
*hygiene*. Hygiene in hygenic macro systems means two things:
1. having binders in macros not capture bound variables in expanded
user-provided code (unless explicitly desired, to implement a new binding
structure as macro)
2. having bound variables in macros not be captured by user-provided
binders

None of those two distinct aspects are specifically accounted for in the
current OCaml+Camlp4 combination. The lack of the first can be worked
around by being careful in extension code (preserve the scope of
user-provided code and not expand it under extension binders, which is
possible thanks to OCaml expressive "let ... and .." feature), the lack of
the second is more problematic as it would require the extension *user* to
be careful (not happening in practice).

This is related to this bug report by Hongbo:
http://caml.inria.fr/mantis/view.php?id=5849

Namespaces are only an attempt to workaround the problem by adding an
indirection in naming that would make lack of hygiene less visible. It's a
reasonable side-effect of a namespace proposal, but that cannot reasonably
be the main motivation for adding namespaces. "Let's add this new feature
so that this problem with that ugly tool can be partially fixed in this
ugly way" is not going to fly.

(Note that Lisp languages solve this by embedding the macro facility in the
language itself, instead of having an up-front AST-generation model. This
allows macro expansion to elaborate internal unique names instead of
surface names, which solves this hygiene problem. An OCaml equivalent would
be to give a way to address a module by its internal compilation unit name.)

On Fri, Feb 22, 2013 at 3:33 PM, Daniel B?nzli
Post by Daniel Bünzli
Post by Malcolm Matalka
So would a syntax extension always have to store the modules it wants to
be sure to access at the beginning of a file it requires, making sure
not to choose overlapping names?
Why not ? Each extension could have its own module against which you have
to link to use the extension that has the module it uses in it
(MyExt.List?).
It's not evident to me that namespaces actually solve the problem either,
they just seem to push the problem in the build system. The actual problem
seems very related to the problem of hygienic macros (accidental name
capture).
Besides I'm not sure the example Anil gave is as widespread as it seems,
it feels like a corner case that could be avoided, as he suggested, by
having the ability to refer to stdlib's modules under a toplevel name
(OCaml.List) and this still seems in the realm of the module system (and
could be realized via something like cmia).
Daniel
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/platform/attachments/20130222/bbdfb83d/attachment.html>
Leo White
2013-02-22 17:10:39 UTC
Permalink
Post by Daniel Bünzli
Post by Daniel Bünzli
It's not evident to me that namespaces actually solve the problem either,
they just seem to push the problem in the build system. The actual problem
seems very related to the problem of hygienic macros (accidental name
capture).
Indeed.
There is also another issue using language extensions which namespaces
solve. When you write a language extension to produce code that calls your
library, you need to have it produce calls like:

Foo.Comp.func ()

where Foo is the packed module that your library produces.

However, then you cannot use the syntax extension within your library
because the Foo module does not exist yet.

This is why, for example, COW's quotations need command-line arguments to
tell them whether they are inside COW or not.

More generally, packed modules create two different names for a module, one
used within the other packed modules, and one used externally. Namespaces
can simply avoid this problem because not all modules in the namespace need
to exist before the namespace can be used.
Post by Daniel Bünzli
"Let's add this new feature
so that this problem with that ugly tool can be partially fixed in this
ugly way" is not going to fly.
I actually think that long names are a perfectly valid solution for hygene,
rather than an ugly one. I can also think of at least 4 other problems that
namespaces solve:

1. Grouping modules without using "pack".
2. Providing multiple names for modules (e.g. "Core.Std.List" and
"Platform.List").
3. The circular naming problem described above.
4. Control of the default set of names that are open when compiling a
module.

All of which are of obvious benefit to the platform (which is what we are
discussing on this list).
Malcolm Matalka
2013-02-22 12:45:39 UTC
Permalink
I'm not paying attention as well as I should be to this thread, but what
you said reminded me of something I'd thought about, maybe it's really
ignorante, but it would be nice if I could do access modules from a
'root' name, kind of like DNS. So if I have opened Core.Std I can
access the old 'lists' module by doing '.List' or something. This would
require camlp4 modules to be written intelligently, but the ability would
ensure they know which module they are getting.

Just a thought,
/M
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to explicitly open a particular namespace: camlp4 code generation.
Right now, several camlp4 extensions break because they use modules from the standard Pervasives library, and have no way to explicitly state that. If Core.Std is opened, then compilation fails.
- hack the build system to pass -pp options to the camlp4 generator. Painful.
- have some facility to explicitly open 'Caml_std' or 'Core_std' locally, irrespective of the current module environment.
I believe namespaces addresses the latter workaround.
-anil
Post by Daniel Bünzli
This may be a silly suggestion as I'm not sure I'm really convinced by the absolute *need* for namespaces (I'd rather not have an
additional concept in a language that I already find sufficiently complex to my taste).
However it strikes me that while it seems to be agreed upon that a simple mechanism like `-pack` solves the problem albeit not in a
technically satisfying way, the worked out proposal seems to skyrocket into complexity. The best way to avoid bureaucracy/complexity is
not to introduce the tools to be able to manage it...
So if the problem is `-pack` is not good enough because it produces a huge `cmo`, why not just try to find a corresponding concept workable with `cm[x]a` ?
Here's a proposal --- from a user point of view, I'll let the compiler hackers comment on how/if this could be workable.
Introduce `cmia` files that bundles the `cmi`'s of a `cm[x]a` with the corresponding file name. The name of the `cmia` file (and hence of
the `cm[x]a` file) defines your toplevel module name ? la `-pack`. Make it even be backward compatible in the sense that if there's no
cmia` for a `cm[x]a` the modules are accessed as before. Now you have a kind of `-pack` that work with `cm[x]a`.
Best,
Daniel
P.S. flat vs hierarchical, I'd also rather go flat.
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Christophe TROESTLER
2013-02-24 19:26:45 UTC
Permalink
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to
explicitly open a particular namespace: camlp4 code generation.
Right now, several camlp4 extensions break because they use modules
from the standard Pervasives library, and have no way to explicitly
state that. If Core.Std is opened, then compilation fails.
- hack the build system to pass -pp options to the camlp4
generator. Painful.
- have some facility to explicitly open 'Caml_std' or 'Core_std'
locally, irrespective of the current module environment.
I believe namespaces addresses the latter workaround.
Camlp4 can insert some code to alias the standard modules needed by
code generation at the beginning of the source files (not foolproof
because a name needs to be generated but good enough in practice).
It would be better if that facility was provided by a Camlp4 module
instead of needing to be redone by each extension.
Yaron Minsky
2013-02-25 14:12:11 UTC
Permalink
On Sun, Feb 24, 2013 at 2:26 PM, Christophe TROESTLER
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to explicitly
open a particular namespace: camlp4 code generation.
Right now, several camlp4 extensions break because they use modules from
the standard Pervasives library, and have no way to explicitly state that.
If Core.Std is opened, then compilation fails.
- hack the build system to pass -pp options to the camlp4 generator. Painful.
- have some facility to explicitly open 'Caml_std' or 'Core_std' locally,
irrespective of the current module environment.
I believe namespaces addresses the latter workaround.
Camlp4 can insert some code to alias the standard modules needed by code
generation at the beginning of the source files (not foolproof because a
name needs to be generated but good enough in practice). It would be better
if that facility was provided by a Camlp4 module instead of needing to be
redone by each extension.
I like this workaround a lot, and am embarrassed not to have thought
of it myself...

y
Yaron Minsky
2013-02-25 14:15:27 UTC
Permalink
Post by Yaron Minsky
On Sun, Feb 24, 2013 at 2:26 PM, Christophe TROESTLER
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to explicitly
open a particular namespace: camlp4 code generation.
Right now, several camlp4 extensions break because they use modules from
the standard Pervasives library, and have no way to explicitly state that.
If Core.Std is opened, then compilation fails.
- hack the build system to pass -pp options to the camlp4 generator. Painful.
- have some facility to explicitly open 'Caml_std' or 'Core_std' locally,
irrespective of the current module environment.
I believe namespaces addresses the latter workaround.
Camlp4 can insert some code to alias the standard modules needed by code
generation at the beginning of the source files (not foolproof because a
name needs to be generated but good enough in practice). It would be better
if that facility was provided by a Camlp4 module instead of needing to be
redone by each extension.
I like this workaround a lot, and am embarrassed not to have thought
of it myself...
As a side note, if we were to build such a facility for ppx and/or
camlp4, I'd love for camlp4 extensions to refer to Pervasives
explicitly, and not rely on the assumption that it has been opened.

y
Anil Madhavapeddy
2013-02-25 14:15:54 UTC
Permalink
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to explicitly open a particular namespace: camlp4 code generation.
Right now, several camlp4 extensions break because they use modules from the standard Pervasives library, and have no way to explicitly state that. If Core.Std is opened, then compilation fails.
- hack the build system to pass -pp options to the camlp4 generator. Painful.
- have some facility to explicitly open 'Caml_std' or 'Core_std' locally, irrespective of the current module environment.
I believe namespaces addresses the latter workaround.
Camlp4 can insert some code to alias the standard modules needed by code generation at the beginning of the source files (not foolproof because a name needs to be generated but good enough in practice). It would be better if that facility was provided by a Camlp4 module instead of needing to be redone by each extension.
That's an interesting idea. The only hitch is that it's a little hard to do in one pass, as the code generation is called on the local AST fragment.

I think it would work if placed as a feature into type_conv itself, as the individual generators (e.g. sexp/orm) all register themselves with it quite early. They could request global modules, which type_conv does in one pass (thus also avoiding duplicate requests for the original namespace).

CCing Markus Mottl to see what he thinks...

-anil
Markus Mottl
2013-02-25 15:10:27 UTC
Permalink
Post by Anil Madhavapeddy
I think it would work if placed as a feature into type_conv itself, as the individual generators (e.g. sexp/orm) all register themselves with it quite early. They could request global modules, which type_conv does in one pass (thus also avoiding duplicate requests for the original namespace).
I'm not sure that type_conv should be the place to implement this.
Since this issue can affect all kinds of camlp4 macros, it seems like
a feature that camlp4 should provide. There should be some
standardized module name, e.g. "Camlp4_stdlib" or similar, which would
allow generated code to refer to the original OCaml standard library.

Regards,
Markus
--
Markus Mottl http://www.ocaml.info markus.mottl at gmail.com
Christophe TROESTLER
2013-02-25 18:21:20 UTC
Permalink
Post by Anil Madhavapeddy
Post by Christophe TROESTLER
Camlp4 can insert some code to alias the standard modules needed
by code generation at the beginning of the source files (not
foolproof because a name needs to be generated but good enough in
practice). It would be better if that facility was provided by a
Camlp4 module instead of needing to be redone by each extension.
That's an interesting idea. The only hitch is that it's a little
hard to do in one pass, as the code generation is called on the
local AST fragment.
It can be done in one pass thanks to AstFilters.register_str_item_filter.
Example code:

let declarations_at_beginning = ref []

(* The filters are evaluated after the whole source is read, thus all
constants will have been collected. *)
let () =
if not_interactive then begin
let add_top_declarations str_item =
let add s decl = <:str_item at here< $decl$ $s$ >> in
List.fold_left add str_item !declarations_at_beginning in
AstFilters.register_str_item_filter add_top_declarations
end

let add_to_beginning_of_file decl =
declarations_at_beginning := decl :: !declarations_at_beginning
Gabriel Scherer
2013-02-26 09:37:14 UTC
Permalink
In this post, I'll try to take a high-level view of the discussion that has
happened so far. You can take this as an (opinionated) summary.

The main ideas of my proposal linked at the top, or more generally my work
on what people have been suggesting for namespaces, are the following:

1. Namespaces are about the out-of-the-language-question of how a given
in-source compilation unit name (masquerading as a module name) maps to an
in-filesystem compilation unit.
2. I was suggesting new features that (mostly) do not change the OCaml
language itself, but the semantics of this mapping and the way it is given
to the type-checker. This is a choice that some people may not agree with
(more on that later).
3. Actual OCaml implementations also have a notion of "internal module
name" that is contained in compiled object files; two modules of the same
internal name cannot be mixed together. This has important practical
implication, and we can consider the implementation space for internal
module names.

Alain has a large codebase with house-developed tools, unique build process
conventions, and the ironically unusual constraint of compiling quickly on
Windows. He pushes for having as few changes as possible. No change to the
internal module names (this means developers should use long
hopefully-unique filename to avoid conflicts), no change to the structure
of compilation unit names (no hierarchy), and a simple
compunit-name-to-compunit-name mapping to alleviate the pains. In my
initial proposal, this corresponds to restricting the compilation
environment descriptions to a flat mapping built by literals and merging
only.

Leo has the fairly different use case of Janestreet Core's library in mind:
hierarchy is important (Core.Foo looks better than Core_Foo and "open
namespace Core" is important), changing the language for greater good is
ok, and his particular suggestion is an in-language, compunit-wide "in
Core.Foo" construct that would both be added to the internal module name,
and be used to generate the mapping from compunit names from compunit *on
the developer side*, rather than on the user side (with then the simple
semantics that all mappings from the search path are merged). This can
against be seen as a way to define the compilation environment, giving
slightly more control to the module provider (mostly a placement in a
hierarchy). It corresponds to a restriction of the way to define
compilation environments that is even more severe, the user having no
control (compilation environments are built as the merging of all
developer-generated environments present in the search path).

Daniel is opposed to the addition of a new "namespace" concept to the OCaml
language. It is not completely clear to me whether he objects to:
1. the addition of more flexibility in the way compilation unit names are
mapped to compilation units, outside the language (the mapping file Alain
suggests, the more expressive mapping languages I have considered, or a
compilation-option version of the "in Foo.Bar" proposed by Leo)
2. or only the addition of a new concept *in the language itself*, such as
can be seen if we insist to write Core#Std#Map.Make rather than
Core.Std.Map.Make

I lament that, expect Alain, nobody gives a damn about letting users
redefine their own names or paths in a different way that what the module
provider planned for. I think it's an important part of the design space
that may allow quite convenient things (eg. scenarios of companies
maintaining a in-house repository of modules with pinned version, and still
being to interact with non-curated module collections in the same
programs), but apparently this is not something people care about. Well,
that's how it is, and hopefully this won't turn out to be overly
restrictive in the mid-term future.


## A more detailed discussion of Daniel's arguments

Daniel, I am glad that you defend this point of view, as I myself grow more
and more conservative of language extension ideas (maybe it's a contagious
effect of vicinity to OCaml designers...). I do completely agree that new
language features should be motivated by a (expressivity vs. complexity /
kludgeness) estimation, and that it is not a priori clear that we need
another "programming in the large" concept above modules.

Regarding point (2): About a year ago, I have worked with Nicolas Pouillard
on a way to make the objection go away, by seeing compilation unit names
only as modules in the source code: given a hierarchical compilation
environment, there is a principled way to turn any non-leaf path (that does
not denote a compilation unit by itself) into a module. Instead of
Core#Std#Map, we could write Core#Std.Map (seeing Core#Std as a module) or
even Core.Std.Map (seeing Core as module) without changing the semantics of
my proposal (or any consensual restriction of it). You can find this
specified in an older design document,
http://gallium.inria.fr/~scherer/namespaces/pack_et_functor_pack.html
I had decided not to link it in my introductory mail because it add a bit
more complexity to the semantics of mapping compilation unit names to
compilation units, and my feeling was that discussing the design space of a
simpler basis was a better way to start a discussion that would certainly
become unwieldy to the risk of being improductive.

Summing things up: if you have a hierarchical mapping outside the language,
you can make it appear in OCaml source code as just as module hierarchy, as
you suggest. I think it actually adds *complexity* to the underlying
design, so there is a price to pay to hide this distinct notion of
(structured) compilation unit name. I would be ready to pay that price if
there is a wide agreement it is the right thing to do, but I personally
favor more explicit designs, at least as an experimentation and discussion
device (I think we should make the difference explicit and have a (module
Core#Std) construct turning a non-leaf compilation unit path into a module
with submodules, with the shared understanding that we can decide to hide
it under a syntactic ambiguity).

(The documentation also discusses the possibility of having *functors* that
span several compilation units, and that is something Yaron has requested
in the past. It is more complex and I don't think it is as ready, robust
and canonical as the module part, so I encourage you to ignore it in the
context of this discussion.)

Regarding point (1), your suggestion to use .cma as an already-existing
grouping of modules in submodules that could help solve module name
conflicts, I feel this is a more anecdotal part of your (still imprecise)
proposal that is more in the "you see it doesn't need to be that complex"
league that an actual principled design. I have a sympathy for the "least
effort" design process that it shares with Alain's proposal, but I think it
should not stop us from thinking about the whole design space in a
scientific way. (For example: unless I'm mistaken, .cma cannot currently
embed .cma, so your proposal for implementation reasons cannot expression
non-flat module hierarchies.) There are important technical details left to
be defined, such as how to avoid internal name clashes between submodules
of different .cma-packs. My intuition is that getting the details
straightened out would amount to re-implementing -pack (in particular the
-for-pack approach to internal module names prefixing) in terms of .cma
rather than .cmo. So the mapping from compilation unit names to compilation
units would still be entirely directed by the filesystem and search path as
it currently is, but with link-economic module packs. Why not, that's an
idea we could explore in more details, but note that its semantics would
still be split in two different "phases":
- a new notion of structured paths Foo#Bar that allows to denote a
compilation unit for bar.cmo embedded into foo.cma
- a possible way to hide this additional structure to source code,
masquerading Foo as a module, with a precise semantics when it is used for
something else than a projection (as in the document linked above; in fact
you could reuse the same proposal, I think)

This particular implementation choice gives away on quite a few things we
might want to have, such as:
- the ability to merge two sets of submodules that share a common parent
name (admittedly this is not useful in a provenance-oriented view of
compilation unit naming, but more in a Data.List view that is not
necessarily the good one to start with)
- the ability for users to build new/distinct parallel module organizations
- more-than-depth-2 hierarchies (implementation detail; might be fixed with
non-neglectible amount of work)
- compilation unit name aliasing / redefinition

But it looks interesting. Please go ahead with more detailed proposals!
On 24 Feb 2013, at 19:26, Christophe TROESTLER <
Post by Anil Madhavapeddy
There's one scenario which absolutely requires the ability to
explicitly open a particular namespace: camlp4 code generation.
Post by Anil Madhavapeddy
Right now, several camlp4 extensions break because they use modules
from the standard Pervasives library, and have no way to explicitly state
that. If Core.Std is opened, then compilation fails.
Post by Anil Madhavapeddy
- hack the build system to pass -pp options to the camlp4 generator.
Painful.
Post by Anil Madhavapeddy
- have some facility to explicitly open 'Caml_std' or 'Core_std'
locally, irrespective of the current module environment.
Post by Anil Madhavapeddy
I believe namespaces addresses the latter workaround.
Camlp4 can insert some code to alias the standard modules needed by code
generation at the beginning of the source files (not foolproof because a
name needs to be generated but good enough in practice). It would be
better if that facility was provided by a Camlp4 module instead of needing
to be redone by each extension.
That's an interesting idea. The only hitch is that it's a little hard to
do in one pass, as the code generation is called on the local AST fragment.
I think it would work if placed as a feature into type_conv itself, as the
individual generators (e.g. sexp/orm) all register themselves with it quite
early. They could request global modules, which type_conv does in one pass
(thus also avoiding duplicate requests for the original namespace).
CCing Markus Mottl to see what he thinks...
-anil
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/platform/attachments/20130226/ac196941/attachment-0001.html>
Leo White
2013-02-26 12:56:27 UTC
Permalink
Post by Gabriel Scherer
I lament that, expect Alain, nobody gives a damn about letting users
redefine their own names or paths in a different way that what the module
provider planned for.
I wouldn't say that people don't give a damn about it, only that it is not
a priority: first we need to be able to give a component the name its
developer intended for it, then we can worry about how to allow other
people to rename it.

I also think that it is important that the default case (i.e. use the name
that the developer gave it) requires no input on the part of the user.
Yaron Minsky
2013-02-26 14:16:09 UTC
Permalink
Post by Gabriel Scherer
I lament that, expect Alain, nobody gives a damn about letting users
redefine their own names or paths in a different way that what the module
provider planned for.
I wouldn't say that people don't give a damn about it, only that it is not a
priority: first we need to be able to give a component the name its
developer intended for it, then we can worry about how to allow other people
to rename it.
I also think that it is important that the default case (i.e. use the name
that the developer gave it) requires no input on the part of the user.
I agree. I would like to let users flexibly redefine namespaces, and
to have multiple namespaces for the same modules. One example of the
use of this that I'd have in Core is Core.Std and Core.Stable.

Core.Std is the standard thing you open if you want to use Core in the
ordinary way. Nothing to see here.

Core.Stable exports just a subset of Core, in particular, a set of
so-called "stable" types that are guaranteed not to change from
release to release. (there are explicit version numbers attached to
these to allow new versions to be minted without changing the old.)
This is useful for building protocols that OCaml programs can
communicate with even when they're built with different versions of
Core.

It would be ideal if the namespace proposal supported this use case,
and once it does, well, it seems like you almost need to have the
flexibility you describe.
Leo White
2013-02-22 16:32:57 UTC
Permalink
Post by Daniel Bünzli
So if the problem is `-pack` is not good enough because it produces a
huge `cmo`, why not just try to find a corresponding concept workable
with `cm[x]a` ?
Here's a proposal --- from a user point of view, I'll let the compiler
hackers comment on how/if this could be workable.
Introduce `cmia` files that bundles the `cmi`'s of a `cm[x]a` with the
corresponding file name. The name of the `cmia` file (and hence of the
`cm[x]a` file) defines your toplevel module name ? la `-pack`. Make it
even be backward compatible in the sense that if there's no `cmia` for a
`cm[x]a` the modules are accessed as before. Now you have a kind of
`-pack` that work with `cm[x]a`.
The problem with such a proposal is: what happens when someone tries to
apply a functor to this `cmia` module?

Either it represents a module, in which case it must be a large .cmo file,
or it is not a module and cannot be used as such.

Once you have top-level module names that are not actually modules, what
you have *is* a namespace.
Daniel Bünzli
2013-02-22 18:23:03 UTC
Permalink
Post by Leo White
The problem with such a proposal is: what happens when someone tries to
apply a functor to this `cmia` module?
Then you cannot do anything but treat it like a regular module. I don't see any problem in that, everything in the cmxa will be linked in in your functor application and that's all.

I'm sure the compiler should be able to treat that without any problem or overly complexity.

Daniel
Martin Jambon
2013-02-22 19:50:30 UTC
Permalink
Post by Daniel Bünzli
This may be a silly suggestion as I'm not sure I'm really convinced
by the absolute *need* for namespaces (I'd rather not have an
additional concept in a language that I already find sufficiently
complex to my taste).
Namespaces allow a single organization to provide a collection of
optional packages named consistently (e.g. String, Json, Http,
Elasticsearch, ... instead of having to invent silly names for each of
these because there are gazillions of implementations out there).

A namespace identifies a software vendor (an organized group of people),
who can manage the name of their packages and modules unilaterally, i.e.
without having to worry about the rest of the world (which includes
competitors and people who just won't talk to each other).

It is different from the concept of module, which cannot be split into
optional packages.
Post by Daniel Bünzli
However it strikes me that while it seems to be agreed upon that a
simple mechanism like `-pack` solves the problem albeit not in a
technically satisfying way, the worked out proposal seems to
skyrocket into complexity. The best way to avoid
bureaucracy/complexity is not to introduce the tools to be able to
manage it...
So if the problem is `-pack` is not good enough because it produces a
huge `cmo`, why not just try to find a corresponding concept workable
with `cm[x]a` ?
Here's a proposal --- from a user point of view, I'll let the
compiler hackers comment on how/if this could be workable.
Introduce `cmia` files that bundles the `cmi`'s of a `cm[x]a` with
the corresponding file name. The name of the `cmia` file (and hence
of the `cm[x]a` file) defines your toplevel module name ? la `-pack`.
Make it even be backward compatible in the sense that if there's no
`cmia` for a `cm[x]a` the modules are accessed as before. Now you
have a kind of `-pack` that work with `cm[x]a`.
Best,
Daniel
P.S. flat vs hierarchical, I'd also rather go flat.
_______________________________________________ Platform mailing
list Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Daniel Bünzli
2013-02-22 20:38:58 UTC
Permalink
Post by Xavier Clerc
I beg to differ. My understanding is that we
need to be able to gather several modules
under a name *without* crafting a new
module in the process.
Why exactly ? What's the problem if a new module is crafted in the process ? For me the problem seems only to be related to the way modules are linked in (the "pack problem" to give it a name). From a conceptual perspective I see absolutely no other, orthogonal, concept at play and hence see no reason to introduce a new one in the core language.
Post by Xavier Clerc
It is different from the concept of module, which cannot be split into
optional packages.
Ibid.

Maybe that's what should be challenged.

Daniel
Christophe TROESTLER
2013-02-24 19:39:34 UTC
Permalink
Post by Daniel Bünzli
So if the problem is `-pack` is not good enough because it produces
a huge `cmo`, why not just try to find a corresponding concept
workable with `cm[x]a` ?
As I understand it, people are not happy with -pack not for the large
cm[x]a files but because the whole library is included in every
executable even if only a small subset of the library is used. Would
the "problem" with -pack still exist if this was solved?
Yaron Minsky
2013-02-25 14:14:17 UTC
Permalink
On Sun, Feb 24, 2013 at 2:39 PM, Christophe TROESTLER
Post by Daniel Bünzli
So if the problem is `-pack` is not good enough because it produces a huge
`cmo`, why not just try to find a corresponding concept workable with
`cm[x]a` ?
As I understand it, people are not happy with -pack not for the large cm[x]a
files but because the whole library is included in every executable even if
only a small subset of the library is used. Would the "problem" with -pack
still exist if this was solved?
I think of there as being three key problems with -pack:

- The pack is a single unit that has to be loaded or not as a unit (as
per your point.
- The pack is a choke-point in the dependency graph. If you depend on
one thing in a pack, that you need to be recompiled if anything in
the pack changes.
- Opening a pack like Core.Std is brutally slow, and really affects
performance of the build.

I think all of these issues need solving.

y
Christophe TROESTLER
2013-02-25 20:28:14 UTC
Permalink
Post by Yaron Minsky
On Sun, Feb 24, 2013 at 2:39 PM, Christophe TROESTLER
Post by Christophe TROESTLER
Post by Daniel Bünzli
So if the problem is `-pack` is not good enough because it
produces a huge `cmo`, why not just try to find a corresponding
concept workable with `cm[x]a` ?
As I understand it, people are not happy with -pack not for the
large cm[x]a files but because the whole library is included in
every executable even if only a small subset of the library is
used. Would the "problem" with -pack still exist if this was
solved?
- The pack is a single unit that has to be loaded or not as a unit (as
per your point.
- The pack is a choke-point in the dependency graph. If you depend on
one thing in a pack, that you need to be recompiled if anything in
the pack changes.
- Opening a pack like Core.Std is brutally slow, and really affects
performance of the build.
I think all of these issues need solving.
I'd like these issues to be solved too. Is there any work at the Labs
on these?
Leo White
2013-02-22 16:29:24 UTC
Permalink
Post by Sylvain Le Gall
I tend to agree with Alain about the fact that hierarchical namespaces
is overkill. Flat namespace is easier to achieve and will solve most
problems.
I don't think there is any additional work to support hierarchical
namespaces.
Post by Sylvain Le Gall
Working with Java all days long and its namespace, I would say it is
totally useless to copy that! What the meaning of org.apache? Will I
have to name my libraries net.le-gall.sylvain.Foo or
org.ocamlcore.forge.ounit.OUnit?
There is a big difference between supporting hierarchical names and
mandating them. It seems fairly ridiculous to assume that people are going
to suddenly start using long unwieldy names with no obvious benefit.
Wojciech Meyer
2013-02-22 11:38:50 UTC
Permalink
Hi,

I want to say that (not finished yet) proposal looks good to me. I like
the idea of being able to structure the environments into tree.
The reason why I think we should use hierarchy is:
- ability to structure several libraries into same "shelves" - looking
what Haskell does [1]: Control.Arrow or Data.Array. It will help to
provide one and consistent library for the platform. It does not
necessarily require (but enables) to use "provenance" style of naming
(e.g. Java convention)
- it enables us to pack some of the platform specific modules into
single name under some hierarchy, consider:

Win32 # System # Unix
Unix # System # Unix

now we are able to conveniently merge appropriate one of these during
compile time.

I'd like to emphasise that we would need to encourage people in a social
way to use the hierarchies wisely, and have some fixed conventions for
the community projects. One way to do this is the Platform itself - when
it will come with pre-bundled hierarchy people will start putting their
libraries into right shelves.

I'm a big fan of the DSL approach, and the spec have clear semantics
regarding namespaces - namespace a name for an environment.

I don't think I have anything other to say, beside that so far I like
it. Looking forward having more input in the proposal.

--
Wojciech

[1] http://www.haskell.org/ghc/docs/latest/html/libraries/


On Thu, Feb 21, 2013 at 1:08 PM, Gabriel Scherer
Post by Gabriel Scherer
Hi,
Since about a year now, there has been a intermittent discussion ongoing on
the idea of introducing "namespaces" to the OCaml language. The basis for
discussion are some pain points of the current behavior of the current
implementation (all modules live in a flat space that is defined by the
search-path (-I command) and not very resilient to change), but there have
been fairly different ideas about how to best solve those problems, or even
what "namespace" means.
The current flat-module-space has mostly been felt by distributors of
largeish codebases designed not to be used in isolation, but required by
users code, in particular set of libraries for OCaml (in particular
JaneStreet Core or Batteries, and possibly components from the future OCaml
platform).
I have worked on these issues last year with Didier Remy, and also Fabrice
Le Fessant and Nicolas Pouillard, and made an presentation at (the informal
part of) the last meeting the Caml Consortium. Here are a few documents we
- a design document: http://gallium.inria.fr/~scherer/namespaces/spec.pdf
http://gallium.inria.fr/~scherer/namespaces/consortium-talk-2012.pdf
While I think the core problems with the current compilation unit lookup
system are rather consensual, there is little agreement on what a reasonable
extension for the language or implementation would be, or if it is even
needed at all. The documents above take an intentionally "rich" approach of
the question, presenting a formal framework and language designed to be
rather expressive. It would be desirable to isolate a simpler feature set
that would cover the practical needs, but this needs to a careful
examination of the use cases, etc. I think people working or interested in
the OCaml Platform may have interesting inputs on the problems and use cases
at hand.
(A good example of the design trade-off involved is: who should have the
responsibility of choosing the names by which the OCaml user refers to
modules / software components on her system, the user itself or the
component provider? Letting users name things adds flexibility but also
complexity. Having a global shared namespace (eg. opam) makes the overall
design simpler, but also may also forbid some potentially interesting use
cases, such as users keeping version-pinned or modified versions of their
dependencies locally, and being able to explicitly refer either to the
standard version or the local version of a package. Do we want to forbid
that? Allow to pick one choice or the other in the global build system of
the developer's project? Let the developer use and link both standard and
local versions at the same time in a program, to compare them and make
tests, or if one dependency uses the standard version, and the other the
local one? Besides design trade-offs, there are also underlying
implementation questions, as the current OCaml linker has a rather
inflexible semantics in this regard.)
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Daniel Bünzli
2013-02-22 12:42:21 UTC
Permalink
Post by Wojciech Meyer
- ability to structure several libraries into same "shelves" - looking
what Haskell does [1]: Control.Arrow or Data.Array. It will help to
provide one and consistent library for the platform. It does not
necessarily require (but enables) to use "provenance" style of naming
(e.g. Java convention)
Actually I find this way of structuring things not pertinent *at all*. It brings absolutely nothing.

First with hierarchies there's always the problem that at certain point you want two things to be in two different places. This is why "tags" are usually better than "folders" to organize things, see e.g. gmail.

Second if we take the link you provided there are not so much toplevel descriptors which means that it lengthen the names without bringing much benefit except noise. More precisely, for me there are two things that needs to be distinguished:

1) A user looking for a library to solve a problem.
2) A user writing code with the library he found or reading code already written.

In case 1) a tagging system is a better way of finding what he's looking for and this doesn't actually happen in the language anyway. More likely he'll browse opam's html representation (opam recently added tags I'm sure they will eventually implement tag browsing in their html representation)

In case 2) the reader gets absolutely no benefit of these intermediate descriptors/hierarchies (e.g. unsurpisingly Data.Time.Calendar isn't just "Data"). He just need a quick way to access the leaves of the hierarchy. So the hierarchy itself is here again useless in the language, it's only noise.

Keep it flat.

Daniel
Daniel Bünzli
2013-02-22 12:52:23 UTC
Permalink
Post by Daniel Bünzli
First with hierarchies there's always the problem that at certain point you want two things to be in two different places.
That should read "one thing in two different places" of course.

Daniel
Maxence Guesdon
2013-02-22 13:13:50 UTC
Permalink
On Fri, 22 Feb 2013 13:42:21 +0100
Post by Daniel Bünzli
Post by Wojciech Meyer
- ability to structure several libraries into same "shelves" - looking
what Haskell does [1]: Control.Arrow or Data.Array. It will help to
provide one and consistent library for the platform. It does not
necessarily require (but enables) to use "provenance" style of naming
(e.g. Java convention)
Actually I find this way of structuring things not pertinent *at all*. It brings absolutely nothing.
First with hierarchies there's always the problem that at certain point you want two things to be in two different places. This is why "tags" are usually better than "folders" to organize things, see e.g. gmail.
1) A user looking for a library to solve a problem.
2) A user writing code with the library he found or reading code already written.
In case 1) a tagging system is a better way of finding what he's looking for and this doesn't actually happen in the language anyway. More likely he'll browse opam's html representation (opam recently added tags I'm sure they will eventually implement tag browsing in their html representation)
In case 2) the reader gets absolutely no benefit of these intermediate descriptors/hierarchies (e.g. unsurpisingly Data.Time.Calendar isn't just "Data"). He just need a quick way to access the leaves of the hierarchy. So the hierarchy itself is here again useless in the language, it's only noise.
Keep it flat.
I fully agree.

All this reminds me of the hierarchy discussions on the batteries list:
https://lists.forge.ocamlcore.org/pipermail/batteries-devel/2008-November/thread.html#278
The hierarchy looked like this:
http://batteries.forge.ocamlcore.org/doc.preview:batteries-alpha2/batteries/html/api/index.html

"The flatter, the better".

Cheers,

Maxence,
Leo White
2013-02-22 16:40:25 UTC
Permalink
Post by Daniel Bünzli
Actually I find this way of structuring things not pertinent *at all*. It
brings absolutely nothing.
First with hierarchies there's always the problem that at certain point
you want two things to be in two different places. This is why "tags" are
usually better than "folders" to organize things, see e.g. gmail.
There is no real reason that something cannot be placed in two locations
within the hierarchy.
Post by Daniel Bünzli
Second if we take the link you provided there are not so much toplevel
descriptors which means that it lengthen the names without bringing much
benefit except noise.
I agree with this in general. However, there are some specific cases where
a hierarchical namespace is very useful. For example:

1. In some very large libraries, it is useful to be able to divide the
modules into sub-components that can opened individually.

2. In some libraries it is useful to provide multiple versions of some
components (e.g. Foo.Async.Io and Foo.Lwt.Io).

Since it doesn't cost anything to support hierarchical namespaces, I see no
strong reason to not do so.
Daniel Bünzli
2013-02-22 18:40:08 UTC
Permalink
Post by Leo White
There is no real reason that something cannot be placed in two locations
within the hierarchy.
But as I suggested from the perspective of programming *in the language* there no point of having it at two different location. Even from the perspective of reading code it's confusing you have to learn all the places where a module may be and understand that all these modules are in fact the same.
Post by Leo White
I agree with this in general. However, there are some specific cases where
1. In some very large libraries, it is useful to be able to divide the
modules into sub-components that can opened individually.
2. In some libraries it is useful to provide multiple versions of some
components (e.g. Foo.Async.Io and Foo.Lwt.Io).
For me all this should be solved within the realm of the module system. Maybe we need to fix the way we link things in (since that seems to be the only reason why -pack is not deemed reasonable). But I really see no point in introducing new namings mechanisms into the core language.

Daniel
Wojciech Meyer
2013-02-25 23:50:13 UTC
Permalink
Hi Daniel,

I appreciate your strong stand that we should not introduce redundant
concepts to the language. I can understand that concept of module can
give the impression of managing compilation units. However I think this
is where the similarities end. Given the semantics of OCaml module
system, that each module is separate compilation unit - it's hard to say
it's flexible enough to cover functionality of namespaces which scale up
across the boundaries of compilation units. All the technical problem of
-pack are because modules are not namespaces. So why not to fix it
properly?

In ML world the smallest abstraction available is function, that scale
up to the level of modules, then we have modules that scale up to the
library level, and then we have an empty place that we try to structure
using modules. It would be natural to think that at the library level,
we should not use modules for abstracting and structuring things.

If you are still not sure, please see what SML community have done many
years ago with their compilation manager:

http://www.smlnj.org/doc/CM/index.html

This is an interesting paper. I don't literally repeat what is inside -
as I read this quite long time ago, but it covers a lot more than just
namespace management.

Hope this will make your more confident.
Post by Daniel Bünzli
Post by Leo White
There is no real reason that something cannot be placed in two locations
within the hierarchy.
But as I suggested from the perspective of programming *in the
language* there no point of having it at two different location. Even
from the perspective of reading code it's confusing you have to learn
all the places where a module may be and understand that all these
modules are in fact the same.
Post by Leo White
I agree with this in general. However, there are some specific cases where
1. In some very large libraries, it is useful to be able to divide the
modules into sub-components that can opened individually.
2. In some libraries it is useful to provide multiple versions of some
components (e.g. Foo.Async.Io and Foo.Lwt.Io).
For me all this should be solved within the realm of the module
system. Maybe we need to fix the way we link things in (since that
seems to be the only reason why -pack is not deemed reasonable). But I
really see no point in introducing new namings mechanisms into the
core language.
Daniel
--
Wojciech Meyer
http://danmey.org
Daniel Bünzli
2013-02-26 11:44:04 UTC
Permalink
Given the semantics of OCaml module system, that each module is separate compilation unit
That's not exact. In OCaml each compilation unit is a module (derived from an atomic file) but each module is not a compilation unit.
All the technical problem of -pack are because modules are not namespaces.
I don't understand this. Modules are namespaces ! They are containers for names and two identical names in two different modules are not deemed equal, that defines a namespace for me. A good deal of the module system is about managing names and their dissemination in the program. Modules are, however, as Yaron points out, much more than that.
In ML world the smallest abstraction available is function, that scale
up to the level of modules, then we have modules that scale up to the
library level, and then we have an empty place that we try to structure
using modules. It would be natural to think that at the library level,
we should not use modules for abstracting and structuring things.
Well my own *naturalism* leads me to believe that since a good part of modules is about managing names and given their recursive definition (modules can contain modules) I find it natural to try to use them at the library level.

Daniel
Didier Remy
2013-02-26 14:10:58 UTC
Permalink
Post by Daniel Bünzli
Modules are namespaces !
I think that this assertion is misleading (and wrong), as it confuses the
word "namespaces" that refers to a specific proposal that could be called
"foo" and the intuitive meaning of namespaces that refers to expression
"name management". In the later sense, yes modules are doing some form of name
management. But in the former sense, no, modules are not doing the same kind
of name management as foo (namespaces).
Post by Daniel Bünzli
They are containers for names and two identical names in two different
modules are not deemed equal, that defines a namespace for me. A good
deal of the module system is about managing names and their
dissemination in the program. Modules are, however, as Yaron points
out, much more than that.
Yes, there are similarities:

- modules manage _internal_ names: names of values, names of types, and
names of submodules. They manage names of objects visible in the OCaml
world. They cannot speak about the outer world and of course not manage
it.

- namespaces manage names of toplevel modules, of collections of modules.
They connect the outer world (files in the environment) with the inner
world of OCaml, but cannot see the the world of OCaml. Modules are
atomic/opaque objects for namespaces.

So they are similarities, but already some significant differences when just
comparing them in terms of name manipulation.

Moreover, as you noticed, modules do a lot more, and do so simultaneously:
they manage types, generativity, consistency of imports, keep the objects
closed, etc. Because of this additional invariants in module objects, there
are operations that modules cannot do with names that namespaces can do.
For instance, modules cannot insert a new submodule inside an existing
module, or pick a module from a submodule to and rebuild a new module from
the component outside of its original context/closure: this could break
types, abstraction, (and bindings!)---unless you change the way modules are
typechecked, e.g. using mixin modules to keep track of some internal
dependencies make a difference between before/after linking status.

Namespaces can do almost arbitrary manipulations of toplevel modules and
collection of modules, because they see modules as atomic/opaque objects and
cannot change them. If namespaces mess up with module names the compiler
and linker may pick the wrong objects but it will catch up those mistakes
because module objects are checked against module objects independently of
the names that were used to pick them.

So the rigid name manipulation of modules with strong invariants combined
with the arbirtrary name manipulation of namespaces remains sound---and with
a clear semantics.

Modules are a very sophisticated concept (think of their meta-theory).
Namespaces are a very trivial concept (trivial tree manipulation).

The complexity of Modules + Namespaces is just the max of the two, that is
the complexity of Modules. Namespaces do not add complexity.

So, I think that the separation of concerns is quite healthly here. Each
concept can do powerful but different sorts of things in its own world
and the combination remains simple, because they is a clear separation.
Post by Daniel Bünzli
Well my own *naturalism* leads me to believe that since a good part of
modules is about managing names and given their recursive definition
(modules can contain modules) I find it natural to try to use them at the
library level.
I am not sure what you mean by *naturalism*. Keeping a minimal set of
orthogonal features is probably something to seek for in the design of a
programming language.

So if namespaces were modules, they should not be added, but they are
an orthogonal concept.

If you wished to mix the power of modules and namespaces together as a
single concept, you probably need something like mixin modules (that can
reason about open modules and linking) whose metatheory is even harder than
that of modules plus an additional mechanism to be able to keep track in
mixins of which components can be stripped off at link time.

So *minimalism* in this situation is probably to keep two separate concepts.
At least, this is feasible in the short term. The mixin approach may
perhaps be superior in the long term, but it cannot be a short term goal.

Besides, namespaces is not a new concept. They already exists in OCaml in a
trivial way, via the canonical mapping of the external file name in which a
module and its interface are implemented and the internal OCaml name of the
module object, plus the search path in which OCaml looks for module objects
to build its initial namespace.

This design choice was an excellent choice in the early days of OCaml when
applications were small and the focus was more on the power of the module
language than on dealing with a large industrial community of users.

Today, this choice shows its limitations.

The proposal is just to relax this rigid rule for building the initial
namespace (i.e. mapping from external file names to internal module
objects). At the same time, using a tree-like structure rather than a flat
structure would allow for even more flexibility, but conceptually, it is not
significantly different from the flat map.

Didier
Xavier Clerc
2013-02-22 19:04:00 UTC
Permalink
Post by Daniel Bünzli
For me all this should be solved within the realm of the module system. Maybe we need to fix the way we link things in (since that seems to be the only reason why -pack is not deemed reasonable). But I really see no point in introducing new namings mechanisms into the core language.
I beg to differ. My understanding is that we
need to be able to gather several modules
under a name *without* crafting a new
module in the process.


Regards,

Xavier
Xavier Clerc
2013-02-22 21:51:25 UTC
Permalink
----- Mail original -----
Post by Daniel Bünzli
Post by Xavier Clerc
I beg to differ. My understanding is that we
need to be able to gather several modules
under a name *without* crafting a new
module in the process.
Why exactly ? What's the problem if a new module is crafted in the
process ? For me the problem seems only to be related to the way
modules are linked in (the "pack problem" to give it a name). From a
conceptual perspective I see absolutely no other, orthogonal,
concept at play and hence see no reason to introduce a new one in
the core language.
Well, as you point out, once you have built a full-fledged module,
you cannot break it into pieces even if you are only interested in
some parts of it.

On the other hand, if namespaces are added to the language, "pack"
would still be useful, as you may sometimes actually need to build
a module from pieces -- e. g. in order to pass it to a functor.

So, as of today, we have :
- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.


Xavier
Daniel Bünzli
2013-02-25 18:04:49 UTC
Permalink
So, as of today, we have :- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not technically satisfying. That's not the way I would like the language I program in to be designed. I'd rather see the problems pack has fixed which I'm sure could be done by allowing archives to be named at the language level as a module.

Daniel
Yaron Minsky
2013-02-25 19:16:03 UTC
Permalink
On Mon, Feb 25, 2013 at 1:04 PM, Daniel B?nzli
Post by Daniel Bünzli
So, as of today, we have :- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying. That's not the way I would like the language
I program in to be designed. I'd rather see the problems pack has
fixed which I'm sure could be done by allowing archives to be named
at the language level as a module.
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly. The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management. Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...

y
Daniel Bünzli
2013-02-25 20:37:03 UTC
Permalink
Post by Yaron Minsky
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly.
I don't dismiss it. I would prefer it to be solved with already existing concepts.
Post by Yaron Minsky
The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management.
In which sense ? Why is too powerful ? What does powerful mean in that particular case ? That is I think what prevents me from understanding the need for namespaces.
Post by Yaron Minsky
Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...
That's not the way I reason, the principle I have here is to try, *if pratical*, to use existing concepts rather than pile new ones in the language. Now I would really like to be convinced that we need namespaces it's just that so far the arguments presented all seem to revolve around "module aliasing"/"pack" do the job but it doesn't because of implementation issues.

Daniel
Yaron Minsky
2013-02-25 21:50:33 UTC
Permalink
On Mon, Feb 25, 2013 at 3:43 PM, Christophe TROESTLER
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 1:04 PM, Daniel B?nzli
Post by Daniel Bünzli
Post by Xavier Clerc
- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying. That's not the way I would like the language
I program in to be designed. I'd rather see the problems pack has
fixed which I'm sure could be done by allowing archives to be named
at the language level as a module.
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly. The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management. Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...
That's an interesting take on this. Would you care to elaborate on
why a module approach may not be sane? Is it from a semantic or an
implementation point of view?
To be clear: I'm not an expert on the internals of the compiler, and
am mostly repeating claims made by others who are.

But my understanding is roughly this: we want namespaces to behave
differently than modules currently do: in particular, we need to be
able to depend on only a subset of a namespace, and to track
dependencies within the different components of a namespace.

One could imagine building these features into modules directly, but
this is hampered by the fact there is a rich set of operators on
modules, for example, you can apply a functor to a module.

It's of course possible that either (a) one could naturally add these
features to modules directly and thus neatly avoid the need for
another language feature; or (b) that one could have two classes of
modules whose implementations differ under the skin but that present
themselves almost identically to users.

But I am unaware of anyone who understand the compiler internals who
believes either (a) or (b) is reasonably easy to do.

y
Yaron Minsky
2013-02-25 23:04:56 UTC
Permalink
On Mon, Feb 25, 2013 at 5:37 PM, Christophe TROESTLER
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 3:43 PM, Christophe TROESTLER
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 1:04 PM, Daniel B?nzli
Post by Daniel Bünzli
Post by Xavier Clerc
- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying. That's not the way I would like the language
I program in to be designed. I'd rather see the problems pack has
fixed which I'm sure could be done by allowing archives to be named
at the language level as a module.
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly. The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management. Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...
That's an interesting take on this. Would you care to elaborate on
why a module approach may not be sane? Is it from a semantic or an
implementation point of view?
To be clear: I'm not an expert on the internals of the compiler, and
am mostly repeating claims made by others who are.
But my understanding is roughly this: we want namespaces to behave
differently than modules currently do: in particular, we need to be
able to depend on only a subset of a namespace, and to track
dependencies within the different components of a namespace.
One could imagine building these features into modules directly, but
this is hampered by the fact there is a rich set of operators on
modules, for example, you can apply a functor to a module.
It's of course possible that either (a) one could naturally add these
features to modules directly and thus neatly avoid the need for
another language feature; or (b) that one could have two classes of
modules whose implementations differ under the skin but that present
themselves almost identically to users.
But I am unaware of anyone who understand the compiler internals who
believes either (a) or (b) is reasonably easy to do.
With no doubt, I understand even less about the compiler internals
than you do. Nonetheless, shouldn't these questions receive a
definitive answer before we speak about namespaces? I have heard
neither that these things are hard to do.
It's hard to track the thread; there's a lot going on. But earlier in
this thread, Leo White mentioned it, raising the functor issue in
particular.
And, if they are, it would be interesting to understand what
features of modules hamper their efficiency as simple containers.
That way, it seems to me, either the technical problems will get
solved or what kind of "stripped modules" namespaces must be will
emerge naturally.
Indeed. Maybe someone can explain that to those of us who are less
schooled in compiler internals...

y
Christophe TROESTLER
2013-02-25 22:37:26 UTC
Permalink
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 3:43 PM, Christophe TROESTLER
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 1:04 PM, Daniel B?nzli
Post by Daniel Bünzli
Post by Xavier Clerc
- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying. That's not the way I would like the language
I program in to be designed. I'd rather see the problems pack has
fixed which I'm sure could be done by allowing archives to be named
at the language level as a module.
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly. The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management. Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...
That's an interesting take on this. Would you care to elaborate on
why a module approach may not be sane? Is it from a semantic or an
implementation point of view?
To be clear: I'm not an expert on the internals of the compiler, and
am mostly repeating claims made by others who are.
But my understanding is roughly this: we want namespaces to behave
differently than modules currently do: in particular, we need to be
able to depend on only a subset of a namespace, and to track
dependencies within the different components of a namespace.
One could imagine building these features into modules directly, but
this is hampered by the fact there is a rich set of operators on
modules, for example, you can apply a functor to a module.
It's of course possible that either (a) one could naturally add these
features to modules directly and thus neatly avoid the need for
another language feature; or (b) that one could have two classes of
modules whose implementations differ under the skin but that present
themselves almost identically to users.
But I am unaware of anyone who understand the compiler internals who
believes either (a) or (b) is reasonably easy to do.
With no doubt, I understand even less about the compiler internals
than you do. Nonetheless, shouldn't these questions receive a
definitive answer before we speak about namespaces? I have heard
neither that these things are hard to do. And, if they are, it would
be interesting to understand what features of modules hamper their
efficiency as simple containers. That way, it seems to me, either the
technical problems will get solved or what kind of "stripped modules"
namespaces must be will emerge naturally.

Best,
C.
Gabriel Scherer
2013-02-26 10:12:04 UTC
Permalink
Post by Christophe TROESTLER
It seems to me that the openness of namespaces is the only feature I
have seen mentioned that modules do not have. But is the openness of
namespaces something considered useful? What problem does this solves?
If you have a haskell-ish view of module hierarchies as functional
classification rather than provenance, eg.
Data.List
Data.Array
Data.Array.Mutable
Foreign.ForeignPtr
Control.Concurrent.IO
...

then having a "merge {Data.{List,Array.Mutable}} with
{Data.{Array,String}}" is important. This is something open structures
naturally have, and that is not a good fit for closed structures. The
discussion for namespaces (before it landed on this list) insisted on
openness as a distinctive aspect for a while, but then we realized
that, at the moment of compilation of a single module, all the
information about the compilation environment in known, so you can
have a closed view of the world -- even if the world may change
between compilations. This idea that "once everything is decided you
are in a closed world again" allows to present (open) namespaces as
(closed) modules at the source code level if deemed desirable -- see
http://gallium.inria.fr/~scherer/namespaces/pack_et_functor_pack.html

Summary: "open" is not essential, but "open merge" is an useful
primitive to have (even in a closed world). You can always locally
assume that the world is closed, and (locally) close structures are
simpler to deal with.
Post by Christophe TROESTLER
With no doubt, I understand even less about the compiler internals
than you do. Nonetheless, shouldn't these questions receive a
definitive answer before we speak about namespaces? I have heard
neither that these things are hard to do. And, if they are, it would
be interesting to understand what features of modules hamper their
efficiency as simple containers. That way, it seems to me, either the
technical problems will get solved or what kind of "stripped modules"
namespaces must be will emerge naturally.
The reason why this isn't done more is that the people that know well
about the compiler internals don't want to be bothered with the
namespace discussion (except Alain, that knows very well about these
issues and is active, if maybe a bit conservatively so, in the
discussions), and these questions are full of tedious implementation
details that even the implementors don't keep all in mind at the same
time. Some partial answers:

- The reason why packing everything in a single .cmo (compilation
unit) bloats linking results is that dependencies are handled at a
granularity of the compilation unit. Relying (dynamically) on the
module Foo in your source unit will result in a mention of Foo's
internal name in your compilation unit, and Foo will need to be linked
as a whole. To reduce the granularity in a principled way, one could
decide to track dependencies at the definition/structure-item level
rather than the whole-unit level, but that would be a fairly invasive
implementation change that has so far been resisted, with unclear (and
potentially scary) implications of compilation time changes for
example.

- One should keep in mind that while bytecode compiled files (.cmo and
.cmi) are easy to deal with, native objects (.cmx and friends) are a
pain because of portability issues. The reason why we have this
two-step "-for-pack" then "-pack" process for packing is that
implementors found that (at the time) MacOS systems couldn't be relied
on to manipulate and change .cmx files. That's the reason why you need
to prepare for packing in advance (-for-pack) instead of simply
packing regularly compiled module. These are the kind of limitation
you have to work with in linker-related settings.

On Mon, Feb 25, 2013 at 11:37 PM, Christophe TROESTLER
Post by Christophe TROESTLER
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 3:43 PM, Christophe TROESTLER
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 1:04 PM, Daniel B?nzli
Post by Daniel Bünzli
Post by Xavier Clerc
- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying. That's not the way I would like the language
I program in to be designed. I'd rather see the problems pack has
fixed which I'm sure could be done by allowing archives to be named
at the language level as a module.
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly. The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management. Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...
That's an interesting take on this. Would you care to elaborate on
why a module approach may not be sane? Is it from a semantic or an
implementation point of view?
To be clear: I'm not an expert on the internals of the compiler, and
am mostly repeating claims made by others who are.
But my understanding is roughly this: we want namespaces to behave
differently than modules currently do: in particular, we need to be
able to depend on only a subset of a namespace, and to track
dependencies within the different components of a namespace.
One could imagine building these features into modules directly, but
this is hampered by the fact there is a rich set of operators on
modules, for example, you can apply a functor to a module.
It's of course possible that either (a) one could naturally add these
features to modules directly and thus neatly avoid the need for
another language feature; or (b) that one could have two classes of
modules whose implementations differ under the skin but that present
themselves almost identically to users.
But I am unaware of anyone who understand the compiler internals who
believes either (a) or (b) is reasonably easy to do.
With no doubt, I understand even less about the compiler internals
than you do. Nonetheless, shouldn't these questions receive a
definitive answer before we speak about namespaces? I have heard
neither that these things are hard to do. And, if they are, it would
be interesting to understand what features of modules hamper their
efficiency as simple containers. That way, it seems to me, either the
technical problems will get solved or what kind of "stripped modules"
namespaces must be will emerge naturally.
Best,
C.
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Christophe TROESTLER
2013-02-25 20:43:40 UTC
Permalink
Post by Yaron Minsky
On Mon, Feb 25, 2013 at 1:04 PM, Daniel B?nzli
Post by Daniel Bünzli
Post by Xavier Clerc
- "archives" (cma / cmxa) allowing to gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying. That's not the way I would like the language
I program in to be designed. I'd rather see the problems pack has
fixed which I'm sure could be done by allowing archives to be named
at the language level as a module.
You might be right, but I think there's a deep issue here that
shouldn't be dismissed so lightly. The argument is that modules are
simply too powerful to be used as the complete solution to namespace
management. Deciding that the only principled approach is to always
pick the most powerful, most general purpose primitive is attractive,
but not always sane...
That's an interesting take on this. Would you care to elaborate on
why a module approach may not be sane? Is it from a semantic or an
implementation point of view?
Leo White
2013-02-26 12:30:15 UTC
Permalink
That's an interesting take on this. Would you care to elaborate on
why a module approach may not be sane? Is it from a semantic or an
implementation point of view?
Ignoring the implementation issues for now, consider the run-time semantics
of the module system.

At run-time a module is a record. Initialising a module involves
initialising every component of the module and placing them in this record.
Initialising these components can involve executing arbitrary code; in fact
the execution of an OCaml program is simply the initialisation of all its
modules.

The problems with pack are related to these dynamic semantics. In order to
be a module the "pack" must create a record to represent this module. This
means that it must initialise all of its components. It is this (rather
than any detail of pack's implementation) that causes the problems
identified by Yaron and others.

Now, access to the components of a top-level module could proceed without
the existence of this record. However, the record is required in order to
"alias" the module, use the module as a first-class value or use it as the
argument to a functor.

Any attempt to overcome the problems with pack, whilst still maintaining
the illusion that the "pack" is a normal module, would result (at the very
least) in one of the following unhealthy situations:

- The module type of the "pack" module would depend on which of its
components were accessed by the program.

- Any use of the "pack" module other than as a simple container
(e.g. "module CS = Core.Std") could have a dramatic effect on what was
linked into the program and potentially on the semantics of the program.

Namespaces are basically modules that can only be used as a simple
container. This means that they do not need a corresponding record at
run-time (or any other run-time representation). This avoids the problems
with pack as well as enabling other useful features (e.g. open
definitions).
Daniel Bünzli
2013-02-26 13:00:40 UTC
Permalink
Thanks Leo. For me this is the first clear and argumented explanation of why packs won't work anyways, I'm convinced now. Effectively I don't think we can avoid the unhealthy situations you mention.

Regarding namespaces, I assume then they don't have any runtime semantics, so you can just see them as a preprocessing step that lengthens the name of a module ? Or do these have other properties ?

Daniel
Leo White
2013-02-26 13:53:20 UTC
Permalink
Post by Daniel Bünzli
Regarding namespaces, I assume then they don't have any runtime
semantics, so you can just see them as a preprocessing step that
lengthens the name of a module ?
Yes, they are a purely compile-time construct.
Xavier Clerc
2013-02-25 19:13:56 UTC
Permalink
----- Mail original -----
Post by Daniel Bünzli
So, as of today, we have :- "archives" (cma / cmxa) allowing to
gather modules but without
naming (at the language level) the gathering ;
- "packs" allowing to gather modules into a module.
I regard namespaces are gathering modules into a named entity but
without creating a module. Hence, it is a new beast, different from
archives and packs.
So basically a new concept is introduced because "pack" is not
technically satisfying.
This is no what I mean.
Packs and namespaces serve two different purposes.
It just happens that the lack of namespace forced
people (including me) to use packs as namespaces.


Xavier
Alain Frisch
2013-02-25 21:24:09 UTC
Permalink
Post by Xavier Clerc
This is no what I mean.
Packs and namespaces serve two different purposes.
It just happens that the lack of namespace forced
people (including me) to use packs as namespaces.
Why forced? I haven't seen a lot of libraries relying on -pack instead
of using unique enough module names (but it's true that I don't use a
lot of third-party libraries). Using the library name as a common
prefix for all its modules (and maybe having a module whose name is the
library name itself in case of libraries with a clear notion of "main
public module") seems a quite good solution to me and a better one than
-pack. Maybe this solution is not so good for libraries whose goal is
to act as a "standard library" (such as Core), because the intention is
to create the impression that the library is actually part of the
language (I don't have the impression to use a library when I write
String.length or List.map, contrary to when I write Xmlm.make_input); so
I understand why Jane Street is reluctant to have Core_list.map
everywhere in their code. But would it really be a problem to have the
users write "open Kaputt_abbreviations" instead of "open
Kaputt.Abbreviations", or Bolt_logger.log instead of Bolt.Logger.log?


Alain
Yaron Minsky
2013-02-25 21:53:31 UTC
Permalink
I understand your point Alain, but while what you're saying is
technically reasonable, I think it doesn't hold together. When
programming in the large, it is useful to be able to manipulate the
namespace and group parts of the world together. Many libraries, not
just Core or Async, want to be able to remap the world by adding a
collection of related names to the namespace. The ability to do the
moral equivalent of:

open Core.Std

is powerful and important. Your proposal of having people add
prefixes to module names does not fit the bill, and the resulting
system does not, in my opinion, scale.

y
Post by Xavier Clerc
This is no what I mean.
Packs and namespaces serve two different purposes.
It just happens that the lack of namespace forced
people (including me) to use packs as namespaces.
Why forced? I haven't seen a lot of libraries relying on -pack instead of
using unique enough module names (but it's true that I don't use a lot of
third-party libraries). Using the library name as a common prefix for all
its modules (and maybe having a module whose name is the library name itself
in case of libraries with a clear notion of "main public module") seems a
quite good solution to me and a better one than -pack. Maybe this solution
is not so good for libraries whose goal is to act as a "standard library"
(such as Core), because the intention is to create the impression that the
library is actually part of the language (I don't have the impression to use
a library when I write String.length or List.map, contrary to when I write
Xmlm.make_input); so I understand why Jane Street is reluctant to have
Core_list.map everywhere in their code. But would it really be a problem to
have the users write "open Kaputt_abbreviations" instead of "open
Kaputt.Abbreviations", or Bolt_logger.log instead of Bolt.Logger.log?
Alain
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Alain Frisch
2013-02-26 13:03:02 UTC
Permalink
Post by Yaron Minsky
I understand your point Alain, but while what you're saying is
technically reasonable, I think it doesn't hold together. When
programming in the large, it is useful to be able to manipulate the
namespace and group parts of the world together.
Can you give concrete examples of which manipulations are desired (and why)?
Post by Yaron Minsky
Many libraries, not
just Core or Async, want to be able to remap the world by adding a
collection of related names to the namespace. The ability to do the
open Core.Std
is powerful and important. Your proposal of having people add
prefixes to module names does not fit the bill, and the resulting
system does not, in my opinion, scale.
I believe my proposal covers this use case. If you depend a lot on a
specify library which exports many modules, you might indeed want to
avoid using long names everywhere to access these modules. That's why I
propose to give simple ways to alias long modules names to short ones,
in a way which can be factorized (with external mapping files).

In my proposal, the equivalent of "open Core.Std" would simply be to
tell the compiler (through a command-line option or with a directive in
the code) to use a mapping file. The same can be done within the
library itself.

Here is a minimalistic version of my proposal, restricted to specifying
those mapping files on the compiler and tools command-lines. To make it
clear that this is only about mapping module references to compiled
units, let's piggy-back the -I option. If its argument is a file with a
.ns suffix and not a directory, the compiler interprets it as a mapping
file (a sequence of lines of the form "Module_name =
relative_path_to_compiled_unit", e.g. "List = core_list"), and when this
-I option is considered during the resolution of a module reference
(such as "List") and the module is defined in the file, the compiler
simply resolves the module to the corresponding unit.

This approach could also be used to restrict which units are visible by
the compiler (and avoid repeated lookup on the file system) without
moving files around on the file system.

ocamldep (without -modules) would apply the same logic; ocamldep
-modules could either implement the same logic, or leave it to the build
system interpreting its output (allowing more dynamic scenarios where
the mapping file themselves are generated). I expect ocamldoc to work
mostly out-of-the-box, even though one could think about using the
mapping files (in reverse direction) to provide shorter names in the
generated documentation (or not).

Users of a library are never forced to use the mapping files, and they
can always refer to a module with its full name (provided the
corresponding -I <path> is used).

The OCaml stdlib would be adapted to use longer names (stdlib_list,
stdlib_array, etc) and shipped with a stdlib.ns file opened by default
(unless -nostdlib is used). To be clear: this mapping file will be used
by the stdlib itself, so references within itself don't need to use long
names.

I'm interested to see concrete examples of manipulation or scenarios
not be covered by this proposal.


Alain
Yaron Minsky
2013-02-26 14:30:50 UTC
Permalink
Post by Alain Frisch
Post by Yaron Minsky
I understand your point Alain, but while what you're saying is
technically reasonable, I think it doesn't hold together. When
programming in the large, it is useful to be able to manipulate the
namespace and group parts of the world together.
Can you give concrete examples of which manipulations are desired (and why)?
I think the conversation has gotten confused. You said "who needs
pack? Just use hierarchically named modules". I'm saying: that's
crazy, you need to be able to do simple manipulations of namespaces
(like the ones implied by "open Core.Std" or "open Core.Stable".)

Now you're pointing out that your namespace proposal covers the
manipulations I'm describing. That may well be right, and I wasn't
contesting that point.

All I'm saying is that simply relying on long module names without any
kind of explicit namespace control does not scale. I stand by that,
without necessarily objecting to your namespace proposal.
Post by Alain Frisch
Post by Yaron Minsky
Many libraries, not
just Core or Async, want to be able to remap the world by adding a
collection of related names to the namespace. The ability to do the
open Core.Std
is powerful and important. Your proposal of having people add
prefixes to module names does not fit the bill, and the resulting
system does not, in my opinion, scale.
I believe my proposal covers this use case. If you depend a lot on a
specify library which exports many modules, you might indeed want to avoid
using long names everywhere to access these modules. That's why I propose
to give simple ways to alias long modules names to short ones, in a way
which can be factorized (with external mapping files).
In my proposal, the equivalent of "open Core.Std" would simply be to tell
the compiler (through a command-line option or with a directive in the code)
to use a mapping file. The same can be done within the library itself.
Here is a minimalistic version of my proposal, restricted to specifying
those mapping files on the compiler and tools command-lines. To make it
clear that this is only about mapping module references to compiled units,
let's piggy-back the -I option. If its argument is a file with a .ns suffix
and not a directory, the compiler interprets it as a mapping file (a
sequence of lines of the form "Module_name =
relative_path_to_compiled_unit", e.g. "List = core_list"), and when this -I
option is considered during the resolution of a module reference (such as
"List") and the module is defined in the file, the compiler simply resolves
the module to the corresponding unit.
This approach could also be used to restrict which units are visible by the
compiler (and avoid repeated lookup on the file system) without moving files
around on the file system.
ocamldep (without -modules) would apply the same logic; ocamldep -modules
could either implement the same logic, or leave it to the build system
interpreting its output (allowing more dynamic scenarios where the mapping
file themselves are generated). I expect ocamldoc to work mostly
out-of-the-box, even though one could think about using the mapping files
(in reverse direction) to provide shorter names in the generated
documentation (or not).
Users of a library are never forced to use the mapping files, and they can
always refer to a module with its full name (provided the corresponding -I
<path> is used).
The OCaml stdlib would be adapted to use longer names (stdlib_list,
stdlib_array, etc) and shipped with a stdlib.ns file opened by default
(unless -nostdlib is used). To be clear: this mapping file will be used by
the stdlib itself, so references within itself don't need to use long names.
I'm interested to see concrete examples of manipulation or scenarios not be
covered by this proposal.
I am unaware of any. I think this does mostly support the
Core.Std/Core.Stable tricks we use.

One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.

y
Yaron Minsky
2013-02-26 14:38:23 UTC
Permalink
Leo, do you have a summary of what you don't like about Alain's
namespace proposal? I'm not presently able to identify any obvious
weaknesses in it. The downsides I see are:

- UNSANITARY: Having both "open namespace Core.Std" and Core_List as
names seems a little unsanity. Indeed, to provide a decent user
experience, you probably want to hide the Core_List name almost
everywhere. You don't want it showing up in error messages,
documentation, source files, etc. When you need to do a bunch of
work to hide something, maybe it's better not to include it at all.

- NO HIDING: I'm not sure that the other namespace proposals do
support this, but I'd like to be able to hide some modules so that
they are not reachable outside of the namespace. We can do this
with the current Core.Std, but I don't see how to do it in Alain's
proposal.

Are there other issues I'm missing?

y
Post by Yaron Minsky
Post by Alain Frisch
Post by Yaron Minsky
I understand your point Alain, but while what you're saying is
technically reasonable, I think it doesn't hold together. When
programming in the large, it is useful to be able to manipulate the
namespace and group parts of the world together.
Can you give concrete examples of which manipulations are desired (and why)?
I think the conversation has gotten confused. You said "who needs
pack? Just use hierarchically named modules". I'm saying: that's
crazy, you need to be able to do simple manipulations of namespaces
(like the ones implied by "open Core.Std" or "open Core.Stable".)
Now you're pointing out that your namespace proposal covers the
manipulations I'm describing. That may well be right, and I wasn't
contesting that point.
All I'm saying is that simply relying on long module names without any
kind of explicit namespace control does not scale. I stand by that,
without necessarily objecting to your namespace proposal.
Post by Alain Frisch
Post by Yaron Minsky
Many libraries, not
just Core or Async, want to be able to remap the world by adding a
collection of related names to the namespace. The ability to do the
open Core.Std
is powerful and important. Your proposal of having people add
prefixes to module names does not fit the bill, and the resulting
system does not, in my opinion, scale.
I believe my proposal covers this use case. If you depend a lot on a
specify library which exports many modules, you might indeed want to avoid
using long names everywhere to access these modules. That's why I propose
to give simple ways to alias long modules names to short ones, in a way
which can be factorized (with external mapping files).
In my proposal, the equivalent of "open Core.Std" would simply be to tell
the compiler (through a command-line option or with a directive in the code)
to use a mapping file. The same can be done within the library itself.
Here is a minimalistic version of my proposal, restricted to specifying
those mapping files on the compiler and tools command-lines. To make it
clear that this is only about mapping module references to compiled units,
let's piggy-back the -I option. If its argument is a file with a .ns suffix
and not a directory, the compiler interprets it as a mapping file (a
sequence of lines of the form "Module_name =
relative_path_to_compiled_unit", e.g. "List = core_list"), and when this -I
option is considered during the resolution of a module reference (such as
"List") and the module is defined in the file, the compiler simply resolves
the module to the corresponding unit.
This approach could also be used to restrict which units are visible by the
compiler (and avoid repeated lookup on the file system) without moving files
around on the file system.
ocamldep (without -modules) would apply the same logic; ocamldep -modules
could either implement the same logic, or leave it to the build system
interpreting its output (allowing more dynamic scenarios where the mapping
file themselves are generated). I expect ocamldoc to work mostly
out-of-the-box, even though one could think about using the mapping files
(in reverse direction) to provide shorter names in the generated
documentation (or not).
Users of a library are never forced to use the mapping files, and they can
always refer to a module with its full name (provided the corresponding -I
<path> is used).
The OCaml stdlib would be adapted to use longer names (stdlib_list,
stdlib_array, etc) and shipped with a stdlib.ns file opened by default
(unless -nostdlib is used). To be clear: this mapping file will be used by
the stdlib itself, so references within itself don't need to use long names.
I'm interested to see concrete examples of manipulation or scenarios not be
covered by this proposal.
I am unaware of any. I think this does mostly support the
Core.Std/Core.Stable tricks we use.
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
y
Gabriel Scherer
2013-02-26 15:12:04 UTC
Permalink
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
This is handled as a "bonus feature" discussed in
http://gallium.inria.fr/~scherer/namespaces/pack_et_functor_pack.html
, under the name "flat access". The idea is that adding values to
namespaces directly doesn't work very well (in OCaml, values cannot
live outside a compilation unit), but you can have some modules that
will be explicitly opened if you open the namespace (or included if
you use it as a module). However, we're not yet sure that this works
as well as the other aspects of the design, and would recommend
starting without it to get a feel of the system.
Post by Yaron Minsky
Leo, do you have a summary of what you don't like about Alain's
namespace proposal? I'm not presently able to identify any obvious
- UNSANITARY: Having both "open namespace Core.Std" and Core_List as
names seems a little unsanity. Indeed, to provide a decent user
experience, you probably want to hide the Core_List name almost
everywhere. You don't want it showing up in error messages,
documentation, source files, etc. When you need to do a bunch of
work to hide something, maybe it's better not to include it at all.
I want to point out that Alain's idea of using long, hopefully-unique
names for the source files, that you deem "unsanitary" here, is also a
workaround for the problem of internal name conflict. In Alain's
proposal, internal names conflicts are (hopfeully) avoided with no
implementation change, just by judicious name choices and letting the
implementation do its work. This imposes this "unsanitary" aspect you
mention, but it is a complexity tradeoff: the other alternative would
be to design a more clever implementation to choose internal names
(than the module's source filename), which could be hidden from the
user (no exposed Core_List) but require an implementation change. I
think that could warrant an exploration (because it could support use
cases such as linking together two versions of the same library, which
isn't possible with a purely filename-based solution), but in the
context Alain's proposal this "unsanitary" aspect has more advantages
than downsides.
Post by Yaron Minsky
- NO HIDING: I'm not sure that the other namespace proposals do
support this, but I'd like to be able to hide some modules so that
they are not reachable outside of the namespace. We can do this
with the current Core.Std, but I don't see how to do it in Alain's
proposal.
I'm surprised by this "hiding" idea. What does it mean, and what would
be an use case for that?
Post by Yaron Minsky
Are there other issues I'm missing?
My gut feeling is that a hierarchical model would add little
complexity to Alain's proposal and give a saner semantics to "open
namespace". With a flat model there can be no real notion of "open
namespace", only handcrafted renamings en masse (Core_list -> List,
Core_array -> Array...). (On the other hand "open namespace", if they
are done at the source level, are a source language change, while none
are necessary so far, so they will require more work to handle in any
case.)

There is the problem of internal name conflicts, but that is one of
the more "advanced" questions that may be left out of a first
experiment.

When deciding to leave out some questions for later, it is however
important to wonder whether we're sure that we will actually be able
to extend the design to support them. Some early design choices may
make future extensions harder (without breaking compatibility). One
solution is to make a coherent design that covers advanced features,
and decide to implement only a subset. One other is to try to guess
what will be easy to add afterwards, and be lucky. Finally, one can
experiment with a first design and move to another, but without
committing to backward compatibility (meaning no language release
between the two design iterations).

My personal guess is that
- "flat access" will be easy an easy extension.
- "hierarchical rather than flat mappings" will be relatively easy in
theory but make a painful transition period for providers that have
modules organized flatly and desire to change; but it depends on a
question of "seeing non-leaf namespaces as modules" that Alain's flat
namespaces don't need to consider.
- "extending the mapping description language" must be planned for
beforehand to avoid growing pains.
- "finer implementation of internal names to avoid linking conflicts"
is yet of unknown difficulty, either today and after any first
proposal not tackling this aspect is acted upon.
Post by Yaron Minsky
y
Post by Yaron Minsky
Post by Alain Frisch
Post by Yaron Minsky
I understand your point Alain, but while what you're saying is
technically reasonable, I think it doesn't hold together. When
programming in the large, it is useful to be able to manipulate the
namespace and group parts of the world together.
Can you give concrete examples of which manipulations are desired (and why)?
I think the conversation has gotten confused. You said "who needs
pack? Just use hierarchically named modules". I'm saying: that's
crazy, you need to be able to do simple manipulations of namespaces
(like the ones implied by "open Core.Std" or "open Core.Stable".)
Now you're pointing out that your namespace proposal covers the
manipulations I'm describing. That may well be right, and I wasn't
contesting that point.
All I'm saying is that simply relying on long module names without any
kind of explicit namespace control does not scale. I stand by that,
without necessarily objecting to your namespace proposal.
Post by Alain Frisch
Post by Yaron Minsky
Many libraries, not
just Core or Async, want to be able to remap the world by adding a
collection of related names to the namespace. The ability to do the
open Core.Std
is powerful and important. Your proposal of having people add
prefixes to module names does not fit the bill, and the resulting
system does not, in my opinion, scale.
I believe my proposal covers this use case. If you depend a lot on a
specify library which exports many modules, you might indeed want to avoid
using long names everywhere to access these modules. That's why I propose
to give simple ways to alias long modules names to short ones, in a way
which can be factorized (with external mapping files).
In my proposal, the equivalent of "open Core.Std" would simply be to tell
the compiler (through a command-line option or with a directive in the code)
to use a mapping file. The same can be done within the library itself.
Here is a minimalistic version of my proposal, restricted to specifying
those mapping files on the compiler and tools command-lines. To make it
clear that this is only about mapping module references to compiled units,
let's piggy-back the -I option. If its argument is a file with a .ns suffix
and not a directory, the compiler interprets it as a mapping file (a
sequence of lines of the form "Module_name =
relative_path_to_compiled_unit", e.g. "List = core_list"), and when this -I
option is considered during the resolution of a module reference (such as
"List") and the module is defined in the file, the compiler simply resolves
the module to the corresponding unit.
This approach could also be used to restrict which units are visible by the
compiler (and avoid repeated lookup on the file system) without moving files
around on the file system.
ocamldep (without -modules) would apply the same logic; ocamldep -modules
could either implement the same logic, or leave it to the build system
interpreting its output (allowing more dynamic scenarios where the mapping
file themselves are generated). I expect ocamldoc to work mostly
out-of-the-box, even though one could think about using the mapping files
(in reverse direction) to provide shorter names in the generated
documentation (or not).
Users of a library are never forced to use the mapping files, and they can
always refer to a module with its full name (provided the corresponding -I
<path> is used).
The OCaml stdlib would be adapted to use longer names (stdlib_list,
stdlib_array, etc) and shipped with a stdlib.ns file opened by default
(unless -nostdlib is used). To be clear: this mapping file will be used by
the stdlib itself, so references within itself don't need to use long names.
I'm interested to see concrete examples of manipulation or scenarios not be
covered by this proposal.
I am unaware of any. I think this does mostly support the
Core.Std/Core.Stable tricks we use.
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
y
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Yaron Minsky
2013-02-26 15:53:21 UTC
Permalink
I should mention that the term "UNSANITARY" might sound a bit
negative, but that is not intended. My actual opinion is that Alain's
proposal sounds to me quite practical, and is basically my favorite
proposal thus far (though I consider my own opinion to be of limited
value, because of my lack of understanding of many implementation
issues). I just wanted for us to call out the downsides explicitly,
so as to better understand what the opposition to Alain's plan might
be.

On Tue, Feb 26, 2013 at 10:12 AM, Gabriel Scherer
Post by Gabriel Scherer
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
This is handled as a "bonus feature" discussed in
http://gallium.inria.fr/~scherer/namespaces/pack_et_functor_pack.html
, under the name "flat access". The idea is that adding values to
namespaces directly doesn't work very well (in OCaml, values cannot
live outside a compilation unit), but you can have some modules that
will be explicitly opened if you open the namespace (or included if
you use it as a module). However, we're not yet sure that this works
as well as the other aspects of the design, and would recommend
starting without it to get a feel of the system.
That sounds dandy.
Post by Gabriel Scherer
Post by Yaron Minsky
Leo, do you have a summary of what you don't like about Alain's
namespace proposal? I'm not presently able to identify any obvious
- UNSANITARY: Having both "open namespace Core.Std" and Core_List as
names seems a little unsanity. Indeed, to provide a decent user
experience, you probably want to hide the Core_List name almost
everywhere. You don't want it showing up in error messages,
documentation, source files, etc. When you need to do a bunch of
work to hide something, maybe it's better not to include it at all.
I want to point out that Alain's idea of using long, hopefully-unique
names for the source files, that you deem "unsanitary" here, is also a
workaround for the problem of internal name conflict. In Alain's
proposal, internal names conflicts are (hopfeully) avoided with no
implementation change, just by judicious name choices and letting the
implementation do its work. This imposes this "unsanitary" aspect you
mention, but it is a complexity tradeoff: the other alternative would
be to design a more clever implementation to choose internal names
(than the module's source filename), which could be hidden from the
user (no exposed Core_List) but require an implementation change. I
think that could warrant an exploration (because it could support use
cases such as linking together two versions of the same library, which
isn't possible with a purely filename-based solution), but in the
context Alain's proposal this "unsanitary" aspect has more advantages
than downsides.
I don't disagree with any of this.
Post by Gabriel Scherer
Post by Yaron Minsky
- NO HIDING: I'm not sure that the other namespace proposals do
support this, but I'd like to be able to hide some modules so that
they are not reachable outside of the namespace. We can do this
with the current Core.Std, but I don't see how to do it in Alain's
proposal.
I'm surprised by this "hiding" idea. What does it mean, and what would
be an use case for that?
Post by Yaron Minsky
Are there other issues I'm missing?
My gut feeling is that a hierarchical model would add little
complexity to Alain's proposal and give a saner semantics to "open
namespace". With a flat model there can be no real notion of "open
namespace", only handcrafted renamings en masse (Core_list -> List,
Core_array -> Array...). (On the other hand "open namespace", if they
are done at the source level, are a source language change, while none
are necessary so far, so they will require more work to handle in any
case.)
There is the problem of internal name conflicts, but that is one of
the more "advanced" questions that may be left out of a first
experiment.
When deciding to leave out some questions for later, it is however
important to wonder whether we're sure that we will actually be able
to extend the design to support them. Some early design choices may
make future extensions harder (without breaking compatibility). One
solution is to make a coherent design that covers advanced features,
and decide to implement only a subset. One other is to try to guess
what will be easy to add afterwards, and be lucky. Finally, one can
experiment with a first design and move to another, but without
committing to backward compatibility (meaning no language release
between the two design iterations).
I appreciate your point about picking a modest design. I would argue,
however, that it's not worth actually implementing a namespace
mechanism that doesn't handle the case of a library like Core. I
think libraries like Core are really one of the biggest drivers for
namespaces, and it would seem a mistake to build and deploy something
that doesn't solve that problem.
Post by Gabriel Scherer
My personal guess is that
- "flat access" will be easy an easy extension.
- "hierarchical rather than flat mappings" will be relatively easy in
theory but make a painful transition period for providers that have
modules organized flatly and desire to change; but it depends on a
question of "seeing non-leaf namespaces as modules" that Alain's flat
namespaces don't need to consider.
- "extending the mapping description language" must be planned for
beforehand to avoid growing pains.
- "finer implementation of internal names to avoid linking conflicts"
is yet of unknown difficulty, either today and after any first
proposal not tackling this aspect is acted upon.
This all sounds reasonable.
Post by Gabriel Scherer
Post by Yaron Minsky
y
Post by Yaron Minsky
Post by Alain Frisch
Post by Yaron Minsky
I understand your point Alain, but while what you're saying is
technically reasonable, I think it doesn't hold together. When
programming in the large, it is useful to be able to manipulate the
namespace and group parts of the world together.
Can you give concrete examples of which manipulations are desired (and why)?
I think the conversation has gotten confused. You said "who needs
pack? Just use hierarchically named modules". I'm saying: that's
crazy, you need to be able to do simple manipulations of namespaces
(like the ones implied by "open Core.Std" or "open Core.Stable".)
Now you're pointing out that your namespace proposal covers the
manipulations I'm describing. That may well be right, and I wasn't
contesting that point.
All I'm saying is that simply relying on long module names without any
kind of explicit namespace control does not scale. I stand by that,
without necessarily objecting to your namespace proposal.
Post by Alain Frisch
Post by Yaron Minsky
Many libraries, not
just Core or Async, want to be able to remap the world by adding a
collection of related names to the namespace. The ability to do the
open Core.Std
is powerful and important. Your proposal of having people add
prefixes to module names does not fit the bill, and the resulting
system does not, in my opinion, scale.
I believe my proposal covers this use case. If you depend a lot on a
specify library which exports many modules, you might indeed want to avoid
using long names everywhere to access these modules. That's why I propose
to give simple ways to alias long modules names to short ones, in a way
which can be factorized (with external mapping files).
In my proposal, the equivalent of "open Core.Std" would simply be to tell
the compiler (through a command-line option or with a directive in the code)
to use a mapping file. The same can be done within the library itself.
Here is a minimalistic version of my proposal, restricted to specifying
those mapping files on the compiler and tools command-lines. To make it
clear that this is only about mapping module references to compiled units,
let's piggy-back the -I option. If its argument is a file with a .ns suffix
and not a directory, the compiler interprets it as a mapping file (a
sequence of lines of the form "Module_name =
relative_path_to_compiled_unit", e.g. "List = core_list"), and when this -I
option is considered during the resolution of a module reference (such as
"List") and the module is defined in the file, the compiler simply resolves
the module to the corresponding unit.
This approach could also be used to restrict which units are visible by the
compiler (and avoid repeated lookup on the file system) without moving files
around on the file system.
ocamldep (without -modules) would apply the same logic; ocamldep -modules
could either implement the same logic, or leave it to the build system
interpreting its output (allowing more dynamic scenarios where the mapping
file themselves are generated). I expect ocamldoc to work mostly
out-of-the-box, even though one could think about using the mapping files
(in reverse direction) to provide shorter names in the generated
documentation (or not).
Users of a library are never forced to use the mapping files, and they can
always refer to a module with its full name (provided the corresponding -I
<path> is used).
The OCaml stdlib would be adapted to use longer names (stdlib_list,
stdlib_array, etc) and shipped with a stdlib.ns file opened by default
(unless -nostdlib is used). To be clear: this mapping file will be used by
the stdlib itself, so references within itself don't need to use long names.
I'm interested to see concrete examples of manipulation or scenarios not be
covered by this proposal.
I am unaware of any. I think this does mostly support the
Core.Std/Core.Stable tricks we use.
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
y
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Alain Frisch
2013-02-26 15:28:18 UTC
Permalink
Post by Yaron Minsky
- UNSANITARY: Having both "open namespace Core.Std" and Core_List as
names seems a little unsanity. Indeed, to provide a decent user
experience, you probably want to hide the Core_List name almost
everywhere. You don't want it showing up in error messages,
documentation, source files, etc. When you need to do a bunch of
work to hide something, maybe it's better not to include it at all.
Concretly, what should the OCamldoc-generated page for a Core module
look like? Should the reference to types defined in other Core modules
be displayed without any indication that they come from Core (without
clicking on it)? I would find this very confusing. Similarly for error
messages: it is indeed nice to be able to see List instead of Core_list
when you live with Core all you day long and there is no ambiguity.
Still, there needs to be a way to have a more explicit error message,
which could distinguish Stdlib_list from Core_list. For me, seeing List
instead Core_list in "common cases" is only nice to have (and definitely
achievable with a little more effort in my proposal), but being able to
distinguish Core_list from Stdlib_lib is crucial. (And I don't think
that showing Core#List is nicer than Core_list, at least not enough to
justify the addition of a new notion.)

Having long unique names which can be used in any context is a very good
property and would also avoid problems reported with camlp4 extensions
not being able to access a specific "standard" module. If we can use
Stdlib_list in any context, we don't need to do any hack to refer to it.
Post by Yaron Minsky
- NO HIDING: I'm not sure that the other namespace proposals do
support this, but I'd like to be able to hide some modules so that
they are not reachable outside of the namespace. We can do this
with the current Core.Std, but I don't see how to do it in Alain's
proposal.
If the library does not install the .cmi file of the private module, it
cannot be accessed by any "client code". We do this regularly for our
internal libraries. For me, this is not really related to namespaces.

With my proposal, it is even better: you can install .cmi files so that
some "privileged" parts of your code base can still access it, but also
restrict access to other parts by providing a mapping file which does
not alias the private module (and just arrange so that you don't have a
-I on the directory where the .cmi is installed). Currently, what we do
to achieve it is to have several install directories for some libraries
and copy a different subset of .cmi files to each of them.


Alain
Yaron Minsky
2013-02-26 16:00:22 UTC
Permalink
Post by Yaron Minsky
- UNSANITARY: Having both "open namespace Core.Std" and Core_List as
names seems a little unsanity. Indeed, to provide a decent user
experience, you probably want to hide the Core_List name almost
everywhere. You don't want it showing up in error messages,
documentation, source files, etc. When you need to do a bunch of
work to hide something, maybe it's better not to include it at all.
Concretly, what should the OCamldoc-generated page for a Core module look
like? Should the reference to types defined in other Core modules be
displayed without any indication that they come from Core (without clicking
on it)?
I believe that's right. When you're reading Core's documentation,
much as when you're reading Core's code, you want to see the
in-namespace names.
I would find this very confusing. Similarly for error messages: it
is indeed nice to be able to see List instead of Core_list when you live
with Core all you day long and there is no ambiguity. Still, there needs to
be a way to have a more explicit error message, which could distinguish
Stdlib_list from Core_list.
I think if you've opened the Core namespace in prefernce to the Stdlib
one, you want to see Core_list as List, and you want to see
Stdlib_list as Stdlib_list.

Core should be able to stand up as a full-fledged alternative to the
standard library. No one wants error messages for the standard
library to suddenly get twice as long because it's littered with
qualifiers. We want the same for Core.

I would argue that when referencing identifiers in other namespaces,
you should at that point have the namespace qualifier in place.
For me, seeing List instead Core_list in "common cases" is only nice
to have (and definitely achievable with a little more effort in my
proposal), but being able to distinguish Core_list from Stdlib_lib
is crucial. (And I don't think that showing Core#List is nicer than
Core_list, at least not enough to justify the addition of a new
notion.)
I think you're underselling the importance of keeping names concise.
We've started using Garrigue's short-paths patch, and it is amazing
how much it helps error messages, and the key thing it does is remove
unnecessary qualifications. OCaml's error messages aren't great as
they are, and making them even longer is a bug user-interface mistake.
Having long unique names which can be used in any context is a very good
property and would also avoid problems reported with camlp4 extensions not
being able to access a specific "standard" module. If we can use
Stdlib_list in any context, we don't need to do any hack to refer to it.
I agree that having unique names that are accessible from any context
is good, though I think namespaces could provide this. You could
refer to Stdlib#List anytime you needed to be explicit, and just use
List when you did not.

That said, I agree that your proposal provides this feature as well,
and that it's a good thing.
Post by Yaron Minsky
- NO HIDING: I'm not sure that the other namespace proposals do
support this, but I'd like to be able to hide some modules so that
they are not reachable outside of the namespace. We can do this
with the current Core.Std, but I don't see how to do it in Alain's
proposal.
If the library does not install the .cmi file of the private module, it
cannot be accessed by any "client code". We do this regularly for our
internal libraries. For me, this is not really related to namespaces.
That's an interesting point. Maybe we could do the same.
With my proposal, it is even better: you can install .cmi files so that some
"privileged" parts of your code base can still access it, but also restrict
access to other parts by providing a mapping file which does not alias the
private module (and just arrange so that you don't have a -I on the
directory where the .cmi is installed). Currently, what we do to achieve it
is to have several install directories for some libraries and copy a
different subset of .cmi files to each of them.
Alain
Daniel Bünzli
2013-02-26 16:30:46 UTC
Permalink
Post by Yaron Minsky
I think you're underselling the importance of keeping names concise.
We've started using Garrigue's short-paths patch, and it is amazing
how much it helps error messages, and the key thing it does is remove
unnecessary qualifications. OCaml's error messages aren't great as
they are, and making them even longer is a bug user-interface mistake.
This is also something that's very important to me. Beyond one (maybe two) "dot" levels I find things to become unreadable, especially when printed as a part of aggregate types like tuples or polymorphic variants.

It was also that reason that prompted me to design the Gg's module mentioned before as it is, i.e. if you take care to define your module signatures as follows:

type a
module A : sig
type t = a
(* mention a not t in the signatures *)
val create : unit -> a
val op : a -> a
end

They are still compatible with the M.t convention for functors but you get short Cunit.a type names in errors and your definitions.

Daniel
Leo White
2013-02-26 17:03:28 UTC
Permalink
Post by Yaron Minsky
Leo, do you have a summary of what you don't like about Alain's
namespace proposal? I'm not presently able to identify any obvious
weaknesses in it.
I have a few of issues with Alain's proposal specifically, and then a
second group of issues that are with that kind of proposal in general (i.e.
defining the namespace separately from the modules it contains).

Issues with Alain's proposal specifically:

1. Alain's proposal does not include a notion of opening a namespace. The
ability to only open a namespace when actually using it (e.g. a local
open) would be very useful. For example, this means that the proposal
provides no assistance for handling multiple standard libraries in the
same program: you can either make "List" equal "Core_List" everywhere or
you can make it equal "Stdlib_List" everywhere. I see no real reason
why Alain's namespaces could not be extended with a namespace opening
feature.

2. Alain's proposal provides no means of aliasing namespaces. Similar to
opening a namespace, it would be useful to be able to say "open
Core.Std as CS" and then refer to Core_Std_Mutex as "CS#Mutex"

3. Alain's proposal only supports a flat namespace. As I said before,
hierarchical namespaces are very useful in certain situations.
Obviously this criticism only makes sense if you first add the ability
to open namespaces. Again this feature could easily be added to Alain's
proposal.

Issues with this kind of proposal:

1. They require unique file names. This means that every file must become
"package_Module.ml" or "package_Subpackage_Module.ml".

2. A module's name depends on other files. A module's name (by which I
mean its namespaced name) depends on the ".ns" file, this could
definitely be confusing.

3. They require you to manage an ".ns" file. Not a major burden, but it is
one more place to look for errors. It also must be included in all
calls to the compiler.

4. To me they put the namespace information in the wrong place. A
component should define its own names, it should also not have to open
its own namespace.

I think that the first group of issues above are important, but the second
group of issues are of a more aesthetic nature. Also, if you want to
automatically open modules along with a namespace, then that is much easier
with centralised definition of the namespace.

As a side note, if people are okay with long filenames, why not support
having filenames like "core-std-mutex.mli" and then allow "open namespace
Core" and "open namespace Core#Std".

These could be treated as if there were automatically a "core.ns":

namespace Std = "core-std.ns"

and "core-std.ns":

module Mutex = "core-std-mutex.mli"

that were passed to any compiler invocation that had such files on its
search path.
Alain Frisch
2013-02-26 18:33:18 UTC
Permalink
Post by Leo White
1. Alain's proposal does not include a notion of opening a namespace.
The ability to only open a namespace when actually using it (e.g. a
local open) would be very useful. For example, this means that the
proposal provides no assistance for handling multiple standard
libraries in the same program: you can either make "List" equal
"Core_List" everywhere or
you can make it equal "Stdlib_List" everywhere. I see no real reason
why Alain's namespaces could not be extended with a namespace
opening feature.
Yes, my initial proposal was about adding a "(local) namespace opening"
statement to the language:

open namespace Core

let open namespace Core in ....

(Both would look for a core.ns file on the search path.)

I've simplified it even further by restricting its use to entire unit
and putting the directive it on the command-line instead of the source
code. If the need for controlling locally which namespaces are used is
felt, I won't object to it. But I'm not yet convinced that this is so
useful (and it complicates things, such as decicing how to display a
nice type name in case of type error, because this depends on the
location in your module).
Post by Leo White
2. Alain's proposal provides no means of aliasing namespaces. Similar to
opening a namespace, it would be useful to be able to say "open
Core.Std as CS" and then refer to Core_Std_Mutex as "CS#Mutex"
I believe that aliasing modules is what matters. Since I don't
understand the need for a dedicated notion of namespace yet (with some
operations on namespaces) I also don't see the need for aliasing them.
Post by Leo White
2. A module's name depends on other files. A module's name (by which I
mean its namespaced name) depends on the ".ns" file, this could
definitely be confusing.
I don't agree with this interpretation: the name of the module depends
only on its file name, and we provide a convenient way for client code
to refer to these modules with shorter names, locally. It is a big
advantage, for me, that client code could always be written without
"namespaces", if required.
Post by Leo White
3. They require you to manage an ".ns" file. Not a major burden, but it
is one more place to look for errors. It also must be included in all
calls to the compiler.
For any serious use of the compiler with some third-party libraries, the
command-line is not built by hand, but by your build system and/or
ocamlfind.
Post by Leo White
4. To me they put the namespace information in the wrong place. A
component should define its own names, it should also not have to open
its own namespace.
I guess this is a matter of taste, and corresponds to a conception of
what namespaces should be as an identified notion, rather than solving
an actual need or addressing a concrete problem with my proposal
(correct me if I'm wrong).
Post by Leo White
As a side note, if people are okay with long filenames, why not support
having filenames like "core-std-mutex.mli" and then allow "open
namespace Core" and "open namespace Core#Std".
namespace Std = "core-std.ns"
module Mutex = "core-std-mutex.mli"
that were passed to any compiler invocation that had such files on its
search path.
I don't like this implicitness: how can we infer dependencies? If
ocamldep sees a reference to "List" in scope where "open namespace Core"
and "open namespace Bar" are both active, it would have to look for both
core-list and bar-list files (and there will be a combinatorial
explosion in the resolution if you allow opening nested namespaces).
Worse: if we use "ocamldep -modules", this resolution has to be done by
the build system, so this complex logic (which depends on the location
in the source file) will have to be re-implemented in omake and other
build systems around. It is an important property that "ocamldep
-modules" does not need to look for the existence of compiled units on
the current tree.


Alain
Yaron Minsky
2013-02-26 19:54:55 UTC
Permalink
Post by Leo White
2. Alain's proposal provides no means of aliasing namespaces. Similar to
opening a namespace, it would be useful to be able to say "open
Core.Std as CS" and then refer to Core_Std_Mutex as "CS#Mutex"
I believe that aliasing modules is what matters. Since I don't understand
the need for a dedicated notion of namespace yet (with some operations on
namespaces) I also don't see the need for aliasing them.
Within Jane Street, we alias modules quite a bit (and it would be nice
if this was lighter weight than it is, but that's another story!). I
think you'd want to alias namespaces for the same reason.

The intuition is this: by default, you want long descriptive
identifiers. But for a small scope, you're happy to either make the
identifier disappear entirely (as in, [open namespace
Super_awesome_openGL_renderer]) or make it small (as in, [alias
namespace Render = Super_awesome_openGL_renderer).
Post by Leo White
2. A module's name depends on other files. A module's name (by which I
mean its namespaced name) depends on the ".ns" file, this could
definitely be confusing.
I don't agree with this interpretation: the name of the module depends only
on its file name, and we provide a convenient way for client code to refer
to these modules with shorter names, locally. It is a big advantage, for
me, that client code could always be written without "namespaces", if
required.
I understand why you view this as an advantage, but I think of this as
roughly a disadvantage. The "namespace" names I think will be far
more usable and convenient, and those I think should be the primary
ones. I believe the fully-qualified names should be viewed as a
second-class, low-level feature. For one thing, having both of them
be primary things that are presented to the user is going to be
awfully confusing.
Post by Leo White
3. They require you to manage an ".ns" file. Not a major burden, but it
is one more place to look for errors. It also must be included in all
calls to the compiler.
For any serious use of the compiler with some third-party libraries, the
command-line is not built by hand, but by your build system and/or
ocamlfind.
Post by Leo White
4. To me they put the namespace information in the wrong place. A
component should define its own names, it should also not have to open
its own namespace.
I guess this is a matter of taste, and corresponds to a conception of what
namespaces should be as an identified notion, rather than solving an actual
need or addressing a concrete problem with my proposal (correct me if I'm
wrong).
I think it addresses usability of the overall system. Your approach
of making the flat names priomary maximizes continuity with toolchains
of the past, but also maximizes the confusion of the system to new
users. I think we should largely present one way of naming things to
users, and making the namespace-names the primary ones seems like a
clear usability win.
Post by Leo White
As a side note, if people are okay with long filenames, why not support
having filenames like "core-std-mutex.mli" and then allow "open
namespace Core" and "open namespace Core#Std".
namespace Std = "core-std.ns"
module Mutex = "core-std-mutex.mli"
that were passed to any compiler invocation that had such files on its
search path.
I don't like this implicitness: how can we infer dependencies? If ocamldep
sees a reference to "List" in scope where "open namespace Core" and "open
namespace Bar" are both active, it would have to look for both core-list and
bar-list files (and there will be a combinatorial explosion in the
resolution if you allow opening nested namespaces). Worse: if we use
"ocamldep -modules", this resolution has to be done by the build system, so
this complex logic (which depends on the location in the source file) will
have to be re-implemented in omake and other build systems around. It is an
important property that "ocamldep -modules" does not need to look for the
existence of compiled units on the current tree.
Alain
Didier Remy
2013-02-26 14:59:08 UTC
Permalink
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
Yaron,

Do you really need this level of granularity? I'd like to think of
modules as the smallest compilation unit. Can you give us examples of what
you'd like to do with value manipulation?

Didier
Daniel Bünzli
2013-02-26 15:40:53 UTC
Permalink
Do you really need this level of granularity? I'd like to think of
modules as the smallest compilation unit. Can you give us examples of what
you'd like to do with value manipulation?
I have a (yet unreleased) library that provides basic types for computer graphics (vectors, matrices, sizes, quaternions, colors, boxes). I manually "packed" it as follows:

module Gg = struct
module Float sig ? end

type m2
type m3
type m4

(* Vectors *)
type v2
type v3
type v4
module type V (* implemented by all vector types *)
module V2 : sig type t = v2 ? end
module V3 : sig type t = v3 ? end
module V4 : sig type t = v4 ? end

(* Points *)
type p2 = v2
type p3 = v3
module type P = sig ? end (* implemented by all point types *)
module P2 : sig type t = p2 ? end
module P3 : sig type t = p3 ? end

(* Matrices *)
module type M = sig ? end (* implemented by all matrix types *)
module M2 : sig type t = m2 ? end
module M3 : sig type t = m3 ... end
module M4 : sig type t = m4 ? end

(* Quaternions *)
type quat = v4
module Quat : sig ? end

(* Sizes *)
type size2 = v2
type size3 = v3
module Size2 = sig ... end
module Size3 = sig ? end

(* Boxes *)
...

end

The idea was that people using the library would do an

open Gg;;

at the top of their source. Now I would gladly replace that Gg with a namespace to maybe benefit of smaller executables (e.g. a 2D program that only uses V2, P2, M2). However at toplevel there are quite a few things that are not modules.

1) Module types. These are mainly here to be able to functorize code over multiple dimensions.

2) Types. Are there for various reasons, first to avoid recursive module definition. Second so that printing of errors in the toplevel and by the compiler do not become long-winded/and or unreadable (e.g. print Gg.size2, Gg.v2 instead of Gg.Size2.t, Gg.V2.t).

3) (Values. I don't do that anymore but initially I also had value constructors for the types but I pushed these back into modules so that the open Gg guarantees you that no value is added to your environment, only types and modules are defined).

So I guess in that case namespaces won't be of any help since only modules can be attached to them. The obvious solution would then to define all top level types maybe in a Gg.Types module, but I'd loose the short printing of type names and need to introduce artificial modules (Gg.Types).

Daniel
Yaron Minsky
2013-02-26 15:43:57 UTC
Permalink
Post by Didier Remy
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
Yaron,
Do you really need this level of granularity? I'd like to think of
modules as the smallest compilation unit. Can you give us examples of what
you'd like to do with value manipulation?
I think we do. What we want is essentially the same thing that we
need to do when OCaml opens Pervasives by default. We simply have
another module that we wish to open by default.

There are two reasons for this: the first is to open at the top-level
some constructors and values that are very commonly useful. Much as
pervasives has None and Some from option available everywhere, we have
Ok and Error from Result.t available everywhere.

Another reason is to shadow values from other modules. Core.Std hides
various values from Pervasives that we view as harmful. For example,
we hide ==, and instead expose phys_equal. (We think == is too
confusing to people from other languages.)

Similarly, Async hides blocking operations that are available in
Core.Std, like print_string, so when you write:

open Core.Std
open Async.Std

those problematic values are hidden from. It would be a shame, I
think, to make every single file that uses Core have to change from:

open Core.Std

to

open namespace Core.Std
open Core.Std.Common

or whatever it would need to be.

y
Post by Didier Remy
Didier
_______________________________________________
Platform mailing list
Platform at lists.ocaml.org
http://lists.ocaml.org/listinfo/platform
Alain Frisch
2013-02-26 16:43:04 UTC
Permalink
Post by Yaron Minsky
I think we do. What we want is essentially the same thing that we
need to do when OCaml opens Pervasives by default. We simply have
another module that we wish to open by default.
Do you agree that this need of importing values automatically in the
global scope is quite specific to "stdlib" replacements and to basic
"infrastructure" libraries? For me, this would justify some "side"
support for the feature under the form of a command-line option, if this
can give the expected benefit without touching the language itself. I
expect that most of your code base does "open Core.Std", right? What
would be the problem of putting this information in the build system so
that this module is opened by default (and maybe also to enable to
corresponding "module/unit mapping" with my proposal)? Generally
speaking I agree that putting too much information in the build system
is a bad idea, but here the point is precisely to provide the illusion
that Core.Std is the standard library (for a specific code base) and to
avoid one extra "open" statement.

There is default behavior of OCaml of opening Pervasives by default.
This can be switched of. Now if Core has to be thought as a different
standard library, why should we require users to even write "open
Core.Std" at all? We should be able to arrange so that the compiler
opens it by default as it usually opens Pervasives.

Again, for me, this is quite orthogonal to module name management (which
can be useful to many libraries) and is rather specific to stdlib
replacement / basic infrastructure libraries.


Alain
Yaron Minsky
2013-02-26 16:53:49 UTC
Permalink
Post by Yaron Minsky
I think we do. What we want is essentially the same thing that we
need to do when OCaml opens Pervasives by default. We simply have
another module that we wish to open by default.
Do you agree that this need of importing values automatically in the global
scope is quite specific to "stdlib" replacements and to basic
"infrastructure" libraries?
Infrastructure is a relative question. Core and Async are both
libraries that want this. But there are more we have internally.
Indeed, we mint a fair number of little programming worlds, each with
their own conventions that are encoded by such a Std library. There
aren't hundreds of these, but there are probably a dozen.

Generally, it's important to remember the importance of scale. Jane
Street has a hundred people actively hacking on OCaml every day,
millions of lines of codes and hundreds of software artifacts. Things
that seem like occasional special cases become first-class
requirements pretty quick.
For me, this would justify some "side" support
for the feature under the form of a command-line option, if this can give
the expected benefit without touching the language itself.
I expect that most of your code base does "open Core.Std", right?
What would be the problem of putting this information in the build
system so that this module is opened by default (and maybe also to
enable to corresponding "module/unit mapping" with my proposal)?
Generally speaking I agree that putting too much information in the
build system is a bad idea, but here the point is precisely to
provide the illusion that Core.Std is the standard library (for a
specific code base) and to avoid one extra "open" statement.
There is default behavior of OCaml of opening Pervasives by
default. This can be switched of. Now if Core has to be thought as
a different standard library, why should we require users to even
write "open Core.Std" at all? We should be able to arrange so that
the compiler opens it by default as it usually opens Pervasives.
Again, for me, this is quite orthogonal to module name management (which can
be useful to many libraries) and is rather specific to stdlib replacement /
basic infrastructure libraries.
I very much disagree. This is a common enough primitive that people
should be able to do it without hacking the build system. Indeed,
Daniel seems to have the same requirements as we do.

y
Alain Frisch
2013-02-26 18:14:06 UTC
Permalink
Post by Yaron Minsky
I very much disagree. This is a common enough primitive that people
should be able to do it without hacking the build system. Indeed,
Daniel seems to have the same requirements as we do.
I'd say "using" the build system rather than hacking it. Actually, it
is more about using ocamlfind. We will anyway need to rely on ocamlfind
to manage the installation of many libraries, right? I think that users
don't really care that passing "-package core" to ocamlfind will result
in it passing the following three options to the compiler:

-I ..../core/core.ns
(* To "open" the the default mapping provided by core *)

-open Core_std
(* To open the default module which, in particular,
overrides definitions in Pervasives. *)

-I ..../core
(* So that you can use long names like Core_list if you
wish, to be explicit *)

(and probably some -ppx options as well :-))


(There could variants like a "-package core.raw" which would only pass
the third one.)

Do you agree that the information about which libraries are used by a
specific part of the code base will be part of the build system anyway?

Now we could discuss putting library-related information in the client
code itself (this is a common request for "syntax extensions", but I
don't see why this is specific to them), e.g. in the form of special
comments/attributes understood by ocamlfind, but this seems a different
topic to me.



Alain
Yaron Minsky
2013-02-26 19:48:03 UTC
Permalink
Post by Alain Frisch
Do you agree that the information about which libraries are used by a
specific part of the code base will be part of the build system anyway?
Now we could discuss putting library-related information in the client code
itself (this is a common request for "syntax extensions", but I don't see
why this is specific to them), e.g. in the form of special
comments/attributes understood by ocamlfind, but this seems a different
topic to me.
Your point is fair. I guess what I'm saying specifically is that
namespace manipulation belongs in the source. Right now, if the build
system, it might cause my build to fail, or too much to be linked in,
but it won't change the meaning of my program in a material way.

In short, I'd like to keep the things that affect the meaning of my
program largely in the source. I agree that for very pervasive
things, like Core.Std or sexplib within Jane Street, you may want to
make this silently present at the build-system level. But most of the
time, you want to be explicit about this kind of change to the
semantics of your program.

Maybe the main difference between us is this: I want to use namespaces
for multiple libraries, not just highly pervasive ones like Core. You
think of namespaces as a thing to solve the fairly narrow problem of
Core. If I thought you were right about this, I would agree with the
rest.
Alain Frisch
2013-02-26 22:17:31 UTC
Permalink
Post by Yaron Minsky
In short, I'd like to keep the things that affect the meaning of my
program largely in the source. I agree that for very pervasive
things, like Core.Std or sexplib within Jane Street, you may want to
make this silently present at the build-system level. But most of the
time, you want to be explicit about this kind of change to the
semantics of your program.
Maybe the main difference between us is this: I want to use namespaces
for multiple libraries, not just highly pervasive ones like Core. You
think of namespaces as a thing to solve the fairly narrow problem of
Core. If I thought you were right about this, I would agree with the
rest.
I think we agree on why we disagree. For me, the only problem which is
serious enough to justify changes to the language and toolchain is about
name clashes (not being able to use two libraries together because they
happen to use identical module names). A simple non-technical solution
exists today: a policy to avoid such clashes (long names). The only
cases where this is problematic, I believe, is for big libraries used in
a pervasive way, for which (i) long names become really tedious for the
users; (ii) module aliasing is not enough because the library has many
modules. I think this is fairly limited and can be solved by very light
additions to the toolchain, allowing basically to define "user-land
stdlibs" (with a context-dependent definition of what a stdlib is). I
consider other use cases and "required" features of namespaces as nice
to have (or not) but not important enough to justify any serious
addition to the language definition and adaptation of existing tools,
especially when we are talking about breaking some current invariants
and properties on which some existing users might rely.


Alain
Yaron Minsky
2013-02-26 22:29:05 UTC
Permalink
Post by Alain Frisch
Post by Yaron Minsky
In short, I'd like to keep the things that affect the meaning of my
program largely in the source. I agree that for very pervasive
things, like Core.Std or sexplib within Jane Street, you may want to
make this silently present at the build-system level. But most of the
time, you want to be explicit about this kind of change to the
semantics of your program.
Maybe the main difference between us is this: I want to use namespaces
for multiple libraries, not just highly pervasive ones like Core. You
think of namespaces as a thing to solve the fairly narrow problem of
Core. If I thought you were right about this, I would agree with the
rest.
I think we agree on why we disagree. For me, the only problem which is
serious enough to justify changes to the language and toolchain is about
name clashes (not being able to use two libraries together because they
happen to use identical module names). A simple non-technical solution
exists today: a policy to avoid such clashes (long names). The only cases
where this is problematic, I believe, is for big libraries used in a
pervasive way, for which (i) long names become really tedious for the users;
(ii) module aliasing is not enough because the library has many modules.
I think the reason we disagree ties to questions of scale. We have a
number of projects that are big enough to want to build their own
"pervasive" library. For example, we have a library whose purpose is
designing trading systems. It has its own Std and its own set of
conventions, and many thousands of lines of code built within that
world. We also have a separate library whose purpose is building user
interfaces for trading systems. It too has its own Std and its own
conventions that it wants to standardize on.

Namespace control becomes a real issue in this world quite quickly. A
solution designed for one hulking Stdlib replacement doesn't address
enough of the problem space.

And I don't think these scale issues are Jane Street only. A big part
of the goal of the work we're putting in to OCamlLabs and Real World
OCaml and these working groups is to really grow the OCaml community,
and for that, OCaml needs to get good at programming-in-the-large. I
think good namespace control is a key part of doing that right.
Post by Alain Frisch
I think this is fairly limited and can be solved by very light
additions to the toolchain, allowing basically to define "user-land
stdlibs" (with a context-dependent definition of what a stdlib is).
I consider other use cases and "required" features of namespaces as
nice to have (or not) but not important enough to justify any
serious addition to the language definition and adaptation of
existing tools, especially when we are talking about breaking some
current invariants and properties on which some existing users might
rely.
I understand where you're coming from, but I don't think it's the
right decision for the future of the language. I think this is case
where OCaml can and should change.

y

Didier Remy
2013-02-26 22:03:52 UTC
Permalink
Yaron,
Post by Yaron Minsky
Post by Didier Remy
Do you really need this level of granularity? I'd like to think of
modules as the smallest compilation unit. Can you give us examples of what
you'd like to do with value manipulation?
I think we do. What we want is essentially the same thing that we
need to do when OCaml opens Pervasives by default. We simply have
another module that we wish to open by default.
This is quite different than cherry-picking values from a module. Also open
is a bit misleading here, because it must behaves like an include. While
open brings all subcomponents in scope but does not evaluate them, include
must evaluate (use) all the definitions and binds them in the current
structure.

This is quite different from what happens when building namespaces: when a
new definition shadows an older one in a namespace, the older definition
will not be visible by the linker and will not be evaluated.

"Include-on-open" (called flat-acccess by Gabriel) is doable as Gabriel
explained, and would generalize the behavior of pervasives, but it is
another mechanism than just building a namespace (because it also _uses_ a
module).
Post by Yaron Minsky
There are two reasons for this: the first is to open at the top-level
some constructors and values that are very commonly useful. Much as
pervasives has None and Some from option available everywhere, we have
Ok and Error from Result.t available everywhere.
Fine.
Post by Yaron Minsky
Another reason is to shadow values from other modules. Core.Std hides
various values from Pervasives that we view as harmful. For example,
we hide ==, and instead expose phys_equal. (We think == is too
confusing to people from other languages.)
This example is different I think, and perhaps another solution could be used.
My understanding is that you wish to have your own version of stdlib, say
stdlib_minus_plus.

Why don't you create a new module stdlib_minus_plus that includes stdlib
with a restricted safe interface (so that bindings "minus" will be evaluated
but not visible) and then add your own safer definitions "plus", bind the
resulting module to some namespace Core#stdlib and then use your own version
of Core#stdlib instead of the original stdlib?

It seems that you wish Core.std to be only a diff against the original
stdlib and not a patched copy. Is this the case? Why it is important?
Post by Yaron Minsky
Similarly, Async hides blocking operations that are available in
open Core.Std
open Async.Std
those problematic values are hidden from.
Isn't this fragile? for example, what happens if the user mistakenly writes
these two lines in the other order? If Async.Std _must_ hide values of
Core.Std for safety reasons (for instance to avoid blocking), why is it not
returning a patched copy of Core.Std with these values overridden instead of
relying on the user to open modules in the right order?
Post by Yaron Minsky
open Core.Std
to
open namespace Core.Std
open Core.Std.Common
or whatever it would need to be.
I don't see what you mean here.

Didier
Alain Frisch
2013-02-26 15:38:52 UTC
Permalink
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
I can understand that this is nice, but is it really important?

Concretely, we are talking about the necessity to add an extra "open
Core_std" statement in addition to the namespace opening statement (or a
corresponding command-line option). Does this extra open statement
justify to make concepts less orthogonal?

My "flat mapping" proposal could be extended to support "implicit opens"
in the same mapping files (in addition to lines like "List = core_list",
we could have "open Core_std" lines). I don't think it is a good idea
to do so, however (and it would require to have two versions of the
mapping file, one to be used inside the implementation of Core itself
and another one for client code -- without the open statement). We
could also let the user specify implicit opens on the command-line, so
as to push the problem to the build system or ocamlfind. Again, I'm not
sure this is a great idea to do so, but at least, it would make it clear
that this feature is not related to the module naming system.


Alain
Yaron Minsky
2013-02-26 16:02:59 UTC
Permalink
Post by Alain Frisch
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
I can understand that this is nice, but is it really important?
I think it is important. Part of what we're doing in thinking through
Real World OCaml is to figure out what we can do to make using the
system as easy and painless as possible. Requiring two separate
declarations is asking people to forget one of them (and from the
comments we've gotten from people alpha-testing the book, they make
these mistakes quite a bit.)

The build system I think is just the wrong place to specify this. You
should be able to control your namespace conveniently from the source.
Specifying implicit opens from the build system is just asking for
people to be confused about how their namespace is controlled.
Post by Alain Frisch
Concretely, we are talking about the necessity to add an extra "open
Core_std" statement in addition to the namespace opening statement (or a
corresponding command-line option). Does this extra open statement justify
to make concepts less orthogonal?
My "flat mapping" proposal could be extended to support "implicit opens" in
the same mapping files (in addition to lines like "List = core_list", we
could have "open Core_std" lines). I don't think it is a good idea to do
so, however (and it would require to have two versions of the mapping file,
one to be used inside the implementation of Core itself and another one for
client code -- without the open statement). We could also let the user
specify implicit opens on the command-line, so as to push the problem to the
build system or ocamlfind. Again, I'm not sure this is a great idea to do
so, but at least, it would make it clear that this feature is not related to
the module naming system.
Alain
Christophe TROESTLER
2013-02-26 16:37:15 UTC
Permalink
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
Would these values be able to run arbitrary code (e.g. initializing an
underlying C library and executing some functions)?
Yaron Minsky
2013-02-26 16:48:55 UTC
Permalink
On Tue, Feb 26, 2013 at 11:37 AM, Christophe TROESTLER
Post by Christophe TROESTLER
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
Would these values be able to run arbitrary code (e.g. initializing an
underlying C library and executing some functions)?
I think yes. It's the exact equivalent of opening a module, which
requires the module be linked in, and all its toplevel effects
executed.

y
Christophe TROESTLER
2013-02-26 17:44:27 UTC
Permalink
Post by Yaron Minsky
On Tue, Feb 26, 2013 at 11:37 AM, Christophe TROESTLER
Post by Christophe TROESTLER
Post by Yaron Minsky
One thing I'll say is that it is important to be able to add values,
and not just modules, to the namespace. Open Core.Std also adds
top-level values, as does the traditional standard library (i.e.,
Pervasives), and I don't want to lose that.
Would these values be able to run arbitrary code (e.g. initializing an
underlying C library and executing some functions)?
I think yes. It's the exact equivalent of opening a module, which
requires the module be linked in, and all its toplevel effects
executed.
There is something I do not understand: if we allow toplevel values
(possibly by automatically opening a module), these might require that
some other modules in the namespace are initialized. Will these be
automatically traced? Or will the other modules need to be specified
on the command line? But then, compiling something using, say, Core
will not be as easy as adding "-pkg core". I guess each module can
come with its own dependency file (as we do now with findlib) but then
each use of a module in the namespace will need to be mimicked in the
build system (or the toplevel)...
Xavier Clerc
2013-02-25 21:40:30 UTC
Permalink
----- Mail original -----
Post by Alain Frisch
Post by Xavier Clerc
This is no what I mean.
Packs and namespaces serve two different purposes.
It just happens that the lack of namespace forced
people (including me) to use packs as namespaces.
Why forced?
"forced" was maybe too strong... :-)
Post by Alain Frisch
I haven't seen a lot of libraries relying on -pack
instead
of using unique enough module names (but it's true that I don't use a
lot of third-party libraries). Using the library name as a common
prefix for all its modules (and maybe having a module whose name is
the
library name itself in case of libraries with a clear notion of "main
public module") seems a quite good solution to me and a better one
than
-pack.
Well, it is not a bad solution, but packing allows to open the library
at once. Not to say that this is a must have but I find it convenient
in practice.
Post by Alain Frisch
Maybe this solution is not so good for libraries whose goal
is
to act as a "standard library" (such as Core), because the intention
is
to create the impression that the library is actually part of the
language (I don't have the impression to use a library when I write
String.length or List.map, contrary to when I write Xmlm.make_input);
so
I understand why Jane Street is reluctant to have Core_list.map
everywhere in their code. But would it really be a problem to have
the
users write "open Kaputt_abbreviations" instead of "open
Kaputt.Abbreviations", or Bolt_logger.log instead of Bolt.Logger.log?
For those libraries, with few modules, it could be a decent solution.
But for Barista (with 50+ modules), it is nice to be able to open the
library as mentioned above. Moreover, library code is also made lighter
by packing: in another Bolt module, "Logger.log" is sufficient (there
is no ambiguity) while your solution would imply to write "Bolt_logger.log".

It just occurs to me that packing obviously allows (or enforces)
"closed" namespaces: no one can add a module to the packed module.
Is the ability to "close" a namespace something considered useful?

Xavier
Christophe TROESTLER
2013-02-25 22:33:39 UTC
Permalink
Post by Xavier Clerc
Maybe this solution is not so good for libraries whose goal is to
act as a "standard library" (such as Core), because the intention
is to create the impression that the library is actually part of
the language (I don't have the impression to use a library when I
write String.length or List.map, contrary to when I write
Xmlm.make_input); so I understand why Jane Street is reluctant to
have Core_list.map everywhere in their code. But would it really
be a problem to have the users write "open Kaputt_abbreviations"
instead of "open Kaputt.Abbreviations", or Bolt_logger.log
instead of Bolt.Logger.log?
For those libraries, with few modules, it could be a decent solution.
But for Barista (with 50+ modules), it is nice to be able to open the
library as mentioned above. Moreover, library code is also made lighter
by packing: in another Bolt module, "Logger.log" is sufficient (there
is no ambiguity) while your solution would imply to write "Bolt_logger.log".
It just occurs to me that packing obviously allows (or enforces)
"closed" namespaces: no one can add a module to the packed module.
Is the ability to "close" a namespace something considered useful?
It seems to me that the openness of namespaces is the only feature I
have seen mentioned that modules do not have. But is the openness of
namespaces something considered useful? What problem does this solves?

Cheers,
C.
David Brown
2013-02-26 17:31:53 UTC
Permalink
Post by Christophe TROESTLER
It seems to me that the openness of namespaces is the only feature I
have seen mentioned that modules do not have. But is the openness of
namespaces something considered useful? What problem does this solves?
The simplest case is that it allows different packets to put things in
under the same hierarchy. As an example, from Haskell:

Data.List from 'base'
Data.Array from 'array'
Data.Text from 'text' - which is not part of the distribution

without openness, these would all need a different prefix.

Admittedly, Haskell has it much easier, since modules are very simple,
and cannot be manipulated within the language, other than the contents
of a particular file being a single module.

David
David Brown
2013-02-26 18:22:53 UTC
Permalink
Post by David Brown
Post by Christophe TROESTLER
It seems to me that the openness of namespaces is the only feature I
have seen mentioned that modules do not have. But is the openness of
namespaces something considered useful? What problem does this solves?
The simplest case is that it allows different packets to put things in
Data.List from 'base'
Data.Array from 'array'
Data.Text from 'text' - which is not part of the distribution
without openness, these would all need a different prefix.
Right but you say what openness is, not why you want it. IMHO, it is
unlikely you want to say "open namespace Data"; it is more a
convenience for documentation and can be handled at another level.
I suppose you could call it a convenience. But, it allows modules to be
named based on what they do and where they belong, rather than needing
prefixes to distinguish them. It has the disadvantage that you can't
tell from the name which package a given module comes from. But, after
an open, this is, in general, the case anyway.

David
Loading...